diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2009-09-17 23:56:37 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-09-17 23:56:37 -0400 |
commit | 1218259b2d09c79ed1113d3a6dbb9a1d6391f5cb (patch) | |
tree | 8f07cd39f6a5f74f41d5be34bc0d843428f04082 /Documentation | |
parent | ca9a702e50287cf429f1c12832319a26a715e70b (diff) | |
parent | 0efb4d20723d58edbad29d1ff98a86b631adb5e6 (diff) |
Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits)
vsnprintf: remove duplicate comment of vsnprintf
softirq: add BLOCK_IOPOLL to softirq_to_name
oprofile: fix oprofile regression: select RING_BUFFER_ALLOW_SWAP
tracing: switch function prints from %pf to %ps
vsprintf: add %ps that is the same as %pS but is like %pf
tracing: Fix minor bugs for __unregister_ftrace_function_probe
tracing: remove notrace from __kprobes annotation
tracing: optimize global_trace_clock cachelines
MAINTAINERS: Update tracing tree details
ftrace: document function and function graph implementation
tracing: make testing syscall events a separate configuration
tracing: remove some unused macros
ftrace: add compile-time check on F_printk()
tracing: fix F_printk() typos
tracing: have TRACE_EVENT macro use __flags to not shadow parameter
tracing: add static to generated TRACE_EVENT functions
ring-buffer: typecast cmpxchg to fix PowerPC warning
tracing: add filter event logic to special, mmiotrace and boot tracers
tracing: remove trace_event_types.h
tracing: use the new trace_entries.h to create format files
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/trace/events.txt | 184 | ||||
-rw-r--r-- | Documentation/trace/ftrace-design.txt | 233 | ||||
-rw-r--r-- | Documentation/trace/ftrace.txt | 6 |
3 files changed, 422 insertions, 1 deletions
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index 90e8b3383ba2..78c45a87be57 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt | |||
@@ -1,7 +1,7 @@ | |||
1 | Event Tracing | 1 | Event Tracing |
2 | 2 | ||
3 | Documentation written by Theodore Ts'o | 3 | Documentation written by Theodore Ts'o |
4 | Updated by Li Zefan | 4 | Updated by Li Zefan and Tom Zanussi |
5 | 5 | ||
6 | 1. Introduction | 6 | 1. Introduction |
7 | =============== | 7 | =============== |
@@ -97,3 +97,185 @@ The format of this boot option is the same as described in section 2.1. | |||
97 | 97 | ||
98 | See The example provided in samples/trace_events | 98 | See The example provided in samples/trace_events |
99 | 99 | ||
100 | 4. Event formats | ||
101 | ================ | ||
102 | |||
103 | Each trace event has a 'format' file associated with it that contains | ||
104 | a description of each field in a logged event. This information can | ||
105 | be used to parse the binary trace stream, and is also the place to | ||
106 | find the field names that can be used in event filters (see section 5). | ||
107 | |||
108 | It also displays the format string that will be used to print the | ||
109 | event in text mode, along with the event name and ID used for | ||
110 | profiling. | ||
111 | |||
112 | Every event has a set of 'common' fields associated with it; these are | ||
113 | the fields prefixed with 'common_'. The other fields vary between | ||
114 | events and correspond to the fields defined in the TRACE_EVENT | ||
115 | definition for that event. | ||
116 | |||
117 | Each field in the format has the form: | ||
118 | |||
119 | field:field-type field-name; offset:N; size:N; | ||
120 | |||
121 | where offset is the offset of the field in the trace record and size | ||
122 | is the size of the data item, in bytes. | ||
123 | |||
124 | For example, here's the information displayed for the 'sched_wakeup' | ||
125 | event: | ||
126 | |||
127 | # cat /debug/tracing/events/sched/sched_wakeup/format | ||
128 | |||
129 | name: sched_wakeup | ||
130 | ID: 60 | ||
131 | format: | ||
132 | field:unsigned short common_type; offset:0; size:2; | ||
133 | field:unsigned char common_flags; offset:2; size:1; | ||
134 | field:unsigned char common_preempt_count; offset:3; size:1; | ||
135 | field:int common_pid; offset:4; size:4; | ||
136 | field:int common_tgid; offset:8; size:4; | ||
137 | |||
138 | field:char comm[TASK_COMM_LEN]; offset:12; size:16; | ||
139 | field:pid_t pid; offset:28; size:4; | ||
140 | field:int prio; offset:32; size:4; | ||
141 | field:int success; offset:36; size:4; | ||
142 | field:int cpu; offset:40; size:4; | ||
143 | |||
144 | print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, | ||
145 | REC->prio, REC->success, REC->cpu | ||
146 | |||
147 | This event contains 10 fields, the first 5 common and the remaining 5 | ||
148 | event-specific. All the fields for this event are numeric, except for | ||
149 | 'comm' which is a string, a distinction important for event filtering. | ||
150 | |||
151 | 5. Event filtering | ||
152 | ================== | ||
153 | |||
154 | Trace events can be filtered in the kernel by associating boolean | ||
155 | 'filter expressions' with them. As soon as an event is logged into | ||
156 | the trace buffer, its fields are checked against the filter expression | ||
157 | associated with that event type. An event with field values that | ||
158 | 'match' the filter will appear in the trace output, and an event whose | ||
159 | values don't match will be discarded. An event with no filter | ||
160 | associated with it matches everything, and is the default when no | ||
161 | filter has been set for an event. | ||
162 | |||
163 | 5.1 Expression syntax | ||
164 | --------------------- | ||
165 | |||
166 | A filter expression consists of one or more 'predicates' that can be | ||
167 | combined using the logical operators '&&' and '||'. A predicate is | ||
168 | simply a clause that compares the value of a field contained within a | ||
169 | logged event with a constant value and returns either 0 or 1 depending | ||
170 | on whether the field value matched (1) or didn't match (0): | ||
171 | |||
172 | field-name relational-operator value | ||
173 | |||
174 | Parentheses can be used to provide arbitrary logical groupings and | ||
175 | double-quotes can be used to prevent the shell from interpreting | ||
176 | operators as shell metacharacters. | ||
177 | |||
178 | The field-names available for use in filters can be found in the | ||
179 | 'format' files for trace events (see section 4). | ||
180 | |||
181 | The relational-operators depend on the type of the field being tested: | ||
182 | |||
183 | The operators available for numeric fields are: | ||
184 | |||
185 | ==, !=, <, <=, >, >= | ||
186 | |||
187 | And for string fields they are: | ||
188 | |||
189 | ==, != | ||
190 | |||
191 | Currently, only exact string matches are supported. | ||
192 | |||
193 | Currently, the maximum number of predicates in a filter is 16. | ||
194 | |||
195 | 5.2 Setting filters | ||
196 | ------------------- | ||
197 | |||
198 | A filter for an individual event is set by writing a filter expression | ||
199 | to the 'filter' file for the given event. | ||
200 | |||
201 | For example: | ||
202 | |||
203 | # cd /debug/tracing/events/sched/sched_wakeup | ||
204 | # echo "common_preempt_count > 4" > filter | ||
205 | |||
206 | A slightly more involved example: | ||
207 | |||
208 | # cd /debug/tracing/events/sched/sched_signal_send | ||
209 | # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter | ||
210 | |||
211 | If there is an error in the expression, you'll get an 'Invalid | ||
212 | argument' error when setting it, and the erroneous string along with | ||
213 | an error message can be seen by looking at the filter e.g.: | ||
214 | |||
215 | # cd /debug/tracing/events/sched/sched_signal_send | ||
216 | # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter | ||
217 | -bash: echo: write error: Invalid argument | ||
218 | # cat filter | ||
219 | ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash | ||
220 | ^ | ||
221 | parse_error: Field not found | ||
222 | |||
223 | Currently the caret ('^') for an error always appears at the beginning of | ||
224 | the filter string; the error message should still be useful though | ||
225 | even without more accurate position info. | ||
226 | |||
227 | 5.3 Clearing filters | ||
228 | -------------------- | ||
229 | |||
230 | To clear the filter for an event, write a '0' to the event's filter | ||
231 | file. | ||
232 | |||
233 | To clear the filters for all events in a subsystem, write a '0' to the | ||
234 | subsystem's filter file. | ||
235 | |||
236 | 5.3 Subsystem filters | ||
237 | --------------------- | ||
238 | |||
239 | For convenience, filters for every event in a subsystem can be set or | ||
240 | cleared as a group by writing a filter expression into the filter file | ||
241 | at the root of the subsytem. Note however, that if a filter for any | ||
242 | event within the subsystem lacks a field specified in the subsystem | ||
243 | filter, or if the filter can't be applied for any other reason, the | ||
244 | filter for that event will retain its previous setting. This can | ||
245 | result in an unintended mixture of filters which could lead to | ||
246 | confusing (to the user who might think different filters are in | ||
247 | effect) trace output. Only filters that reference just the common | ||
248 | fields can be guaranteed to propagate successfully to all events. | ||
249 | |||
250 | Here are a few subsystem filter examples that also illustrate the | ||
251 | above points: | ||
252 | |||
253 | Clear the filters on all events in the sched subsytem: | ||
254 | |||
255 | # cd /sys/kernel/debug/tracing/events/sched | ||
256 | # echo 0 > filter | ||
257 | # cat sched_switch/filter | ||
258 | none | ||
259 | # cat sched_wakeup/filter | ||
260 | none | ||
261 | |||
262 | Set a filter using only common fields for all events in the sched | ||
263 | subsytem (all events end up with the same filter): | ||
264 | |||
265 | # cd /sys/kernel/debug/tracing/events/sched | ||
266 | # echo common_pid == 0 > filter | ||
267 | # cat sched_switch/filter | ||
268 | common_pid == 0 | ||
269 | # cat sched_wakeup/filter | ||
270 | common_pid == 0 | ||
271 | |||
272 | Attempt to set a filter using a non-common field for all events in the | ||
273 | sched subsytem (all events but those that have a prev_pid field retain | ||
274 | their old filters): | ||
275 | |||
276 | # cd /sys/kernel/debug/tracing/events/sched | ||
277 | # echo prev_pid == 0 > filter | ||
278 | # cat sched_switch/filter | ||
279 | prev_pid == 0 | ||
280 | # cat sched_wakeup/filter | ||
281 | common_pid == 0 | ||
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt new file mode 100644 index 000000000000..7003e10f10f5 --- /dev/null +++ b/Documentation/trace/ftrace-design.txt | |||
@@ -0,0 +1,233 @@ | |||
1 | function tracer guts | ||
2 | ==================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | Here we will cover the architecture pieces that the common function tracing | ||
8 | code relies on for proper functioning. Things are broken down into increasing | ||
9 | complexity so that you can start simple and at least get basic functionality. | ||
10 | |||
11 | Note that this focuses on architecture implementation details only. If you | ||
12 | want more explanation of a feature in terms of common code, review the common | ||
13 | ftrace.txt file. | ||
14 | |||
15 | |||
16 | Prerequisites | ||
17 | ------------- | ||
18 | |||
19 | Ftrace relies on these features being implemented: | ||
20 | STACKTRACE_SUPPORT - implement save_stack_trace() | ||
21 | TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h | ||
22 | |||
23 | |||
24 | HAVE_FUNCTION_TRACER | ||
25 | -------------------- | ||
26 | |||
27 | You will need to implement the mcount and the ftrace_stub functions. | ||
28 | |||
29 | The exact mcount symbol name will depend on your toolchain. Some call it | ||
30 | "mcount", "_mcount", or even "__mcount". You can probably figure it out by | ||
31 | running something like: | ||
32 | $ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount | ||
33 | call mcount | ||
34 | We'll make the assumption below that the symbol is "mcount" just to keep things | ||
35 | nice and simple in the examples. | ||
36 | |||
37 | Keep in mind that the ABI that is in effect inside of the mcount function is | ||
38 | *highly* architecture/toolchain specific. We cannot help you in this regard, | ||
39 | sorry. Dig up some old documentation and/or find someone more familiar than | ||
40 | you to bang ideas off of. Typically, register usage (argument/scratch/etc...) | ||
41 | is a major issue at this point, especially in relation to the location of the | ||
42 | mcount call (before/after function prologue). You might also want to look at | ||
43 | how glibc has implemented the mcount function for your architecture. It might | ||
44 | be (semi-)relevant. | ||
45 | |||
46 | The mcount function should check the function pointer ftrace_trace_function | ||
47 | to see if it is set to ftrace_stub. If it is, there is nothing for you to do, | ||
48 | so return immediately. If it isn't, then call that function in the same way | ||
49 | the mcount function normally calls __mcount_internal -- the first argument is | ||
50 | the "frompc" while the second argument is the "selfpc" (adjusted to remove the | ||
51 | size of the mcount call that is embedded in the function). | ||
52 | |||
53 | For example, if the function foo() calls bar(), when the bar() function calls | ||
54 | mcount(), the arguments mcount() will pass to the tracer are: | ||
55 | "frompc" - the address bar() will use to return to foo() | ||
56 | "selfpc" - the address bar() (with _mcount() size adjustment) | ||
57 | |||
58 | Also keep in mind that this mcount function will be called *a lot*, so | ||
59 | optimizing for the default case of no tracer will help the smooth running of | ||
60 | your system when tracing is disabled. So the start of the mcount function is | ||
61 | typically the bare min with checking things before returning. That also means | ||
62 | the code flow should usually kept linear (i.e. no branching in the nop case). | ||
63 | This is of course an optimization and not a hard requirement. | ||
64 | |||
65 | Here is some pseudo code that should help (these functions should actually be | ||
66 | implemented in assembly): | ||
67 | |||
68 | void ftrace_stub(void) | ||
69 | { | ||
70 | return; | ||
71 | } | ||
72 | |||
73 | void mcount(void) | ||
74 | { | ||
75 | /* save any bare state needed in order to do initial checking */ | ||
76 | |||
77 | extern void (*ftrace_trace_function)(unsigned long, unsigned long); | ||
78 | if (ftrace_trace_function != ftrace_stub) | ||
79 | goto do_trace; | ||
80 | |||
81 | /* restore any bare state */ | ||
82 | |||
83 | return; | ||
84 | |||
85 | do_trace: | ||
86 | |||
87 | /* save all state needed by the ABI (see paragraph above) */ | ||
88 | |||
89 | unsigned long frompc = ...; | ||
90 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | ||
91 | ftrace_trace_function(frompc, selfpc); | ||
92 | |||
93 | /* restore all state needed by the ABI */ | ||
94 | } | ||
95 | |||
96 | Don't forget to export mcount for modules ! | ||
97 | extern void mcount(void); | ||
98 | EXPORT_SYMBOL(mcount); | ||
99 | |||
100 | |||
101 | HAVE_FUNCTION_TRACE_MCOUNT_TEST | ||
102 | ------------------------------- | ||
103 | |||
104 | This is an optional optimization for the normal case when tracing is turned off | ||
105 | in the system. If you do not enable this Kconfig option, the common ftrace | ||
106 | code will take care of doing the checking for you. | ||
107 | |||
108 | To support this feature, you only need to check the function_trace_stop | ||
109 | variable in the mcount function. If it is non-zero, there is no tracing to be | ||
110 | done at all, so you can return. | ||
111 | |||
112 | This additional pseudo code would simply be: | ||
113 | void mcount(void) | ||
114 | { | ||
115 | /* save any bare state needed in order to do initial checking */ | ||
116 | |||
117 | + if (function_trace_stop) | ||
118 | + return; | ||
119 | |||
120 | extern void (*ftrace_trace_function)(unsigned long, unsigned long); | ||
121 | if (ftrace_trace_function != ftrace_stub) | ||
122 | ... | ||
123 | |||
124 | |||
125 | HAVE_FUNCTION_GRAPH_TRACER | ||
126 | -------------------------- | ||
127 | |||
128 | Deep breath ... time to do some real work. Here you will need to update the | ||
129 | mcount function to check ftrace graph function pointers, as well as implement | ||
130 | some functions to save (hijack) and restore the return address. | ||
131 | |||
132 | The mcount function should check the function pointers ftrace_graph_return | ||
133 | (compare to ftrace_stub) and ftrace_graph_entry (compare to | ||
134 | ftrace_graph_entry_stub). If either of those are not set to the relevant stub | ||
135 | function, call the arch-specific function ftrace_graph_caller which in turn | ||
136 | calls the arch-specific function prepare_ftrace_return. Neither of these | ||
137 | function names are strictly required, but you should use them anyways to stay | ||
138 | consistent across the architecture ports -- easier to compare & contrast | ||
139 | things. | ||
140 | |||
141 | The arguments to prepare_ftrace_return are slightly different than what are | ||
142 | passed to ftrace_trace_function. The second argument "selfpc" is the same, | ||
143 | but the first argument should be a pointer to the "frompc". Typically this is | ||
144 | located on the stack. This allows the function to hijack the return address | ||
145 | temporarily to have it point to the arch-specific function return_to_handler. | ||
146 | That function will simply call the common ftrace_return_to_handler function and | ||
147 | that will return the original return address with which, you can return to the | ||
148 | original call site. | ||
149 | |||
150 | Here is the updated mcount pseudo code: | ||
151 | void mcount(void) | ||
152 | { | ||
153 | ... | ||
154 | if (ftrace_trace_function != ftrace_stub) | ||
155 | goto do_trace; | ||
156 | |||
157 | +#ifdef CONFIG_FUNCTION_GRAPH_TRACER | ||
158 | + extern void (*ftrace_graph_return)(...); | ||
159 | + extern void (*ftrace_graph_entry)(...); | ||
160 | + if (ftrace_graph_return != ftrace_stub || | ||
161 | + ftrace_graph_entry != ftrace_graph_entry_stub) | ||
162 | + ftrace_graph_caller(); | ||
163 | +#endif | ||
164 | |||
165 | /* restore any bare state */ | ||
166 | ... | ||
167 | |||
168 | Here is the pseudo code for the new ftrace_graph_caller assembly function: | ||
169 | #ifdef CONFIG_FUNCTION_GRAPH_TRACER | ||
170 | void ftrace_graph_caller(void) | ||
171 | { | ||
172 | /* save all state needed by the ABI */ | ||
173 | |||
174 | unsigned long *frompc = &...; | ||
175 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | ||
176 | prepare_ftrace_return(frompc, selfpc); | ||
177 | |||
178 | /* restore all state needed by the ABI */ | ||
179 | } | ||
180 | #endif | ||
181 | |||
182 | For information on how to implement prepare_ftrace_return(), simply look at | ||
183 | the x86 version. The only architecture-specific piece in it is the setup of | ||
184 | the fault recovery table (the asm(...) code). The rest should be the same | ||
185 | across architectures. | ||
186 | |||
187 | Here is the pseudo code for the new return_to_handler assembly function. Note | ||
188 | that the ABI that applies here is different from what applies to the mcount | ||
189 | code. Since you are returning from a function (after the epilogue), you might | ||
190 | be able to skimp on things saved/restored (usually just registers used to pass | ||
191 | return values). | ||
192 | |||
193 | #ifdef CONFIG_FUNCTION_GRAPH_TRACER | ||
194 | void return_to_handler(void) | ||
195 | { | ||
196 | /* save all state needed by the ABI (see paragraph above) */ | ||
197 | |||
198 | void (*original_return_point)(void) = ftrace_return_to_handler(); | ||
199 | |||
200 | /* restore all state needed by the ABI */ | ||
201 | |||
202 | /* this is usually either a return or a jump */ | ||
203 | original_return_point(); | ||
204 | } | ||
205 | #endif | ||
206 | |||
207 | |||
208 | HAVE_FTRACE_NMI_ENTER | ||
209 | --------------------- | ||
210 | |||
211 | If you can't trace NMI functions, then skip this option. | ||
212 | |||
213 | <details to be filled> | ||
214 | |||
215 | |||
216 | HAVE_FTRACE_SYSCALLS | ||
217 | --------------------- | ||
218 | |||
219 | <details to be filled> | ||
220 | |||
221 | |||
222 | HAVE_FTRACE_MCOUNT_RECORD | ||
223 | ------------------------- | ||
224 | |||
225 | See scripts/recordmcount.pl for more info. | ||
226 | |||
227 | <details to be filled> | ||
228 | |||
229 | |||
230 | HAVE_DYNAMIC_FTRACE | ||
231 | --------------------- | ||
232 | |||
233 | <details to be filled> | ||
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index 355d0f1f8c50..1b6292bbdd6d 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt | |||
@@ -26,6 +26,12 @@ disabled, and more (ftrace allows for tracer plugins, which | |||
26 | means that the list of tracers can always grow). | 26 | means that the list of tracers can always grow). |
27 | 27 | ||
28 | 28 | ||
29 | Implementation Details | ||
30 | ---------------------- | ||
31 | |||
32 | See ftrace-design.txt for details for arch porters and such. | ||
33 | |||
34 | |||
29 | The File System | 35 | The File System |
30 | --------------- | 36 | --------------- |
31 | 37 | ||