diff options
Diffstat (limited to 'Documentation/ftrace.txt')
-rw-r--r-- | Documentation/ftrace.txt | 403 |
1 files changed, 205 insertions, 198 deletions
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt index 13e4bf054c38..f218f616ff6b 100644 --- a/Documentation/ftrace.txt +++ b/Documentation/ftrace.txt | |||
@@ -2,8 +2,12 @@ | |||
2 | ======================== | 2 | ======================== |
3 | 3 | ||
4 | Copyright 2008 Red Hat Inc. | 4 | Copyright 2008 Red Hat Inc. |
5 | Author: Steven Rostedt <srostedt@redhat.com> | 5 | Author: Steven Rostedt <srostedt@redhat.com> |
6 | License: The GNU Free Documentation License, Version 1.2 | ||
7 | Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, | ||
8 | John Kacur, and David Teigland. | ||
6 | 9 | ||
10 | Written for: 2.6.27-rc1 | ||
7 | 11 | ||
8 | Introduction | 12 | Introduction |
9 | ------------ | 13 | ------------ |
@@ -15,10 +19,11 @@ issues that take place outside of user-space. | |||
15 | 19 | ||
16 | Although ftrace is the function tracer, it also includes an | 20 | Although ftrace is the function tracer, it also includes an |
17 | infrastructure that allows for other types of tracing. Some of the | 21 | infrastructure that allows for other types of tracing. Some of the |
18 | tracers that are currently in ftrace is a tracer to trace | 22 | tracers that are currently in ftrace include a tracer to trace |
19 | context switches, the time it takes for a high priority task to | 23 | context switches, the time it takes for a high priority task to |
20 | run after it was woken up, the time interrupts are disabled, and | 24 | run after it was woken up, the time interrupts are disabled, and |
21 | more. | 25 | more (ftrace allows for tracer plugins, which means that the list of |
26 | tracers can always grow). | ||
22 | 27 | ||
23 | 28 | ||
24 | The File System | 29 | The File System |
@@ -32,6 +37,8 @@ To mount the debugfs system: | |||
32 | # mkdir /debug | 37 | # mkdir /debug |
33 | # mount -t debugfs nodev /debug | 38 | # mount -t debugfs nodev /debug |
34 | 39 | ||
40 | (Note: it is more common to mount at /sys/kernel/debug, but for simplicity | ||
41 | this document will use /debug) | ||
35 | 42 | ||
36 | That's it! (assuming that you have ftrace configured into your kernel) | 43 | That's it! (assuming that you have ftrace configured into your kernel) |
37 | 44 | ||
@@ -46,21 +53,20 @@ of ftrace. Here is a list of some of the key files: | |||
46 | that is configured. | 53 | that is configured. |
47 | 54 | ||
48 | available_tracers : This holds the different types of tracers that | 55 | available_tracers : This holds the different types of tracers that |
49 | has been compiled into the kernel. The tracers | 56 | have been compiled into the kernel. The tracers |
50 | listed here can be configured by echoing in their | 57 | listed here can be configured by echoing their name |
51 | name into current_tracer. | 58 | into current_tracer. |
52 | 59 | ||
53 | tracing_enabled : This sets or displays whether the current_tracer | 60 | tracing_enabled : This sets or displays whether the current_tracer |
54 | is activated and tracing or not. Echo 0 into this | 61 | is activated and tracing or not. Echo 0 into this |
55 | file to disable the tracer or 1 (or non-zero) to | 62 | file to disable the tracer or 1 to enable it. |
56 | enable it. | ||
57 | 63 | ||
58 | trace : This file holds the output of the trace in a human readable | 64 | trace : This file holds the output of the trace in a human readable |
59 | format. | 65 | format (described below). |
60 | 66 | ||
61 | latency_trace : This file shows the same trace but the information | 67 | latency_trace : This file shows the same trace but the information |
62 | is organized more to display possible latencies | 68 | is organized more to display possible latencies |
63 | in the system. | 69 | in the system (described below). |
64 | 70 | ||
65 | trace_pipe : The output is the same as the "trace" file but this | 71 | trace_pipe : The output is the same as the "trace" file but this |
66 | file is meant to be streamed with live tracing. | 72 | file is meant to be streamed with live tracing. |
@@ -72,7 +78,7 @@ of ftrace. Here is a list of some of the key files: | |||
72 | file, it is consumed, and will not be read | 78 | file, it is consumed, and will not be read |
73 | again with a sequential read. The "trace" and | 79 | again with a sequential read. The "trace" and |
74 | "latency_trace" files are static, and if the | 80 | "latency_trace" files are static, and if the |
75 | tracer isn't adding more data, they will display | 81 | tracer is not adding more data, they will display |
76 | the same information every time they are read. | 82 | the same information every time they are read. |
77 | 83 | ||
78 | iter_ctrl : This file lets the user control the amount of data | 84 | iter_ctrl : This file lets the user control the amount of data |
@@ -89,12 +95,14 @@ of ftrace. Here is a list of some of the key files: | |||
89 | 95 | ||
90 | trace_entries : This sets or displays the number of trace | 96 | trace_entries : This sets or displays the number of trace |
91 | entries each CPU buffer can hold. The tracer buffers | 97 | entries each CPU buffer can hold. The tracer buffers |
92 | are the same size for each CPU, so care must be | 98 | are the same size for each CPU. The displayed number |
93 | taken when modifying the trace_entries. The number | 99 | is the size of the CPU buffer and not total size. The |
94 | of actually entries will be the number given | 100 | trace buffers are allocated in pages (blocks of memory |
95 | times the number of possible CPUS. The buffers | 101 | that the kernel uses for allocation, usually 4 KB in size). |
96 | are saved as individual pages, and the actual entries | 102 | Since each entry is smaller than a page, if the last |
97 | will always be rounded up to entries per page. | 103 | allocated page has room for more entries than were |
104 | requested, the rest of the page is used to allocate | ||
105 | entries. | ||
98 | 106 | ||
99 | This can only be updated when the current_tracer | 107 | This can only be updated when the current_tracer |
100 | is set to "none". | 108 | is set to "none". |
@@ -107,20 +115,19 @@ of ftrace. Here is a list of some of the key files: | |||
107 | on specified CPUS. The format is a hex string | 115 | on specified CPUS. The format is a hex string |
108 | representing the CPUS. | 116 | representing the CPUS. |
109 | 117 | ||
110 | set_ftrace_filter : When dynamic ftrace is configured in, the | 118 | set_ftrace_filter : When dynamic ftrace is configured in (see the |
111 | code is dynamically modified to disable calling | 119 | section below "dynamic ftrace"), the code is dynamically |
112 | of the function profiler (mcount). This lets | 120 | modified (code text rewrite) to disable calling of the |
113 | tracing be configured in with practically no overhead | 121 | function profiler (mcount). This lets tracing be configured |
114 | in performance. This also has a side effect of | 122 | in with practically no overhead in performance. This also |
115 | enabling or disabling specific functions to be | 123 | has a side effect of enabling or disabling specific functions |
116 | traced. Echoing in names of functions into this | 124 | to be traced. Echoing names of functions into this file |
117 | file will limit the trace to only those files. | 125 | will limit the trace to only those functions. |
118 | 126 | ||
119 | set_ftrace_notrace: This has the opposite effect that | 127 | set_ftrace_notrace: This has an effect opposite to that of |
120 | set_ftrace_filter has. Any function that is added | 128 | set_ftrace_filter. Any function that is added here will not |
121 | here will not be traced. If a function exists | 129 | be traced. If a function exists in both set_ftrace_filter |
122 | in both set_ftrace_filter and set_ftrace_notrace | 130 | and set_ftrace_notrace, the function will _not_ be traced. |
123 | the function will _not_ bet traced. | ||
124 | 131 | ||
125 | available_filter_functions : When a function is encountered the first | 132 | available_filter_functions : When a function is encountered the first |
126 | time by the dynamic tracer, it is recorded and | 133 | time by the dynamic tracer, it is recorded and |
@@ -128,32 +135,31 @@ of ftrace. Here is a list of some of the key files: | |||
128 | lists the functions that have been recorded | 135 | lists the functions that have been recorded |
129 | by the dynamic tracer and these functions can | 136 | by the dynamic tracer and these functions can |
130 | be used to set the ftrace filter by the above | 137 | be used to set the ftrace filter by the above |
131 | "set_ftrace_filter" file. | 138 | "set_ftrace_filter" file. (See the section "dynamic ftrace" |
139 | below for more details). | ||
132 | 140 | ||
133 | 141 | ||
134 | The Tracers | 142 | The Tracers |
135 | ----------- | 143 | ----------- |
136 | 144 | ||
137 | Here are the list of current tracers that can be configured. | 145 | Here is the list of current tracers that may be configured. |
138 | 146 | ||
139 | ftrace - function tracer that uses mcount to trace all functions. | 147 | ftrace - function tracer that uses mcount to trace all functions. |
140 | It is possible to filter out which functions that are | ||
141 | traced when dynamic ftrace is configured in. | ||
142 | 148 | ||
143 | sched_switch - traces the context switches between tasks. | 149 | sched_switch - traces the context switches between tasks. |
144 | 150 | ||
145 | irqsoff - traces the areas that disable interrupts and saves off | 151 | irqsoff - traces the areas that disable interrupts and saves |
146 | the trace with the longest max latency. | 152 | the trace with the longest max latency. |
147 | See tracing_max_latency. When a new max is recorded, | 153 | See tracing_max_latency. When a new max is recorded, |
148 | it replaces the old trace. It is best to view this | 154 | it replaces the old trace. It is best to view this |
149 | trace with the latency_trace file. | 155 | trace via the latency_trace file. |
150 | 156 | ||
151 | preemptoff - Similar to irqsoff but traces and records the time | 157 | preemptoff - Similar to irqsoff but traces and records the amount of |
152 | preemption is disabled. | 158 | time for which preemption is disabled. |
153 | 159 | ||
154 | preemptirqsoff - Similar to irqsoff and preemptoff, but traces and | 160 | preemptirqsoff - Similar to irqsoff and preemptoff, but traces and |
155 | records the largest time irqs and/or preemption is | 161 | records the largest time for which irqs and/or preemption |
156 | disabled. | 162 | is disabled. |
157 | 163 | ||
158 | wakeup - Traces and records the max latency that it takes for | 164 | wakeup - Traces and records the max latency that it takes for |
159 | the highest priority task to get scheduled after | 165 | the highest priority task to get scheduled after |
@@ -166,13 +172,13 @@ Here are the list of current tracers that can be configured. | |||
166 | Examples of using the tracer | 172 | Examples of using the tracer |
167 | ---------------------------- | 173 | ---------------------------- |
168 | 174 | ||
169 | Here are typical examples of using the tracers with only controlling | 175 | Here are typical examples of using the tracers when controlling them only |
170 | them with the debugfs interface (without using any user-land utilities). | 176 | with the debugfs interface (without using any user-land utilities). |
171 | 177 | ||
172 | Output format: | 178 | Output format: |
173 | -------------- | 179 | -------------- |
174 | 180 | ||
175 | Here's an example of the output format of the file "trace" | 181 | Here is an example of the output format of the file "trace" |
176 | 182 | ||
177 | -------- | 183 | -------- |
178 | # tracer: ftrace | 184 | # tracer: ftrace |
@@ -184,14 +190,15 @@ Here's an example of the output format of the file "trace" | |||
184 | bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput | 190 | bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput |
185 | -------- | 191 | -------- |
186 | 192 | ||
187 | A header is printed with the trace that is represented. In this case | 193 | A header is printed with the tracer name that is represented by the trace. |
188 | the tracer is "ftrace". Then a header showing the format. Task name | 194 | In this case the tracer is "ftrace". Then a header showing the format. Task |
189 | "bash", the task PID "4251", the CPU that it was running on | 195 | name "bash", the task PID "4251", the CPU that it was running on |
190 | "01", the timestamp in <secs>.<usecs> format, the function name that was | 196 | "01", the timestamp in <secs>.<usecs> format, the function name that was |
191 | traced "path_put" and the parent function that called this function | 197 | traced "path_put" and the parent function that called this function |
192 | "path_walk". | 198 | "path_walk". The timestamp is the time at which the function was |
199 | entered. | ||
193 | 200 | ||
194 | The sched_switch tracer also includes tracing of task wake ups and | 201 | The sched_switch tracer also includes tracing of task wakeups and |
195 | context switches. | 202 | context switches. |
196 | 203 | ||
197 | ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S | 204 | ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S |
@@ -201,7 +208,7 @@ context switches. | |||
201 | kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R | 208 | kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R |
202 | ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R | 209 | ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R |
203 | 210 | ||
204 | Wake ups are represented by a "+" and the context switches show | 211 | Wake ups are represented by a "+" and the context switches are shown as |
205 | "==>". The format is: | 212 | "==>". The format is: |
206 | 213 | ||
207 | Context switches: | 214 | Context switches: |
@@ -216,7 +223,7 @@ Wake ups are represented by a "+" and the context switches show | |||
216 | 223 | ||
217 | <pid>:<prio>:<state> + <pid>:<prio>:<state> | 224 | <pid>:<prio>:<state> + <pid>:<prio>:<state> |
218 | 225 | ||
219 | The prio is the internal kernel priority, which is inverse to the | 226 | The prio is the internal kernel priority, which is the inverse of the |
220 | priority that is usually displayed by user-space tools. Zero represents | 227 | priority that is usually displayed by user-space tools. Zero represents |
221 | the highest priority (99). Prio 100 starts the "nice" priorities with | 228 | the highest priority (99). Prio 100 starts the "nice" priorities with |
222 | 100 being equal to nice -20 and 139 being nice 19. The prio "140" is | 229 | 100 being equal to nice -20 and 139 being nice 19. The prio "140" is |
@@ -227,7 +234,7 @@ Latency trace format | |||
227 | -------------------- | 234 | -------------------- |
228 | 235 | ||
229 | For traces that display latency times, the latency_trace file gives | 236 | For traces that display latency times, the latency_trace file gives |
230 | a bit more information to see why a latency happened. Here's a typical | 237 | somewhat more information to see why a latency happened. Here is a typical |
231 | trace. | 238 | trace. |
232 | 239 | ||
233 | # tracer: irqsoff | 240 | # tracer: irqsoff |
@@ -255,21 +262,20 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
255 | <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq) | 262 | <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq) |
256 | 263 | ||
257 | 264 | ||
258 | vim:ft=help | ||
259 | |||
260 | 265 | ||
261 | This shows that the current tracer is "irqsoff" tracing the time | 266 | This shows that the current tracer is "irqsoff" tracing the time for which |
262 | interrupts are disabled. It gives the trace version and the kernel | 267 | interrupts were disabled. It gives the trace version and the version |
263 | this was executed on (2.6.26-rc8). Then it displays the max latency | 268 | of the kernel upon which this was executed on (2.6.26-rc8). Then it displays |
264 | in microsecs (97 us). The number of trace entries displayed | 269 | the max latency in microsecs (97 us). The number of trace entries displayed |
265 | by the total number recorded (both are three: #3/3). The type of | 270 | and the total number recorded (both are three: #3/3). The type of |
266 | preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero | 271 | preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero |
267 | and reserved for later use. #P is the number of online CPUS (#P:2). | 272 | and are reserved for later use. #P is the number of online CPUS (#P:2). |
268 | 273 | ||
269 | The task is the process that was running when the latency happened. | 274 | The task is the process that was running when the latency occurred. |
270 | (swapper pid: 0). | 275 | (swapper pid: 0). |
271 | 276 | ||
272 | The start and stop that caused the latencies: | 277 | The start and stop (the functions in which the interrupts were disabled and |
278 | enabled respectively) that caused the latencies: | ||
273 | 279 | ||
274 | apic_timer_interrupt is where the interrupts were disabled. | 280 | apic_timer_interrupt is where the interrupts were disabled. |
275 | do_softirq is where they were enabled again. | 281 | do_softirq is where they were enabled again. |
@@ -281,14 +287,14 @@ explains which is which. | |||
281 | 287 | ||
282 | pid: The PID of that process. | 288 | pid: The PID of that process. |
283 | 289 | ||
284 | CPU#: The CPU that the process was running on. | 290 | CPU#: The CPU which the process was running on. |
285 | 291 | ||
286 | irqs-off: 'd' interrupts are disabled. '.' otherwise. | 292 | irqs-off: 'd' interrupts are disabled. '.' otherwise. |
287 | 293 | ||
288 | need-resched: 'N' task need_resched is set, '.' otherwise. | 294 | need-resched: 'N' task need_resched is set, '.' otherwise. |
289 | 295 | ||
290 | hardirq/softirq: | 296 | hardirq/softirq: |
291 | 'H' - hard irq happened inside a softirq. | 297 | 'H' - hard irq occurred inside a softirq. |
292 | 'h' - hard irq is running | 298 | 'h' - hard irq is running |
293 | 's' - soft irq is running | 299 | 's' - soft irq is running |
294 | '.' - normal context. | 300 | '.' - normal context. |
@@ -297,13 +303,13 @@ explains which is which. | |||
297 | 303 | ||
298 | The above is mostly meaningful for kernel developers. | 304 | The above is mostly meaningful for kernel developers. |
299 | 305 | ||
300 | time: This differs from the trace output where as the trace output | 306 | time: This differs from the trace file output. The trace file output |
301 | contained a absolute timestamp. This timestamp is relative | 307 | includes an absolute timestamp. The timestamp used by the |
302 | to the start of the first entry in the the trace. | 308 | latency_trace file is relative to the start of the trace. |
303 | 309 | ||
304 | delay: This is just to help catch your eye a bit better. And | 310 | delay: This is just to help catch your eye a bit better. And |
305 | needs to be fixed to be only relative to the same CPU. | 311 | needs to be fixed to be only relative to the same CPU. |
306 | The marks is determined by the difference between this | 312 | The marks are determined by the difference between this |
307 | current trace and the next trace. | 313 | current trace and the next trace. |
308 | '!' - greater than preempt_mark_thresh (default 100) | 314 | '!' - greater than preempt_mark_thresh (default 100) |
309 | '+' - greater than 1 microsecond | 315 | '+' - greater than 1 microsecond |
@@ -322,13 +328,13 @@ output. To see what is available, simply cat the file: | |||
322 | print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ | 328 | print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ |
323 | noblock nostacktrace nosched-tree | 329 | noblock nostacktrace nosched-tree |
324 | 330 | ||
325 | To disable one of the options, echo in the option appended with "no". | 331 | To disable one of the options, echo in the option prepended with "no". |
326 | 332 | ||
327 | echo noprint-parent > /debug/tracing/iter_ctrl | 333 | echo noprint-parent > /debug/tracing/iter_ctrl |
328 | 334 | ||
329 | To enable an option, leave off the "no". | 335 | To enable an option, leave off the "no". |
330 | 336 | ||
331 | echo sym-offest > /debug/tracing/iter_ctrl | 337 | echo sym-offset > /debug/tracing/iter_ctrl |
332 | 338 | ||
333 | Here are the available options: | 339 | Here are the available options: |
334 | 340 | ||
@@ -344,7 +350,7 @@ Here are the available options: | |||
344 | 350 | ||
345 | sym-offset - Display not only the function name, but also the offset | 351 | sym-offset - Display not only the function name, but also the offset |
346 | in the function. For example, instead of seeing just | 352 | in the function. For example, instead of seeing just |
347 | "ktime_get" you will see "ktime_get+0xb/0x20" | 353 | "ktime_get", you will see "ktime_get+0xb/0x20". |
348 | 354 | ||
349 | sym-offset: | 355 | sym-offset: |
350 | bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 | 356 | bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 |
@@ -364,7 +370,7 @@ Here are the available options: | |||
364 | user applications that can translate the raw numbers better than | 370 | user applications that can translate the raw numbers better than |
365 | having it done in the kernel. | 371 | having it done in the kernel. |
366 | 372 | ||
367 | hex - similar to raw, but the numbers will be in a hexadecimal format. | 373 | hex - Similar to raw, but the numbers will be in a hexadecimal format. |
368 | 374 | ||
369 | bin - This will print out the formats in raw binary. | 375 | bin - This will print out the formats in raw binary. |
370 | 376 | ||
@@ -380,8 +386,8 @@ Here are the available options: | |||
380 | sched_switch | 386 | sched_switch |
381 | ------------ | 387 | ------------ |
382 | 388 | ||
383 | This tracer simply records schedule switches. Here's an example | 389 | This tracer simply records schedule switches. Here is an example |
384 | on how to implement it. | 390 | of how to use it. |
385 | 391 | ||
386 | # echo sched_switch > /debug/tracing/current_tracer | 392 | # echo sched_switch > /debug/tracing/current_tracer |
387 | # echo 1 > /debug/tracing/tracing_enabled | 393 | # echo 1 > /debug/tracing/tracing_enabled |
@@ -416,8 +422,8 @@ the name of the trace and points to the options. The "FUNCTION" | |||
416 | is a misnomer since here it represents the wake ups and context | 422 | is a misnomer since here it represents the wake ups and context |
417 | switches. | 423 | switches. |
418 | 424 | ||
419 | The sched_switch only lists the wake ups (represented with '+') | 425 | The sched_switch file only lists the wake ups (represented with '+') |
420 | and context switches ('==>') with the previous task or current | 426 | and context switches ('==>') with the previous task or current task |
421 | first followed by the next task or task waking up. The format for both | 427 | first followed by the next task or task waking up. The format for both |
422 | of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO | 428 | of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO |
423 | is the inverse of the actual priority with zero (0) being the highest | 429 | is the inverse of the actual priority with zero (0) being the highest |
@@ -432,7 +438,8 @@ The task states are: | |||
432 | 438 | ||
433 | R - running : wants to run, may not actually be running | 439 | R - running : wants to run, may not actually be running |
434 | S - sleep : process is waiting to be woken up (handles signals) | 440 | S - sleep : process is waiting to be woken up (handles signals) |
435 | D - deep sleep : process must be woken up (ignores signals) | 441 | D - disk sleep (uninterruptible sleep) : process must be woken up |
442 | (ignores signals) | ||
436 | T - stopped : process suspended | 443 | T - stopped : process suspended |
437 | t - traced : process is being traced (with something like gdb) | 444 | t - traced : process is being traced (with something like gdb) |
438 | Z - zombie : process waiting to be cleaned up | 445 | Z - zombie : process waiting to be cleaned up |
@@ -442,8 +449,8 @@ The task states are: | |||
442 | ftrace_enabled | 449 | ftrace_enabled |
443 | -------------- | 450 | -------------- |
444 | 451 | ||
445 | The following tracers give different output depending on whether | 452 | The following tracers (listed below) give different output depending |
446 | or not the sysctl ftrace_enabled is set. To set ftrace_enabled, | 453 | on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled, |
447 | one can either use the sysctl function or set it via the proc | 454 | one can either use the sysctl function or set it via the proc |
448 | file system interface. | 455 | file system interface. |
449 | 456 | ||
@@ -470,13 +477,12 @@ interrupt from triggering or the mouse interrupt from letting the | |||
470 | kernel know of a new mouse event. The result is a latency with the | 477 | kernel know of a new mouse event. The result is a latency with the |
471 | reaction time. | 478 | reaction time. |
472 | 479 | ||
473 | The irqsoff tracer tracks the time interrupts are disabled and when | 480 | The irqsoff tracer tracks the time for which interrupts are disabled. |
474 | they are re-enabled. When a new maximum latency is hit, it saves off | 481 | When a new maximum latency is hit, the tracer saves the trace leading up |
475 | the trace so that it may be retrieved at a later time. Every time a | 482 | to that latency point so that every time a new maximum is reached, the old |
476 | new maximum in reached, the old saved trace is discarded and the new | 483 | saved trace is discarded and the new trace is saved. |
477 | trace is saved. | ||
478 | 484 | ||
479 | To reset the maximum, echo 0 into tracing_max_latency. Here's an | 485 | To reset the maximum, echo 0 into tracing_max_latency. Here is an |
480 | example: | 486 | example: |
481 | 487 | ||
482 | # echo irqsoff > /debug/tracing/current_tracer | 488 | # echo irqsoff > /debug/tracing/current_tracer |
@@ -488,14 +494,14 @@ example: | |||
488 | # cat /debug/tracing/latency_trace | 494 | # cat /debug/tracing/latency_trace |
489 | # tracer: irqsoff | 495 | # tracer: irqsoff |
490 | # | 496 | # |
491 | irqsoff latency trace v1.1.5 on 2.6.26-rc8 | 497 | irqsoff latency trace v1.1.5 on 2.6.26 |
492 | -------------------------------------------------------------------- | 498 | -------------------------------------------------------------------- |
493 | latency: 6 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) | 499 | latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) |
494 | ----------------- | 500 | ----------------- |
495 | | task: bash-4269 (uid:0 nice:0 policy:0 rt_prio:0) | 501 | | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0) |
496 | ----------------- | 502 | ----------------- |
497 | => started at: copy_page_range | 503 | => started at: sys_setpgid |
498 | => ended at: copy_page_range | 504 | => ended at: sys_setpgid |
499 | 505 | ||
500 | # _------=> CPU# | 506 | # _------=> CPU# |
501 | # / _-----=> irqs-off | 507 | # / _-----=> irqs-off |
@@ -506,21 +512,19 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
506 | # ||||| delay | 512 | # ||||| delay |
507 | # cmd pid ||||| time | caller | 513 | # cmd pid ||||| time | caller |
508 | # \ / ||||| \ | / | 514 | # \ / ||||| \ | / |
509 | bash-4269 1...1 0us+: _spin_lock (copy_page_range) | 515 | bash-3730 1d... 0us : _write_lock_irq (sys_setpgid) |
510 | bash-4269 1...1 7us : _spin_unlock (copy_page_range) | 516 | bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid) |
511 | bash-4269 1...2 7us : trace_preempt_on (copy_page_range) | 517 | bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid) |
512 | 518 | ||
513 | 519 | ||
514 | vim:ft=help | 520 | Here we see that that we had a latency of 12 microsecs (which is |
521 | very good). The _write_lock_irq in sys_setpgid disabled interrupts. | ||
522 | The difference between the 12 and the displayed timestamp 14us occurred | ||
523 | because the clock was incremented between the time of recording the max | ||
524 | latency and the time of recording the function that had that latency. | ||
515 | 525 | ||
516 | Here we see that that we had a latency of 6 microsecs (which is | 526 | Note the above example had ftrace_enabled not set. If we set the |
517 | very good). The spin_lock in copy_page_range disabled interrupts. | 527 | ftrace_enabled, we get a much larger output: |
518 | The difference between the 6 and the displayed timestamp 7us is | ||
519 | because the clock must have incremented between the time of recording | ||
520 | the max latency and recording the function that had that latency. | ||
521 | |||
522 | Note the above had ftrace_enabled not set. If we set the ftrace_enabled | ||
523 | we get a much larger output: | ||
524 | 528 | ||
525 | # tracer: irqsoff | 529 | # tracer: irqsoff |
526 | # | 530 | # |
@@ -566,27 +570,26 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
566 | ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal) | 570 | ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal) |
567 | 571 | ||
568 | 572 | ||
569 | vim:ft=help | ||
570 | |||
571 | 573 | ||
572 | Here we traced a 50 microsecond latency. But we also see all the | 574 | Here we traced a 50 microsecond latency. But we also see all the |
573 | functions that were called during that time. Note that enabling | 575 | functions that were called during that time. Note that by enabling |
574 | function tracing we endure an added overhead. This overhead may | 576 | function tracing, we incur an added overhead. This overhead may |
575 | extend the latency times. But never the less, this trace has provided | 577 | extend the latency times. But nevertheless, this trace has provided |
576 | some very helpful debugging. | 578 | some very helpful debugging information. |
577 | 579 | ||
578 | 580 | ||
579 | preemptoff | 581 | preemptoff |
580 | ---------- | 582 | ---------- |
581 | 583 | ||
582 | When preemption is disabled we may be able to receive interrupts but | 584 | When preemption is disabled, we may be able to receive interrupts but |
583 | the task can not be preempted and a higher priority task must wait | 585 | the task cannot be preempted and a higher priority task must wait |
584 | for preemption to be enabled again before it can preempt a lower | 586 | for preemption to be enabled again before it can preempt a lower |
585 | priority task. | 587 | priority task. |
586 | 588 | ||
587 | The preemptoff tracer traces the places that disables preemption. | 589 | The preemptoff tracer traces the places that disable preemption. |
588 | Like the irqsoff, it records the maximum latency that preemption | 590 | Like the irqsoff tracer, it records the maximum latency for which preemption |
589 | was disabled. The control of preemptoff is much like the irqsoff. | 591 | was disabled. The control of preemptoff tracer is much like the irqsoff |
592 | tracer. | ||
590 | 593 | ||
591 | # echo preemptoff > /debug/tracing/current_tracer | 594 | # echo preemptoff > /debug/tracing/current_tracer |
592 | # echo 0 > /debug/tracing/tracing_max_latency | 595 | # echo 0 > /debug/tracing/tracing_max_latency |
@@ -620,8 +623,6 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8 | |||
620 | sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) | 623 | sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) |
621 | 624 | ||
622 | 625 | ||
623 | vim:ft=help | ||
624 | |||
625 | This has some more changes. Preemption was disabled when an interrupt | 626 | This has some more changes. Preemption was disabled when an interrupt |
626 | came in (notice the 'h'), and was enabled while doing a softirq. | 627 | came in (notice the 'h'), and was enabled while doing a softirq. |
627 | (notice the 's'). But we also see that interrupts have been disabled | 628 | (notice the 's'). But we also see that interrupts have been disabled |
@@ -689,16 +690,16 @@ The above is an example of the preemptoff trace with ftrace_enabled | |||
689 | set. Here we see that interrupts were disabled the entire time. | 690 | set. Here we see that interrupts were disabled the entire time. |
690 | The irq_enter code lets us know that we entered an interrupt 'h'. | 691 | The irq_enter code lets us know that we entered an interrupt 'h'. |
691 | Before that, the functions being traced still show that it is not | 692 | Before that, the functions being traced still show that it is not |
692 | in an interrupt, but we can see by the functions themselves that | 693 | in an interrupt, but we can see from the functions themselves that |
693 | this is not the case. | 694 | this is not the case. |
694 | 695 | ||
695 | Notice that the __do_softirq when called doesn't have a preempt_count. | 696 | Notice that __do_softirq when called does not have a preempt_count. |
696 | It may seem that we missed a preempt enabled. What really happened | 697 | It may seem that we missed a preempt enabling. What really happened |
697 | is that the preempt count is held on the threads stack and we | 698 | is that the preempt count is held on the thread's stack and we |
698 | switched to the softirq stack (4K stacks in effect). The code | 699 | switched to the softirq stack (4K stacks in effect). The code |
699 | does not copy the preempt count, but because interrupts are disabled | 700 | does not copy the preempt count, but because interrupts are disabled, |
700 | we don't need to worry about it. Having a tracer like this is good | 701 | we do not need to worry about it. Having a tracer like this is good |
701 | to let people know what really happens inside the kernel. | 702 | for letting people know what really happens inside the kernel. |
702 | 703 | ||
703 | 704 | ||
704 | preemptirqsoff | 705 | preemptirqsoff |
@@ -708,7 +709,7 @@ Knowing the locations that have interrupts disabled or preemption | |||
708 | disabled for the longest times is helpful. But sometimes we would | 709 | disabled for the longest times is helpful. But sometimes we would |
709 | like to know when either preemption and/or interrupts are disabled. | 710 | like to know when either preemption and/or interrupts are disabled. |
710 | 711 | ||
711 | The following code: | 712 | Consider the following code: |
712 | 713 | ||
713 | local_irq_disable(); | 714 | local_irq_disable(); |
714 | call_function_with_irqs_off(); | 715 | call_function_with_irqs_off(); |
@@ -732,7 +733,7 @@ To record this time, use the preemptirqsoff tracer. | |||
732 | 733 | ||
733 | Again, using this trace is much like the irqsoff and preemptoff tracers. | 734 | Again, using this trace is much like the irqsoff and preemptoff tracers. |
734 | 735 | ||
735 | # echo preemptoff > /debug/tracing/current_tracer | 736 | # echo preemptirqsoff > /debug/tracing/current_tracer |
736 | # echo 0 > /debug/tracing/tracing_max_latency | 737 | # echo 0 > /debug/tracing/tracing_max_latency |
737 | # echo 1 > /debug/tracing/tracing_enabled | 738 | # echo 1 > /debug/tracing/tracing_enabled |
738 | # ls -ltr | 739 | # ls -ltr |
@@ -764,12 +765,10 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
764 | ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq) | 765 | ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq) |
765 | 766 | ||
766 | 767 | ||
767 | vim:ft=help | ||
768 | |||
769 | 768 | ||
770 | The trace_hardirqs_off_thunk is called from assembly on x86 when | 769 | The trace_hardirqs_off_thunk is called from assembly on x86 when |
771 | interrupts are disabled in the assembly code. Without the function | 770 | interrupts are disabled in the assembly code. Without the function |
772 | tracing, we don't know if interrupts were enabled within the preemption | 771 | tracing, we do not know if interrupts were enabled within the preemption |
773 | points. We do see that it started with preemption enabled. | 772 | points. We do see that it started with preemption enabled. |
774 | 773 | ||
775 | Here is a trace with ftrace_enabled set: | 774 | Here is a trace with ftrace_enabled set: |
@@ -860,25 +859,25 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
860 | 859 | ||
861 | This is a very interesting trace. It started with the preemption of | 860 | This is a very interesting trace. It started with the preemption of |
862 | the ls task. We see that the task had the "need_resched" bit set | 861 | the ls task. We see that the task had the "need_resched" bit set |
863 | with the 'N' in the trace. Interrupts are disabled in the spin_lock | 862 | via the 'N' in the trace. Interrupts were disabled before the spin_lock |
864 | and the trace started. We see that a schedule took place to run | 863 | at the beginning of the trace. We see that a schedule took place to run |
865 | sshd. When the interrupts were enabled we took an interrupt. | 864 | sshd. When the interrupts were enabled, we took an interrupt. |
866 | On return of the interrupt the softirq ran. We took another interrupt | 865 | On return from the interrupt handler, the softirq ran. We took another |
867 | while running the softirq as we see with the capital 'H'. | 866 | interrupt while running the softirq as we see from the capital 'H'. |
868 | 867 | ||
869 | 868 | ||
870 | wakeup | 869 | wakeup |
871 | ------ | 870 | ------ |
872 | 871 | ||
873 | In Real-Time environment it is very important to know the wakeup | 872 | In a Real-Time environment it is very important to know the wakeup |
874 | time it takes for the highest priority task that wakes up to the | 873 | time it takes for the highest priority task that is woken up to the |
875 | time it executes. This is also known as "schedule latency". | 874 | time that it executes. This is also known as "schedule latency". |
876 | I stress the point that this is about RT tasks. It is also important | 875 | I stress the point that this is about RT tasks. It is also important |
877 | to know the scheduling latency of non-RT tasks, but the average | 876 | to know the scheduling latency of non-RT tasks, but the average |
878 | schedule latency is better for non-RT tasks. Tools like | 877 | schedule latency is better for non-RT tasks. Tools like |
879 | LatencyTop is more appropriate for such measurements. | 878 | LatencyTop are more appropriate for such measurements. |
880 | 879 | ||
881 | Real-Time environments is interested in the worst case latency. | 880 | Real-Time environments are interested in the worst case latency. |
882 | That is the longest latency it takes for something to happen, and | 881 | That is the longest latency it takes for something to happen, and |
883 | not the average. We can have a very fast scheduler that may only | 882 | not the average. We can have a very fast scheduler that may only |
884 | have a large latency once in a while, but that would not work well | 883 | have a large latency once in a while, but that would not work well |
@@ -889,8 +888,8 @@ tasks that are unpredictable will overwrite the worst case latency | |||
889 | of RT tasks. | 888 | of RT tasks. |
890 | 889 | ||
891 | Since this tracer only deals with RT tasks, we will run this slightly | 890 | Since this tracer only deals with RT tasks, we will run this slightly |
892 | different than we did with the previous tracers. Instead of performing | 891 | differently than we did with the previous tracers. Instead of performing |
893 | an 'ls' we will run 'sleep 1' under 'chrt' which changes the | 892 | an 'ls', we will run 'sleep 1' under 'chrt' which changes the |
894 | priority of the task. | 893 | priority of the task. |
895 | 894 | ||
896 | # echo wakeup > /debug/tracing/current_tracer | 895 | # echo wakeup > /debug/tracing/current_tracer |
@@ -921,12 +920,10 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8 | |||
921 | <idle>-0 1d..4 4us : schedule (cpu_idle) | 920 | <idle>-0 1d..4 4us : schedule (cpu_idle) |
922 | 921 | ||
923 | 922 | ||
924 | vim:ft=help | ||
925 | |||
926 | 923 | ||
927 | Running this on an idle system we see that it only took 4 microseconds | 924 | Running this on an idle system, we see that it only took 4 microseconds |
928 | to perform the task switch. Note, since the trace marker in the | 925 | to perform the task switch. Note, since the trace marker in the |
929 | schedule is before the actual "switch" we stop the tracing when | 926 | schedule is before the actual "switch", we stop the tracing when |
930 | the recorded task is about to schedule in. This may change if | 927 | the recorded task is about to schedule in. This may change if |
931 | we add a new marker at the end of the scheduler. | 928 | we add a new marker at the end of the scheduler. |
932 | 929 | ||
@@ -991,13 +988,16 @@ ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock) | |||
991 | ksoftirq-7 1d..4 50us : schedule (__cond_resched) | 988 | ksoftirq-7 1d..4 50us : schedule (__cond_resched) |
992 | 989 | ||
993 | The interrupt went off while running ksoftirqd. This task runs at | 990 | The interrupt went off while running ksoftirqd. This task runs at |
994 | SCHED_OTHER. Why didn't we see the 'N' set early? This may be | 991 | SCHED_OTHER. Why did not we see the 'N' set early? This may be |
995 | a harmless bug with x86_32 and 4K stacks. The need_reched() function | 992 | a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks |
996 | that tests if we need to reschedule looks on the actual stack. | 993 | configured, the interrupt and softirq run with their own stack. |
997 | Where as the setting of the NEED_RESCHED bit happens on the | 994 | Some information is held on the top of the task's stack (need_resched |
998 | task's stack. But because we are in a hard interrupt, the test | 995 | and preempt_count are both stored there). The setting of the NEED_RESCHED |
999 | is with the interrupts stack which has that to be false. We don't | 996 | bit is done directly to the task's stack, but the reading of the |
1000 | see the 'N' until we switch back to the task's stack. | 997 | NEED_RESCHED is done by looking at the current stack, which in this case |
998 | is the stack for the hard interrupt. This hides the fact that NEED_RESCHED | ||
999 | has been set. We do not see the 'N' until we switch back to the task's | ||
1000 | assigned stack. | ||
1001 | 1001 | ||
1002 | ftrace | 1002 | ftrace |
1003 | ------ | 1003 | ------ |
@@ -1036,14 +1036,14 @@ this tracer is a nop. | |||
1036 | [...] | 1036 | [...] |
1037 | 1037 | ||
1038 | 1038 | ||
1039 | Note: It is sometimes better to enable or disable tracing directly from | 1039 | Note: ftrace uses ring buffers to store the above entries. The newest data |
1040 | a program, because the buffer may be overflowed by the echo commands | 1040 | may overwrite the oldest data. Sometimes using echo to stop the trace |
1041 | before you get to the point you want to trace. It is also easier to | 1041 | is not sufficient because the tracing could have overwritten the data |
1042 | stop the tracing at the point that you hit the part that you are | 1042 | that you wanted to record. For this reason, it is sometimes better to |
1043 | interested in. Since the ftrace buffer is a ring buffer with the | 1043 | disable tracing directly from a program. This allows you to stop the |
1044 | oldest data being overwritten, usually it is sufficient to start the | 1044 | tracing at the point that you hit the part that you are interested in. |
1045 | tracer with an echo command but have you code stop it. Something | 1045 | To disable the tracing directly from a C program, something like following |
1046 | like the following is usually appropriate for this. | 1046 | code snippet can be used: |
1047 | 1047 | ||
1048 | int trace_fd; | 1048 | int trace_fd; |
1049 | [...] | 1049 | [...] |
@@ -1052,25 +1052,31 @@ int main(int argc, char *argv[]) { | |||
1052 | trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY); | 1052 | trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY); |
1053 | [...] | 1053 | [...] |
1054 | if (condition_hit()) { | 1054 | if (condition_hit()) { |
1055 | write(trace_fd, "0", 1); | 1055 | write(trace_fd, "0", 1); |
1056 | } | 1056 | } |
1057 | [...] | 1057 | [...] |
1058 | } | 1058 | } |
1059 | 1059 | ||
1060 | Note: Here we hard coded the path name. The debugfs mount is not | ||
1061 | guaranteed to be at /debug (and is more commonly at /sys/kernel/debug). | ||
1062 | For simple one time traces, the above is sufficent. For anything else, | ||
1063 | a search through /proc/mounts may be needed to find where the debugfs | ||
1064 | file-system is mounted. | ||
1060 | 1065 | ||
1061 | dynamic ftrace | 1066 | dynamic ftrace |
1062 | -------------- | 1067 | -------------- |
1063 | 1068 | ||
1064 | If CONFIG_DYNAMIC_FTRACE is set, then the system will run with | 1069 | If CONFIG_DYNAMIC_FTRACE is set, the system will run with |
1065 | virtually no overhead when function tracing is disabled. The way | 1070 | virtually no overhead when function tracing is disabled. The way |
1066 | this works is the mcount function call (placed at the start of | 1071 | this works is the mcount function call (placed at the start of |
1067 | every kernel function, produced by the -pg switch in gcc), starts | 1072 | every kernel function, produced by the -pg switch in gcc), starts |
1068 | of pointing to a simple return. | 1073 | of pointing to a simple return. (Enabling FTRACE will include the |
1074 | -pg switch in the compiling of the kernel.) | ||
1069 | 1075 | ||
1070 | When dynamic ftrace is initialized, it calls kstop_machine to make it | 1076 | When dynamic ftrace is initialized, it calls kstop_machine to make |
1071 | act like a uniprocessor so that it can freely modify code without | 1077 | the machine act like a uniprocessor so that it can freely modify code |
1072 | worrying about other processors executing that same code. At | 1078 | without worrying about other processors executing that same code. At |
1073 | initialization, the mcount calls are change to call a "record_ip" | 1079 | initialization, the mcount calls are changed to call a "record_ip" |
1074 | function. After this, the first time a kernel function is called, | 1080 | function. After this, the first time a kernel function is called, |
1075 | it has the calling address saved in a hash table. | 1081 | it has the calling address saved in a hash table. |
1076 | 1082 | ||
@@ -1078,15 +1084,15 @@ Later on the ftraced kernel thread is awoken and will again call | |||
1078 | kstop_machine if new functions have been recorded. The ftraced thread | 1084 | kstop_machine if new functions have been recorded. The ftraced thread |
1079 | will change all calls to mcount to "nop". Just calling mcount | 1085 | will change all calls to mcount to "nop". Just calling mcount |
1080 | and having mcount return has shown a 10% overhead. By converting | 1086 | and having mcount return has shown a 10% overhead. By converting |
1081 | it to a nop, there is no recordable overhead to the system. | 1087 | it to a nop, there is no measurable overhead to the system. |
1082 | 1088 | ||
1083 | One special side-effect to the recording of the functions being | 1089 | One special side-effect to the recording of the functions being |
1084 | traced, is that we can now selectively choose which functions we | 1090 | traced is that we can now selectively choose which functions we |
1085 | want to trace and which ones we want the mcount calls to remain as | 1091 | wish to trace and which ones we want the mcount calls to remain as |
1086 | nops. | 1092 | nops. |
1087 | 1093 | ||
1088 | Two files that contain to the enabling and disabling of recorded | 1094 | Two files are used, one for enabling and one for disabling the tracing |
1089 | functions are: | 1095 | of specified functions. They are: |
1090 | 1096 | ||
1091 | set_ftrace_filter | 1097 | set_ftrace_filter |
1092 | 1098 | ||
@@ -1094,7 +1100,7 @@ and | |||
1094 | 1100 | ||
1095 | set_ftrace_notrace | 1101 | set_ftrace_notrace |
1096 | 1102 | ||
1097 | A list of available functions that you can add to this files is listed | 1103 | A list of available functions that you can add to these files is listed |
1098 | in: | 1104 | in: |
1099 | 1105 | ||
1100 | available_filter_functions | 1106 | available_filter_functions |
@@ -1108,7 +1114,7 @@ pick_next_task_fair | |||
1108 | mutex_lock | 1114 | mutex_lock |
1109 | [...] | 1115 | [...] |
1110 | 1116 | ||
1111 | If I'm only interested in sys_nanosleep and hrtimer_interrupt: | 1117 | If I am only interested in sys_nanosleep and hrtimer_interrupt: |
1112 | 1118 | ||
1113 | # echo sys_nanosleep hrtimer_interrupt \ | 1119 | # echo sys_nanosleep hrtimer_interrupt \ |
1114 | > /debug/tracing/set_ftrace_filter | 1120 | > /debug/tracing/set_ftrace_filter |
@@ -1125,21 +1131,21 @@ If I'm only interested in sys_nanosleep and hrtimer_interrupt: | |||
1125 | usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call | 1131 | usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call |
1126 | <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt | 1132 | <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt |
1127 | 1133 | ||
1128 | To see what functions are being traced, you can cat the file: | 1134 | To see which functions are being traced, you can cat the file: |
1129 | 1135 | ||
1130 | # cat /debug/tracing/set_ftrace_filter | 1136 | # cat /debug/tracing/set_ftrace_filter |
1131 | hrtimer_interrupt | 1137 | hrtimer_interrupt |
1132 | sys_nanosleep | 1138 | sys_nanosleep |
1133 | 1139 | ||
1134 | 1140 | ||
1135 | Perhaps this isn't enough. The filters also allow simple wild cards. | 1141 | Perhaps this is not enough. The filters also allow simple wild cards. |
1136 | Only the following is currently available | 1142 | Only the following are currently available |
1137 | 1143 | ||
1138 | <match>* - will match functions that begins with <match> | 1144 | <match>* - will match functions that begin with <match> |
1139 | *<match> - will match functions that end with <match> | 1145 | *<match> - will match functions that end with <match> |
1140 | *<match>* - will match functions that have <match> in it | 1146 | *<match>* - will match functions that have <match> in it |
1141 | 1147 | ||
1142 | Thats all the wild cards that are allowed. | 1148 | These are the only wild cards which are supported. |
1143 | 1149 | ||
1144 | <match>*<match> will not work. | 1150 | <match>*<match> will not work. |
1145 | 1151 | ||
@@ -1187,7 +1193,7 @@ This is because the '>' and '>>' act just like they do in bash. | |||
1187 | To rewrite the filters, use '>' | 1193 | To rewrite the filters, use '>' |
1188 | To append to the filters, use '>>' | 1194 | To append to the filters, use '>>' |
1189 | 1195 | ||
1190 | To clear out a filter so that all functions will be recorded again. | 1196 | To clear out a filter so that all functions will be recorded again: |
1191 | 1197 | ||
1192 | # echo > /debug/tracing/set_ftrace_filter | 1198 | # echo > /debug/tracing/set_ftrace_filter |
1193 | # cat /debug/tracing/set_ftrace_filter | 1199 | # cat /debug/tracing/set_ftrace_filter |
@@ -1246,24 +1252,24 @@ ftraced | |||
1246 | 1252 | ||
1247 | As mentioned above, when dynamic ftrace is configured in, a kernel | 1253 | As mentioned above, when dynamic ftrace is configured in, a kernel |
1248 | thread wakes up once a second and checks to see if there are mcount | 1254 | thread wakes up once a second and checks to see if there are mcount |
1249 | calls that need to be converted into nops. If there is not, then | 1255 | calls that need to be converted into nops. If there are not any, then |
1250 | it simply goes back to sleep. But if there is, it will call | 1256 | it simply goes back to sleep. But if there are some, it will call |
1251 | kstop_machine to convert the calls to nops. | 1257 | kstop_machine to convert the calls to nops. |
1252 | 1258 | ||
1253 | There may be a case that you do not want this added latency. | 1259 | There may be a case in which you do not want this added latency. |
1254 | Perhaps you are doing some audio recording and this activity might | 1260 | Perhaps you are doing some audio recording and this activity might |
1255 | cause skips in the playback. There is an interface to disable | 1261 | cause skips in the playback. There is an interface to disable |
1256 | and enable the ftraced kernel thread. | 1262 | and enable the "ftraced" kernel thread. |
1257 | 1263 | ||
1258 | # echo 0 > /debug/tracing/ftraced_enabled | 1264 | # echo 0 > /debug/tracing/ftraced_enabled |
1259 | 1265 | ||
1260 | This will disable the calling of the kstop_machine to update the | 1266 | This will disable the calling of kstop_machine to update the |
1261 | mcount calls to nops. Remember that there's a large overhead | 1267 | mcount calls to nops. Remember that there is a large overhead |
1262 | to calling mcount. Without this kernel thread, that overhead will | 1268 | to calling mcount. Without this kernel thread, that overhead will |
1263 | exist. | 1269 | exist. |
1264 | 1270 | ||
1265 | Any write to the ftraced_enabled file will cause the kstop_machine | 1271 | If there are recorded calls to mcount, any write to the ftraced_enabled |
1266 | to run if there are recorded calls to mcount. This means that a | 1272 | file will cause the kstop_machine to run. This means that a |
1267 | user can manually perform the updates when they want to by simply | 1273 | user can manually perform the updates when they want to by simply |
1268 | echoing a '0' into the ftraced_enabled file. | 1274 | echoing a '0' into the ftraced_enabled file. |
1269 | 1275 | ||
@@ -1274,8 +1280,8 @@ that uses ftrace function recording. | |||
1274 | trace_pipe | 1280 | trace_pipe |
1275 | ---------- | 1281 | ---------- |
1276 | 1282 | ||
1277 | The trace_pipe outputs the same as trace, but the effect on the | 1283 | The trace_pipe outputs the same content as the trace file, but the effect |
1278 | tracing is different. Every read from trace_pipe is consumed. | 1284 | on the tracing is different. Every read from trace_pipe is consumed. |
1279 | This means that subsequent reads will be different. The trace | 1285 | This means that subsequent reads will be different. The trace |
1280 | is live. | 1286 | is live. |
1281 | 1287 | ||
@@ -1305,7 +1311,7 @@ is live. | |||
1305 | bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up | 1311 | bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up |
1306 | 1312 | ||
1307 | 1313 | ||
1308 | Note, reading the trace_pipe will block until more input is added. | 1314 | Note, reading the trace_pipe file will block until more input is added. |
1309 | By changing the tracer, trace_pipe will issue an EOF. We needed | 1315 | By changing the tracer, trace_pipe will issue an EOF. We needed |
1310 | to set the ftrace tracer _before_ cating the trace_pipe file. | 1316 | to set the ftrace tracer _before_ cating the trace_pipe file. |
1311 | 1317 | ||
@@ -1314,8 +1320,8 @@ trace entries | |||
1314 | ------------- | 1320 | ------------- |
1315 | 1321 | ||
1316 | Having too much or not enough data can be troublesome in diagnosing | 1322 | Having too much or not enough data can be troublesome in diagnosing |
1317 | some issue in the kernel. The file trace_entries is used to modify | 1323 | an issue in the kernel. The file trace_entries is used to modify |
1318 | the size of the internal trace buffers. The numbers listed | 1324 | the size of the internal trace buffers. The number listed |
1319 | is the number of entries that can be recorded per CPU. To know | 1325 | is the number of entries that can be recorded per CPU. To know |
1320 | the full size, multiply the number of possible CPUS with the | 1326 | the full size, multiply the number of possible CPUS with the |
1321 | number of entries. | 1327 | number of entries. |
@@ -1323,8 +1329,9 @@ number of entries. | |||
1323 | # cat /debug/tracing/trace_entries | 1329 | # cat /debug/tracing/trace_entries |
1324 | 65620 | 1330 | 65620 |
1325 | 1331 | ||
1326 | Note, to modify this you must have tracing fulling disabled. To do that, | 1332 | Note, to modify this, you must have tracing completely disabled. To do that, |
1327 | echo "none" into the current_tracer. | 1333 | echo "none" into the current_tracer. If the current_tracer is not set |
1334 | to "none", an EINVAL error will be returned. | ||
1328 | 1335 | ||
1329 | # echo none > /debug/tracing/current_tracer | 1336 | # echo none > /debug/tracing/current_tracer |
1330 | # echo 100000 > /debug/tracing/trace_entries | 1337 | # echo 100000 > /debug/tracing/trace_entries |
@@ -1333,18 +1340,18 @@ echo "none" into the current_tracer. | |||
1333 | 1340 | ||
1334 | 1341 | ||
1335 | Notice that we echoed in 100,000 but the size is 100,045. The entries | 1342 | Notice that we echoed in 100,000 but the size is 100,045. The entries |
1336 | are held by individual pages. It allocates the number of pages it takes | 1343 | are held in individual pages. It allocates the number of pages it takes |
1337 | to fulfill the request. If more entries may fit on the last page | 1344 | to fulfill the request. If more entries may fit on the last page |
1338 | it will add them. | 1345 | then they will be added. |
1339 | 1346 | ||
1340 | # echo 1 > /debug/tracing/trace_entries | 1347 | # echo 1 > /debug/tracing/trace_entries |
1341 | # cat /debug/tracing/trace_entries | 1348 | # cat /debug/tracing/trace_entries |
1342 | 85 | 1349 | 85 |
1343 | 1350 | ||
1344 | This shows us that 85 entries can fit on a single page. | 1351 | This shows us that 85 entries can fit in a single page. |
1345 | 1352 | ||
1346 | The number of pages that will be allocated is a percentage of available | 1353 | The number of pages which will be allocated is limited to a percentage |
1347 | memory. Allocating too much will produces an error. | 1354 | of available memory. Allocating too much will produce an error. |
1348 | 1355 | ||
1349 | # echo 1000000000000 > /debug/tracing/trace_entries | 1356 | # echo 1000000000000 > /debug/tracing/trace_entries |
1350 | -bash: echo: write error: Cannot allocate memory | 1357 | -bash: echo: write error: Cannot allocate memory |