diff options
-rw-r--r-- | Documentation/ftrace.txt | 956 |
1 files changed, 530 insertions, 426 deletions
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt index 055bcd2992d..2041ee951c1 100644 --- a/Documentation/ftrace.txt +++ b/Documentation/ftrace.txt | |||
@@ -15,31 +15,31 @@ Introduction | |||
15 | 15 | ||
16 | Ftrace is an internal tracer designed to help out developers and | 16 | Ftrace is an internal tracer designed to help out developers and |
17 | designers of systems to find what is going on inside the kernel. | 17 | designers of systems to find what is going on inside the kernel. |
18 | It can be used for debugging or analyzing latencies and performance | 18 | It can be used for debugging or analyzing latencies and |
19 | issues that take place outside of user-space. | 19 | performance issues that take place outside of user-space. |
20 | 20 | ||
21 | Although ftrace is the function tracer, it also includes an | 21 | Although ftrace is the function tracer, it also includes an |
22 | infrastructure that allows for other types of tracing. Some of the | 22 | infrastructure that allows for other types of tracing. Some of |
23 | tracers that are currently in ftrace include a tracer to trace | 23 | the tracers that are currently in ftrace include a tracer to |
24 | context switches, the time it takes for a high priority task to | 24 | trace context switches, the time it takes for a high priority |
25 | run after it was woken up, the time interrupts are disabled, and | 25 | task to run after it was woken up, the time interrupts are |
26 | more (ftrace allows for tracer plugins, which means that the list of | 26 | disabled, and more (ftrace allows for tracer plugins, which |
27 | tracers can always grow). | 27 | means that the list of tracers can always grow). |
28 | 28 | ||
29 | 29 | ||
30 | The File System | 30 | The File System |
31 | --------------- | 31 | --------------- |
32 | 32 | ||
33 | Ftrace uses the debugfs file system to hold the control files as well | 33 | Ftrace uses the debugfs file system to hold the control files as |
34 | as the files to display output. | 34 | well as the files to display output. |
35 | 35 | ||
36 | To mount the debugfs system: | 36 | To mount the debugfs system: |
37 | 37 | ||
38 | # mkdir /debug | 38 | # mkdir /debug |
39 | # mount -t debugfs nodev /debug | 39 | # mount -t debugfs nodev /debug |
40 | 40 | ||
41 | (Note: it is more common to mount at /sys/kernel/debug, but for simplicity | 41 | ( Note: it is more common to mount at /sys/kernel/debug, but for |
42 | this document will use /debug) | 42 | simplicity this document will use /debug) |
43 | 43 | ||
44 | That's it! (assuming that you have ftrace configured into your kernel) | 44 | That's it! (assuming that you have ftrace configured into your kernel) |
45 | 45 | ||
@@ -50,94 +50,124 @@ of ftrace. Here is a list of some of the key files: | |||
50 | 50 | ||
51 | Note: all time values are in microseconds. | 51 | Note: all time values are in microseconds. |
52 | 52 | ||
53 | current_tracer: This is used to set or display the current tracer | 53 | current_tracer: |
54 | that is configured. | 54 | |
55 | 55 | This is used to set or display the current tracer | |
56 | available_tracers: This holds the different types of tracers that | 56 | that is configured. |
57 | have been compiled into the kernel. The tracers | 57 | |
58 | listed here can be configured by echoing their name | 58 | available_tracers: |
59 | into current_tracer. | 59 | |
60 | 60 | This holds the different types of tracers that | |
61 | tracing_enabled: This sets or displays whether the current_tracer | 61 | have been compiled into the kernel. The |
62 | is activated and tracing or not. Echo 0 into this | 62 | tracers listed here can be configured by |
63 | file to disable the tracer or 1 to enable it. | 63 | echoing their name into current_tracer. |
64 | 64 | ||
65 | trace: This file holds the output of the trace in a human readable | 65 | tracing_enabled: |
66 | format (described below). | 66 | |
67 | 67 | This sets or displays whether the current_tracer | |
68 | latency_trace: This file shows the same trace but the information | 68 | is activated and tracing or not. Echo 0 into this |
69 | is organized more to display possible latencies | 69 | file to disable the tracer or 1 to enable it. |
70 | in the system (described below). | 70 | |
71 | 71 | trace: | |
72 | trace_pipe: The output is the same as the "trace" file but this | 72 | |
73 | file is meant to be streamed with live tracing. | 73 | This file holds the output of the trace in a human |
74 | Reads from this file will block until new data | 74 | readable format (described below). |
75 | is retrieved. Unlike the "trace" and "latency_trace" | 75 | |
76 | files, this file is a consumer. This means reading | 76 | latency_trace: |
77 | from this file causes sequential reads to display | 77 | |
78 | more current data. Once data is read from this | 78 | This file shows the same trace but the information |
79 | file, it is consumed, and will not be read | 79 | is organized more to display possible latencies |
80 | again with a sequential read. The "trace" and | 80 | in the system (described below). |
81 | "latency_trace" files are static, and if the | 81 | |
82 | tracer is not adding more data, they will display | 82 | trace_pipe: |
83 | the same information every time they are read. | 83 | |
84 | 84 | The output is the same as the "trace" file but this | |
85 | trace_options: This file lets the user control the amount of data | 85 | file is meant to be streamed with live tracing. |
86 | that is displayed in one of the above output | 86 | Reads from this file will block until new data |
87 | files. | 87 | is retrieved. Unlike the "trace" and "latency_trace" |
88 | 88 | files, this file is a consumer. This means reading | |
89 | trace_max_latency: Some of the tracers record the max latency. | 89 | from this file causes sequential reads to display |
90 | For example, the time interrupts are disabled. | 90 | more current data. Once data is read from this |
91 | This time is saved in this file. The max trace | 91 | file, it is consumed, and will not be read |
92 | will also be stored, and displayed by either | 92 | again with a sequential read. The "trace" and |
93 | "trace" or "latency_trace". A new max trace will | 93 | "latency_trace" files are static, and if the |
94 | only be recorded if the latency is greater than | 94 | tracer is not adding more data, they will display |
95 | the value in this file. (in microseconds) | 95 | the same information every time they are read. |
96 | 96 | ||
97 | buffer_size_kb: This sets or displays the number of kilobytes each CPU | 97 | trace_options: |
98 | buffer can hold. The tracer buffers are the same size | 98 | |
99 | for each CPU. The displayed number is the size of the | 99 | This file lets the user control the amount of data |
100 | CPU buffer and not total size of all buffers. The | 100 | that is displayed in one of the above output |
101 | trace buffers are allocated in pages (blocks of memory | 101 | files. |
102 | that the kernel uses for allocation, usually 4 KB in size). | 102 | |
103 | If the last page allocated has room for more bytes | 103 | trace_max_latency: |
104 | than requested, the rest of the page will be used, | 104 | |
105 | making the actual allocation bigger than requested. | 105 | Some of the tracers record the max latency. |
106 | (Note, the size may not be a multiple of the page size due | 106 | For example, the time interrupts are disabled. |
107 | to buffer managment overhead.) | 107 | This time is saved in this file. The max trace |
108 | 108 | will also be stored, and displayed by either | |
109 | This can only be updated when the current_tracer | 109 | "trace" or "latency_trace". A new max trace will |
110 | is set to "nop". | 110 | only be recorded if the latency is greater than |
111 | 111 | the value in this file. (in microseconds) | |
112 | tracing_cpumask: This is a mask that lets the user only trace | 112 | |
113 | on specified CPUS. The format is a hex string | 113 | buffer_size_kb: |
114 | representing the CPUS. | 114 | |
115 | 115 | This sets or displays the number of kilobytes each CPU | |
116 | set_ftrace_filter: When dynamic ftrace is configured in (see the | 116 | buffer can hold. The tracer buffers are the same size |
117 | section below "dynamic ftrace"), the code is dynamically | 117 | for each CPU. The displayed number is the size of the |
118 | modified (code text rewrite) to disable calling of the | 118 | CPU buffer and not total size of all buffers. The |
119 | function profiler (mcount). This lets tracing be configured | 119 | trace buffers are allocated in pages (blocks of memory |
120 | in with practically no overhead in performance. This also | 120 | that the kernel uses for allocation, usually 4 KB in size). |
121 | has a side effect of enabling or disabling specific functions | 121 | If the last page allocated has room for more bytes |
122 | to be traced. Echoing names of functions into this file | 122 | than requested, the rest of the page will be used, |
123 | will limit the trace to only those functions. | 123 | making the actual allocation bigger than requested. |
124 | 124 | ( Note, the size may not be a multiple of the page size | |
125 | set_ftrace_notrace: This has an effect opposite to that of | 125 | due to buffer managment overhead. ) |
126 | set_ftrace_filter. Any function that is added here will not | 126 | |
127 | be traced. If a function exists in both set_ftrace_filter | 127 | This can only be updated when the current_tracer |
128 | and set_ftrace_notrace, the function will _not_ be traced. | 128 | is set to "nop". |
129 | 129 | ||
130 | set_ftrace_pid: Have the function tracer only trace a single thread. | 130 | tracing_cpumask: |
131 | 131 | ||
132 | set_graph_function: Select the function where the trace have to start | 132 | This is a mask that lets the user only trace |
133 | with the function graph tracer (See the section | 133 | on specified CPUS. The format is a hex string |
134 | "dynamic ftrace" for more details). | 134 | representing the CPUS. |
135 | 135 | ||
136 | available_filter_functions: This lists the functions that ftrace | 136 | set_ftrace_filter: |
137 | has processed and can trace. These are the function | 137 | |
138 | names that you can pass to "set_ftrace_filter" or | 138 | When dynamic ftrace is configured in (see the |
139 | "set_ftrace_notrace". (See the section "dynamic ftrace" | 139 | section below "dynamic ftrace"), the code is dynamically |
140 | below for more details.) | 140 | modified (code text rewrite) to disable calling of the |
141 | function profiler (mcount). This lets tracing be configured | ||
142 | in with practically no overhead in performance. This also | ||
143 | has a side effect of enabling or disabling specific functions | ||
144 | to be traced. Echoing names of functions into this file | ||
145 | will limit the trace to only those functions. | ||
146 | |||
147 | set_ftrace_notrace: | ||
148 | |||
149 | This has an effect opposite to that of | ||
150 | set_ftrace_filter. Any function that is added here will not | ||
151 | be traced. If a function exists in both set_ftrace_filter | ||
152 | and set_ftrace_notrace, the function will _not_ be traced. | ||
153 | |||
154 | set_ftrace_pid: | ||
155 | |||
156 | Have the function tracer only trace a single thread. | ||
157 | |||
158 | set_graph_function: | ||
159 | |||
160 | Set a "trigger" function where tracing should start | ||
161 | with the function graph tracer (See the section | ||
162 | "dynamic ftrace" for more details). | ||
163 | |||
164 | available_filter_functions: | ||
165 | |||
166 | This lists the functions that ftrace | ||
167 | has processed and can trace. These are the function | ||
168 | names that you can pass to "set_ftrace_filter" or | ||
169 | "set_ftrace_notrace". (See the section "dynamic ftrace" | ||
170 | below for more details.) | ||
141 | 171 | ||
142 | 172 | ||
143 | The Tracers | 173 | The Tracers |
@@ -145,44 +175,66 @@ The Tracers | |||
145 | 175 | ||
146 | Here is the list of current tracers that may be configured. | 176 | Here is the list of current tracers that may be configured. |
147 | 177 | ||
148 | function - function tracer that uses mcount to trace all functions. | 178 | "function" |
179 | |||
180 | Function call tracer to trace all kernel functions. | ||
181 | |||
182 | "function_graph_tracer" | ||
183 | |||
184 | Similar to the function tracer except that the | ||
185 | function tracer probes the functions on their entry | ||
186 | whereas the function graph tracer traces on both entry | ||
187 | and exit of the functions. It then provides the ability | ||
188 | to draw a graph of function calls similar to C code | ||
189 | source. | ||
190 | |||
191 | "sched_switch" | ||
192 | |||
193 | Traces the context switches and wakeups between tasks. | ||
194 | |||
195 | "irqsoff" | ||
196 | |||
197 | Traces the areas that disable interrupts and saves | ||
198 | the trace with the longest max latency. | ||
199 | See tracing_max_latency. When a new max is recorded, | ||
200 | it replaces the old trace. It is best to view this | ||
201 | trace via the latency_trace file. | ||
149 | 202 | ||
150 | function_graph_tracer - similar to the function tracer except that the | 203 | "preemptoff" |
151 | function tracer probes the functions on their entry whereas the | ||
152 | function graph tracer traces on both entry and exit of the | ||
153 | functions. It then provides the ability to draw a graph of | ||
154 | function calls like a primitive C code source. | ||
155 | 204 | ||
156 | sched_switch - traces the context switches between tasks. | 205 | Similar to irqsoff but traces and records the amount of |
206 | time for which preemption is disabled. | ||
157 | 207 | ||
158 | irqsoff - traces the areas that disable interrupts and saves | 208 | "preemptirqsoff" |
159 | the trace with the longest max latency. | ||
160 | See tracing_max_latency. When a new max is recorded, | ||
161 | it replaces the old trace. It is best to view this | ||
162 | trace via the latency_trace file. | ||
163 | 209 | ||
164 | preemptoff - Similar to irqsoff but traces and records the amount of | 210 | Similar to irqsoff and preemptoff, but traces and |
165 | time for which preemption is disabled. | 211 | records the largest time for which irqs and/or preemption |
212 | is disabled. | ||
166 | 213 | ||
167 | preemptirqsoff - Similar to irqsoff and preemptoff, but traces and | 214 | "wakeup" |
168 | records the largest time for which irqs and/or preemption | ||
169 | is disabled. | ||
170 | 215 | ||
171 | wakeup - Traces and records the max latency that it takes for | 216 | Traces and records the max latency that it takes for |
172 | the highest priority task to get scheduled after | 217 | the highest priority task to get scheduled after |
173 | it has been woken up. | 218 | it has been woken up. |
174 | 219 | ||
175 | nop - This is not a tracer. To remove all tracers from tracing | 220 | "hw-branch-tracer" |
176 | simply echo "nop" into current_tracer. | ||
177 | 221 | ||
178 | hw-branch-tracer - traces branches on all cpu's in a circular buffer. | 222 | Uses the BTS CPU feature on x86 CPUs to traces all |
223 | branches executed. | ||
224 | |||
225 | "nop" | ||
226 | |||
227 | This is the "trace nothing" tracer. To remove all | ||
228 | tracers from tracing simply echo "nop" into | ||
229 | current_tracer. | ||
179 | 230 | ||
180 | 231 | ||
181 | Examples of using the tracer | 232 | Examples of using the tracer |
182 | ---------------------------- | 233 | ---------------------------- |
183 | 234 | ||
184 | Here are typical examples of using the tracers when controlling them only | 235 | Here are typical examples of using the tracers when controlling |
185 | with the debugfs interface (without using any user-land utilities). | 236 | them only with the debugfs interface (without using any |
237 | user-land utilities). | ||
186 | 238 | ||
187 | Output format: | 239 | Output format: |
188 | -------------- | 240 | -------------- |
@@ -199,16 +251,16 @@ Here is an example of the output format of the file "trace" | |||
199 | bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput | 251 | bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput |
200 | -------- | 252 | -------- |
201 | 253 | ||
202 | A header is printed with the tracer name that is represented by the trace. | 254 | A header is printed with the tracer name that is represented by |
203 | In this case the tracer is "function". Then a header showing the format. Task | 255 | the trace. In this case the tracer is "function". Then a header |
204 | name "bash", the task PID "4251", the CPU that it was running on | 256 | showing the format. Task name "bash", the task PID "4251", the |
205 | "01", the timestamp in <secs>.<usecs> format, the function name that was | 257 | CPU that it was running on "01", the timestamp in <secs>.<usecs> |
206 | traced "path_put" and the parent function that called this function | 258 | format, the function name that was traced "path_put" and the |
207 | "path_walk". The timestamp is the time at which the function was | 259 | parent function that called this function "path_walk". The |
208 | entered. | 260 | timestamp is the time at which the function was entered. |
209 | 261 | ||
210 | The sched_switch tracer also includes tracing of task wakeups and | 262 | The sched_switch tracer also includes tracing of task wakeups |
211 | context switches. | 263 | and context switches. |
212 | 264 | ||
213 | ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S | 265 | ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S |
214 | ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S | 266 | ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S |
@@ -217,8 +269,8 @@ context switches. | |||
217 | kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R | 269 | kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R |
218 | ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R | 270 | ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R |
219 | 271 | ||
220 | Wake ups are represented by a "+" and the context switches are shown as | 272 | Wake ups are represented by a "+" and the context switches are |
221 | "==>". The format is: | 273 | shown as "==>". The format is: |
222 | 274 | ||
223 | Context switches: | 275 | Context switches: |
224 | 276 | ||
@@ -232,19 +284,20 @@ Wake ups are represented by a "+" and the context switches are shown as | |||
232 | 284 | ||
233 | <pid>:<prio>:<state> + <pid>:<prio>:<state> | 285 | <pid>:<prio>:<state> + <pid>:<prio>:<state> |
234 | 286 | ||
235 | The prio is the internal kernel priority, which is the inverse of the | 287 | The prio is the internal kernel priority, which is the inverse |
236 | priority that is usually displayed by user-space tools. Zero represents | 288 | of the priority that is usually displayed by user-space tools. |
237 | the highest priority (99). Prio 100 starts the "nice" priorities with | 289 | Zero represents the highest priority (99). Prio 100 starts the |
238 | 100 being equal to nice -20 and 139 being nice 19. The prio "140" is | 290 | "nice" priorities with 100 being equal to nice -20 and 139 being |
239 | reserved for the idle task which is the lowest priority thread (pid 0). | 291 | nice 19. The prio "140" is reserved for the idle task which is |
292 | the lowest priority thread (pid 0). | ||
240 | 293 | ||
241 | 294 | ||
242 | Latency trace format | 295 | Latency trace format |
243 | -------------------- | 296 | -------------------- |
244 | 297 | ||
245 | For traces that display latency times, the latency_trace file gives | 298 | For traces that display latency times, the latency_trace file |
246 | somewhat more information to see why a latency happened. Here is a typical | 299 | gives somewhat more information to see why a latency happened. |
247 | trace. | 300 | Here is a typical trace. |
248 | 301 | ||
249 | # tracer: irqsoff | 302 | # tracer: irqsoff |
250 | # | 303 | # |
@@ -271,20 +324,20 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
271 | <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq) | 324 | <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq) |
272 | 325 | ||
273 | 326 | ||
327 | This shows that the current tracer is "irqsoff" tracing the time | ||
328 | for which interrupts were disabled. It gives the trace version | ||
329 | and the version of the kernel upon which this was executed on | ||
330 | (2.6.26-rc8). Then it displays the max latency in microsecs (97 | ||
331 | us). The number of trace entries displayed and the total number | ||
332 | recorded (both are three: #3/3). The type of preemption that was | ||
333 | used (PREEMPT). VP, KP, SP, and HP are always zero and are | ||
334 | reserved for later use. #P is the number of online CPUS (#P:2). | ||
274 | 335 | ||
275 | This shows that the current tracer is "irqsoff" tracing the time for which | 336 | The task is the process that was running when the latency |
276 | interrupts were disabled. It gives the trace version and the version | 337 | occurred. (swapper pid: 0). |
277 | of the kernel upon which this was executed on (2.6.26-rc8). Then it displays | ||
278 | the max latency in microsecs (97 us). The number of trace entries displayed | ||
279 | and the total number recorded (both are three: #3/3). The type of | ||
280 | preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero | ||
281 | and are reserved for later use. #P is the number of online CPUS (#P:2). | ||
282 | |||
283 | The task is the process that was running when the latency occurred. | ||
284 | (swapper pid: 0). | ||
285 | 338 | ||
286 | The start and stop (the functions in which the interrupts were disabled and | 339 | The start and stop (the functions in which the interrupts were |
287 | enabled respectively) that caused the latencies: | 340 | disabled and enabled respectively) that caused the latencies: |
288 | 341 | ||
289 | apic_timer_interrupt is where the interrupts were disabled. | 342 | apic_timer_interrupt is where the interrupts were disabled. |
290 | do_softirq is where they were enabled again. | 343 | do_softirq is where they were enabled again. |
@@ -320,12 +373,12 @@ The above is mostly meaningful for kernel developers. | |||
320 | latency_trace file is relative to the start of the trace. | 373 | latency_trace file is relative to the start of the trace. |
321 | 374 | ||
322 | delay: This is just to help catch your eye a bit better. And | 375 | delay: This is just to help catch your eye a bit better. And |
323 | needs to be fixed to be only relative to the same CPU. | 376 | needs to be fixed to be only relative to the same CPU. |
324 | The marks are determined by the difference between this | 377 | The marks are determined by the difference between this |
325 | current trace and the next trace. | 378 | current trace and the next trace. |
326 | '!' - greater than preempt_mark_thresh (default 100) | 379 | '!' - greater than preempt_mark_thresh (default 100) |
327 | '+' - greater than 1 microsecond | 380 | '+' - greater than 1 microsecond |
328 | ' ' - less than or equal to 1 microsecond. | 381 | ' ' - less than or equal to 1 microsecond. |
329 | 382 | ||
330 | The rest is the same as the 'trace' file. | 383 | The rest is the same as the 'trace' file. |
331 | 384 | ||
@@ -333,14 +386,15 @@ The above is mostly meaningful for kernel developers. | |||
333 | trace_options | 386 | trace_options |
334 | ------------- | 387 | ------------- |
335 | 388 | ||
336 | The trace_options file is used to control what gets printed in the trace | 389 | The trace_options file is used to control what gets printed in |
337 | output. To see what is available, simply cat the file: | 390 | the trace output. To see what is available, simply cat the file: |
338 | 391 | ||
339 | cat /debug/tracing/trace_options | 392 | cat /debug/tracing/trace_options |
340 | print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ | 393 | print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ |
341 | noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj | 394 | noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj |
342 | 395 | ||
343 | To disable one of the options, echo in the option prepended with "no". | 396 | To disable one of the options, echo in the option prepended with |
397 | "no". | ||
344 | 398 | ||
345 | echo noprint-parent > /debug/tracing/trace_options | 399 | echo noprint-parent > /debug/tracing/trace_options |
346 | 400 | ||
@@ -350,8 +404,8 @@ To enable an option, leave off the "no". | |||
350 | 404 | ||
351 | Here are the available options: | 405 | Here are the available options: |
352 | 406 | ||
353 | print-parent - On function traces, display the calling function | 407 | print-parent - On function traces, display the calling (parent) |
354 | as well as the function being traced. | 408 | function as well as the function being traced. |
355 | 409 | ||
356 | print-parent: | 410 | print-parent: |
357 | bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul | 411 | bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul |
@@ -360,15 +414,16 @@ Here are the available options: | |||
360 | bash-4000 [01] 1477.606694: simple_strtoul | 414 | bash-4000 [01] 1477.606694: simple_strtoul |
361 | 415 | ||
362 | 416 | ||
363 | sym-offset - Display not only the function name, but also the offset | 417 | sym-offset - Display not only the function name, but also the |
364 | in the function. For example, instead of seeing just | 418 | offset in the function. For example, instead of |
365 | "ktime_get", you will see "ktime_get+0xb/0x20". | 419 | seeing just "ktime_get", you will see |
420 | "ktime_get+0xb/0x20". | ||
366 | 421 | ||
367 | sym-offset: | 422 | sym-offset: |
368 | bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 | 423 | bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 |
369 | 424 | ||
370 | sym-addr - this will also display the function address as well as | 425 | sym-addr - this will also display the function address as well |
371 | the function name. | 426 | as the function name. |
372 | 427 | ||
373 | sym-addr: | 428 | sym-addr: |
374 | bash-4000 [01] 1477.606694: simple_strtoul <c0339346> | 429 | bash-4000 [01] 1477.606694: simple_strtoul <c0339346> |
@@ -378,35 +433,41 @@ Here are the available options: | |||
378 | bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ | 433 | bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ |
379 | (+0.000ms): simple_strtoul (strict_strtoul) | 434 | (+0.000ms): simple_strtoul (strict_strtoul) |
380 | 435 | ||
381 | raw - This will display raw numbers. This option is best for use with | 436 | raw - This will display raw numbers. This option is best for |
382 | user applications that can translate the raw numbers better than | 437 | use with user applications that can translate the raw |
383 | having it done in the kernel. | 438 | numbers better than having it done in the kernel. |
384 | 439 | ||
385 | hex - Similar to raw, but the numbers will be in a hexadecimal format. | 440 | hex - Similar to raw, but the numbers will be in a hexadecimal |
441 | format. | ||
386 | 442 | ||
387 | bin - This will print out the formats in raw binary. | 443 | bin - This will print out the formats in raw binary. |
388 | 444 | ||
389 | block - TBD (needs update) | 445 | block - TBD (needs update) |
390 | 446 | ||
391 | stacktrace - This is one of the options that changes the trace itself. | 447 | stacktrace - This is one of the options that changes the trace |
392 | When a trace is recorded, so is the stack of functions. | 448 | itself. When a trace is recorded, so is the stack |
393 | This allows for back traces of trace sites. | 449 | of functions. This allows for back traces of |
450 | trace sites. | ||
394 | 451 | ||
395 | userstacktrace - This option changes the trace. | 452 | userstacktrace - This option changes the trace. It records a |
396 | It records a stacktrace of the current userspace thread. | 453 | stacktrace of the current userspace thread. |
397 | 454 | ||
398 | sym-userobj - when user stacktrace are enabled, look up which object the | 455 | sym-userobj - when user stacktrace are enabled, look up which |
399 | address belongs to, and print a relative address | 456 | object the address belongs to, and print a |
400 | This is especially useful when ASLR is on, otherwise you don't | 457 | relative address. This is especially useful when |
401 | get a chance to resolve the address to object/file/line after the app is no | 458 | ASLR is on, otherwise you don't get a chance to |
402 | longer running | 459 | resolve the address to object/file/line after |
460 | the app is no longer running | ||
403 | 461 | ||
404 | The lookup is performed when you read trace,trace_pipe,latency_trace. Example: | 462 | The lookup is performed when you read |
463 | trace,trace_pipe,latency_trace. Example: | ||
405 | 464 | ||
406 | a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 | 465 | a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 |
407 | x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] | 466 | x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] |
408 | 467 | ||
409 | sched-tree - TBD (any users??) | 468 | sched-tree - trace all tasks that are on the runqueue, at |
469 | every scheduling event. Will add overhead if | ||
470 | there's a lot of tasks running at once. | ||
410 | 471 | ||
411 | 472 | ||
412 | sched_switch | 473 | sched_switch |
@@ -443,18 +504,19 @@ of how to use it. | |||
443 | [...] | 504 | [...] |
444 | 505 | ||
445 | 506 | ||
446 | As we have discussed previously about this format, the header shows | 507 | As we have discussed previously about this format, the header |
447 | the name of the trace and points to the options. The "FUNCTION" | 508 | shows the name of the trace and points to the options. The |
448 | is a misnomer since here it represents the wake ups and context | 509 | "FUNCTION" is a misnomer since here it represents the wake ups |
449 | switches. | 510 | and context switches. |
450 | 511 | ||
451 | The sched_switch file only lists the wake ups (represented with '+') | 512 | The sched_switch file only lists the wake ups (represented with |
452 | and context switches ('==>') with the previous task or current task | 513 | '+') and context switches ('==>') with the previous task or |
453 | first followed by the next task or task waking up. The format for both | 514 | current task first followed by the next task or task waking up. |
454 | of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO | 515 | The format for both of these is PID:KERNEL-PRIO:TASK-STATE. |
455 | is the inverse of the actual priority with zero (0) being the highest | 516 | Remember that the KERNEL-PRIO is the inverse of the actual |
456 | priority and the nice values starting at 100 (nice -20). Below is | 517 | priority with zero (0) being the highest priority and the nice |
457 | a quick chart to map the kernel priority to user land priorities. | 518 | values starting at 100 (nice -20). Below is a quick chart to map |
519 | the kernel priority to user land priorities. | ||
458 | 520 | ||
459 | Kernel priority: 0 to 99 ==> user RT priority 99 to 0 | 521 | Kernel priority: 0 to 99 ==> user RT priority 99 to 0 |
460 | Kernel priority: 100 to 139 ==> user nice -20 to 19 | 522 | Kernel priority: 100 to 139 ==> user nice -20 to 19 |
@@ -475,10 +537,10 @@ The task states are: | |||
475 | ftrace_enabled | 537 | ftrace_enabled |
476 | -------------- | 538 | -------------- |
477 | 539 | ||
478 | The following tracers (listed below) give different output depending | 540 | The following tracers (listed below) give different output |
479 | on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled, | 541 | depending on whether or not the sysctl ftrace_enabled is set. To |
480 | one can either use the sysctl function or set it via the proc | 542 | set ftrace_enabled, one can either use the sysctl function or |
481 | file system interface. | 543 | set it via the proc file system interface. |
482 | 544 | ||
483 | sysctl kernel.ftrace_enabled=1 | 545 | sysctl kernel.ftrace_enabled=1 |
484 | 546 | ||
@@ -486,12 +548,12 @@ file system interface. | |||
486 | 548 | ||
487 | echo 1 > /proc/sys/kernel/ftrace_enabled | 549 | echo 1 > /proc/sys/kernel/ftrace_enabled |
488 | 550 | ||
489 | To disable ftrace_enabled simply replace the '1' with '0' in | 551 | To disable ftrace_enabled simply replace the '1' with '0' in the |
490 | the above commands. | 552 | above commands. |
491 | 553 | ||
492 | When ftrace_enabled is set the tracers will also record the functions | 554 | When ftrace_enabled is set the tracers will also record the |
493 | that are within the trace. The descriptions of the tracers | 555 | functions that are within the trace. The descriptions of the |
494 | will also show an example with ftrace enabled. | 556 | tracers will also show an example with ftrace enabled. |
495 | 557 | ||
496 | 558 | ||
497 | irqsoff | 559 | irqsoff |
@@ -499,17 +561,18 @@ irqsoff | |||
499 | 561 | ||
500 | When interrupts are disabled, the CPU can not react to any other | 562 | When interrupts are disabled, the CPU can not react to any other |
501 | external event (besides NMIs and SMIs). This prevents the timer | 563 | external event (besides NMIs and SMIs). This prevents the timer |
502 | interrupt from triggering or the mouse interrupt from letting the | 564 | interrupt from triggering or the mouse interrupt from letting |
503 | kernel know of a new mouse event. The result is a latency with the | 565 | the kernel know of a new mouse event. The result is a latency |
504 | reaction time. | 566 | with the reaction time. |
505 | 567 | ||
506 | The irqsoff tracer tracks the time for which interrupts are disabled. | 568 | The irqsoff tracer tracks the time for which interrupts are |
507 | When a new maximum latency is hit, the tracer saves the trace leading up | 569 | disabled. When a new maximum latency is hit, the tracer saves |
508 | to that latency point so that every time a new maximum is reached, the old | 570 | the trace leading up to that latency point so that every time a |
509 | saved trace is discarded and the new trace is saved. | 571 | new maximum is reached, the old saved trace is discarded and the |
572 | new trace is saved. | ||
510 | 573 | ||
511 | To reset the maximum, echo 0 into tracing_max_latency. Here is an | 574 | To reset the maximum, echo 0 into tracing_max_latency. Here is |
512 | example: | 575 | an example: |
513 | 576 | ||
514 | # echo irqsoff > /debug/tracing/current_tracer | 577 | # echo irqsoff > /debug/tracing/current_tracer |
515 | # echo 0 > /debug/tracing/tracing_max_latency | 578 | # echo 0 > /debug/tracing/tracing_max_latency |
@@ -544,10 +607,11 @@ irqsoff latency trace v1.1.5 on 2.6.26 | |||
544 | 607 | ||
545 | 608 | ||
546 | Here we see that that we had a latency of 12 microsecs (which is | 609 | Here we see that that we had a latency of 12 microsecs (which is |
547 | very good). The _write_lock_irq in sys_setpgid disabled interrupts. | 610 | very good). The _write_lock_irq in sys_setpgid disabled |
548 | The difference between the 12 and the displayed timestamp 14us occurred | 611 | interrupts. The difference between the 12 and the displayed |
549 | because the clock was incremented between the time of recording the max | 612 | timestamp 14us occurred because the clock was incremented |
550 | latency and the time of recording the function that had that latency. | 613 | between the time of recording the max latency and the time of |
614 | recording the function that had that latency. | ||
551 | 615 | ||
552 | Note the above example had ftrace_enabled not set. If we set the | 616 | Note the above example had ftrace_enabled not set. If we set the |
553 | ftrace_enabled, we get a much larger output: | 617 | ftrace_enabled, we get a much larger output: |
@@ -598,24 +662,24 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
598 | 662 | ||
599 | 663 | ||
600 | Here we traced a 50 microsecond latency. But we also see all the | 664 | Here we traced a 50 microsecond latency. But we also see all the |
601 | functions that were called during that time. Note that by enabling | 665 | functions that were called during that time. Note that by |
602 | function tracing, we incur an added overhead. This overhead may | 666 | enabling function tracing, we incur an added overhead. This |
603 | extend the latency times. But nevertheless, this trace has provided | 667 | overhead may extend the latency times. But nevertheless, this |
604 | some very helpful debugging information. | 668 | trace has provided some very helpful debugging information. |
605 | 669 | ||
606 | 670 | ||
607 | preemptoff | 671 | preemptoff |
608 | ---------- | 672 | ---------- |
609 | 673 | ||
610 | When preemption is disabled, we may be able to receive interrupts but | 674 | When preemption is disabled, we may be able to receive |
611 | the task cannot be preempted and a higher priority task must wait | 675 | interrupts but the task cannot be preempted and a higher |
612 | for preemption to be enabled again before it can preempt a lower | 676 | priority task must wait for preemption to be enabled again |
613 | priority task. | 677 | before it can preempt a lower priority task. |
614 | 678 | ||
615 | The preemptoff tracer traces the places that disable preemption. | 679 | The preemptoff tracer traces the places that disable preemption. |
616 | Like the irqsoff tracer, it records the maximum latency for which preemption | 680 | Like the irqsoff tracer, it records the maximum latency for |
617 | was disabled. The control of preemptoff tracer is much like the irqsoff | 681 | which preemption was disabled. The control of preemptoff tracer |
618 | tracer. | 682 | is much like the irqsoff tracer. |
619 | 683 | ||
620 | # echo preemptoff > /debug/tracing/current_tracer | 684 | # echo preemptoff > /debug/tracing/current_tracer |
621 | # echo 0 > /debug/tracing/tracing_max_latency | 685 | # echo 0 > /debug/tracing/tracing_max_latency |
@@ -649,11 +713,12 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8 | |||
649 | sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) | 713 | sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) |
650 | 714 | ||
651 | 715 | ||
652 | This has some more changes. Preemption was disabled when an interrupt | 716 | This has some more changes. Preemption was disabled when an |
653 | came in (notice the 'h'), and was enabled while doing a softirq. | 717 | interrupt came in (notice the 'h'), and was enabled while doing |
654 | (notice the 's'). But we also see that interrupts have been disabled | 718 | a softirq. (notice the 's'). But we also see that interrupts |
655 | when entering the preempt off section and leaving it (the 'd'). | 719 | have been disabled when entering the preempt off section and |
656 | We do not know if interrupts were enabled in the mean time. | 720 | leaving it (the 'd'). We do not know if interrupts were enabled |
721 | in the mean time. | ||
657 | 722 | ||
658 | # tracer: preemptoff | 723 | # tracer: preemptoff |
659 | # | 724 | # |
@@ -712,28 +777,30 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8 | |||
712 | sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq) | 777 | sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq) |
713 | 778 | ||
714 | 779 | ||
715 | The above is an example of the preemptoff trace with ftrace_enabled | 780 | The above is an example of the preemptoff trace with |
716 | set. Here we see that interrupts were disabled the entire time. | 781 | ftrace_enabled set. Here we see that interrupts were disabled |
717 | The irq_enter code lets us know that we entered an interrupt 'h'. | 782 | the entire time. The irq_enter code lets us know that we entered |
718 | Before that, the functions being traced still show that it is not | 783 | an interrupt 'h'. Before that, the functions being traced still |
719 | in an interrupt, but we can see from the functions themselves that | 784 | show that it is not in an interrupt, but we can see from the |
720 | this is not the case. | 785 | functions themselves that this is not the case. |
721 | 786 | ||
722 | Notice that __do_softirq when called does not have a preempt_count. | 787 | Notice that __do_softirq when called does not have a |
723 | It may seem that we missed a preempt enabling. What really happened | 788 | preempt_count. It may seem that we missed a preempt enabling. |
724 | is that the preempt count is held on the thread's stack and we | 789 | What really happened is that the preempt count is held on the |
725 | switched to the softirq stack (4K stacks in effect). The code | 790 | thread's stack and we switched to the softirq stack (4K stacks |
726 | does not copy the preempt count, but because interrupts are disabled, | 791 | in effect). The code does not copy the preempt count, but |
727 | we do not need to worry about it. Having a tracer like this is good | 792 | because interrupts are disabled, we do not need to worry about |
728 | for letting people know what really happens inside the kernel. | 793 | it. Having a tracer like this is good for letting people know |
794 | what really happens inside the kernel. | ||
729 | 795 | ||
730 | 796 | ||
731 | preemptirqsoff | 797 | preemptirqsoff |
732 | -------------- | 798 | -------------- |
733 | 799 | ||
734 | Knowing the locations that have interrupts disabled or preemption | 800 | Knowing the locations that have interrupts disabled or |
735 | disabled for the longest times is helpful. But sometimes we would | 801 | preemption disabled for the longest times is helpful. But |
736 | like to know when either preemption and/or interrupts are disabled. | 802 | sometimes we would like to know when either preemption and/or |
803 | interrupts are disabled. | ||
737 | 804 | ||
738 | Consider the following code: | 805 | Consider the following code: |
739 | 806 | ||
@@ -753,11 +820,13 @@ The preemptoff tracer will record the total length of | |||
753 | call_function_with_irqs_and_preemption_off() and | 820 | call_function_with_irqs_and_preemption_off() and |
754 | call_function_with_preemption_off(). | 821 | call_function_with_preemption_off(). |
755 | 822 | ||
756 | But neither will trace the time that interrupts and/or preemption | 823 | But neither will trace the time that interrupts and/or |
757 | is disabled. This total time is the time that we can not schedule. | 824 | preemption is disabled. This total time is the time that we can |
758 | To record this time, use the preemptirqsoff tracer. | 825 | not schedule. To record this time, use the preemptirqsoff |
826 | tracer. | ||
759 | 827 | ||
760 | Again, using this trace is much like the irqsoff and preemptoff tracers. | 828 | Again, using this trace is much like the irqsoff and preemptoff |
829 | tracers. | ||
761 | 830 | ||
762 | # echo preemptirqsoff > /debug/tracing/current_tracer | 831 | # echo preemptirqsoff > /debug/tracing/current_tracer |
763 | # echo 0 > /debug/tracing/tracing_max_latency | 832 | # echo 0 > /debug/tracing/tracing_max_latency |
@@ -793,9 +862,10 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
793 | 862 | ||
794 | 863 | ||
795 | The trace_hardirqs_off_thunk is called from assembly on x86 when | 864 | The trace_hardirqs_off_thunk is called from assembly on x86 when |
796 | interrupts are disabled in the assembly code. Without the function | 865 | interrupts are disabled in the assembly code. Without the |
797 | tracing, we do not know if interrupts were enabled within the preemption | 866 | function tracing, we do not know if interrupts were enabled |
798 | points. We do see that it started with preemption enabled. | 867 | within the preemption points. We do see that it started with |
868 | preemption enabled. | ||
799 | 869 | ||
800 | Here is a trace with ftrace_enabled set: | 870 | Here is a trace with ftrace_enabled set: |
801 | 871 | ||
@@ -883,40 +953,42 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 | |||
883 | sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq) | 953 | sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq) |
884 | 954 | ||
885 | 955 | ||
886 | This is a very interesting trace. It started with the preemption of | 956 | This is a very interesting trace. It started with the preemption |
887 | the ls task. We see that the task had the "need_resched" bit set | 957 | of the ls task. We see that the task had the "need_resched" bit |
888 | via the 'N' in the trace. Interrupts were disabled before the spin_lock | 958 | set via the 'N' in the trace. Interrupts were disabled before |
889 | at the beginning of the trace. We see that a schedule took place to run | 959 | the spin_lock at the beginning of the trace. We see that a |
890 | sshd. When the interrupts were enabled, we took an interrupt. | 960 | schedule took place to run sshd. When the interrupts were |
891 | On return from the interrupt handler, the softirq ran. We took another | 961 | enabled, we took an interrupt. On return from the interrupt |
892 | interrupt while running the softirq as we see from the capital 'H'. | 962 | handler, the softirq ran. We took another interrupt while |
963 | running the softirq as we see from the capital 'H'. | ||
893 | 964 | ||
894 | 965 | ||
895 | wakeup | 966 | wakeup |
896 | ------ | 967 | ------ |
897 | 968 | ||
898 | In a Real-Time environment it is very important to know the wakeup | 969 | In a Real-Time environment it is very important to know the |
899 | time it takes for the highest priority task that is woken up to the | 970 | wakeup time it takes for the highest priority task that is woken |
900 | time that it executes. This is also known as "schedule latency". | 971 | up to the time that it executes. This is also known as "schedule |
901 | I stress the point that this is about RT tasks. It is also important | 972 | latency". I stress the point that this is about RT tasks. It is |
902 | to know the scheduling latency of non-RT tasks, but the average | 973 | also important to know the scheduling latency of non-RT tasks, |
903 | schedule latency is better for non-RT tasks. Tools like | 974 | but the average schedule latency is better for non-RT tasks. |
904 | LatencyTop are more appropriate for such measurements. | 975 | Tools like LatencyTop are more appropriate for such |
976 | measurements. | ||
905 | 977 | ||
906 | Real-Time environments are interested in the worst case latency. | 978 | Real-Time environments are interested in the worst case latency. |
907 | That is the longest latency it takes for something to happen, and | 979 | That is the longest latency it takes for something to happen, |
908 | not the average. We can have a very fast scheduler that may only | 980 | and not the average. We can have a very fast scheduler that may |
909 | have a large latency once in a while, but that would not work well | 981 | only have a large latency once in a while, but that would not |
910 | with Real-Time tasks. The wakeup tracer was designed to record | 982 | work well with Real-Time tasks. The wakeup tracer was designed |
911 | the worst case wakeups of RT tasks. Non-RT tasks are not recorded | 983 | to record the worst case wakeups of RT tasks. Non-RT tasks are |
912 | because the tracer only records one worst case and tracing non-RT | 984 | not recorded because the tracer only records one worst case and |
913 | tasks that are unpredictable will overwrite the worst case latency | 985 | tracing non-RT tasks that are unpredictable will overwrite the |
914 | of RT tasks. | 986 | worst case latency of RT tasks. |
915 | 987 | ||
916 | Since this tracer only deals with RT tasks, we will run this slightly | 988 | Since this tracer only deals with RT tasks, we will run this |
917 | differently than we did with the previous tracers. Instead of performing | 989 | slightly differently than we did with the previous tracers. |
918 | an 'ls', we will run 'sleep 1' under 'chrt' which changes the | 990 | Instead of performing an 'ls', we will run 'sleep 1' under |
919 | priority of the task. | 991 | 'chrt' which changes the priority of the task. |
920 | 992 | ||
921 | # echo wakeup > /debug/tracing/current_tracer | 993 | # echo wakeup > /debug/tracing/current_tracer |
922 | # echo 0 > /debug/tracing/tracing_max_latency | 994 | # echo 0 > /debug/tracing/tracing_max_latency |
@@ -946,17 +1018,16 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8 | |||
946 | <idle>-0 1d..4 4us : schedule (cpu_idle) | 1018 | <idle>-0 1d..4 4us : schedule (cpu_idle) |
947 | 1019 | ||
948 | 1020 | ||
1021 | Running this on an idle system, we see that it only took 4 | ||
1022 | microseconds to perform the task switch. Note, since the trace | ||
1023 | marker in the schedule is before the actual "switch", we stop | ||
1024 | the tracing when the recorded task is about to schedule in. This | ||
1025 | may change if we add a new marker at the end of the scheduler. | ||
949 | 1026 | ||
950 | Running this on an idle system, we see that it only took 4 microseconds | 1027 | Notice that the recorded task is 'sleep' with the PID of 4901 |
951 | to perform the task switch. Note, since the trace marker in the | 1028 | and it has an rt_prio of 5. This priority is user-space priority |
952 | schedule is before the actual "switch", we stop the tracing when | 1029 | and not the internal kernel priority. The policy is 1 for |
953 | the recorded task is about to schedule in. This may change if | 1030 | SCHED_FIFO and 2 for SCHED_RR. |
954 | we add a new marker at the end of the scheduler. | ||
955 | |||
956 | Notice that the recorded task is 'sleep' with the PID of 4901 and it | ||
957 | has an rt_prio of 5. This priority is user-space priority and not | ||
958 | the internal kernel priority. The policy is 1 for SCHED_FIFO and 2 | ||
959 | for SCHED_RR. | ||
960 | 1031 | ||
961 | Doing the same with chrt -r 5 and ftrace_enabled set. | 1032 | Doing the same with chrt -r 5 and ftrace_enabled set. |
962 | 1033 | ||
@@ -1013,24 +1084,25 @@ ksoftirq-7 1d..6 49us : _spin_unlock (tracing_record_cmdline) | |||
1013 | ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock) | 1084 | ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock) |
1014 | ksoftirq-7 1d..4 50us : schedule (__cond_resched) | 1085 | ksoftirq-7 1d..4 50us : schedule (__cond_resched) |
1015 | 1086 | ||
1016 | The interrupt went off while running ksoftirqd. This task runs at | 1087 | The interrupt went off while running ksoftirqd. This task runs |
1017 | SCHED_OTHER. Why did not we see the 'N' set early? This may be | 1088 | at SCHED_OTHER. Why did not we see the 'N' set early? This may |
1018 | a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks | 1089 | be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K |
1019 | configured, the interrupt and softirq run with their own stack. | 1090 | stacks configured, the interrupt and softirq run with their own |
1020 | Some information is held on the top of the task's stack (need_resched | 1091 | stack. Some information is held on the top of the task's stack |
1021 | and preempt_count are both stored there). The setting of the NEED_RESCHED | 1092 | (need_resched and preempt_count are both stored there). The |
1022 | bit is done directly to the task's stack, but the reading of the | 1093 | setting of the NEED_RESCHED bit is done directly to the task's |
1023 | NEED_RESCHED is done by looking at the current stack, which in this case | 1094 | stack, but the reading of the NEED_RESCHED is done by looking at |
1024 | is the stack for the hard interrupt. This hides the fact that NEED_RESCHED | 1095 | the current stack, which in this case is the stack for the hard |
1025 | has been set. We do not see the 'N' until we switch back to the task's | 1096 | interrupt. This hides the fact that NEED_RESCHED has been set. |
1097 | We do not see the 'N' until we switch back to the task's | ||
1026 | assigned stack. | 1098 | assigned stack. |
1027 | 1099 | ||
1028 | function | 1100 | function |
1029 | -------- | 1101 | -------- |
1030 | 1102 | ||
1031 | This tracer is the function tracer. Enabling the function tracer | 1103 | This tracer is the function tracer. Enabling the function tracer |
1032 | can be done from the debug file system. Make sure the ftrace_enabled is | 1104 | can be done from the debug file system. Make sure the |
1033 | set; otherwise this tracer is a nop. | 1105 | ftrace_enabled is set; otherwise this tracer is a nop. |
1034 | 1106 | ||
1035 | # sysctl kernel.ftrace_enabled=1 | 1107 | # sysctl kernel.ftrace_enabled=1 |
1036 | # echo function > /debug/tracing/current_tracer | 1108 | # echo function > /debug/tracing/current_tracer |
@@ -1060,14 +1132,15 @@ set; otherwise this tracer is a nop. | |||
1060 | [...] | 1132 | [...] |
1061 | 1133 | ||
1062 | 1134 | ||
1063 | Note: function tracer uses ring buffers to store the above entries. | 1135 | Note: function tracer uses ring buffers to store the above |
1064 | The newest data may overwrite the oldest data. Sometimes using echo to | 1136 | entries. The newest data may overwrite the oldest data. |
1065 | stop the trace is not sufficient because the tracing could have overwritten | 1137 | Sometimes using echo to stop the trace is not sufficient because |
1066 | the data that you wanted to record. For this reason, it is sometimes better to | 1138 | the tracing could have overwritten the data that you wanted to |
1067 | disable tracing directly from a program. This allows you to stop the | 1139 | record. For this reason, it is sometimes better to disable |
1068 | tracing at the point that you hit the part that you are interested in. | 1140 | tracing directly from a program. This allows you to stop the |
1069 | To disable the tracing directly from a C program, something like following | 1141 | tracing at the point that you hit the part that you are |
1070 | code snippet can be used: | 1142 | interested in. To disable the tracing directly from a C program, |
1143 | something like following code snippet can be used: | ||
1071 | 1144 | ||
1072 | int trace_fd; | 1145 | int trace_fd; |
1073 | [...] | 1146 | [...] |
@@ -1082,10 +1155,10 @@ int main(int argc, char *argv[]) { | |||
1082 | } | 1155 | } |
1083 | 1156 | ||
1084 | Note: Here we hard coded the path name. The debugfs mount is not | 1157 | Note: Here we hard coded the path name. The debugfs mount is not |
1085 | guaranteed to be at /debug (and is more commonly at /sys/kernel/debug). | 1158 | guaranteed to be at /debug (and is more commonly at |
1086 | For simple one time traces, the above is sufficent. For anything else, | 1159 | /sys/kernel/debug). For simple one time traces, the above is |
1087 | a search through /proc/mounts may be needed to find where the debugfs | 1160 | sufficent. For anything else, a search through /proc/mounts may |
1088 | file-system is mounted. | 1161 | be needed to find where the debugfs file-system is mounted. |
1089 | 1162 | ||
1090 | 1163 | ||
1091 | Single thread tracing | 1164 | Single thread tracing |
@@ -1186,10 +1259,11 @@ following format: | |||
1186 | 0 scheduler_tick+0x1b6/0x1bf <- scheduler_tick+0x1aa/0x1bf | 1259 | 0 scheduler_tick+0x1b6/0x1bf <- scheduler_tick+0x1aa/0x1bf |
1187 | 1260 | ||
1188 | 1261 | ||
1189 | The tracer may be used to dump the trace for the oops'ing cpu on a | 1262 | The tracer may be used to dump the trace for the oops'ing cpu on |
1190 | kernel oops into the system log. To enable this, ftrace_dump_on_oops | 1263 | a kernel oops into the system log. To enable this, |
1191 | must be set. To set ftrace_dump_on_oops, one can either use the sysctl | 1264 | ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one |
1192 | function or set it via the proc system interface. | 1265 | can either use the sysctl function or set it via the proc system |
1266 | interface. | ||
1193 | 1267 | ||
1194 | sysctl kernel.ftrace_dump_on_oops=1 | 1268 | sysctl kernel.ftrace_dump_on_oops=1 |
1195 | 1269 | ||
@@ -1198,8 +1272,8 @@ or | |||
1198 | echo 1 > /proc/sys/kernel/ftrace_dump_on_oops | 1272 | echo 1 > /proc/sys/kernel/ftrace_dump_on_oops |
1199 | 1273 | ||
1200 | 1274 | ||
1201 | Here's an example of such a dump after a null pointer dereference in a | 1275 | Here's an example of such a dump after a null pointer |
1202 | kernel module: | 1276 | dereference in a kernel module: |
1203 | 1277 | ||
1204 | [57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 | 1278 | [57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 |
1205 | [57848.106019] IP: [<ffffffffa0000006>] open+0x6/0x14 [oops] | 1279 | [57848.106019] IP: [<ffffffffa0000006>] open+0x6/0x14 [oops] |
@@ -1239,25 +1313,34 @@ kernel module: | |||
1239 | function graph tracer | 1313 | function graph tracer |
1240 | --------------------------- | 1314 | --------------------------- |
1241 | 1315 | ||
1242 | This tracer is similar to the function tracer except that it probes | 1316 | This tracer is similar to the function tracer except that it |
1243 | a function on its entry and its exit. | 1317 | probes a function on its entry and its exit. This is done by |
1244 | This is done by setting a dynamically allocated stack of return addresses on each | 1318 | using a dynamically allocated stack of return addresses in each |
1245 | task_struct. Then the tracer overwrites the return address of each function traced | 1319 | task_struct. On function entry the tracer overwrites the return |
1246 | to set a custom probe. Thus the original return address is stored on the stack of return | 1320 | address of each function traced to set a custom probe. Thus the |
1247 | address in the task_struct. | 1321 | original return address is stored on the stack of return address |
1322 | in the task_struct. | ||
1248 | 1323 | ||
1249 | Probing on both extremities of a function leads to special features such as | 1324 | Probing on both ends of a function leads to special features |
1325 | such as: | ||
1250 | 1326 | ||
1251 | _ measure of function's time execution | 1327 | - measure of a function's time execution |
1252 | _ having a reliable call stack to draw function calls graph | 1328 | - having a reliable call stack to draw function calls graph |
1253 | 1329 | ||
1254 | This tracer is useful in several situations: | 1330 | This tracer is useful in several situations: |
1255 | 1331 | ||
1256 | _ you want to find the reason of a strange kernel behavior and need to see | 1332 | - you want to find the reason of a strange kernel behavior and |
1257 | what happens in detail on any areas (or specific ones). | 1333 | need to see what happens in detail on any areas (or specific |
1258 | _ you are experiencing weird latencies but it's difficult to find its origin. | 1334 | ones). |
1259 | _ you want to find quickly which path is taken by a specific function | 1335 | |
1260 | _ you just want to see what happens inside your kernel | 1336 | - you are experiencing weird latencies but it's difficult to |
1337 | find its origin. | ||
1338 | |||
1339 | - you want to find quickly which path is taken by a specific | ||
1340 | function | ||
1341 | |||
1342 | - you just want to peek inside a working kernel and want to see | ||
1343 | what happens there. | ||
1261 | 1344 | ||
1262 | # tracer: function_graph | 1345 | # tracer: function_graph |
1263 | # | 1346 | # |
@@ -1282,24 +1365,28 @@ _ you just want to see what happens inside your kernel | |||
1282 | 0) 0.586 us | _spin_unlock(); | 1365 | 0) 0.586 us | _spin_unlock(); |
1283 | 1366 | ||
1284 | 1367 | ||
1285 | There are several columns that can be dynamically enabled/disabled. | 1368 | There are several columns that can be dynamically |
1286 | You can use every combination of options you want, depending on your needs. | 1369 | enabled/disabled. You can use every combination of options you |
1370 | want, depending on your needs. | ||
1287 | 1371 | ||
1288 | _ The cpu number on which the function executed is default enabled. | 1372 | - The cpu number on which the function executed is default |
1289 | It is sometimes better to only trace one cpu (see tracing_cpu_mask file) | 1373 | enabled. It is sometimes better to only trace one cpu (see |
1290 | or you might sometimes see unordered function calls while cpu tracing switch. | 1374 | tracing_cpu_mask file) or you might sometimes see unordered |
1375 | function calls while cpu tracing switch. | ||
1291 | 1376 | ||
1292 | hide: echo nofuncgraph-cpu > /debug/tracing/trace_options | 1377 | hide: echo nofuncgraph-cpu > /debug/tracing/trace_options |
1293 | show: echo funcgraph-cpu > /debug/tracing/trace_options | 1378 | show: echo funcgraph-cpu > /debug/tracing/trace_options |
1294 | 1379 | ||
1295 | _ The duration (function's time of execution) is displayed on the closing bracket | 1380 | - The duration (function's time of execution) is displayed on |
1296 | line of a function or on the same line than the current function in case of a leaf | 1381 | the closing bracket line of a function or on the same line |
1297 | one. It is default enabled. | 1382 | than the current function in case of a leaf one. It is default |
1383 | enabled. | ||
1298 | 1384 | ||
1299 | hide: echo nofuncgraph-duration > /debug/tracing/trace_options | 1385 | hide: echo nofuncgraph-duration > /debug/tracing/trace_options |
1300 | show: echo funcgraph-duration > /debug/tracing/trace_options | 1386 | show: echo funcgraph-duration > /debug/tracing/trace_options |
1301 | 1387 | ||
1302 | _ The overhead field precedes the duration one in case of reached duration thresholds. | 1388 | - The overhead field precedes the duration field in case of |
1389 | reached duration thresholds. | ||
1303 | 1390 | ||
1304 | hide: echo nofuncgraph-overhead > /debug/tracing/trace_options | 1391 | hide: echo nofuncgraph-overhead > /debug/tracing/trace_options |
1305 | show: echo funcgraph-overhead > /debug/tracing/trace_options | 1392 | show: echo funcgraph-overhead > /debug/tracing/trace_options |
@@ -1328,8 +1415,8 @@ _ The overhead field precedes the duration one in case of reached duration thres | |||
1328 | ! means that the function exceeded 100 usecs. | 1415 | ! means that the function exceeded 100 usecs. |
1329 | 1416 | ||
1330 | 1417 | ||
1331 | _ The task/pid field displays the thread cmdline and pid which executed the function. | 1418 | - The task/pid field displays the thread cmdline and pid which |
1332 | It is default disabled. | 1419 | executed the function. It is default disabled. |
1333 | 1420 | ||
1334 | hide: echo nofuncgraph-proc > /debug/tracing/trace_options | 1421 | hide: echo nofuncgraph-proc > /debug/tracing/trace_options |
1335 | show: echo funcgraph-proc > /debug/tracing/trace_options | 1422 | show: echo funcgraph-proc > /debug/tracing/trace_options |
@@ -1351,8 +1438,9 @@ _ The task/pid field displays the thread cmdline and pid which executed the func | |||
1351 | 0) sh-4802 | + 49.370 us | } | 1438 | 0) sh-4802 | + 49.370 us | } |
1352 | 1439 | ||
1353 | 1440 | ||
1354 | _ The absolute time field is an absolute timestamp given by the clock since | 1441 | - The absolute time field is an absolute timestamp given by the |
1355 | it started. A snapshot of this time is given on each entry/exit of functions | 1442 | system clock since it started. A snapshot of this time is |
1443 | given on each entry/exit of functions | ||
1356 | 1444 | ||
1357 | hide: echo nofuncgraph-abstime > /debug/tracing/trace_options | 1445 | hide: echo nofuncgraph-abstime > /debug/tracing/trace_options |
1358 | show: echo funcgraph-abstime > /debug/tracing/trace_options | 1446 | show: echo funcgraph-abstime > /debug/tracing/trace_options |
@@ -1377,9 +1465,10 @@ _ The absolute time field is an absolute timestamp given by the clock since | |||
1377 | 360.774530 | 1) 0.594 us | __phys_addr(); | 1465 | 360.774530 | 1) 0.594 us | __phys_addr(); |
1378 | 1466 | ||
1379 | 1467 | ||
1380 | You can put some comments on specific functions by using ftrace_printk() | 1468 | You can put some comments on specific functions by using |
1381 | For example, if you want to put a comment inside the __might_sleep() function, | 1469 | ftrace_printk() For example, if you want to put a comment inside |
1382 | you just have to include <linux/ftrace.h> and call ftrace_printk() inside __might_sleep() | 1470 | the __might_sleep() function, you just have to include |
1471 | <linux/ftrace.h> and call ftrace_printk() inside __might_sleep() | ||
1383 | 1472 | ||
1384 | ftrace_printk("I'm a comment!\n") | 1473 | ftrace_printk("I'm a comment!\n") |
1385 | 1474 | ||
@@ -1390,8 +1479,9 @@ will produce: | |||
1390 | 1) 1.449 us | } | 1479 | 1) 1.449 us | } |
1391 | 1480 | ||
1392 | 1481 | ||
1393 | You might find other useful features for this tracer on the "dynamic ftrace" | 1482 | You might find other useful features for this tracer in the |
1394 | section such as tracing only specific functions or tasks. | 1483 | following "dynamic ftrace" section such as tracing only specific |
1484 | functions or tasks. | ||
1395 | 1485 | ||
1396 | dynamic ftrace | 1486 | dynamic ftrace |
1397 | -------------- | 1487 | -------------- |
@@ -1399,43 +1489,45 @@ dynamic ftrace | |||
1399 | If CONFIG_DYNAMIC_FTRACE is set, the system will run with | 1489 | If CONFIG_DYNAMIC_FTRACE is set, the system will run with |
1400 | virtually no overhead when function tracing is disabled. The way | 1490 | virtually no overhead when function tracing is disabled. The way |
1401 | this works is the mcount function call (placed at the start of | 1491 | this works is the mcount function call (placed at the start of |
1402 | every kernel function, produced by the -pg switch in gcc), starts | 1492 | every kernel function, produced by the -pg switch in gcc), |
1403 | of pointing to a simple return. (Enabling FTRACE will include the | 1493 | starts of pointing to a simple return. (Enabling FTRACE will |
1404 | -pg switch in the compiling of the kernel.) | 1494 | include the -pg switch in the compiling of the kernel.) |
1405 | 1495 | ||
1406 | At compile time every C file object is run through the | 1496 | At compile time every C file object is run through the |
1407 | recordmcount.pl script (located in the scripts directory). This | 1497 | recordmcount.pl script (located in the scripts directory). This |
1408 | script will process the C object using objdump to find all the | 1498 | script will process the C object using objdump to find all the |
1409 | locations in the .text section that call mcount. (Note, only | 1499 | locations in the .text section that call mcount. (Note, only the |
1410 | the .text section is processed, since processing other sections | 1500 | .text section is processed, since processing other sections like |
1411 | like .init.text may cause races due to those sections being freed). | 1501 | .init.text may cause races due to those sections being freed). |
1412 | 1502 | ||
1413 | A new section called "__mcount_loc" is created that holds references | 1503 | A new section called "__mcount_loc" is created that holds |
1414 | to all the mcount call sites in the .text section. This section is | 1504 | references to all the mcount call sites in the .text section. |
1415 | compiled back into the original object. The final linker will add | 1505 | This section is compiled back into the original object. The |
1416 | all these references into a single table. | 1506 | final linker will add all these references into a single table. |
1417 | 1507 | ||
1418 | On boot up, before SMP is initialized, the dynamic ftrace code | 1508 | On boot up, before SMP is initialized, the dynamic ftrace code |
1419 | scans this table and updates all the locations into nops. It also | 1509 | scans this table and updates all the locations into nops. It |
1420 | records the locations, which are added to the available_filter_functions | 1510 | also records the locations, which are added to the |
1421 | list. Modules are processed as they are loaded and before they are | 1511 | available_filter_functions list. Modules are processed as they |
1422 | executed. When a module is unloaded, it also removes its functions from | 1512 | are loaded and before they are executed. When a module is |
1423 | the ftrace function list. This is automatic in the module unload | 1513 | unloaded, it also removes its functions from the ftrace function |
1424 | code, and the module author does not need to worry about it. | 1514 | list. This is automatic in the module unload code, and the |
1425 | 1515 | module author does not need to worry about it. | |
1426 | When tracing is enabled, kstop_machine is called to prevent races | 1516 | |
1427 | with the CPUS executing code being modified (which can cause the | 1517 | When tracing is enabled, kstop_machine is called to prevent |
1428 | CPU to do undesireable things), and the nops are patched back | 1518 | races with the CPUS executing code being modified (which can |
1429 | to calls. But this time, they do not call mcount (which is just | 1519 | cause the CPU to do undesireable things), and the nops are |
1430 | a function stub). They now call into the ftrace infrastructure. | 1520 | patched back to calls. But this time, they do not call mcount |
1521 | (which is just a function stub). They now call into the ftrace | ||
1522 | infrastructure. | ||
1431 | 1523 | ||
1432 | One special side-effect to the recording of the functions being | 1524 | One special side-effect to the recording of the functions being |
1433 | traced is that we can now selectively choose which functions we | 1525 | traced is that we can now selectively choose which functions we |
1434 | wish to trace and which ones we want the mcount calls to remain as | 1526 | wish to trace and which ones we want the mcount calls to remain |
1435 | nops. | 1527 | as nops. |
1436 | 1528 | ||
1437 | Two files are used, one for enabling and one for disabling the tracing | 1529 | Two files are used, one for enabling and one for disabling the |
1438 | of specified functions. They are: | 1530 | tracing of specified functions. They are: |
1439 | 1531 | ||
1440 | set_ftrace_filter | 1532 | set_ftrace_filter |
1441 | 1533 | ||
@@ -1443,8 +1535,8 @@ and | |||
1443 | 1535 | ||
1444 | set_ftrace_notrace | 1536 | set_ftrace_notrace |
1445 | 1537 | ||
1446 | A list of available functions that you can add to these files is listed | 1538 | A list of available functions that you can add to these files is |
1447 | in: | 1539 | listed in: |
1448 | 1540 | ||
1449 | available_filter_functions | 1541 | available_filter_functions |
1450 | 1542 | ||
@@ -1481,8 +1573,8 @@ hrtimer_interrupt | |||
1481 | sys_nanosleep | 1573 | sys_nanosleep |
1482 | 1574 | ||
1483 | 1575 | ||
1484 | Perhaps this is not enough. The filters also allow simple wild cards. | 1576 | Perhaps this is not enough. The filters also allow simple wild |
1485 | Only the following are currently available | 1577 | cards. Only the following are currently available |
1486 | 1578 | ||
1487 | <match>* - will match functions that begin with <match> | 1579 | <match>* - will match functions that begin with <match> |
1488 | *<match> - will match functions that end with <match> | 1580 | *<match> - will match functions that end with <match> |
@@ -1492,9 +1584,9 @@ These are the only wild cards which are supported. | |||
1492 | 1584 | ||
1493 | <match>*<match> will not work. | 1585 | <match>*<match> will not work. |
1494 | 1586 | ||
1495 | Note: It is better to use quotes to enclose the wild cards, otherwise | 1587 | Note: It is better to use quotes to enclose the wild cards, |
1496 | the shell may expand the parameters into names of files in the local | 1588 | otherwise the shell may expand the parameters into names |
1497 | directory. | 1589 | of files in the local directory. |
1498 | 1590 | ||
1499 | # echo 'hrtimer_*' > /debug/tracing/set_ftrace_filter | 1591 | # echo 'hrtimer_*' > /debug/tracing/set_ftrace_filter |
1500 | 1592 | ||
@@ -1540,7 +1632,8 @@ This is because the '>' and '>>' act just like they do in bash. | |||
1540 | To rewrite the filters, use '>' | 1632 | To rewrite the filters, use '>' |
1541 | To append to the filters, use '>>' | 1633 | To append to the filters, use '>>' |
1542 | 1634 | ||
1543 | To clear out a filter so that all functions will be recorded again: | 1635 | To clear out a filter so that all functions will be recorded |
1636 | again: | ||
1544 | 1637 | ||
1545 | # echo > /debug/tracing/set_ftrace_filter | 1638 | # echo > /debug/tracing/set_ftrace_filter |
1546 | # cat /debug/tracing/set_ftrace_filter | 1639 | # cat /debug/tracing/set_ftrace_filter |
@@ -1572,7 +1665,8 @@ hrtimer_get_res | |||
1572 | hrtimer_init_sleeper | 1665 | hrtimer_init_sleeper |
1573 | 1666 | ||
1574 | 1667 | ||
1575 | The set_ftrace_notrace prevents those functions from being traced. | 1668 | The set_ftrace_notrace prevents those functions from being |
1669 | traced. | ||
1576 | 1670 | ||
1577 | # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace | 1671 | # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace |
1578 | 1672 | ||
@@ -1595,18 +1689,20 @@ Produces: | |||
1595 | We can see that there's no more lock or preempt tracing. | 1689 | We can see that there's no more lock or preempt tracing. |
1596 | 1690 | ||
1597 | 1691 | ||
1598 | * Dynamic ftrace with the function graph tracer * | 1692 | Dynamic ftrace with the function graph tracer |
1693 | --------------------------------------------- | ||
1599 | 1694 | ||
1695 | Although what has been explained above concerns both the | ||
1696 | function tracer and the function-graph-tracer, there are some | ||
1697 | special features only available in the function-graph tracer. | ||
1600 | 1698 | ||
1601 | Although what has been explained above concerns both the function tracer and | 1699 | If you want to trace only one function and all of its children, |
1602 | the function_graph_tracer, the following concerns only the latter. | 1700 | you just have to echo its name into set_graph_function: |
1603 | 1701 | ||
1604 | If you want to trace only one function and all of its childs, you just have | 1702 | echo __do_fault > set_graph_function |
1605 | to echo its name on set_graph_function: | ||
1606 | 1703 | ||
1607 | echo __do_fault > set_graph_function | 1704 | will produce the following "expanded" trace of the __do_fault() |
1608 | 1705 | function: | |
1609 | will produce the following: | ||
1610 | 1706 | ||
1611 | 0) | __do_fault() { | 1707 | 0) | __do_fault() { |
1612 | 0) | filemap_fault() { | 1708 | 0) | filemap_fault() { |
@@ -1643,23 +1739,24 @@ will produce the following: | |||
1643 | 0) 2.793 us | } | 1739 | 0) 2.793 us | } |
1644 | 0) + 14.012 us | } | 1740 | 0) + 14.012 us | } |
1645 | 1741 | ||
1646 | You can also select several functions: | 1742 | You can also expand several functions at once: |
1647 | 1743 | ||
1648 | echo sys_open > set_graph_function | 1744 | echo sys_open > set_graph_function |
1649 | echo sys_close >> set_graph_function | 1745 | echo sys_close >> set_graph_function |
1650 | 1746 | ||
1651 | Now if you want to go back to trace all functions | 1747 | Now if you want to go back to trace all functions you can clear |
1748 | this special filter via: | ||
1652 | 1749 | ||
1653 | echo > set_graph_function | 1750 | echo > set_graph_function |
1654 | 1751 | ||
1655 | 1752 | ||
1656 | trace_pipe | 1753 | trace_pipe |
1657 | ---------- | 1754 | ---------- |
1658 | 1755 | ||
1659 | The trace_pipe outputs the same content as the trace file, but the effect | 1756 | The trace_pipe outputs the same content as the trace file, but |
1660 | on the tracing is different. Every read from trace_pipe is consumed. | 1757 | the effect on the tracing is different. Every read from |
1661 | This means that subsequent reads will be different. The trace | 1758 | trace_pipe is consumed. This means that subsequent reads will be |
1662 | is live. | 1759 | different. The trace is live. |
1663 | 1760 | ||
1664 | # echo function > /debug/tracing/current_tracer | 1761 | # echo function > /debug/tracing/current_tracer |
1665 | # cat /debug/tracing/trace_pipe > /tmp/trace.out & | 1762 | # cat /debug/tracing/trace_pipe > /tmp/trace.out & |
@@ -1687,38 +1784,45 @@ is live. | |||
1687 | bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up | 1784 | bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up |
1688 | 1785 | ||
1689 | 1786 | ||
1690 | Note, reading the trace_pipe file will block until more input is added. | 1787 | Note, reading the trace_pipe file will block until more input is |
1691 | By changing the tracer, trace_pipe will issue an EOF. We needed | 1788 | added. By changing the tracer, trace_pipe will issue an EOF. We |
1692 | to set the function tracer _before_ we "cat" the trace_pipe file. | 1789 | needed to set the function tracer _before_ we "cat" the |
1790 | trace_pipe file. | ||
1693 | 1791 | ||
1694 | 1792 | ||
1695 | trace entries | 1793 | trace entries |
1696 | ------------- | 1794 | ------------- |
1697 | 1795 | ||
1698 | Having too much or not enough data can be troublesome in diagnosing | 1796 | Having too much or not enough data can be troublesome in |
1699 | an issue in the kernel. The file buffer_size_kb is used to modify | 1797 | diagnosing an issue in the kernel. The file buffer_size_kb is |
1700 | the size of the internal trace buffers. The number listed | 1798 | used to modify the size of the internal trace buffers. The |
1701 | is the number of entries that can be recorded per CPU. To know | 1799 | number listed is the number of entries that can be recorded per |
1702 | the full size, multiply the number of possible CPUS with the | 1800 | CPU. To know the full size, multiply the number of possible CPUS |
1703 | number of entries. | 1801 | with the number of entries. |
1704 | 1802 | ||
1705 | # cat /debug/tracing/buffer_size_kb | 1803 | # cat /debug/tracing/buffer_size_kb |
1706 | 1408 (units kilobytes) | 1804 | 1408 (units kilobytes) |
1707 | 1805 | ||
1708 | Note, to modify this, you must have tracing completely disabled. To do that, | 1806 | Note, to modify this, you must have tracing completely disabled. |
1709 | echo "nop" into the current_tracer. If the current_tracer is not set | 1807 | To do that, echo "nop" into the current_tracer. If the |
1710 | to "nop", an EINVAL error will be returned. | 1808 | current_tracer is not set to "nop", an EINVAL error will be |
1809 | returned. | ||
1711 | 1810 | ||
1712 | # echo nop > /debug/tracing/current_tracer | 1811 | # echo nop > /debug/tracing/current_tracer |
1713 | # echo 10000 > /debug/tracing/buffer_size_kb | 1812 | # echo 10000 > /debug/tracing/buffer_size_kb |
1714 | # cat /debug/tracing/buffer_size_kb | 1813 | # cat /debug/tracing/buffer_size_kb |
1715 | 10000 (units kilobytes) | 1814 | 10000 (units kilobytes) |
1716 | 1815 | ||
1717 | The number of pages which will be allocated is limited to a percentage | 1816 | The number of pages which will be allocated is limited to a |
1718 | of available memory. Allocating too much will produce an error. | 1817 | percentage of available memory. Allocating too much will produce |
1818 | an error. | ||
1719 | 1819 | ||
1720 | # echo 1000000000000 > /debug/tracing/buffer_size_kb | 1820 | # echo 1000000000000 > /debug/tracing/buffer_size_kb |
1721 | -bash: echo: write error: Cannot allocate memory | 1821 | -bash: echo: write error: Cannot allocate memory |
1722 | # cat /debug/tracing/buffer_size_kb | 1822 | # cat /debug/tracing/buffer_size_kb |
1723 | 85 | 1823 | 85 |
1724 | 1824 | ||
1825 | ----------- | ||
1826 | |||
1827 | More details can be found in the source code, in the | ||
1828 | kernel/tracing/*.c files. | ||