aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorSteven Rostedt <rostedt@goodmis.org>2008-07-14 16:41:12 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2008-07-14 16:57:56 -0400
commita41eebab7537890409ea9dfe0fcda9b5fbdb090d (patch)
tree3383faebf8896b1806a3bbc058e3ef8db5c4092e
parent17489c058e8c63ab5ebdc67ab52ca70d1bc270b1 (diff)
ftrace: document updates
The following updates were recommended by Elias Oltmanns and Randy Dunlap. [ updates based on Andrew Morton's comments are still to come. ] Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--Documentation/ftrace.txt134
1 files changed, 71 insertions, 63 deletions
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 13e4bf054c38..77d3faa1a611 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -2,8 +2,11 @@
2 ======================== 2 ========================
3 3
4Copyright 2008 Red Hat Inc. 4Copyright 2008 Red Hat Inc.
5Author: Steven Rostedt <srostedt@redhat.com> 5 Author: Steven Rostedt <srostedt@redhat.com>
6 License: The GNU Free Documentation License, Version 1.2
7Reviewers: Elias Oltmanns and Randy Dunlap
6 8
9Writen for: 2.6.26-rc8 linux-2.6-tip.git tip/tracing/ftrace branch
7 10
8Introduction 11Introduction
9------------ 12------------
@@ -46,7 +49,7 @@ of ftrace. Here is a list of some of the key files:
46 that is configured. 49 that is configured.
47 50
48 available_tracers : This holds the different types of tracers that 51 available_tracers : This holds the different types of tracers that
49 has been compiled into the kernel. The tracers 52 have been compiled into the kernel. The tracers
50 listed here can be configured by echoing in their 53 listed here can be configured by echoing in their
51 name into current_tracer. 54 name into current_tracer.
52 55
@@ -90,11 +93,13 @@ of ftrace. Here is a list of some of the key files:
90 trace_entries : This sets or displays the number of trace 93 trace_entries : This sets or displays the number of trace
91 entries each CPU buffer can hold. The tracer buffers 94 entries each CPU buffer can hold. The tracer buffers
92 are the same size for each CPU, so care must be 95 are the same size for each CPU, so care must be
93 taken when modifying the trace_entries. The number 96 taken when modifying the trace_entries. The trace
94 of actually entries will be the number given 97 buffers are allocated in pages (blocks of memory that
95 times the number of possible CPUS. The buffers 98 the kernel uses for allocation, usually 4 KB in size).
96 are saved as individual pages, and the actual entries 99 Since each entry is smaller than a page, if the last
97 will always be rounded up to entries per page. 100 allocated page has room for more entries than were
101 requested, the rest of the page is used to allocate
102 entries.
98 103
99 This can only be updated when the current_tracer 104 This can only be updated when the current_tracer
100 is set to "none". 105 is set to "none".
@@ -114,13 +119,13 @@ of ftrace. Here is a list of some of the key files:
114 in performance. This also has a side effect of 119 in performance. This also has a side effect of
115 enabling or disabling specific functions to be 120 enabling or disabling specific functions to be
116 traced. Echoing in names of functions into this 121 traced. Echoing in names of functions into this
117 file will limit the trace to only those files. 122 file will limit the trace to only these functions.
118 123
119 set_ftrace_notrace: This has the opposite effect that 124 set_ftrace_notrace: This has the opposite effect that
120 set_ftrace_filter has. Any function that is added 125 set_ftrace_filter has. Any function that is added
121 here will not be traced. If a function exists 126 here will not be traced. If a function exists
122 in both set_ftrace_filter and set_ftrace_notrace 127 in both set_ftrace_filter and set_ftrace_notrace,
123 the function will _not_ bet traced. 128 the function will _not_ be traced.
124 129
125 available_filter_functions : When a function is encountered the first 130 available_filter_functions : When a function is encountered the first
126 time by the dynamic tracer, it is recorded and 131 time by the dynamic tracer, it is recorded and
@@ -138,7 +143,7 @@ Here are the list of current tracers that can be configured.
138 143
139 ftrace - function tracer that uses mcount to trace all functions. 144 ftrace - function tracer that uses mcount to trace all functions.
140 It is possible to filter out which functions that are 145 It is possible to filter out which functions that are
141 traced when dynamic ftrace is configured in. 146 to be traced when dynamic ftrace is configured in.
142 147
143 sched_switch - traces the context switches between tasks. 148 sched_switch - traces the context switches between tasks.
144 149
@@ -297,13 +302,13 @@ explains which is which.
297 302
298The above is mostly meaningful for kernel developers. 303The above is mostly meaningful for kernel developers.
299 304
300 time: This differs from the trace output where as the trace output 305 time: This differs from the trace file output. The trace file output
301 contained a absolute timestamp. This timestamp is relative 306 included an absolute timestamp. The timestamp used by the
302 to the start of the first entry in the the trace. 307 latency_trace file is relative to the start of the trace.
303 308
304 delay: This is just to help catch your eye a bit better. And 309 delay: This is just to help catch your eye a bit better. And
305 needs to be fixed to be only relative to the same CPU. 310 needs to be fixed to be only relative to the same CPU.
306 The marks is determined by the difference between this 311 The marks are determined by the difference between this
307 current trace and the next trace. 312 current trace and the next trace.
308 '!' - greater than preempt_mark_thresh (default 100) 313 '!' - greater than preempt_mark_thresh (default 100)
309 '+' - greater than 1 microsecond 314 '+' - greater than 1 microsecond
@@ -322,13 +327,13 @@ output. To see what is available, simply cat the file:
322 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ 327 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
323 noblock nostacktrace nosched-tree 328 noblock nostacktrace nosched-tree
324 329
325To disable one of the options, echo in the option appended with "no". 330To disable one of the options, echo in the option prepended with "no".
326 331
327 echo noprint-parent > /debug/tracing/iter_ctrl 332 echo noprint-parent > /debug/tracing/iter_ctrl
328 333
329To enable an option, leave off the "no". 334To enable an option, leave off the "no".
330 335
331 echo sym-offest > /debug/tracing/iter_ctrl 336 echo sym-offset > /debug/tracing/iter_ctrl
332 337
333Here are the available options: 338Here are the available options:
334 339
@@ -344,7 +349,7 @@ Here are the available options:
344 349
345 sym-offset - Display not only the function name, but also the offset 350 sym-offset - Display not only the function name, but also the offset
346 in the function. For example, instead of seeing just 351 in the function. For example, instead of seeing just
347 "ktime_get" you will see "ktime_get+0xb/0x20" 352 "ktime_get", you will see "ktime_get+0xb/0x20".
348 353
349 sym-offset: 354 sym-offset:
350 bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 355 bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
@@ -364,7 +369,7 @@ Here are the available options:
364 user applications that can translate the raw numbers better than 369 user applications that can translate the raw numbers better than
365 having it done in the kernel. 370 having it done in the kernel.
366 371
367 hex - similar to raw, but the numbers will be in a hexadecimal format. 372 hex - Similar to raw, but the numbers will be in a hexadecimal format.
368 373
369 bin - This will print out the formats in raw binary. 374 bin - This will print out the formats in raw binary.
370 375
@@ -381,7 +386,7 @@ sched_switch
381------------ 386------------
382 387
383This tracer simply records schedule switches. Here's an example 388This tracer simply records schedule switches. Here's an example
384on how to implement it. 389of how to use it.
385 390
386 # echo sched_switch > /debug/tracing/current_tracer 391 # echo sched_switch > /debug/tracing/current_tracer
387 # echo 1 > /debug/tracing/tracing_enabled 392 # echo 1 > /debug/tracing/tracing_enabled
@@ -470,7 +475,7 @@ interrupt from triggering or the mouse interrupt from letting the
470kernel know of a new mouse event. The result is a latency with the 475kernel know of a new mouse event. The result is a latency with the
471reaction time. 476reaction time.
472 477
473The irqsoff tracer tracks the time interrupts are disabled and when 478The irqsoff tracer tracks the time interrupts are disabled to the time
474they are re-enabled. When a new maximum latency is hit, it saves off 479they are re-enabled. When a new maximum latency is hit, it saves off
475the trace so that it may be retrieved at a later time. Every time a 480the trace so that it may be retrieved at a later time. Every time a
476new maximum in reached, the old saved trace is discarded and the new 481new maximum in reached, the old saved trace is discarded and the new
@@ -519,7 +524,7 @@ The difference between the 6 and the displayed timestamp 7us is
519because the clock must have incremented between the time of recording 524because the clock must have incremented between the time of recording
520the max latency and recording the function that had that latency. 525the max latency and recording the function that had that latency.
521 526
522Note the above had ftrace_enabled not set. If we set the ftrace_enabled 527Note the above had ftrace_enabled not set. If we set the ftrace_enabled,
523we get a much larger output: 528we get a much larger output:
524 529
525# tracer: irqsoff 530# tracer: irqsoff
@@ -570,21 +575,21 @@ vim:ft=help
570 575
571 576
572Here we traced a 50 microsecond latency. But we also see all the 577Here we traced a 50 microsecond latency. But we also see all the
573functions that were called during that time. Note that enabling 578functions that were called during that time. Note that by enabling
574function tracing we endure an added overhead. This overhead may 579function tracing, we endure an added overhead. This overhead may
575extend the latency times. But never the less, this trace has provided 580extend the latency times. But nevertheless, this trace has provided
576some very helpful debugging. 581some very helpful debugging information.
577 582
578 583
579preemptoff 584preemptoff
580---------- 585----------
581 586
582When preemption is disabled we may be able to receive interrupts but 587When preemption is disabled, we may be able to receive interrupts but
583the task can not be preempted and a higher priority task must wait 588the task cannot be preempted and a higher priority task must wait
584for preemption to be enabled again before it can preempt a lower 589for preemption to be enabled again before it can preempt a lower
585priority task. 590priority task.
586 591
587The preemptoff tracer traces the places that disables preemption. 592The preemptoff tracer traces the places that disable preemption.
588Like the irqsoff, it records the maximum latency that preemption 593Like the irqsoff, it records the maximum latency that preemption
589was disabled. The control of preemptoff is much like the irqsoff. 594was disabled. The control of preemptoff is much like the irqsoff.
590 595
@@ -696,7 +701,7 @@ Notice that the __do_softirq when called doesn't have a preempt_count.
696It may seem that we missed a preempt enabled. What really happened 701It may seem that we missed a preempt enabled. What really happened
697is that the preempt count is held on the threads stack and we 702is that the preempt count is held on the threads stack and we
698switched to the softirq stack (4K stacks in effect). The code 703switched to the softirq stack (4K stacks in effect). The code
699does not copy the preempt count, but because interrupts are disabled 704does not copy the preempt count, but because interrupts are disabled,
700we don't need to worry about it. Having a tracer like this is good 705we don't need to worry about it. Having a tracer like this is good
701to let people know what really happens inside the kernel. 706to let people know what really happens inside the kernel.
702 707
@@ -732,7 +737,7 @@ To record this time, use the preemptirqsoff tracer.
732 737
733Again, using this trace is much like the irqsoff and preemptoff tracers. 738Again, using this trace is much like the irqsoff and preemptoff tracers.
734 739
735 # echo preemptoff > /debug/tracing/current_tracer 740 # echo preemptirqsoff > /debug/tracing/current_tracer
736 # echo 0 > /debug/tracing/tracing_max_latency 741 # echo 0 > /debug/tracing/tracing_max_latency
737 # echo 1 > /debug/tracing/tracing_enabled 742 # echo 1 > /debug/tracing/tracing_enabled
738 # ls -ltr 743 # ls -ltr
@@ -862,9 +867,9 @@ This is a very interesting trace. It started with the preemption of
862the ls task. We see that the task had the "need_resched" bit set 867the ls task. We see that the task had the "need_resched" bit set
863with the 'N' in the trace. Interrupts are disabled in the spin_lock 868with the 'N' in the trace. Interrupts are disabled in the spin_lock
864and the trace started. We see that a schedule took place to run 869and the trace started. We see that a schedule took place to run
865sshd. When the interrupts were enabled we took an interrupt. 870sshd. When the interrupts were enabled, we took an interrupt.
866On return of the interrupt the softirq ran. We took another interrupt 871On return from the interrupt handler, the softirq ran. We took another
867while running the softirq as we see with the capital 'H'. 872interrupt while running the softirq as we see with the capital 'H'.
868 873
869 874
870wakeup 875wakeup
@@ -876,9 +881,9 @@ time it executes. This is also known as "schedule latency".
876I stress the point that this is about RT tasks. It is also important 881I stress the point that this is about RT tasks. It is also important
877to know the scheduling latency of non-RT tasks, but the average 882to know the scheduling latency of non-RT tasks, but the average
878schedule latency is better for non-RT tasks. Tools like 883schedule latency is better for non-RT tasks. Tools like
879LatencyTop is more appropriate for such measurements. 884LatencyTop are more appropriate for such measurements.
880 885
881Real-Time environments is interested in the worst case latency. 886Real-Time environments are interested in the worst case latency.
882That is the longest latency it takes for something to happen, and 887That is the longest latency it takes for something to happen, and
883not the average. We can have a very fast scheduler that may only 888not the average. We can have a very fast scheduler that may only
884have a large latency once in a while, but that would not work well 889have a large latency once in a while, but that would not work well
@@ -889,8 +894,8 @@ tasks that are unpredictable will overwrite the worst case latency
889of RT tasks. 894of RT tasks.
890 895
891Since this tracer only deals with RT tasks, we will run this slightly 896Since this tracer only deals with RT tasks, we will run this slightly
892different than we did with the previous tracers. Instead of performing 897differently than we did with the previous tracers. Instead of performing
893an 'ls' we will run 'sleep 1' under 'chrt' which changes the 898an 'ls', we will run 'sleep 1' under 'chrt' which changes the
894priority of the task. 899priority of the task.
895 900
896 # echo wakeup > /debug/tracing/current_tracer 901 # echo wakeup > /debug/tracing/current_tracer
@@ -924,9 +929,9 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8
924vim:ft=help 929vim:ft=help
925 930
926 931
927Running this on an idle system we see that it only took 4 microseconds 932Running this on an idle system, we see that it only took 4 microseconds
928to perform the task switch. Note, since the trace marker in the 933to perform the task switch. Note, since the trace marker in the
929schedule is before the actual "switch" we stop the tracing when 934schedule is before the actual "switch", we stop the tracing when
930the recorded task is about to schedule in. This may change if 935the recorded task is about to schedule in. This may change if
931we add a new marker at the end of the scheduler. 936we add a new marker at the end of the scheduler.
932 937
@@ -992,12 +997,15 @@ ksoftirq-7 1d..4 50us : schedule (__cond_resched)
992 997
993The interrupt went off while running ksoftirqd. This task runs at 998The interrupt went off while running ksoftirqd. This task runs at
994SCHED_OTHER. Why didn't we see the 'N' set early? This may be 999SCHED_OTHER. Why didn't we see the 'N' set early? This may be
995a harmless bug with x86_32 and 4K stacks. The need_reched() function 1000a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks
996that tests if we need to reschedule looks on the actual stack. 1001configured, the interrupt and softirq runs with their own stack.
997Where as the setting of the NEED_RESCHED bit happens on the 1002Some information is held on the top of the task's stack (need_resched
998task's stack. But because we are in a hard interrupt, the test 1003and preempt_count are both stored there). The setting of the NEED_RESCHED
999is with the interrupts stack which has that to be false. We don't 1004bit is done directly to the task's stack, but the reading of the
1000see the 'N' until we switch back to the task's stack. 1005NEED_RESCHED is done by looking at the current stack, which in this case
1006is the stack for the hard interrupt. This hides the fact that NEED_RESCHED
1007has been set. We don't see the 'N' until we switch back to the task's
1008assigned stack.
1001 1009
1002ftrace 1010ftrace
1003------ 1011------
@@ -1067,10 +1075,10 @@ this works is the mcount function call (placed at the start of
1067every kernel function, produced by the -pg switch in gcc), starts 1075every kernel function, produced by the -pg switch in gcc), starts
1068of pointing to a simple return. 1076of pointing to a simple return.
1069 1077
1070When dynamic ftrace is initialized, it calls kstop_machine to make it 1078When dynamic ftrace is initialized, it calls kstop_machine to make
1071act like a uniprocessor so that it can freely modify code without 1079the machine act like a uniprocessor so that it can freely modify code
1072worrying about other processors executing that same code. At 1080without worrying about other processors executing that same code. At
1073initialization, the mcount calls are change to call a "record_ip" 1081initialization, the mcount calls are changed to call a "record_ip"
1074function. After this, the first time a kernel function is called, 1082function. After this, the first time a kernel function is called,
1075it has the calling address saved in a hash table. 1083it has the calling address saved in a hash table.
1076 1084
@@ -1085,8 +1093,8 @@ traced, is that we can now selectively choose which functions we
1085want to trace and which ones we want the mcount calls to remain as 1093want to trace and which ones we want the mcount calls to remain as
1086nops. 1094nops.
1087 1095
1088Two files that contain to the enabling and disabling of recorded 1096Two files are used, one for enabling and one for disabling the tracing
1089functions are: 1097of recorded functions. They are:
1090 1098
1091 set_ftrace_filter 1099 set_ftrace_filter
1092 1100
@@ -1094,7 +1102,7 @@ and
1094 1102
1095 set_ftrace_notrace 1103 set_ftrace_notrace
1096 1104
1097A list of available functions that you can add to this files is listed 1105A list of available functions that you can add to these files is listed
1098in: 1106in:
1099 1107
1100 available_filter_functions 1108 available_filter_functions
@@ -1133,9 +1141,9 @@ sys_nanosleep
1133 1141
1134 1142
1135Perhaps this isn't enough. The filters also allow simple wild cards. 1143Perhaps this isn't enough. The filters also allow simple wild cards.
1136Only the following is currently available 1144Only the following are currently available
1137 1145
1138 <match>* - will match functions that begins with <match> 1146 <match>* - will match functions that begin with <match>
1139 *<match> - will match functions that end with <match> 1147 *<match> - will match functions that end with <match>
1140 *<match>* - will match functions that have <match> in it 1148 *<match>* - will match functions that have <match> in it
1141 1149
@@ -1187,7 +1195,7 @@ This is because the '>' and '>>' act just like they do in bash.
1187To rewrite the filters, use '>' 1195To rewrite the filters, use '>'
1188To append to the filters, use '>>' 1196To append to the filters, use '>>'
1189 1197
1190To clear out a filter so that all functions will be recorded again. 1198To clear out a filter so that all functions will be recorded again:
1191 1199
1192 # echo > /debug/tracing/set_ftrace_filter 1200 # echo > /debug/tracing/set_ftrace_filter
1193 # cat /debug/tracing/set_ftrace_filter 1201 # cat /debug/tracing/set_ftrace_filter
@@ -1246,8 +1254,8 @@ ftraced
1246 1254
1247As mentioned above, when dynamic ftrace is configured in, a kernel 1255As mentioned above, when dynamic ftrace is configured in, a kernel
1248thread wakes up once a second and checks to see if there are mcount 1256thread wakes up once a second and checks to see if there are mcount
1249calls that need to be converted into nops. If there is not, then 1257calls that need to be converted into nops. If there are not any, then
1250it simply goes back to sleep. But if there is, it will call 1258it simply goes back to sleep. But if there are some, it will call
1251kstop_machine to convert the calls to nops. 1259kstop_machine to convert the calls to nops.
1252 1260
1253There may be a case that you do not want this added latency. 1261There may be a case that you do not want this added latency.
@@ -1262,8 +1270,8 @@ mcount calls to nops. Remember that there's a large overhead
1262to calling mcount. Without this kernel thread, that overhead will 1270to calling mcount. Without this kernel thread, that overhead will
1263exist. 1271exist.
1264 1272
1265Any write to the ftraced_enabled file will cause the kstop_machine 1273If there are recorded calls to mcount, any write to the ftraced_enabled
1266to run if there are recorded calls to mcount. This means that a 1274file will cause the kstop_machine to run. This means that a
1267user can manually perform the updates when they want to by simply 1275user can manually perform the updates when they want to by simply
1268echoing a '0' into the ftraced_enabled file. 1276echoing a '0' into the ftraced_enabled file.
1269 1277
@@ -1315,7 +1323,7 @@ trace entries
1315 1323
1316Having too much or not enough data can be troublesome in diagnosing 1324Having too much or not enough data can be troublesome in diagnosing
1317some issue in the kernel. The file trace_entries is used to modify 1325some issue in the kernel. The file trace_entries is used to modify
1318the size of the internal trace buffers. The numbers listed 1326the size of the internal trace buffers. The number listed
1319is the number of entries that can be recorded per CPU. To know 1327is the number of entries that can be recorded per CPU. To know
1320the full size, multiply the number of possible CPUS with the 1328the full size, multiply the number of possible CPUS with the
1321number of entries. 1329number of entries.
@@ -1323,7 +1331,7 @@ number of entries.
1323 # cat /debug/tracing/trace_entries 1331 # cat /debug/tracing/trace_entries
132465620 133265620
1325 1333
1326Note, to modify this you must have tracing fulling disabled. To do that, 1334Note, to modify this, you must have tracing completely disabled. To do that,
1327echo "none" into the current_tracer. 1335echo "none" into the current_tracer.
1328 1336
1329 # echo none > /debug/tracing/current_tracer 1337 # echo none > /debug/tracing/current_tracer
@@ -1344,7 +1352,7 @@ it will add them.
1344This shows us that 85 entries can fit on a single page. 1352This shows us that 85 entries can fit on a single page.
1345 1353
1346The number of pages that will be allocated is a percentage of available 1354The number of pages that will be allocated is a percentage of available
1347memory. Allocating too much will produces an error. 1355memory. Allocating too much will produce an error.
1348 1356
1349 # echo 1000000000000 > /debug/tracing/trace_entries 1357 # echo 1000000000000 > /debug/tracing/trace_entries
1350-bash: echo: write error: Cannot allocate memory 1358-bash: echo: write error: Cannot allocate memory