diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/controllers/cpuacct.txt | 32 | ||||
-rw-r--r-- | Documentation/ftrace.txt | 149 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 8 | ||||
-rw-r--r-- | Documentation/lockstat.txt | 51 | ||||
-rw-r--r-- | Documentation/markers.txt | 29 | ||||
-rw-r--r-- | Documentation/scheduler/sched-arch.txt | 4 | ||||
-rw-r--r-- | Documentation/tracepoints.txt | 94 |
7 files changed, 270 insertions, 97 deletions
diff --git a/Documentation/controllers/cpuacct.txt b/Documentation/controllers/cpuacct.txt new file mode 100644 index 000000000000..bb775fbe43d7 --- /dev/null +++ b/Documentation/controllers/cpuacct.txt | |||
@@ -0,0 +1,32 @@ | |||
1 | CPU Accounting Controller | ||
2 | ------------------------- | ||
3 | |||
4 | The CPU accounting controller is used to group tasks using cgroups and | ||
5 | account the CPU usage of these groups of tasks. | ||
6 | |||
7 | The CPU accounting controller supports multi-hierarchy groups. An accounting | ||
8 | group accumulates the CPU usage of all of its child groups and the tasks | ||
9 | directly present in its group. | ||
10 | |||
11 | Accounting groups can be created by first mounting the cgroup filesystem. | ||
12 | |||
13 | # mkdir /cgroups | ||
14 | # mount -t cgroup -ocpuacct none /cgroups | ||
15 | |||
16 | With the above step, the initial or the parent accounting group | ||
17 | becomes visible at /cgroups. At bootup, this group includes all the | ||
18 | tasks in the system. /cgroups/tasks lists the tasks in this cgroup. | ||
19 | /cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by | ||
20 | this group which is essentially the CPU time obtained by all the tasks | ||
21 | in the system. | ||
22 | |||
23 | New accounting groups can be created under the parent group /cgroups. | ||
24 | |||
25 | # cd /cgroups | ||
26 | # mkdir g1 | ||
27 | # echo $$ > g1 | ||
28 | |||
29 | The above steps create a new group g1 and move the current shell | ||
30 | process (bash) into it. CPU time consumed by this bash and its children | ||
31 | can be obtained from g1/cpuacct.usage and the same is accumulated in | ||
32 | /cgroups/cpuacct.usage also. | ||
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt index 9cc4d685dde5..803b1318b13d 100644 --- a/Documentation/ftrace.txt +++ b/Documentation/ftrace.txt | |||
@@ -82,7 +82,7 @@ of ftrace. Here is a list of some of the key files: | |||
82 | tracer is not adding more data, they will display | 82 | tracer is not adding more data, they will display |
83 | the same information every time they are read. | 83 | the same information every time they are read. |
84 | 84 | ||
85 | iter_ctrl: This file lets the user control the amount of data | 85 | trace_options: This file lets the user control the amount of data |
86 | that is displayed in one of the above output | 86 | that is displayed in one of the above output |
87 | files. | 87 | files. |
88 | 88 | ||
@@ -94,10 +94,10 @@ of ftrace. Here is a list of some of the key files: | |||
94 | only be recorded if the latency is greater than | 94 | only be recorded if the latency is greater than |
95 | the value in this file. (in microseconds) | 95 | the value in this file. (in microseconds) |
96 | 96 | ||
97 | trace_entries: This sets or displays the number of bytes each CPU | 97 | buffer_size_kb: This sets or displays the number of kilobytes each CPU |
98 | buffer can hold. The tracer buffers are the same size | 98 | buffer can hold. The tracer buffers are the same size |
99 | for each CPU. The displayed number is the size of the | 99 | for each CPU. The displayed number is the size of the |
100 | CPU buffer and not total size of all buffers. The | 100 | CPU buffer and not total size of all buffers. The |
101 | trace buffers are allocated in pages (blocks of memory | 101 | trace buffers are allocated in pages (blocks of memory |
102 | that the kernel uses for allocation, usually 4 KB in size). | 102 | that the kernel uses for allocation, usually 4 KB in size). |
103 | If the last page allocated has room for more bytes | 103 | If the last page allocated has room for more bytes |
@@ -127,6 +127,8 @@ of ftrace. Here is a list of some of the key files: | |||
127 | be traced. If a function exists in both set_ftrace_filter | 127 | be traced. If a function exists in both set_ftrace_filter |
128 | and set_ftrace_notrace, the function will _not_ be traced. | 128 | and set_ftrace_notrace, the function will _not_ be traced. |
129 | 129 | ||
130 | set_ftrace_pid: Have the function tracer only trace a single thread. | ||
131 | |||
130 | available_filter_functions: This lists the functions that ftrace | 132 | available_filter_functions: This lists the functions that ftrace |
131 | has processed and can trace. These are the function | 133 | has processed and can trace. These are the function |
132 | names that you can pass to "set_ftrace_filter" or | 134 | names that you can pass to "set_ftrace_filter" or |
@@ -316,23 +318,23 @@ The above is mostly meaningful for kernel developers. | |||
316 | The rest is the same as the 'trace' file. | 318 | The rest is the same as the 'trace' file. |
317 | 319 | ||
318 | 320 | ||
319 | iter_ctrl | 321 | trace_options |
320 | --------- | 322 | ------------- |
321 | 323 | ||
322 | The iter_ctrl file is used to control what gets printed in the trace | 324 | The trace_options file is used to control what gets printed in the trace |
323 | output. To see what is available, simply cat the file: | 325 | output. To see what is available, simply cat the file: |
324 | 326 | ||
325 | cat /debug/tracing/iter_ctrl | 327 | cat /debug/tracing/trace_options |
326 | print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ | 328 | print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ |
327 | noblock nostacktrace nosched-tree | 329 | noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj |
328 | 330 | ||
329 | To disable one of the options, echo in the option prepended with "no". | 331 | To disable one of the options, echo in the option prepended with "no". |
330 | 332 | ||
331 | echo noprint-parent > /debug/tracing/iter_ctrl | 333 | echo noprint-parent > /debug/tracing/trace_options |
332 | 334 | ||
333 | To enable an option, leave off the "no". | 335 | To enable an option, leave off the "no". |
334 | 336 | ||
335 | echo sym-offset > /debug/tracing/iter_ctrl | 337 | echo sym-offset > /debug/tracing/trace_options |
336 | 338 | ||
337 | Here are the available options: | 339 | Here are the available options: |
338 | 340 | ||
@@ -378,6 +380,20 @@ Here are the available options: | |||
378 | When a trace is recorded, so is the stack of functions. | 380 | When a trace is recorded, so is the stack of functions. |
379 | This allows for back traces of trace sites. | 381 | This allows for back traces of trace sites. |
380 | 382 | ||
383 | userstacktrace - This option changes the trace. | ||
384 | It records a stacktrace of the current userspace thread. | ||
385 | |||
386 | sym-userobj - when user stacktrace are enabled, look up which object the | ||
387 | address belongs to, and print a relative address | ||
388 | This is especially useful when ASLR is on, otherwise you don't | ||
389 | get a chance to resolve the address to object/file/line after the app is no | ||
390 | longer running | ||
391 | |||
392 | The lookup is performed when you read trace,trace_pipe,latency_trace. Example: | ||
393 | |||
394 | a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 | ||
395 | x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] | ||
396 | |||
381 | sched-tree - TBD (any users??) | 397 | sched-tree - TBD (any users??) |
382 | 398 | ||
383 | 399 | ||
@@ -1059,6 +1075,83 @@ For simple one time traces, the above is sufficent. For anything else, | |||
1059 | a search through /proc/mounts may be needed to find where the debugfs | 1075 | a search through /proc/mounts may be needed to find where the debugfs |
1060 | file-system is mounted. | 1076 | file-system is mounted. |
1061 | 1077 | ||
1078 | |||
1079 | Single thread tracing | ||
1080 | --------------------- | ||
1081 | |||
1082 | By writing into /debug/tracing/set_ftrace_pid you can trace a | ||
1083 | single thread. For example: | ||
1084 | |||
1085 | # cat /debug/tracing/set_ftrace_pid | ||
1086 | no pid | ||
1087 | # echo 3111 > /debug/tracing/set_ftrace_pid | ||
1088 | # cat /debug/tracing/set_ftrace_pid | ||
1089 | 3111 | ||
1090 | # echo function > /debug/tracing/current_tracer | ||
1091 | # cat /debug/tracing/trace | head | ||
1092 | # tracer: function | ||
1093 | # | ||
1094 | # TASK-PID CPU# TIMESTAMP FUNCTION | ||
1095 | # | | | | | | ||
1096 | yum-updatesd-3111 [003] 1637.254676: finish_task_switch <-thread_return | ||
1097 | yum-updatesd-3111 [003] 1637.254681: hrtimer_cancel <-schedule_hrtimeout_range | ||
1098 | yum-updatesd-3111 [003] 1637.254682: hrtimer_try_to_cancel <-hrtimer_cancel | ||
1099 | yum-updatesd-3111 [003] 1637.254683: lock_hrtimer_base <-hrtimer_try_to_cancel | ||
1100 | yum-updatesd-3111 [003] 1637.254685: fget_light <-do_sys_poll | ||
1101 | yum-updatesd-3111 [003] 1637.254686: pipe_poll <-do_sys_poll | ||
1102 | # echo -1 > /debug/tracing/set_ftrace_pid | ||
1103 | # cat /debug/tracing/trace |head | ||
1104 | # tracer: function | ||
1105 | # | ||
1106 | # TASK-PID CPU# TIMESTAMP FUNCTION | ||
1107 | # | | | | | | ||
1108 | ##### CPU 3 buffer started #### | ||
1109 | yum-updatesd-3111 [003] 1701.957688: free_poll_entry <-poll_freewait | ||
1110 | yum-updatesd-3111 [003] 1701.957689: remove_wait_queue <-free_poll_entry | ||
1111 | yum-updatesd-3111 [003] 1701.957691: fput <-free_poll_entry | ||
1112 | yum-updatesd-3111 [003] 1701.957692: audit_syscall_exit <-sysret_audit | ||
1113 | yum-updatesd-3111 [003] 1701.957693: path_put <-audit_syscall_exit | ||
1114 | |||
1115 | If you want to trace a function when executing, you could use | ||
1116 | something like this simple program: | ||
1117 | |||
1118 | #include <stdio.h> | ||
1119 | #include <stdlib.h> | ||
1120 | #include <sys/types.h> | ||
1121 | #include <sys/stat.h> | ||
1122 | #include <fcntl.h> | ||
1123 | #include <unistd.h> | ||
1124 | |||
1125 | int main (int argc, char **argv) | ||
1126 | { | ||
1127 | if (argc < 1) | ||
1128 | exit(-1); | ||
1129 | |||
1130 | if (fork() > 0) { | ||
1131 | int fd, ffd; | ||
1132 | char line[64]; | ||
1133 | int s; | ||
1134 | |||
1135 | ffd = open("/debug/tracing/current_tracer", O_WRONLY); | ||
1136 | if (ffd < 0) | ||
1137 | exit(-1); | ||
1138 | write(ffd, "nop", 3); | ||
1139 | |||
1140 | fd = open("/debug/tracing/set_ftrace_pid", O_WRONLY); | ||
1141 | s = sprintf(line, "%d\n", getpid()); | ||
1142 | write(fd, line, s); | ||
1143 | |||
1144 | write(ffd, "function", 8); | ||
1145 | |||
1146 | close(fd); | ||
1147 | close(ffd); | ||
1148 | |||
1149 | execvp(argv[1], argv+1); | ||
1150 | } | ||
1151 | |||
1152 | return 0; | ||
1153 | } | ||
1154 | |||
1062 | dynamic ftrace | 1155 | dynamic ftrace |
1063 | -------------- | 1156 | -------------- |
1064 | 1157 | ||
@@ -1158,7 +1251,11 @@ These are the only wild cards which are supported. | |||
1158 | 1251 | ||
1159 | <match>*<match> will not work. | 1252 | <match>*<match> will not work. |
1160 | 1253 | ||
1161 | # echo hrtimer_* > /debug/tracing/set_ftrace_filter | 1254 | Note: It is better to use quotes to enclose the wild cards, otherwise |
1255 | the shell may expand the parameters into names of files in the local | ||
1256 | directory. | ||
1257 | |||
1258 | # echo 'hrtimer_*' > /debug/tracing/set_ftrace_filter | ||
1162 | 1259 | ||
1163 | Produces: | 1260 | Produces: |
1164 | 1261 | ||
@@ -1213,7 +1310,7 @@ Again, now we want to append. | |||
1213 | # echo sys_nanosleep > /debug/tracing/set_ftrace_filter | 1310 | # echo sys_nanosleep > /debug/tracing/set_ftrace_filter |
1214 | # cat /debug/tracing/set_ftrace_filter | 1311 | # cat /debug/tracing/set_ftrace_filter |
1215 | sys_nanosleep | 1312 | sys_nanosleep |
1216 | # echo hrtimer_* >> /debug/tracing/set_ftrace_filter | 1313 | # echo 'hrtimer_*' >> /debug/tracing/set_ftrace_filter |
1217 | # cat /debug/tracing/set_ftrace_filter | 1314 | # cat /debug/tracing/set_ftrace_filter |
1218 | hrtimer_run_queues | 1315 | hrtimer_run_queues |
1219 | hrtimer_run_pending | 1316 | hrtimer_run_pending |
@@ -1299,41 +1396,29 @@ trace entries | |||
1299 | ------------- | 1396 | ------------- |
1300 | 1397 | ||
1301 | Having too much or not enough data can be troublesome in diagnosing | 1398 | Having too much or not enough data can be troublesome in diagnosing |
1302 | an issue in the kernel. The file trace_entries is used to modify | 1399 | an issue in the kernel. The file buffer_size_kb is used to modify |
1303 | the size of the internal trace buffers. The number listed | 1400 | the size of the internal trace buffers. The number listed |
1304 | is the number of entries that can be recorded per CPU. To know | 1401 | is the number of entries that can be recorded per CPU. To know |
1305 | the full size, multiply the number of possible CPUS with the | 1402 | the full size, multiply the number of possible CPUS with the |
1306 | number of entries. | 1403 | number of entries. |
1307 | 1404 | ||
1308 | # cat /debug/tracing/trace_entries | 1405 | # cat /debug/tracing/buffer_size_kb |
1309 | 65620 | 1406 | 1408 (units kilobytes) |
1310 | 1407 | ||
1311 | Note, to modify this, you must have tracing completely disabled. To do that, | 1408 | Note, to modify this, you must have tracing completely disabled. To do that, |
1312 | echo "nop" into the current_tracer. If the current_tracer is not set | 1409 | echo "nop" into the current_tracer. If the current_tracer is not set |
1313 | to "nop", an EINVAL error will be returned. | 1410 | to "nop", an EINVAL error will be returned. |
1314 | 1411 | ||
1315 | # echo nop > /debug/tracing/current_tracer | 1412 | # echo nop > /debug/tracing/current_tracer |
1316 | # echo 100000 > /debug/tracing/trace_entries | 1413 | # echo 10000 > /debug/tracing/buffer_size_kb |
1317 | # cat /debug/tracing/trace_entries | 1414 | # cat /debug/tracing/buffer_size_kb |
1318 | 100045 | 1415 | 10000 (units kilobytes) |
1319 | |||
1320 | |||
1321 | Notice that we echoed in 100,000 but the size is 100,045. The entries | ||
1322 | are held in individual pages. It allocates the number of pages it takes | ||
1323 | to fulfill the request. If more entries may fit on the last page | ||
1324 | then they will be added. | ||
1325 | |||
1326 | # echo 1 > /debug/tracing/trace_entries | ||
1327 | # cat /debug/tracing/trace_entries | ||
1328 | 85 | ||
1329 | |||
1330 | This shows us that 85 entries can fit in a single page. | ||
1331 | 1416 | ||
1332 | The number of pages which will be allocated is limited to a percentage | 1417 | The number of pages which will be allocated is limited to a percentage |
1333 | of available memory. Allocating too much will produce an error. | 1418 | of available memory. Allocating too much will produce an error. |
1334 | 1419 | ||
1335 | # echo 1000000000000 > /debug/tracing/trace_entries | 1420 | # echo 1000000000000 > /debug/tracing/buffer_size_kb |
1336 | -bash: echo: write error: Cannot allocate memory | 1421 | -bash: echo: write error: Cannot allocate memory |
1337 | # cat /debug/tracing/trace_entries | 1422 | # cat /debug/tracing/buffer_size_kb |
1338 | 85 | 1423 | 85 |
1339 | 1424 | ||
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0f346d201ed..2919a2e91938 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -750,6 +750,14 @@ and is between 256 and 4096 characters. It is defined in the file | |||
750 | parameter will force ia64_sal_cache_flush to call | 750 | parameter will force ia64_sal_cache_flush to call |
751 | ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. | 751 | ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. |
752 | 752 | ||
753 | ftrace=[tracer] | ||
754 | [ftrace] will set and start the specified tracer | ||
755 | as early as possible in order to facilitate early | ||
756 | boot debugging. | ||
757 | |||
758 | ftrace_dump_on_oops | ||
759 | [ftrace] will dump the trace buffers on oops. | ||
760 | |||
753 | gamecon.map[2|3]= | 761 | gamecon.map[2|3]= |
754 | [HW,JOY] Multisystem joystick and NES/SNES/PSX pad | 762 | [HW,JOY] Multisystem joystick and NES/SNES/PSX pad |
755 | support via parallel port (up to 5 devices per port) | 763 | support via parallel port (up to 5 devices per port) |
diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt index 4ba4664ce5c3..9cb9138f7a79 100644 --- a/Documentation/lockstat.txt +++ b/Documentation/lockstat.txt | |||
@@ -71,35 +71,50 @@ Look at the current lock statistics: | |||
71 | 71 | ||
72 | # less /proc/lock_stat | 72 | # less /proc/lock_stat |
73 | 73 | ||
74 | 01 lock_stat version 0.2 | 74 | 01 lock_stat version 0.3 |
75 | 02 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 75 | 02 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
76 | 03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total | 76 | 03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total |
77 | 04 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 77 | 04 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
78 | 05 | 78 | 05 |
79 | 06 &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60 | 79 | 06 &mm->mmap_sem-W: 233 538 18446744073708 22924.27 607243.51 1342 45806 1.71 8595.89 1180582.34 |
80 | 07 &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38 | 80 | 07 &mm->mmap_sem-R: 205 587 18446744073708 28403.36 731975.00 1940 412426 0.58 187825.45 6307502.88 |
81 | 08 -------------------------- | 81 | 08 --------------- |
82 | 09 &inode->i_data.tree_lock 0 [<ffffffff8027c08f>] add_to_page_cache+0x5f/0x190 | 82 | 09 &mm->mmap_sem 487 [<ffffffff8053491f>] do_page_fault+0x466/0x928 |
83 | 10 | 83 | 10 &mm->mmap_sem 179 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d |
84 | 11 ............................................................................................................................................................................................... | 84 | 11 &mm->mmap_sem 279 [<ffffffff80210a57>] sys_mmap+0x75/0xce |
85 | 12 | 85 | 12 &mm->mmap_sem 76 [<ffffffff802a490b>] sys_munmap+0x32/0x59 |
86 | 13 dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24 | 86 | 13 --------------- |
87 | 14 ----------- | 87 | 14 &mm->mmap_sem 270 [<ffffffff80210a57>] sys_mmap+0x75/0xce |
88 | 15 dcache_lock 180 [<ffffffff802c0d7e>] sys_getcwd+0x11e/0x230 | 88 | 15 &mm->mmap_sem 431 [<ffffffff8053491f>] do_page_fault+0x466/0x928 |
89 | 16 dcache_lock 165 [<ffffffff802c002a>] d_alloc+0x15a/0x210 | 89 | 16 &mm->mmap_sem 138 [<ffffffff802a490b>] sys_munmap+0x32/0x59 |
90 | 17 dcache_lock 33 [<ffffffff8035818d>] _atomic_dec_and_lock+0x4d/0x70 | 90 | 17 &mm->mmap_sem 145 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d |
91 | 18 dcache_lock 1 [<ffffffff802beef8>] shrink_dcache_parent+0x18/0x130 | 91 | 18 |
92 | 19 ............................................................................................................................................................................................... | ||
93 | 20 | ||
94 | 21 dcache_lock: 621 623 0.52 118.26 1053.02 6745 91930 0.29 316.29 118423.41 | ||
95 | 22 ----------- | ||
96 | 23 dcache_lock 179 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54 | ||
97 | 24 dcache_lock 113 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb | ||
98 | 25 dcache_lock 99 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44 | ||
99 | 26 dcache_lock 104 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a | ||
100 | 27 ----------- | ||
101 | 28 dcache_lock 192 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54 | ||
102 | 29 dcache_lock 98 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44 | ||
103 | 30 dcache_lock 72 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb | ||
104 | 31 dcache_lock 112 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a | ||
92 | 105 | ||
93 | This excerpt shows the first two lock class statistics. Line 01 shows the | 106 | This excerpt shows the first two lock class statistics. Line 01 shows the |
94 | output version - each time the format changes this will be updated. Line 02-04 | 107 | output version - each time the format changes this will be updated. Line 02-04 |
95 | show the header with column descriptions. Lines 05-10 and 13-18 show the actual | 108 | show the header with column descriptions. Lines 05-18 and 20-31 show the actual |
96 | statistics. These statistics come in two parts; the actual stats separated by a | 109 | statistics. These statistics come in two parts; the actual stats separated by a |
97 | short separator (line 08, 14) from the contention points. | 110 | short separator (line 08, 13) from the contention points. |
98 | 111 | ||
99 | The first lock (05-10) is a read/write lock, and shows two lines above the | 112 | The first lock (05-18) is a read/write lock, and shows two lines above the |
100 | short separator. The contention points don't match the column descriptors, | 113 | short separator. The contention points don't match the column descriptors, |
101 | they have two: contentions and [<IP>] symbol. | 114 | they have two: contentions and [<IP>] symbol. The second set of contention |
115 | points are the points we're contending with. | ||
102 | 116 | ||
117 | The integer part of the time values is in us. | ||
103 | 118 | ||
104 | View the top contending locks: | 119 | View the top contending locks: |
105 | 120 | ||
diff --git a/Documentation/markers.txt b/Documentation/markers.txt index 089f6138fcd9..d2b3d0e91b26 100644 --- a/Documentation/markers.txt +++ b/Documentation/markers.txt | |||
@@ -51,11 +51,16 @@ to call) for the specific marker through marker_probe_register() and can be | |||
51 | activated by calling marker_arm(). Marker deactivation can be done by calling | 51 | activated by calling marker_arm(). Marker deactivation can be done by calling |
52 | marker_disarm() as many times as marker_arm() has been called. Removing a probe | 52 | marker_disarm() as many times as marker_arm() has been called. Removing a probe |
53 | is done through marker_probe_unregister(); it will disarm the probe. | 53 | is done through marker_probe_unregister(); it will disarm the probe. |
54 | marker_synchronize_unregister() must be called before the end of the module exit | 54 | |
55 | function to make sure there is no caller left using the probe. This, and the | 55 | marker_synchronize_unregister() must be called between probe unregistration and |
56 | fact that preemption is disabled around the probe call, make sure that probe | 56 | the first occurrence of |
57 | removal and module unload are safe. See the "Probe example" section below for a | 57 | - the end of module exit function, |
58 | sample probe module. | 58 | to make sure there is no caller left using the probe; |
59 | - the free of any resource used by the probes, | ||
60 | to make sure the probes wont be accessing invalid data. | ||
61 | This, and the fact that preemption is disabled around the probe call, make sure | ||
62 | that probe removal and module unload are safe. See the "Probe example" section | ||
63 | below for a sample probe module. | ||
59 | 64 | ||
60 | The marker mechanism supports inserting multiple instances of the same marker. | 65 | The marker mechanism supports inserting multiple instances of the same marker. |
61 | Markers can be put in inline functions, inlined static functions, and | 66 | Markers can be put in inline functions, inlined static functions, and |
@@ -70,6 +75,20 @@ a printk warning which identifies the inconsistency: | |||
70 | 75 | ||
71 | "Format mismatch for probe probe_name (format), marker (format)" | 76 | "Format mismatch for probe probe_name (format), marker (format)" |
72 | 77 | ||
78 | Another way to use markers is to simply define the marker without generating any | ||
79 | function call to actually call into the marker. This is useful in combination | ||
80 | with tracepoint probes in a scheme like this : | ||
81 | |||
82 | void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk); | ||
83 | |||
84 | DEFINE_MARKER_TP(marker_eventname, tracepoint_name, probe_tracepoint_name, | ||
85 | "arg1 %u pid %d"); | ||
86 | |||
87 | notrace void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk) | ||
88 | { | ||
89 | struct marker *marker = &GET_MARKER(kernel_irq_entry); | ||
90 | /* write data to trace buffers ... */ | ||
91 | } | ||
73 | 92 | ||
74 | * Probe / marker example | 93 | * Probe / marker example |
75 | 94 | ||
diff --git a/Documentation/scheduler/sched-arch.txt b/Documentation/scheduler/sched-arch.txt index 941615a9769b..d43dbcbd163b 100644 --- a/Documentation/scheduler/sched-arch.txt +++ b/Documentation/scheduler/sched-arch.txt | |||
@@ -8,7 +8,7 @@ Context switch | |||
8 | By default, the switch_to arch function is called with the runqueue | 8 | By default, the switch_to arch function is called with the runqueue |
9 | locked. This is usually not a problem unless switch_to may need to | 9 | locked. This is usually not a problem unless switch_to may need to |
10 | take the runqueue lock. This is usually due to a wake up operation in | 10 | take the runqueue lock. This is usually due to a wake up operation in |
11 | the context switch. See include/asm-ia64/system.h for an example. | 11 | the context switch. See arch/ia64/include/asm/system.h for an example. |
12 | 12 | ||
13 | To request the scheduler call switch_to with the runqueue unlocked, | 13 | To request the scheduler call switch_to with the runqueue unlocked, |
14 | you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file | 14 | you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file |
@@ -23,7 +23,7 @@ disabled. Interrupts may be enabled over the call if it is likely to | |||
23 | introduce a significant interrupt latency by adding the line | 23 | introduce a significant interrupt latency by adding the line |
24 | `#define __ARCH_WANT_INTERRUPTS_ON_CTXSW` in the same place as for | 24 | `#define __ARCH_WANT_INTERRUPTS_ON_CTXSW` in the same place as for |
25 | unlocked context switches. This define also implies | 25 | unlocked context switches. This define also implies |
26 | `__ARCH_WANT_UNLOCKED_CTXSW`. See include/asm-arm/system.h for an | 26 | `__ARCH_WANT_UNLOCKED_CTXSW`. See arch/arm/include/asm/system.h for an |
27 | example. | 27 | example. |
28 | 28 | ||
29 | 29 | ||
diff --git a/Documentation/tracepoints.txt b/Documentation/tracepoints.txt index 5d354e167494..6f0a044f5b5e 100644 --- a/Documentation/tracepoints.txt +++ b/Documentation/tracepoints.txt | |||
@@ -3,28 +3,30 @@ | |||
3 | Mathieu Desnoyers | 3 | Mathieu Desnoyers |
4 | 4 | ||
5 | 5 | ||
6 | This document introduces Linux Kernel Tracepoints and their use. It provides | 6 | This document introduces Linux Kernel Tracepoints and their use. It |
7 | examples of how to insert tracepoints in the kernel and connect probe functions | 7 | provides examples of how to insert tracepoints in the kernel and |
8 | to them and provides some examples of probe functions. | 8 | connect probe functions to them and provides some examples of probe |
9 | functions. | ||
9 | 10 | ||
10 | 11 | ||
11 | * Purpose of tracepoints | 12 | * Purpose of tracepoints |
12 | 13 | ||
13 | A tracepoint placed in code provides a hook to call a function (probe) that you | 14 | A tracepoint placed in code provides a hook to call a function (probe) |
14 | can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or | 15 | that you can provide at runtime. A tracepoint can be "on" (a probe is |
15 | "off" (no probe is attached). When a tracepoint is "off" it has no effect, | 16 | connected to it) or "off" (no probe is attached). When a tracepoint is |
16 | except for adding a tiny time penalty (checking a condition for a branch) and | 17 | "off" it has no effect, except for adding a tiny time penalty |
17 | space penalty (adding a few bytes for the function call at the end of the | 18 | (checking a condition for a branch) and space penalty (adding a few |
18 | instrumented function and adds a data structure in a separate section). When a | 19 | bytes for the function call at the end of the instrumented function |
19 | tracepoint is "on", the function you provide is called each time the tracepoint | 20 | and adds a data structure in a separate section). When a tracepoint |
20 | is executed, in the execution context of the caller. When the function provided | 21 | is "on", the function you provide is called each time the tracepoint |
21 | ends its execution, it returns to the caller (continuing from the tracepoint | 22 | is executed, in the execution context of the caller. When the function |
22 | site). | 23 | provided ends its execution, it returns to the caller (continuing from |
24 | the tracepoint site). | ||
23 | 25 | ||
24 | You can put tracepoints at important locations in the code. They are | 26 | You can put tracepoints at important locations in the code. They are |
25 | lightweight hooks that can pass an arbitrary number of parameters, | 27 | lightweight hooks that can pass an arbitrary number of parameters, |
26 | which prototypes are described in a tracepoint declaration placed in a header | 28 | which prototypes are described in a tracepoint declaration placed in a |
27 | file. | 29 | header file. |
28 | 30 | ||
29 | They can be used for tracing and performance accounting. | 31 | They can be used for tracing and performance accounting. |
30 | 32 | ||
@@ -42,14 +44,16 @@ In include/trace/subsys.h : | |||
42 | 44 | ||
43 | #include <linux/tracepoint.h> | 45 | #include <linux/tracepoint.h> |
44 | 46 | ||
45 | DEFINE_TRACE(subsys_eventname, | 47 | DECLARE_TRACE(subsys_eventname, |
46 | TPPTOTO(int firstarg, struct task_struct *p), | 48 | TPPROTO(int firstarg, struct task_struct *p), |
47 | TPARGS(firstarg, p)); | 49 | TPARGS(firstarg, p)); |
48 | 50 | ||
49 | In subsys/file.c (where the tracing statement must be added) : | 51 | In subsys/file.c (where the tracing statement must be added) : |
50 | 52 | ||
51 | #include <trace/subsys.h> | 53 | #include <trace/subsys.h> |
52 | 54 | ||
55 | DEFINE_TRACE(subsys_eventname); | ||
56 | |||
53 | void somefct(void) | 57 | void somefct(void) |
54 | { | 58 | { |
55 | ... | 59 | ... |
@@ -61,31 +65,41 @@ Where : | |||
61 | - subsys_eventname is an identifier unique to your event | 65 | - subsys_eventname is an identifier unique to your event |
62 | - subsys is the name of your subsystem. | 66 | - subsys is the name of your subsystem. |
63 | - eventname is the name of the event to trace. | 67 | - eventname is the name of the event to trace. |
64 | - TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function | ||
65 | called by this tracepoint. | ||
66 | - TPARGS(firstarg, p) are the parameters names, same as found in the prototype. | ||
67 | 68 | ||
68 | Connecting a function (probe) to a tracepoint is done by providing a probe | 69 | - TPPROTO(int firstarg, struct task_struct *p) is the prototype of the |
69 | (function to call) for the specific tracepoint through | 70 | function called by this tracepoint. |
70 | register_trace_subsys_eventname(). Removing a probe is done through | ||
71 | unregister_trace_subsys_eventname(); it will remove the probe sure there is no | ||
72 | caller left using the probe when it returns. Probe removal is preempt-safe | ||
73 | because preemption is disabled around the probe call. See the "Probe example" | ||
74 | section below for a sample probe module. | ||
75 | |||
76 | The tracepoint mechanism supports inserting multiple instances of the same | ||
77 | tracepoint, but a single definition must be made of a given tracepoint name over | ||
78 | all the kernel to make sure no type conflict will occur. Name mangling of the | ||
79 | tracepoints is done using the prototypes to make sure typing is correct. | ||
80 | Verification of probe type correctness is done at the registration site by the | ||
81 | compiler. Tracepoints can be put in inline functions, inlined static functions, | ||
82 | and unrolled loops as well as regular functions. | ||
83 | |||
84 | The naming scheme "subsys_event" is suggested here as a convention intended | ||
85 | to limit collisions. Tracepoint names are global to the kernel: they are | ||
86 | considered as being the same whether they are in the core kernel image or in | ||
87 | modules. | ||
88 | 71 | ||
72 | - TPARGS(firstarg, p) are the parameters names, same as found in the | ||
73 | prototype. | ||
74 | |||
75 | Connecting a function (probe) to a tracepoint is done by providing a | ||
76 | probe (function to call) for the specific tracepoint through | ||
77 | register_trace_subsys_eventname(). Removing a probe is done through | ||
78 | unregister_trace_subsys_eventname(); it will remove the probe. | ||
79 | |||
80 | tracepoint_synchronize_unregister() must be called before the end of | ||
81 | the module exit function to make sure there is no caller left using | ||
82 | the probe. This, and the fact that preemption is disabled around the | ||
83 | probe call, make sure that probe removal and module unload are safe. | ||
84 | See the "Probe example" section below for a sample probe module. | ||
85 | |||
86 | The tracepoint mechanism supports inserting multiple instances of the | ||
87 | same tracepoint, but a single definition must be made of a given | ||
88 | tracepoint name over all the kernel to make sure no type conflict will | ||
89 | occur. Name mangling of the tracepoints is done using the prototypes | ||
90 | to make sure typing is correct. Verification of probe type correctness | ||
91 | is done at the registration site by the compiler. Tracepoints can be | ||
92 | put in inline functions, inlined static functions, and unrolled loops | ||
93 | as well as regular functions. | ||
94 | |||
95 | The naming scheme "subsys_event" is suggested here as a convention | ||
96 | intended to limit collisions. Tracepoint names are global to the | ||
97 | kernel: they are considered as being the same whether they are in the | ||
98 | core kernel image or in modules. | ||
99 | |||
100 | If the tracepoint has to be used in kernel modules, an | ||
101 | EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be | ||
102 | used to export the defined tracepoints. | ||
89 | 103 | ||
90 | * Probe / tracepoint example | 104 | * Probe / tracepoint example |
91 | 105 | ||