aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/controllers/cpuacct.txt32
-rw-r--r--Documentation/ftrace.txt62
-rw-r--r--Documentation/kernel-parameters.txt8
-rw-r--r--Documentation/lockstat.txt51
-rw-r--r--Documentation/markers.txt14
-rw-r--r--Documentation/scheduler/sched-arch.txt4
-rw-r--r--Documentation/tracepoints.txt92
7 files changed, 174 insertions, 89 deletions
diff --git a/Documentation/controllers/cpuacct.txt b/Documentation/controllers/cpuacct.txt
new file mode 100644
index 000000000000..bb775fbe43d7
--- /dev/null
+++ b/Documentation/controllers/cpuacct.txt
@@ -0,0 +1,32 @@
1CPU Accounting Controller
2-------------------------
3
4The CPU accounting controller is used to group tasks using cgroups and
5account the CPU usage of these groups of tasks.
6
7The CPU accounting controller supports multi-hierarchy groups. An accounting
8group accumulates the CPU usage of all of its child groups and the tasks
9directly present in its group.
10
11Accounting groups can be created by first mounting the cgroup filesystem.
12
13# mkdir /cgroups
14# mount -t cgroup -ocpuacct none /cgroups
15
16With the above step, the initial or the parent accounting group
17becomes visible at /cgroups. At bootup, this group includes all the
18tasks in the system. /cgroups/tasks lists the tasks in this cgroup.
19/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by
20this group which is essentially the CPU time obtained by all the tasks
21in the system.
22
23New accounting groups can be created under the parent group /cgroups.
24
25# cd /cgroups
26# mkdir g1
27# echo $$ > g1
28
29The above steps create a new group g1 and move the current shell
30process (bash) into it. CPU time consumed by this bash and its children
31can be obtained from g1/cpuacct.usage and the same is accumulated in
32/cgroups/cpuacct.usage also.
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 9cc4d685dde5..35a78bc6651d 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -82,7 +82,7 @@ of ftrace. Here is a list of some of the key files:
82 tracer is not adding more data, they will display 82 tracer is not adding more data, they will display
83 the same information every time they are read. 83 the same information every time they are read.
84 84
85 iter_ctrl: This file lets the user control the amount of data 85 trace_options: This file lets the user control the amount of data
86 that is displayed in one of the above output 86 that is displayed in one of the above output
87 files. 87 files.
88 88
@@ -94,10 +94,10 @@ of ftrace. Here is a list of some of the key files:
94 only be recorded if the latency is greater than 94 only be recorded if the latency is greater than
95 the value in this file. (in microseconds) 95 the value in this file. (in microseconds)
96 96
97 trace_entries: This sets or displays the number of bytes each CPU 97 buffer_size_kb: This sets or displays the number of kilobytes each CPU
98 buffer can hold. The tracer buffers are the same size 98 buffer can hold. The tracer buffers are the same size
99 for each CPU. The displayed number is the size of the 99 for each CPU. The displayed number is the size of the
100 CPU buffer and not total size of all buffers. The 100 CPU buffer and not total size of all buffers. The
101 trace buffers are allocated in pages (blocks of memory 101 trace buffers are allocated in pages (blocks of memory
102 that the kernel uses for allocation, usually 4 KB in size). 102 that the kernel uses for allocation, usually 4 KB in size).
103 If the last page allocated has room for more bytes 103 If the last page allocated has room for more bytes
@@ -316,23 +316,23 @@ The above is mostly meaningful for kernel developers.
316 The rest is the same as the 'trace' file. 316 The rest is the same as the 'trace' file.
317 317
318 318
319iter_ctrl 319trace_options
320--------- 320-------------
321 321
322The iter_ctrl file is used to control what gets printed in the trace 322The trace_options file is used to control what gets printed in the trace
323output. To see what is available, simply cat the file: 323output. To see what is available, simply cat the file:
324 324
325 cat /debug/tracing/iter_ctrl 325 cat /debug/tracing/trace_options
326 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ 326 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
327 noblock nostacktrace nosched-tree 327 noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
328 328
329To disable one of the options, echo in the option prepended with "no". 329To disable one of the options, echo in the option prepended with "no".
330 330
331 echo noprint-parent > /debug/tracing/iter_ctrl 331 echo noprint-parent > /debug/tracing/trace_options
332 332
333To enable an option, leave off the "no". 333To enable an option, leave off the "no".
334 334
335 echo sym-offset > /debug/tracing/iter_ctrl 335 echo sym-offset > /debug/tracing/trace_options
336 336
337Here are the available options: 337Here are the available options:
338 338
@@ -378,6 +378,20 @@ Here are the available options:
378 When a trace is recorded, so is the stack of functions. 378 When a trace is recorded, so is the stack of functions.
379 This allows for back traces of trace sites. 379 This allows for back traces of trace sites.
380 380
381 userstacktrace - This option changes the trace.
382 It records a stacktrace of the current userspace thread.
383
384 sym-userobj - when user stacktrace are enabled, look up which object the
385 address belongs to, and print a relative address
386 This is especially useful when ASLR is on, otherwise you don't
387 get a chance to resolve the address to object/file/line after the app is no
388 longer running
389
390 The lookup is performed when you read trace,trace_pipe,latency_trace. Example:
391
392 a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
393x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
394
381 sched-tree - TBD (any users??) 395 sched-tree - TBD (any users??)
382 396
383 397
@@ -1299,41 +1313,29 @@ trace entries
1299------------- 1313-------------
1300 1314
1301Having too much or not enough data can be troublesome in diagnosing 1315Having too much or not enough data can be troublesome in diagnosing
1302an issue in the kernel. The file trace_entries is used to modify 1316an issue in the kernel. The file buffer_size_kb is used to modify
1303the size of the internal trace buffers. The number listed 1317the size of the internal trace buffers. The number listed
1304is the number of entries that can be recorded per CPU. To know 1318is the number of entries that can be recorded per CPU. To know
1305the full size, multiply the number of possible CPUS with the 1319the full size, multiply the number of possible CPUS with the
1306number of entries. 1320number of entries.
1307 1321
1308 # cat /debug/tracing/trace_entries 1322 # cat /debug/tracing/buffer_size_kb
130965620 13231408 (units kilobytes)
1310 1324
1311Note, to modify this, you must have tracing completely disabled. To do that, 1325Note, to modify this, you must have tracing completely disabled. To do that,
1312echo "nop" into the current_tracer. If the current_tracer is not set 1326echo "nop" into the current_tracer. If the current_tracer is not set
1313to "nop", an EINVAL error will be returned. 1327to "nop", an EINVAL error will be returned.
1314 1328
1315 # echo nop > /debug/tracing/current_tracer 1329 # echo nop > /debug/tracing/current_tracer
1316 # echo 100000 > /debug/tracing/trace_entries 1330 # echo 10000 > /debug/tracing/buffer_size_kb
1317 # cat /debug/tracing/trace_entries 1331 # cat /debug/tracing/buffer_size_kb
1318100045 133210000 (units kilobytes)
1319
1320
1321Notice that we echoed in 100,000 but the size is 100,045. The entries
1322are held in individual pages. It allocates the number of pages it takes
1323to fulfill the request. If more entries may fit on the last page
1324then they will be added.
1325
1326 # echo 1 > /debug/tracing/trace_entries
1327 # cat /debug/tracing/trace_entries
132885
1329
1330This shows us that 85 entries can fit in a single page.
1331 1333
1332The number of pages which will be allocated is limited to a percentage 1334The number of pages which will be allocated is limited to a percentage
1333of available memory. Allocating too much will produce an error. 1335of available memory. Allocating too much will produce an error.
1334 1336
1335 # echo 1000000000000 > /debug/tracing/trace_entries 1337 # echo 1000000000000 > /debug/tracing/buffer_size_kb
1336-bash: echo: write error: Cannot allocate memory 1338-bash: echo: write error: Cannot allocate memory
1337 # cat /debug/tracing/trace_entries 1339 # cat /debug/tracing/buffer_size_kb
133885 134085
1339 1341
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e0f346d201ed..2919a2e91938 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -750,6 +750,14 @@ and is between 256 and 4096 characters. It is defined in the file
750 parameter will force ia64_sal_cache_flush to call 750 parameter will force ia64_sal_cache_flush to call
751 ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. 751 ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
752 752
753 ftrace=[tracer]
754 [ftrace] will set and start the specified tracer
755 as early as possible in order to facilitate early
756 boot debugging.
757
758 ftrace_dump_on_oops
759 [ftrace] will dump the trace buffers on oops.
760
753 gamecon.map[2|3]= 761 gamecon.map[2|3]=
754 [HW,JOY] Multisystem joystick and NES/SNES/PSX pad 762 [HW,JOY] Multisystem joystick and NES/SNES/PSX pad
755 support via parallel port (up to 5 devices per port) 763 support via parallel port (up to 5 devices per port)
diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt
index 4ba4664ce5c3..9cb9138f7a79 100644
--- a/Documentation/lockstat.txt
+++ b/Documentation/lockstat.txt
@@ -71,35 +71,50 @@ Look at the current lock statistics:
71 71
72# less /proc/lock_stat 72# less /proc/lock_stat
73 73
7401 lock_stat version 0.2 7401 lock_stat version 0.3
7502 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 7502 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
7603 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total 7603 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
7704 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 7704 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
7805 7805
7906 &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60 7906 &mm->mmap_sem-W: 233 538 18446744073708 22924.27 607243.51 1342 45806 1.71 8595.89 1180582.34
8007 &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38 8007 &mm->mmap_sem-R: 205 587 18446744073708 28403.36 731975.00 1940 412426 0.58 187825.45 6307502.88
8108 -------------------------- 8108 ---------------
8209 &inode->i_data.tree_lock 0 [<ffffffff8027c08f>] add_to_page_cache+0x5f/0x190 8209 &mm->mmap_sem 487 [<ffffffff8053491f>] do_page_fault+0x466/0x928
8310 8310 &mm->mmap_sem 179 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
8411 ............................................................................................................................................................................................... 8411 &mm->mmap_sem 279 [<ffffffff80210a57>] sys_mmap+0x75/0xce
8512 8512 &mm->mmap_sem 76 [<ffffffff802a490b>] sys_munmap+0x32/0x59
8613 dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24 8613 ---------------
8714 ----------- 8714 &mm->mmap_sem 270 [<ffffffff80210a57>] sys_mmap+0x75/0xce
8815 dcache_lock 180 [<ffffffff802c0d7e>] sys_getcwd+0x11e/0x230 8815 &mm->mmap_sem 431 [<ffffffff8053491f>] do_page_fault+0x466/0x928
8916 dcache_lock 165 [<ffffffff802c002a>] d_alloc+0x15a/0x210 8916 &mm->mmap_sem 138 [<ffffffff802a490b>] sys_munmap+0x32/0x59
9017 dcache_lock 33 [<ffffffff8035818d>] _atomic_dec_and_lock+0x4d/0x70 9017 &mm->mmap_sem 145 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
9118 dcache_lock 1 [<ffffffff802beef8>] shrink_dcache_parent+0x18/0x130 9118
9219 ...............................................................................................................................................................................................
9320
9421 dcache_lock: 621 623 0.52 118.26 1053.02 6745 91930 0.29 316.29 118423.41
9522 -----------
9623 dcache_lock 179 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
9724 dcache_lock 113 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
9825 dcache_lock 99 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
9926 dcache_lock 104 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
10027 -----------
10128 dcache_lock 192 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
10229 dcache_lock 98 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
10330 dcache_lock 72 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
10431 dcache_lock 112 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
92 105
93This excerpt shows the first two lock class statistics. Line 01 shows the 106This excerpt shows the first two lock class statistics. Line 01 shows the
94output version - each time the format changes this will be updated. Line 02-04 107output version - each time the format changes this will be updated. Line 02-04
95show the header with column descriptions. Lines 05-10 and 13-18 show the actual 108show the header with column descriptions. Lines 05-18 and 20-31 show the actual
96statistics. These statistics come in two parts; the actual stats separated by a 109statistics. These statistics come in two parts; the actual stats separated by a
97short separator (line 08, 14) from the contention points. 110short separator (line 08, 13) from the contention points.
98 111
99The first lock (05-10) is a read/write lock, and shows two lines above the 112The first lock (05-18) is a read/write lock, and shows two lines above the
100short separator. The contention points don't match the column descriptors, 113short separator. The contention points don't match the column descriptors,
101they have two: contentions and [<IP>] symbol. 114they have two: contentions and [<IP>] symbol. The second set of contention
115points are the points we're contending with.
102 116
117The integer part of the time values is in us.
103 118
104View the top contending locks: 119View the top contending locks:
105 120
diff --git a/Documentation/markers.txt b/Documentation/markers.txt
index 089f6138fcd9..6d275e4ef385 100644
--- a/Documentation/markers.txt
+++ b/Documentation/markers.txt
@@ -70,6 +70,20 @@ a printk warning which identifies the inconsistency:
70 70
71"Format mismatch for probe probe_name (format), marker (format)" 71"Format mismatch for probe probe_name (format), marker (format)"
72 72
73Another way to use markers is to simply define the marker without generating any
74function call to actually call into the marker. This is useful in combination
75with tracepoint probes in a scheme like this :
76
77void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk);
78
79DEFINE_MARKER_TP(marker_eventname, tracepoint_name, probe_tracepoint_name,
80 "arg1 %u pid %d");
81
82notrace void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk)
83{
84 struct marker *marker = &GET_MARKER(kernel_irq_entry);
85 /* write data to trace buffers ... */
86}
73 87
74* Probe / marker example 88* Probe / marker example
75 89
diff --git a/Documentation/scheduler/sched-arch.txt b/Documentation/scheduler/sched-arch.txt
index 941615a9769b..d43dbcbd163b 100644
--- a/Documentation/scheduler/sched-arch.txt
+++ b/Documentation/scheduler/sched-arch.txt
@@ -8,7 +8,7 @@ Context switch
8By default, the switch_to arch function is called with the runqueue 8By default, the switch_to arch function is called with the runqueue
9locked. This is usually not a problem unless switch_to may need to 9locked. This is usually not a problem unless switch_to may need to
10take the runqueue lock. This is usually due to a wake up operation in 10take the runqueue lock. This is usually due to a wake up operation in
11the context switch. See include/asm-ia64/system.h for an example. 11the context switch. See arch/ia64/include/asm/system.h for an example.
12 12
13To request the scheduler call switch_to with the runqueue unlocked, 13To request the scheduler call switch_to with the runqueue unlocked,
14you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file 14you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file
@@ -23,7 +23,7 @@ disabled. Interrupts may be enabled over the call if it is likely to
23introduce a significant interrupt latency by adding the line 23introduce a significant interrupt latency by adding the line
24`#define __ARCH_WANT_INTERRUPTS_ON_CTXSW` in the same place as for 24`#define __ARCH_WANT_INTERRUPTS_ON_CTXSW` in the same place as for
25unlocked context switches. This define also implies 25unlocked context switches. This define also implies
26`__ARCH_WANT_UNLOCKED_CTXSW`. See include/asm-arm/system.h for an 26`__ARCH_WANT_UNLOCKED_CTXSW`. See arch/arm/include/asm/system.h for an
27example. 27example.
28 28
29 29
diff --git a/Documentation/tracepoints.txt b/Documentation/tracepoints.txt
index 5d354e167494..2d42241a25c3 100644
--- a/Documentation/tracepoints.txt
+++ b/Documentation/tracepoints.txt
@@ -3,28 +3,30 @@
3 Mathieu Desnoyers 3 Mathieu Desnoyers
4 4
5 5
6This document introduces Linux Kernel Tracepoints and their use. It provides 6This document introduces Linux Kernel Tracepoints and their use. It
7examples of how to insert tracepoints in the kernel and connect probe functions 7provides examples of how to insert tracepoints in the kernel and
8to them and provides some examples of probe functions. 8connect probe functions to them and provides some examples of probe
9functions.
9 10
10 11
11* Purpose of tracepoints 12* Purpose of tracepoints
12 13
13A tracepoint placed in code provides a hook to call a function (probe) that you 14A tracepoint placed in code provides a hook to call a function (probe)
14can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or 15that you can provide at runtime. A tracepoint can be "on" (a probe is
15"off" (no probe is attached). When a tracepoint is "off" it has no effect, 16connected to it) or "off" (no probe is attached). When a tracepoint is
16except for adding a tiny time penalty (checking a condition for a branch) and 17"off" it has no effect, except for adding a tiny time penalty
17space penalty (adding a few bytes for the function call at the end of the 18(checking a condition for a branch) and space penalty (adding a few
18instrumented function and adds a data structure in a separate section). When a 19bytes for the function call at the end of the instrumented function
19tracepoint is "on", the function you provide is called each time the tracepoint 20and adds a data structure in a separate section). When a tracepoint
20is executed, in the execution context of the caller. When the function provided 21is "on", the function you provide is called each time the tracepoint
21ends its execution, it returns to the caller (continuing from the tracepoint 22is executed, in the execution context of the caller. When the function
22site). 23provided ends its execution, it returns to the caller (continuing from
24the tracepoint site).
23 25
24You can put tracepoints at important locations in the code. They are 26You can put tracepoints at important locations in the code. They are
25lightweight hooks that can pass an arbitrary number of parameters, 27lightweight hooks that can pass an arbitrary number of parameters,
26which prototypes are described in a tracepoint declaration placed in a header 28which prototypes are described in a tracepoint declaration placed in a
27file. 29header file.
28 30
29They can be used for tracing and performance accounting. 31They can be used for tracing and performance accounting.
30 32
@@ -42,7 +44,7 @@ In include/trace/subsys.h :
42 44
43#include <linux/tracepoint.h> 45#include <linux/tracepoint.h>
44 46
45DEFINE_TRACE(subsys_eventname, 47DECLARE_TRACE(subsys_eventname,
46 TPPTOTO(int firstarg, struct task_struct *p), 48 TPPTOTO(int firstarg, struct task_struct *p),
47 TPARGS(firstarg, p)); 49 TPARGS(firstarg, p));
48 50
@@ -50,6 +52,8 @@ In subsys/file.c (where the tracing statement must be added) :
50 52
51#include <trace/subsys.h> 53#include <trace/subsys.h>
52 54
55DEFINE_TRACE(subsys_eventname);
56
53void somefct(void) 57void somefct(void)
54{ 58{
55 ... 59 ...
@@ -61,31 +65,41 @@ Where :
61- subsys_eventname is an identifier unique to your event 65- subsys_eventname is an identifier unique to your event
62 - subsys is the name of your subsystem. 66 - subsys is the name of your subsystem.
63 - eventname is the name of the event to trace. 67 - eventname is the name of the event to trace.
64- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function
65 called by this tracepoint.
66- TPARGS(firstarg, p) are the parameters names, same as found in the prototype.
67 68
68Connecting a function (probe) to a tracepoint is done by providing a probe 69- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the
69(function to call) for the specific tracepoint through 70 function called by this tracepoint.
70register_trace_subsys_eventname(). Removing a probe is done through
71unregister_trace_subsys_eventname(); it will remove the probe sure there is no
72caller left using the probe when it returns. Probe removal is preempt-safe
73because preemption is disabled around the probe call. See the "Probe example"
74section below for a sample probe module.
75
76The tracepoint mechanism supports inserting multiple instances of the same
77tracepoint, but a single definition must be made of a given tracepoint name over
78all the kernel to make sure no type conflict will occur. Name mangling of the
79tracepoints is done using the prototypes to make sure typing is correct.
80Verification of probe type correctness is done at the registration site by the
81compiler. Tracepoints can be put in inline functions, inlined static functions,
82and unrolled loops as well as regular functions.
83
84The naming scheme "subsys_event" is suggested here as a convention intended
85to limit collisions. Tracepoint names are global to the kernel: they are
86considered as being the same whether they are in the core kernel image or in
87modules.
88 71
72- TPARGS(firstarg, p) are the parameters names, same as found in the
73 prototype.
74
75Connecting a function (probe) to a tracepoint is done by providing a
76probe (function to call) for the specific tracepoint through
77register_trace_subsys_eventname(). Removing a probe is done through
78unregister_trace_subsys_eventname(); it will remove the probe.
79
80tracepoint_synchronize_unregister() must be called before the end of
81the module exit function to make sure there is no caller left using
82the probe. This, and the fact that preemption is disabled around the
83probe call, make sure that probe removal and module unload are safe.
84See the "Probe example" section below for a sample probe module.
85
86The tracepoint mechanism supports inserting multiple instances of the
87same tracepoint, but a single definition must be made of a given
88tracepoint name over all the kernel to make sure no type conflict will
89occur. Name mangling of the tracepoints is done using the prototypes
90to make sure typing is correct. Verification of probe type correctness
91is done at the registration site by the compiler. Tracepoints can be
92put in inline functions, inlined static functions, and unrolled loops
93as well as regular functions.
94
95The naming scheme "subsys_event" is suggested here as a convention
96intended to limit collisions. Tracepoint names are global to the
97kernel: they are considered as being the same whether they are in the
98core kernel image or in modules.
99
100If the tracepoint has to be used in kernel modules, an
101EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
102used to export the defined tracepoints.
89 103
90* Probe / tracepoint example 104* Probe / tracepoint example
91 105