aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/trace
diff options
context:
space:
mode:
authorGlenn Elliott <gelliott@cs.unc.edu>2012-03-04 19:47:13 -0500
committerGlenn Elliott <gelliott@cs.unc.edu>2012-03-04 19:47:13 -0500
commitc71c03bda1e86c9d5198c5d83f712e695c4f2a1e (patch)
treeecb166cb3e2b7e2adb3b5e292245fefd23381ac8 /Documentation/trace
parentea53c912f8a86a8567697115b6a0d8152beee5c8 (diff)
parent6a00f206debf8a5c8899055726ad127dbeeed098 (diff)
Merge branch 'mpi-master' into wip-k-fmlpwip-k-fmlp
Conflicts: litmus/sched_cedf.c
Diffstat (limited to 'Documentation/trace')
-rw-r--r--Documentation/trace/events-power.txt90
-rw-r--r--Documentation/trace/events.txt8
-rw-r--r--Documentation/trace/ftrace-design.txt7
-rw-r--r--Documentation/trace/ftrace.txt151
-rw-r--r--Documentation/trace/kprobetrace.txt17
-rw-r--r--Documentation/trace/postprocess/trace-vmscan-postprocess.pl50
-rw-r--r--Documentation/trace/ring-buffer-design.txt2
7 files changed, 179 insertions, 146 deletions
diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt
new file mode 100644
index 000000000000..96d87b67fe37
--- /dev/null
+++ b/Documentation/trace/events-power.txt
@@ -0,0 +1,90 @@
1
2 Subsystem Trace Points: power
3
4The power tracing system captures events related to power transitions
5within the kernel. Broadly speaking there are three major subheadings:
6
7 o Power state switch which reports events related to suspend (S-states),
8 cpuidle (C-states) and cpufreq (P-states)
9 o System clock related changes
10 o Power domains related changes and transitions
11
12This document describes what each of the tracepoints is and why they
13might be useful.
14
15Cf. include/trace/events/power.h for the events definitions.
16
171. Power state switch events
18============================
19
201.1 New trace API
21-----------------
22
23A 'cpu' event class gathers the CPU-related events: cpuidle and
24cpufreq.
25
26cpu_idle "state=%lu cpu_id=%lu"
27cpu_frequency "state=%lu cpu_id=%lu"
28
29A suspend event is used to indicate the system going in and out of the
30suspend mode:
31
32machine_suspend "state=%lu"
33
34
35Note: the value of '-1' or '4294967295' for state means an exit from the current state,
36i.e. trace_cpu_idle(4, smp_processor_id()) means that the system
37enters the idle state 4, while trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id())
38means that the system exits the previous idle state.
39
40The event which has 'state=4294967295' in the trace is very important to the user
41space tools which are using it to detect the end of the current state, and so to
42correctly draw the states diagrams and to calculate accurate statistics etc.
43
441.2 DEPRECATED trace API
45------------------------
46
47A new Kconfig option CONFIG_EVENT_POWER_TRACING_DEPRECATED with the default value of
48'y' has been created. This allows the legacy trace power API to be used conjointly
49with the new trace API.
50The Kconfig option, the old trace API (in include/trace/events/power.h) and the
51old trace points will disappear in a future release (namely 2.6.41).
52
53power_start "type=%lu state=%lu cpu_id=%lu"
54power_frequency "type=%lu state=%lu cpu_id=%lu"
55power_end "cpu_id=%lu"
56
57The 'type' parameter takes one of those macros:
58 . POWER_NONE = 0,
59 . POWER_CSTATE = 1, /* C-State */
60 . POWER_PSTATE = 2, /* Fequency change or DVFS */
61
62The 'state' parameter is set depending on the type:
63 . Target C-state for type=POWER_CSTATE,
64 . Target frequency for type=POWER_PSTATE,
65
66power_end is used to indicate the exit of a state, corresponding to the latest
67power_start event.
68
692. Clocks events
70================
71The clock events are used for clock enable/disable and for
72clock rate change.
73
74clock_enable "%s state=%lu cpu_id=%lu"
75clock_disable "%s state=%lu cpu_id=%lu"
76clock_set_rate "%s state=%lu cpu_id=%lu"
77
78The first parameter gives the clock name (e.g. "gpio1_iclk").
79The second parameter is '1' for enable, '0' for disable, the target
80clock rate for set_rate.
81
823. Power domains events
83=======================
84The power domain events are used for power domains transitions
85
86power_domain_target "%s state=%lu cpu_id=%lu"
87
88The first parameter gives the power domain name (e.g. "mpu_pwrdm").
89The second parameter is the power domain target state.
90
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 09bd8e902989..b510564aac7e 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -125,7 +125,7 @@ is the size of the data item, in bytes.
125For example, here's the information displayed for the 'sched_wakeup' 125For example, here's the information displayed for the 'sched_wakeup'
126event: 126event:
127 127
128# cat /debug/tracing/events/sched/sched_wakeup/format 128# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
129 129
130name: sched_wakeup 130name: sched_wakeup
131ID: 60 131ID: 60
@@ -201,19 +201,19 @@ to the 'filter' file for the given event.
201 201
202For example: 202For example:
203 203
204# cd /debug/tracing/events/sched/sched_wakeup 204# cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
205# echo "common_preempt_count > 4" > filter 205# echo "common_preempt_count > 4" > filter
206 206
207A slightly more involved example: 207A slightly more involved example:
208 208
209# cd /debug/tracing/events/sched/sched_signal_send 209# cd /sys/kernel/debug/tracing/events/signal/signal_generate
210# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter 210# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
211 211
212If there is an error in the expression, you'll get an 'Invalid 212If there is an error in the expression, you'll get an 'Invalid
213argument' error when setting it, and the erroneous string along with 213argument' error when setting it, and the erroneous string along with
214an error message can be seen by looking at the filter e.g.: 214an error message can be seen by looking at the filter e.g.:
215 215
216# cd /debug/tracing/events/sched/sched_signal_send 216# cd /sys/kernel/debug/tracing/events/signal/signal_generate
217# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter 217# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
218-bash: echo: write error: Invalid argument 218-bash: echo: write error: Invalid argument
219# cat filter 219# cat filter
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index dc52bd442c92..79fcafc7fd64 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -247,6 +247,13 @@ You need very few things to get the syscalls tracing in an arch.
247- Support the TIF_SYSCALL_TRACEPOINT thread flags. 247- Support the TIF_SYSCALL_TRACEPOINT thread flags.
248- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace 248- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace
249 in the ptrace syscalls tracing path. 249 in the ptrace syscalls tracing path.
250- If the system call table on this arch is more complicated than a simple array
251 of addresses of the system calls, implement an arch_syscall_addr to return
252 the address of a given system call.
253- If the symbol names of the system calls do not match the function names on
254 this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and
255 implement arch_syscall_match_sym_name with the appropriate logic to return
256 true if the function name corresponds with the symbol name.
250- Tag this arch as HAVE_SYSCALL_TRACEPOINTS. 257- Tag this arch as HAVE_SYSCALL_TRACEPOINTS.
251 258
252 259
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 557c1edeccaf..1ebc24cf9a55 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -80,11 +80,11 @@ of ftrace. Here is a list of some of the key files:
80 tracers listed here can be configured by 80 tracers listed here can be configured by
81 echoing their name into current_tracer. 81 echoing their name into current_tracer.
82 82
83 tracing_enabled: 83 tracing_on:
84 84
85 This sets or displays whether the current_tracer 85 This sets or displays whether writing to the trace
86 is activated and tracing or not. Echo 0 into this 86 ring buffer is enabled. Echo 0 into this file to disable
87 file to disable the tracer or 1 to enable it. 87 the tracer or 1 to enable it.
88 88
89 trace: 89 trace:
90 90
@@ -202,10 +202,6 @@ Here is the list of current tracers that may be configured.
202 to draw a graph of function calls similar to C code 202 to draw a graph of function calls similar to C code
203 source. 203 source.
204 204
205 "sched_switch"
206
207 Traces the context switches and wakeups between tasks.
208
209 "irqsoff" 205 "irqsoff"
210 206
211 Traces the areas that disable interrupts and saves 207 Traces the areas that disable interrupts and saves
@@ -273,39 +269,6 @@ format, the function name that was traced "path_put" and the
273parent function that called this function "path_walk". The 269parent function that called this function "path_walk". The
274timestamp is the time at which the function was entered. 270timestamp is the time at which the function was entered.
275 271
276The sched_switch tracer also includes tracing of task wakeups
277and context switches.
278
279 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
280 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
281 ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
282 events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
283 kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
284 ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
285
286Wake ups are represented by a "+" and the context switches are
287shown as "==>". The format is:
288
289 Context switches:
290
291 Previous task Next Task
292
293 <pid>:<prio>:<state> ==> <pid>:<prio>:<state>
294
295 Wake ups:
296
297 Current task Task waking up
298
299 <pid>:<prio>:<state> + <pid>:<prio>:<state>
300
301The prio is the internal kernel priority, which is the inverse
302of the priority that is usually displayed by user-space tools.
303Zero represents the highest priority (99). Prio 100 starts the
304"nice" priorities with 100 being equal to nice -20 and 139 being
305nice 19. The prio "140" is reserved for the idle task which is
306the lowest priority thread (pid 0).
307
308
309Latency trace format 272Latency trace format
310-------------------- 273--------------------
311 274
@@ -491,78 +454,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
491 latencies, as described in "Latency 454 latencies, as described in "Latency
492 trace format". 455 trace format".
493 456
494sched_switch 457 overwrite - This controls what happens when the trace buffer is
495------------ 458 full. If "1" (default), the oldest events are
496 459 discarded and overwritten. If "0", then the newest
497This tracer simply records schedule switches. Here is an example 460 events are discarded.
498of how to use it.
499
500 # echo sched_switch > current_tracer
501 # echo 1 > tracing_enabled
502 # sleep 1
503 # echo 0 > tracing_enabled
504 # cat trace
505
506# tracer: sched_switch
507#
508# TASK-PID CPU# TIMESTAMP FUNCTION
509# | | | | |
510 bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
511 bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
512 sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
513 bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
514 bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
515 sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
516 bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
517 bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
518 <idle>-0 [00] 240.132589: 0:140:R + 4:115:S
519 <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
520 ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
521 <idle>-0 [00] 240.132598: 0:140:R + 4:115:S
522 <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
523 ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
524 sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
525 [...]
526
527
528As we have discussed previously about this format, the header
529shows the name of the trace and points to the options. The
530"FUNCTION" is a misnomer since here it represents the wake ups
531and context switches.
532
533The sched_switch file only lists the wake ups (represented with
534'+') and context switches ('==>') with the previous task or
535current task first followed by the next task or task waking up.
536The format for both of these is PID:KERNEL-PRIO:TASK-STATE.
537Remember that the KERNEL-PRIO is the inverse of the actual
538priority with zero (0) being the highest priority and the nice
539values starting at 100 (nice -20). Below is a quick chart to map
540the kernel priority to user land priorities.
541
542 Kernel Space User Space
543 ===============================================================
544 0(high) to 98(low) user RT priority 99(high) to 1(low)
545 with SCHED_RR or SCHED_FIFO
546 ---------------------------------------------------------------
547 99 sched_priority is not used in scheduling
548 decisions(it must be specified as 0)
549 ---------------------------------------------------------------
550 100(high) to 139(low) user nice -20(high) to 19(low)
551 ---------------------------------------------------------------
552 140 idle task priority
553 ---------------------------------------------------------------
554
555The task states are:
556
557 R - running : wants to run, may not actually be running
558 S - sleep : process is waiting to be woken up (handles signals)
559 D - disk sleep (uninterruptible sleep) : process must be woken up
560 (ignores signals)
561 T - stopped : process suspended
562 t - traced : process is being traced (with something like gdb)
563 Z - zombie : process waiting to be cleaned up
564 X - unknown
565
566 461
567ftrace_enabled 462ftrace_enabled
568-------------- 463--------------
@@ -607,10 +502,10 @@ an example:
607 # echo irqsoff > current_tracer 502 # echo irqsoff > current_tracer
608 # echo latency-format > trace_options 503 # echo latency-format > trace_options
609 # echo 0 > tracing_max_latency 504 # echo 0 > tracing_max_latency
610 # echo 1 > tracing_enabled 505 # echo 1 > tracing_on
611 # ls -ltr 506 # ls -ltr
612 [...] 507 [...]
613 # echo 0 > tracing_enabled 508 # echo 0 > tracing_on
614 # cat trace 509 # cat trace
615# tracer: irqsoff 510# tracer: irqsoff
616# 511#
@@ -715,10 +610,10 @@ is much like the irqsoff tracer.
715 # echo preemptoff > current_tracer 610 # echo preemptoff > current_tracer
716 # echo latency-format > trace_options 611 # echo latency-format > trace_options
717 # echo 0 > tracing_max_latency 612 # echo 0 > tracing_max_latency
718 # echo 1 > tracing_enabled 613 # echo 1 > tracing_on
719 # ls -ltr 614 # ls -ltr
720 [...] 615 [...]
721 # echo 0 > tracing_enabled 616 # echo 0 > tracing_on
722 # cat trace 617 # cat trace
723# tracer: preemptoff 618# tracer: preemptoff
724# 619#
@@ -863,10 +758,10 @@ tracers.
863 # echo preemptirqsoff > current_tracer 758 # echo preemptirqsoff > current_tracer
864 # echo latency-format > trace_options 759 # echo latency-format > trace_options
865 # echo 0 > tracing_max_latency 760 # echo 0 > tracing_max_latency
866 # echo 1 > tracing_enabled 761 # echo 1 > tracing_on
867 # ls -ltr 762 # ls -ltr
868 [...] 763 [...]
869 # echo 0 > tracing_enabled 764 # echo 0 > tracing_on
870 # cat trace 765 # cat trace
871# tracer: preemptirqsoff 766# tracer: preemptirqsoff
872# 767#
@@ -1026,9 +921,9 @@ Instead of performing an 'ls', we will run 'sleep 1' under
1026 # echo wakeup > current_tracer 921 # echo wakeup > current_tracer
1027 # echo latency-format > trace_options 922 # echo latency-format > trace_options
1028 # echo 0 > tracing_max_latency 923 # echo 0 > tracing_max_latency
1029 # echo 1 > tracing_enabled 924 # echo 1 > tracing_on
1030 # chrt -f 5 sleep 1 925 # chrt -f 5 sleep 1
1031 # echo 0 > tracing_enabled 926 # echo 0 > tracing_on
1032 # cat trace 927 # cat trace
1033# tracer: wakeup 928# tracer: wakeup
1034# 929#
@@ -1140,9 +1035,9 @@ ftrace_enabled is set; otherwise this tracer is a nop.
1140 1035
1141 # sysctl kernel.ftrace_enabled=1 1036 # sysctl kernel.ftrace_enabled=1
1142 # echo function > current_tracer 1037 # echo function > current_tracer
1143 # echo 1 > tracing_enabled 1038 # echo 1 > tracing_on
1144 # usleep 1 1039 # usleep 1
1145 # echo 0 > tracing_enabled 1040 # echo 0 > tracing_on
1146 # cat trace 1041 # cat trace
1147# tracer: function 1042# tracer: function
1148# 1043#
@@ -1180,7 +1075,7 @@ int trace_fd;
1180[...] 1075[...]
1181int main(int argc, char *argv[]) { 1076int main(int argc, char *argv[]) {
1182 [...] 1077 [...]
1183 trace_fd = open(tracing_file("tracing_enabled"), O_WRONLY); 1078 trace_fd = open(tracing_file("tracing_on"), O_WRONLY);
1184 [...] 1079 [...]
1185 if (condition_hit()) { 1080 if (condition_hit()) {
1186 write(trace_fd, "0", 1); 1081 write(trace_fd, "0", 1);
@@ -1631,9 +1526,9 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt:
1631 # echo sys_nanosleep hrtimer_interrupt \ 1526 # echo sys_nanosleep hrtimer_interrupt \
1632 > set_ftrace_filter 1527 > set_ftrace_filter
1633 # echo function > current_tracer 1528 # echo function > current_tracer
1634 # echo 1 > tracing_enabled 1529 # echo 1 > tracing_on
1635 # usleep 1 1530 # usleep 1
1636 # echo 0 > tracing_enabled 1531 # echo 0 > tracing_on
1637 # cat trace 1532 # cat trace
1638# tracer: ftrace 1533# tracer: ftrace
1639# 1534#
@@ -1879,9 +1774,9 @@ different. The trace is live.
1879 # echo function > current_tracer 1774 # echo function > current_tracer
1880 # cat trace_pipe > /tmp/trace.out & 1775 # cat trace_pipe > /tmp/trace.out &
1881[1] 4153 1776[1] 4153
1882 # echo 1 > tracing_enabled 1777 # echo 1 > tracing_on
1883 # usleep 1 1778 # usleep 1
1884 # echo 0 > tracing_enabled 1779 # echo 0 > tracing_on
1885 # cat trace 1780 # cat trace
1886# tracer: function 1781# tracer: function
1887# 1782#
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 5f77d94598dd..c83bd6b4e6e8 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -42,11 +42,25 @@ Synopsis of kprobe_events
42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) 42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
43 NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 43 NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
44 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 44 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
45 (u8/u16/u32/u64/s8/s16/s32/s64) and string are supported. 45 (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield
46 are supported.
46 47
47 (*) only for return probe. 48 (*) only for return probe.
48 (**) this is useful for fetching a field of data structures. 49 (**) this is useful for fetching a field of data structures.
49 50
51Types
52-----
53Several types are supported for fetch-args. Kprobe tracer will access memory
54by given type. Prefix 's' and 'u' means those types are signed and unsigned
55respectively. Traced arguments are shown in decimal (signed) or hex (unsigned).
56String type is a special type, which fetches a "null-terminated" string from
57kernel space. This means it will fail and store NULL if the string container
58has been paged out.
59Bitfield is another special type, which takes 3 parameters, bit-width, bit-
60offset, and container-size (usually 32). The syntax is;
61
62 b<bit-width>@<bit-offset>/<container-size>
63
50 64
51Per-Probe Event Filtering 65Per-Probe Event Filtering
52------------------------- 66-------------------------
@@ -106,7 +120,6 @@ format:
106 field:unsigned char common_flags; offset:2; size:1; signed:0; 120 field:unsigned char common_flags; offset:2; size:1; signed:0;
107 field:unsigned char common_preempt_count; offset:3; size:1;signed:0; 121 field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
108 field:int common_pid; offset:4; size:4; signed:1; 122 field:int common_pid; offset:4; size:4; signed:1;
109 field:int common_lock_depth; offset:8; size:4; signed:1;
110 123
111 field:unsigned long __probe_ip; offset:12; size:4; signed:0; 124 field:unsigned long __probe_ip; offset:12; size:4; signed:0;
112 field:int __probe_nargs; offset:16; size:4; signed:1; 125 field:int __probe_nargs; offset:16; size:4; signed:1;
diff --git a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
index 1b55146d1c8d..12cecc83cd91 100644
--- a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
+++ b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
@@ -46,7 +46,7 @@ use constant HIGH_KSWAPD_LATENCY => 20;
46use constant HIGH_KSWAPD_REWAKEUP => 21; 46use constant HIGH_KSWAPD_REWAKEUP => 21;
47use constant HIGH_NR_SCANNED => 22; 47use constant HIGH_NR_SCANNED => 22;
48use constant HIGH_NR_TAKEN => 23; 48use constant HIGH_NR_TAKEN => 23;
49use constant HIGH_NR_RECLAIM => 24; 49use constant HIGH_NR_RECLAIMED => 24;
50use constant HIGH_NR_CONTIG_DIRTY => 25; 50use constant HIGH_NR_CONTIG_DIRTY => 25;
51 51
52my %perprocesspid; 52my %perprocesspid;
@@ -58,11 +58,13 @@ my $opt_read_procstat;
58my $total_wakeup_kswapd; 58my $total_wakeup_kswapd;
59my ($total_direct_reclaim, $total_direct_nr_scanned); 59my ($total_direct_reclaim, $total_direct_nr_scanned);
60my ($total_direct_latency, $total_kswapd_latency); 60my ($total_direct_latency, $total_kswapd_latency);
61my ($total_direct_nr_reclaimed);
61my ($total_direct_writepage_file_sync, $total_direct_writepage_file_async); 62my ($total_direct_writepage_file_sync, $total_direct_writepage_file_async);
62my ($total_direct_writepage_anon_sync, $total_direct_writepage_anon_async); 63my ($total_direct_writepage_anon_sync, $total_direct_writepage_anon_async);
63my ($total_kswapd_nr_scanned, $total_kswapd_wake); 64my ($total_kswapd_nr_scanned, $total_kswapd_wake);
64my ($total_kswapd_writepage_file_sync, $total_kswapd_writepage_file_async); 65my ($total_kswapd_writepage_file_sync, $total_kswapd_writepage_file_async);
65my ($total_kswapd_writepage_anon_sync, $total_kswapd_writepage_anon_async); 66my ($total_kswapd_writepage_anon_sync, $total_kswapd_writepage_anon_async);
67my ($total_kswapd_nr_reclaimed);
66 68
67# Catch sigint and exit on request 69# Catch sigint and exit on request
68my $sigint_report = 0; 70my $sigint_report = 0;
@@ -104,7 +106,7 @@ my $regex_kswapd_wake_default = 'nid=([0-9]*) order=([0-9]*)';
104my $regex_kswapd_sleep_default = 'nid=([0-9]*)'; 106my $regex_kswapd_sleep_default = 'nid=([0-9]*)';
105my $regex_wakeup_kswapd_default = 'nid=([0-9]*) zid=([0-9]*) order=([0-9]*)'; 107my $regex_wakeup_kswapd_default = 'nid=([0-9]*) zid=([0-9]*) order=([0-9]*)';
106my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_taken=([0-9]*) contig_taken=([0-9]*) contig_dirty=([0-9]*) contig_failed=([0-9]*)'; 108my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_taken=([0-9]*) contig_taken=([0-9]*) contig_dirty=([0-9]*) contig_failed=([0-9]*)';
107my $regex_lru_shrink_inactive_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) priority=([0-9]*)'; 109my $regex_lru_shrink_inactive_default = 'nid=([0-9]*) zid=([0-9]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) priority=([0-9]*) flags=([A-Z_|]*)';
108my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_rotated=([0-9]*) priority=([0-9]*)'; 110my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_rotated=([0-9]*) priority=([0-9]*)';
109my $regex_writepage_default = 'page=([0-9a-f]*) pfn=([0-9]*) flags=([A-Z_|]*)'; 111my $regex_writepage_default = 'page=([0-9a-f]*) pfn=([0-9]*) flags=([A-Z_|]*)';
110 112
@@ -203,8 +205,8 @@ $regex_lru_shrink_inactive = generate_traceevent_regex(
203 "vmscan/mm_vmscan_lru_shrink_inactive", 205 "vmscan/mm_vmscan_lru_shrink_inactive",
204 $regex_lru_shrink_inactive_default, 206 $regex_lru_shrink_inactive_default,
205 "nid", "zid", 207 "nid", "zid",
206 "lru", 208 "nr_scanned", "nr_reclaimed", "priority",
207 "nr_scanned", "nr_reclaimed", "priority"); 209 "flags");
208$regex_lru_shrink_active = generate_traceevent_regex( 210$regex_lru_shrink_active = generate_traceevent_regex(
209 "vmscan/mm_vmscan_lru_shrink_active", 211 "vmscan/mm_vmscan_lru_shrink_active",
210 $regex_lru_shrink_active_default, 212 $regex_lru_shrink_active_default,
@@ -371,10 +373,29 @@ EVENT_PROCESS:
371 print " $regex_lru_isolate/o\n"; 373 print " $regex_lru_isolate/o\n";
372 next; 374 next;
373 } 375 }
376 my $isolate_mode = $1;
374 my $nr_scanned = $4; 377 my $nr_scanned = $4;
375 my $nr_contig_dirty = $7; 378 my $nr_contig_dirty = $7;
376 $perprocesspid{$process_pid}->{HIGH_NR_SCANNED} += $nr_scanned; 379
380 # To closer match vmstat scanning statistics, only count isolate_both
381 # and isolate_inactive as scanning. isolate_active is rotation
382 # isolate_inactive == 0
383 # isolate_active == 1
384 # isolate_both == 2
385 if ($isolate_mode != 1) {
386 $perprocesspid{$process_pid}->{HIGH_NR_SCANNED} += $nr_scanned;
387 }
377 $perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty; 388 $perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty;
389 } elsif ($tracepoint eq "mm_vmscan_lru_shrink_inactive") {
390 $details = $5;
391 if ($details !~ /$regex_lru_shrink_inactive/o) {
392 print "WARNING: Failed to parse mm_vmscan_lru_shrink_inactive as expected\n";
393 print " $details\n";
394 print " $regex_lru_shrink_inactive/o\n";
395 next;
396 }
397 my $nr_reclaimed = $4;
398 $perprocesspid{$process_pid}->{HIGH_NR_RECLAIMED} += $nr_reclaimed;
378 } elsif ($tracepoint eq "mm_vmscan_writepage") { 399 } elsif ($tracepoint eq "mm_vmscan_writepage") {
379 $details = $5; 400 $details = $5;
380 if ($details !~ /$regex_writepage/o) { 401 if ($details !~ /$regex_writepage/o) {
@@ -464,8 +485,8 @@ sub dump_stats {
464 485
465 # Print out process activity 486 # Print out process activity
466 printf("\n"); 487 printf("\n");
467 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s\n", "Process", "Direct", "Wokeup", "Pages", "Pages", "Pages", "Time"); 488 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s %8s\n", "Process", "Direct", "Wokeup", "Pages", "Pages", "Pages", "Pages", "Time");
468 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s\n", "details", "Rclms", "Kswapd", "Scanned", "Sync-IO", "ASync-IO", "Stalled"); 489 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s %8s\n", "details", "Rclms", "Kswapd", "Scanned", "Rclmed", "Sync-IO", "ASync-IO", "Stalled");
469 foreach $process_pid (keys %stats) { 490 foreach $process_pid (keys %stats) {
470 491
471 if (!$stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}) { 492 if (!$stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}) {
@@ -475,6 +496,7 @@ sub dump_stats {
475 $total_direct_reclaim += $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}; 496 $total_direct_reclaim += $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN};
476 $total_wakeup_kswapd += $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}; 497 $total_wakeup_kswapd += $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD};
477 $total_direct_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED}; 498 $total_direct_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED};
499 $total_direct_nr_reclaimed += $stats{$process_pid}->{HIGH_NR_RECLAIMED};
478 $total_direct_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC}; 500 $total_direct_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC};
479 $total_direct_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}; 501 $total_direct_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC};
480 $total_direct_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC}; 502 $total_direct_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC};
@@ -489,11 +511,12 @@ sub dump_stats {
489 $index++; 511 $index++;
490 } 512 }
491 513
492 printf("%-" . $max_strlen . "s %8d %10d %8u %8u %8u %8.3f", 514 printf("%-" . $max_strlen . "s %8d %10d %8u %8u %8u %8u %8.3f",
493 $process_pid, 515 $process_pid,
494 $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}, 516 $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN},
495 $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}, 517 $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD},
496 $stats{$process_pid}->{HIGH_NR_SCANNED}, 518 $stats{$process_pid}->{HIGH_NR_SCANNED},
519 $stats{$process_pid}->{HIGH_NR_RECLAIMED},
497 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}, 520 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC},
498 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC}, 521 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC},
499 $this_reclaim_delay / 1000); 522 $this_reclaim_delay / 1000);
@@ -529,8 +552,8 @@ sub dump_stats {
529 552
530 # Print out kswapd activity 553 # Print out kswapd activity
531 printf("\n"); 554 printf("\n");
532 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Kswapd", "Kswapd", "Order", "Pages", "Pages", "Pages"); 555 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Kswapd", "Kswapd", "Order", "Pages", "Pages", "Pages", "Pages");
533 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Instance", "Wakeups", "Re-wakeup", "Scanned", "Sync-IO", "ASync-IO"); 556 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Instance", "Wakeups", "Re-wakeup", "Scanned", "Rclmed", "Sync-IO", "ASync-IO");
534 foreach $process_pid (keys %stats) { 557 foreach $process_pid (keys %stats) {
535 558
536 if (!$stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}) { 559 if (!$stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}) {
@@ -539,16 +562,18 @@ sub dump_stats {
539 562
540 $total_kswapd_wake += $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}; 563 $total_kswapd_wake += $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE};
541 $total_kswapd_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED}; 564 $total_kswapd_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED};
565 $total_kswapd_nr_reclaimed += $stats{$process_pid}->{HIGH_NR_RECLAIMED};
542 $total_kswapd_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC}; 566 $total_kswapd_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC};
543 $total_kswapd_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}; 567 $total_kswapd_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC};
544 $total_kswapd_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC}; 568 $total_kswapd_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC};
545 $total_kswapd_writepage_anon_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC}; 569 $total_kswapd_writepage_anon_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC};
546 570
547 printf("%-" . $max_strlen . "s %8d %10d %8u %8i %8u", 571 printf("%-" . $max_strlen . "s %8d %10d %8u %8u %8i %8u",
548 $process_pid, 572 $process_pid,
549 $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}, 573 $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE},
550 $stats{$process_pid}->{HIGH_KSWAPD_REWAKEUP}, 574 $stats{$process_pid}->{HIGH_KSWAPD_REWAKEUP},
551 $stats{$process_pid}->{HIGH_NR_SCANNED}, 575 $stats{$process_pid}->{HIGH_NR_SCANNED},
576 $stats{$process_pid}->{HIGH_NR_RECLAIMED},
552 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}, 577 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC},
553 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC}); 578 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC});
554 579
@@ -579,6 +604,7 @@ sub dump_stats {
579 print "\nSummary\n"; 604 print "\nSummary\n";
580 print "Direct reclaims: $total_direct_reclaim\n"; 605 print "Direct reclaims: $total_direct_reclaim\n";
581 print "Direct reclaim pages scanned: $total_direct_nr_scanned\n"; 606 print "Direct reclaim pages scanned: $total_direct_nr_scanned\n";
607 print "Direct reclaim pages reclaimed: $total_direct_nr_reclaimed\n";
582 print "Direct reclaim write file sync I/O: $total_direct_writepage_file_sync\n"; 608 print "Direct reclaim write file sync I/O: $total_direct_writepage_file_sync\n";
583 print "Direct reclaim write anon sync I/O: $total_direct_writepage_anon_sync\n"; 609 print "Direct reclaim write anon sync I/O: $total_direct_writepage_anon_sync\n";
584 print "Direct reclaim write file async I/O: $total_direct_writepage_file_async\n"; 610 print "Direct reclaim write file async I/O: $total_direct_writepage_file_async\n";
@@ -588,6 +614,7 @@ sub dump_stats {
588 print "\n"; 614 print "\n";
589 print "Kswapd wakeups: $total_kswapd_wake\n"; 615 print "Kswapd wakeups: $total_kswapd_wake\n";
590 print "Kswapd pages scanned: $total_kswapd_nr_scanned\n"; 616 print "Kswapd pages scanned: $total_kswapd_nr_scanned\n";
617 print "Kswapd pages reclaimed: $total_kswapd_nr_reclaimed\n";
591 print "Kswapd reclaim write file sync I/O: $total_kswapd_writepage_file_sync\n"; 618 print "Kswapd reclaim write file sync I/O: $total_kswapd_writepage_file_sync\n";
592 print "Kswapd reclaim write anon sync I/O: $total_kswapd_writepage_anon_sync\n"; 619 print "Kswapd reclaim write anon sync I/O: $total_kswapd_writepage_anon_sync\n";
593 print "Kswapd reclaim write file async I/O: $total_kswapd_writepage_file_async\n"; 620 print "Kswapd reclaim write file async I/O: $total_kswapd_writepage_file_async\n";
@@ -612,6 +639,7 @@ sub aggregate_perprocesspid() {
612 $perprocess{$process}->{MM_VMSCAN_WAKEUP_KSWAPD} += $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}; 639 $perprocess{$process}->{MM_VMSCAN_WAKEUP_KSWAPD} += $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD};
613 $perprocess{$process}->{HIGH_KSWAPD_REWAKEUP} += $perprocesspid{$process_pid}->{HIGH_KSWAPD_REWAKEUP}; 640 $perprocess{$process}->{HIGH_KSWAPD_REWAKEUP} += $perprocesspid{$process_pid}->{HIGH_KSWAPD_REWAKEUP};
614 $perprocess{$process}->{HIGH_NR_SCANNED} += $perprocesspid{$process_pid}->{HIGH_NR_SCANNED}; 641 $perprocess{$process}->{HIGH_NR_SCANNED} += $perprocesspid{$process_pid}->{HIGH_NR_SCANNED};
642 $perprocess{$process}->{HIGH_NR_RECLAIMED} += $perprocesspid{$process_pid}->{HIGH_NR_RECLAIMED};
615 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC}; 643 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC};
616 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}; 644 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC};
617 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC}; 645 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC};
diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt
index d299ff31df57..7d350b496585 100644
--- a/Documentation/trace/ring-buffer-design.txt
+++ b/Documentation/trace/ring-buffer-design.txt
@@ -237,7 +237,7 @@ with the previous write.
237 |written | 237 |written |
238 +---------+ 238 +---------+
239 |written | 239 |written |
240 +---------+ <--- next positon for write (current commit) 240 +---------+ <--- next position for write (current commit)
241 | empty | 241 | empty |
242 +---------+ 242 +---------+
243 243