diff options
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/Makefile | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/examples.txt | 225 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-record.txt | 60 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-stat.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-top.txt | 112 |
5 files changed, 383 insertions, 18 deletions
diff --git a/tools/perf/Documentation/Makefile b/tools/perf/Documentation/Makefile index 5457192e1b41..bdd3b7ecad0a 100644 --- a/tools/perf/Documentation/Makefile +++ b/tools/perf/Documentation/Makefile | |||
@@ -35,7 +35,7 @@ man7dir=$(mandir)/man7 | |||
35 | # DESTDIR= | 35 | # DESTDIR= |
36 | 36 | ||
37 | ASCIIDOC=asciidoc | 37 | ASCIIDOC=asciidoc |
38 | ASCIIDOC_EXTRA = | 38 | ASCIIDOC_EXTRA = --unsafe |
39 | MANPAGE_XSL = manpage-normal.xsl | 39 | MANPAGE_XSL = manpage-normal.xsl |
40 | XMLTO_EXTRA = | 40 | XMLTO_EXTRA = |
41 | INSTALL?=install | 41 | INSTALL?=install |
diff --git a/tools/perf/Documentation/examples.txt b/tools/perf/Documentation/examples.txt new file mode 100644 index 000000000000..8eb6c489fb15 --- /dev/null +++ b/tools/perf/Documentation/examples.txt | |||
@@ -0,0 +1,225 @@ | |||
1 | |||
2 | ------------------------------ | ||
3 | ****** perf by examples ****** | ||
4 | ------------------------------ | ||
5 | |||
6 | [ From an e-mail by Ingo Molnar, http://lkml.org/lkml/2009/8/4/346 ] | ||
7 | |||
8 | |||
9 | First, discovery/enumeration of available counters can be done via | ||
10 | 'perf list': | ||
11 | |||
12 | titan:~> perf list | ||
13 | [...] | ||
14 | kmem:kmalloc [Tracepoint event] | ||
15 | kmem:kmem_cache_alloc [Tracepoint event] | ||
16 | kmem:kmalloc_node [Tracepoint event] | ||
17 | kmem:kmem_cache_alloc_node [Tracepoint event] | ||
18 | kmem:kfree [Tracepoint event] | ||
19 | kmem:kmem_cache_free [Tracepoint event] | ||
20 | kmem:mm_page_free_direct [Tracepoint event] | ||
21 | kmem:mm_pagevec_free [Tracepoint event] | ||
22 | kmem:mm_page_alloc [Tracepoint event] | ||
23 | kmem:mm_page_alloc_zone_locked [Tracepoint event] | ||
24 | kmem:mm_page_pcpu_drain [Tracepoint event] | ||
25 | kmem:mm_page_alloc_extfrag [Tracepoint event] | ||
26 | |||
27 | Then any (or all) of the above event sources can be activated and | ||
28 | measured. For example the page alloc/free properties of a 'hackbench | ||
29 | run' are: | ||
30 | |||
31 | titan:~> perf stat -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc | ||
32 | -e kmem:mm_pagevec_free -e kmem:mm_page_free_direct ./hackbench 10 | ||
33 | Time: 0.575 | ||
34 | |||
35 | Performance counter stats for './hackbench 10': | ||
36 | |||
37 | 13857 kmem:mm_page_pcpu_drain | ||
38 | 27576 kmem:mm_page_alloc | ||
39 | 6025 kmem:mm_pagevec_free | ||
40 | 20934 kmem:mm_page_free_direct | ||
41 | |||
42 | 0.613972165 seconds time elapsed | ||
43 | |||
44 | You can observe the statistical properties as well, by using the | ||
45 | 'repeat the workload N times' feature of perf stat: | ||
46 | |||
47 | titan:~> perf stat --repeat 5 -e kmem:mm_page_pcpu_drain -e | ||
48 | kmem:mm_page_alloc -e kmem:mm_pagevec_free -e | ||
49 | kmem:mm_page_free_direct ./hackbench 10 | ||
50 | Time: 0.627 | ||
51 | Time: 0.644 | ||
52 | Time: 0.564 | ||
53 | Time: 0.559 | ||
54 | Time: 0.626 | ||
55 | |||
56 | Performance counter stats for './hackbench 10' (5 runs): | ||
57 | |||
58 | 12920 kmem:mm_page_pcpu_drain ( +- 3.359% ) | ||
59 | 25035 kmem:mm_page_alloc ( +- 3.783% ) | ||
60 | 6104 kmem:mm_pagevec_free ( +- 0.934% ) | ||
61 | 18376 kmem:mm_page_free_direct ( +- 4.941% ) | ||
62 | |||
63 | 0.643954516 seconds time elapsed ( +- 2.363% ) | ||
64 | |||
65 | Furthermore, these tracepoints can be used to sample the workload as | ||
66 | well. For example the page allocations done by a 'git gc' can be | ||
67 | captured the following way: | ||
68 | |||
69 | titan:~/git> perf record -f -e kmem:mm_page_alloc -c 1 ./git gc | ||
70 | Counting objects: 1148, done. | ||
71 | Delta compression using up to 2 threads. | ||
72 | Compressing objects: 100% (450/450), done. | ||
73 | Writing objects: 100% (1148/1148), done. | ||
74 | Total 1148 (delta 690), reused 1148 (delta 690) | ||
75 | [ perf record: Captured and wrote 0.267 MB perf.data (~11679 samples) ] | ||
76 | |||
77 | To check which functions generated page allocations: | ||
78 | |||
79 | titan:~/git> perf report | ||
80 | # Samples: 10646 | ||
81 | # | ||
82 | # Overhead Command Shared Object | ||
83 | # ........ ............... .......................... | ||
84 | # | ||
85 | 23.57% git-repack /lib64/libc-2.5.so | ||
86 | 21.81% git /lib64/libc-2.5.so | ||
87 | 14.59% git ./git | ||
88 | 11.79% git-repack ./git | ||
89 | 7.12% git /lib64/ld-2.5.so | ||
90 | 3.16% git-repack /lib64/libpthread-2.5.so | ||
91 | 2.09% git-repack /bin/bash | ||
92 | 1.97% rm /lib64/libc-2.5.so | ||
93 | 1.39% mv /lib64/ld-2.5.so | ||
94 | 1.37% mv /lib64/libc-2.5.so | ||
95 | 1.12% git-repack /lib64/ld-2.5.so | ||
96 | 0.95% rm /lib64/ld-2.5.so | ||
97 | 0.90% git-update-serv /lib64/libc-2.5.so | ||
98 | 0.73% git-update-serv /lib64/ld-2.5.so | ||
99 | 0.68% perf /lib64/libpthread-2.5.so | ||
100 | 0.64% git-repack /usr/lib64/libz.so.1.2.3 | ||
101 | |||
102 | Or to see it on a more finegrained level: | ||
103 | |||
104 | titan:~/git> perf report --sort comm,dso,symbol | ||
105 | # Samples: 10646 | ||
106 | # | ||
107 | # Overhead Command Shared Object Symbol | ||
108 | # ........ ............... .......................... ...... | ||
109 | # | ||
110 | 9.35% git-repack ./git [.] insert_obj_hash | ||
111 | 9.12% git ./git [.] insert_obj_hash | ||
112 | 7.31% git /lib64/libc-2.5.so [.] memcpy | ||
113 | 6.34% git-repack /lib64/libc-2.5.so [.] _int_malloc | ||
114 | 6.24% git-repack /lib64/libc-2.5.so [.] memcpy | ||
115 | 5.82% git-repack /lib64/libc-2.5.so [.] __GI___fork | ||
116 | 5.47% git /lib64/libc-2.5.so [.] _int_malloc | ||
117 | 2.99% git /lib64/libc-2.5.so [.] memset | ||
118 | |||
119 | Furthermore, call-graph sampling can be done too, of page | ||
120 | allocations - to see precisely what kind of page allocations there | ||
121 | are: | ||
122 | |||
123 | titan:~/git> perf record -f -g -e kmem:mm_page_alloc -c 1 ./git gc | ||
124 | Counting objects: 1148, done. | ||
125 | Delta compression using up to 2 threads. | ||
126 | Compressing objects: 100% (450/450), done. | ||
127 | Writing objects: 100% (1148/1148), done. | ||
128 | Total 1148 (delta 690), reused 1148 (delta 690) | ||
129 | [ perf record: Captured and wrote 0.963 MB perf.data (~42069 samples) ] | ||
130 | |||
131 | titan:~/git> perf report -g | ||
132 | # Samples: 10686 | ||
133 | # | ||
134 | # Overhead Command Shared Object | ||
135 | # ........ ............... .......................... | ||
136 | # | ||
137 | 23.25% git-repack /lib64/libc-2.5.so | ||
138 | | | ||
139 | |--50.00%-- _int_free | ||
140 | | | ||
141 | |--37.50%-- __GI___fork | ||
142 | | make_child | ||
143 | | | ||
144 | |--12.50%-- ptmalloc_unlock_all2 | ||
145 | | make_child | ||
146 | | | ||
147 | --6.25%-- __GI_strcpy | ||
148 | 21.61% git /lib64/libc-2.5.so | ||
149 | | | ||
150 | |--30.00%-- __GI_read | ||
151 | | | | ||
152 | | --83.33%-- git_config_from_file | ||
153 | | git_config | ||
154 | | | | ||
155 | [...] | ||
156 | |||
157 | Or you can observe the whole system's page allocations for 10 | ||
158 | seconds: | ||
159 | |||
160 | titan:~/git> perf stat -a -e kmem:mm_page_pcpu_drain -e | ||
161 | kmem:mm_page_alloc -e kmem:mm_pagevec_free -e | ||
162 | kmem:mm_page_free_direct sleep 10 | ||
163 | |||
164 | Performance counter stats for 'sleep 10': | ||
165 | |||
166 | 171585 kmem:mm_page_pcpu_drain | ||
167 | 322114 kmem:mm_page_alloc | ||
168 | 73623 kmem:mm_pagevec_free | ||
169 | 254115 kmem:mm_page_free_direct | ||
170 | |||
171 | 10.000591410 seconds time elapsed | ||
172 | |||
173 | Or observe how fluctuating the page allocations are, via statistical | ||
174 | analysis done over ten 1-second intervals: | ||
175 | |||
176 | titan:~/git> perf stat --repeat 10 -a -e kmem:mm_page_pcpu_drain -e | ||
177 | kmem:mm_page_alloc -e kmem:mm_pagevec_free -e | ||
178 | kmem:mm_page_free_direct sleep 1 | ||
179 | |||
180 | Performance counter stats for 'sleep 1' (10 runs): | ||
181 | |||
182 | 17254 kmem:mm_page_pcpu_drain ( +- 3.709% ) | ||
183 | 34394 kmem:mm_page_alloc ( +- 4.617% ) | ||
184 | 7509 kmem:mm_pagevec_free ( +- 4.820% ) | ||
185 | 25653 kmem:mm_page_free_direct ( +- 3.672% ) | ||
186 | |||
187 | 1.058135029 seconds time elapsed ( +- 3.089% ) | ||
188 | |||
189 | Or you can annotate the recorded 'git gc' run on a per symbol basis | ||
190 | and check which instructions/source-code generated page allocations: | ||
191 | |||
192 | titan:~/git> perf annotate __GI___fork | ||
193 | ------------------------------------------------ | ||
194 | Percent | Source code & Disassembly of libc-2.5.so | ||
195 | ------------------------------------------------ | ||
196 | : | ||
197 | : | ||
198 | : Disassembly of section .plt: | ||
199 | : Disassembly of section .text: | ||
200 | : | ||
201 | : 00000031a2e95560 <__fork>: | ||
202 | [...] | ||
203 | 0.00 : 31a2e95602: b8 38 00 00 00 mov $0x38,%eax | ||
204 | 0.00 : 31a2e95607: 0f 05 syscall | ||
205 | 83.42 : 31a2e95609: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax | ||
206 | 0.00 : 31a2e9560f: 0f 87 4d 01 00 00 ja 31a2e95762 <__fork+0x202> | ||
207 | 0.00 : 31a2e95615: 85 c0 test %eax,%eax | ||
208 | |||
209 | ( this shows that 83.42% of __GI___fork's page allocations come from | ||
210 | the 0x38 system call it performs. ) | ||
211 | |||
212 | etc. etc. - a lot more is possible. I could list a dozen of | ||
213 | other different usecases straight away - neither of which is | ||
214 | possible via /proc/vmstat. | ||
215 | |||
216 | /proc/vmstat is not in the same league really, in terms of | ||
217 | expressive power of system analysis and performance | ||
218 | analysis. | ||
219 | |||
220 | All that the above results needed were those new tracepoints | ||
221 | in include/tracing/events/kmem.h. | ||
222 | |||
223 | Ingo | ||
224 | |||
225 | |||
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 1dbc1eeb4c01..6be696b0a2bb 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt | |||
@@ -29,13 +29,67 @@ OPTIONS | |||
29 | Select the PMU event. Selection can be a symbolic event name | 29 | Select the PMU event. Selection can be a symbolic event name |
30 | (use 'perf list' to list all events) or a raw PMU | 30 | (use 'perf list' to list all events) or a raw PMU |
31 | event (eventsel+umask) in the form of rNNN where NNN is a | 31 | event (eventsel+umask) in the form of rNNN where NNN is a |
32 | hexadecimal event descriptor. | 32 | hexadecimal event descriptor. |
33 | 33 | ||
34 | -a:: | 34 | -a:: |
35 | system-wide collection | 35 | System-wide collection. |
36 | 36 | ||
37 | -l:: | 37 | -l:: |
38 | scale counter values | 38 | Scale counter values. |
39 | |||
40 | -p:: | ||
41 | --pid=:: | ||
42 | Record events on existing pid. | ||
43 | |||
44 | -r:: | ||
45 | --realtime=:: | ||
46 | Collect data with this RT SCHED_FIFO priority. | ||
47 | -A:: | ||
48 | --append:: | ||
49 | Append to the output file to do incremental profiling. | ||
50 | |||
51 | -f:: | ||
52 | --force:: | ||
53 | Overwrite existing data file. | ||
54 | |||
55 | -c:: | ||
56 | --count=:: | ||
57 | Event period to sample. | ||
58 | |||
59 | -o:: | ||
60 | --output=:: | ||
61 | Output file name. | ||
62 | |||
63 | -i:: | ||
64 | --inherit:: | ||
65 | Child tasks inherit counters. | ||
66 | -F:: | ||
67 | --freq=:: | ||
68 | Profile at this frequency. | ||
69 | |||
70 | -m:: | ||
71 | --mmap-pages=:: | ||
72 | Number of mmap data pages. | ||
73 | |||
74 | -g:: | ||
75 | --call-graph:: | ||
76 | Do call-graph (stack chain/backtrace) recording. | ||
77 | |||
78 | -v:: | ||
79 | --verbose:: | ||
80 | Be more verbose (show counter open errors, etc). | ||
81 | |||
82 | -s:: | ||
83 | --stat:: | ||
84 | Per thread counts. | ||
85 | |||
86 | -d:: | ||
87 | --data:: | ||
88 | Sample addresses. | ||
89 | |||
90 | -n:: | ||
91 | --no-samples:: | ||
92 | Don't sample. | ||
39 | 93 | ||
40 | SEE ALSO | 94 | SEE ALSO |
41 | -------- | 95 | -------- |
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 0d74346d21ab..484080dd5b6f 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt | |||
@@ -40,7 +40,7 @@ OPTIONS | |||
40 | -a:: | 40 | -a:: |
41 | system-wide collection | 41 | system-wide collection |
42 | 42 | ||
43 | -S:: | 43 | -c:: |
44 | scale counter values | 44 | scale counter values |
45 | 45 | ||
46 | EXAMPLES | 46 | EXAMPLES |
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 539d01289725..4a7d558dc309 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt | |||
@@ -3,36 +3,122 @@ perf-top(1) | |||
3 | 3 | ||
4 | NAME | 4 | NAME |
5 | ---- | 5 | ---- |
6 | perf-top - Run a command and profile it | 6 | perf-top - System profiling tool. |
7 | 7 | ||
8 | SYNOPSIS | 8 | SYNOPSIS |
9 | -------- | 9 | -------- |
10 | [verse] | 10 | [verse] |
11 | 'perf top' [-e <EVENT> | --event=EVENT] [-l] [-a] <command> | 11 | 'perf top' [-e <EVENT> | --event=EVENT] [<options>] |
12 | 12 | ||
13 | DESCRIPTION | 13 | DESCRIPTION |
14 | ----------- | 14 | ----------- |
15 | This command runs a command and gathers a performance counter profile | 15 | This command generates and displays a performance counter profile in realtime. |
16 | from it. | ||
17 | 16 | ||
18 | 17 | ||
19 | OPTIONS | 18 | OPTIONS |
20 | ------- | 19 | ------- |
21 | <command>...:: | 20 | -a:: |
22 | Any command you can specify in a shell. | 21 | --all-cpus:: |
22 | System-wide collection. (default) | ||
23 | |||
24 | -c <count>:: | ||
25 | --count=<count>:: | ||
26 | Event period to sample. | ||
27 | |||
28 | -C <cpu>:: | ||
29 | --CPU=<cpu>:: | ||
30 | CPU to profile. | ||
31 | |||
32 | -d <seconds>:: | ||
33 | --delay=<seconds>:: | ||
34 | Number of seconds to delay between refreshes. | ||
23 | 35 | ||
24 | -e:: | 36 | -e <event>:: |
25 | --event=:: | 37 | --event=<event>:: |
26 | Select the PMU event. Selection can be a symbolic event name | 38 | Select the PMU event. Selection can be a symbolic event name |
27 | (use 'perf list' to list all events) or a raw PMU | 39 | (use 'perf list' to list all events) or a raw PMU |
28 | event (eventsel+umask) in the form of rNNN where NNN is a | 40 | event (eventsel+umask) in the form of rNNN where NNN is a |
29 | hexadecimal event descriptor. | 41 | hexadecimal event descriptor. |
30 | 42 | ||
31 | -a:: | 43 | -E <entries>:: |
32 | system-wide collection | 44 | --entries=<entries>:: |
45 | Display this many functions. | ||
46 | |||
47 | -f <count>:: | ||
48 | --count-filter=<count>:: | ||
49 | Only display functions with more events than this. | ||
50 | |||
51 | -F <freq>:: | ||
52 | --freq=<freq>:: | ||
53 | Profile at this frequency. | ||
54 | |||
55 | -i:: | ||
56 | --inherit:: | ||
57 | Child tasks inherit counters, only makes sens with -p option. | ||
58 | |||
59 | -k <path>:: | ||
60 | --vmlinux=<path>:: | ||
61 | Path to vmlinux. Required for annotation functionality. | ||
62 | |||
63 | -m <pages>:: | ||
64 | --mmap-pages=<pages>:: | ||
65 | Number of mmapped data pages. | ||
66 | |||
67 | -p <pid>:: | ||
68 | --pid=<pid>:: | ||
69 | Profile events on existing pid. | ||
70 | |||
71 | -r <priority>:: | ||
72 | --realtime=<priority>:: | ||
73 | Collect data with this RT SCHED_FIFO priority. | ||
74 | |||
75 | -s <symbol>:: | ||
76 | --sym-annotate=<symbol>:: | ||
77 | Annotate this symbol. Requires -k option. | ||
78 | |||
79 | -v:: | ||
80 | --verbose:: | ||
81 | Be more verbose (show counter open errors, etc). | ||
82 | |||
83 | -z:: | ||
84 | --zero:: | ||
85 | Zero history across display updates. | ||
86 | |||
87 | INTERACTIVE PROMPTING KEYS | ||
88 | -------------------------- | ||
89 | |||
90 | [d]:: | ||
91 | Display refresh delay. | ||
92 | |||
93 | [e]:: | ||
94 | Number of entries to display. | ||
95 | |||
96 | [E]:: | ||
97 | Event to display when multiple counters are active. | ||
98 | |||
99 | [f]:: | ||
100 | Profile display filter (>= hit count). | ||
101 | |||
102 | [F]:: | ||
103 | Annotation display filter (>= % of total). | ||
104 | |||
105 | [s]:: | ||
106 | Annotate symbol. | ||
107 | |||
108 | [S]:: | ||
109 | Stop annotation, return to full profile display. | ||
110 | |||
111 | [w]:: | ||
112 | Toggle between weighted sum and individual count[E]r profile. | ||
113 | |||
114 | [z]:: | ||
115 | Toggle event count zeroing across display updates. | ||
116 | |||
117 | [qQ]:: | ||
118 | Quit. | ||
119 | |||
120 | Pressing any unmapped key displays a menu, and prompts for input. | ||
33 | 121 | ||
34 | -l:: | ||
35 | scale counter values | ||
36 | 122 | ||
37 | SEE ALSO | 123 | SEE ALSO |
38 | -------- | 124 | -------- |