diff options
author | Jiri Olsa <jolsa@kernel.org> | 2016-08-26 04:36:12 -0400 |
---|---|---|
committer | Arnaldo Carvalho de Melo <acme@redhat.com> | 2016-10-21 09:32:01 -0400 |
commit | 465f27a3b2a21cdd1561537c8f2cf293b1d77da4 (patch) | |
tree | bd25e1646a3d051f56029b18f937f6e4144dacc6 | |
parent | 9a406eb610e3676611ce3e32d2e6c55ccc7e5d61 (diff) |
perf c2c: Add man page and credits
Add man page for c2c command and credits to builtin-c2c.c file.
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-twbp391v8v9f5idp584hlfov@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-rw-r--r-- | tools/perf/Documentation/perf-c2c.txt | 276 | ||||
-rw-r--r-- | tools/perf/builtin-c2c.c | 11 |
2 files changed, 287 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt new file mode 100644 index 000000000000..ba2f4de399c3 --- /dev/null +++ b/tools/perf/Documentation/perf-c2c.txt | |||
@@ -0,0 +1,276 @@ | |||
1 | perf-c2c(1) | ||
2 | =========== | ||
3 | |||
4 | NAME | ||
5 | ---- | ||
6 | perf-c2c - Shared Data C2C/HITM Analyzer. | ||
7 | |||
8 | SYNOPSIS | ||
9 | -------- | ||
10 | [verse] | ||
11 | 'perf c2c record' [<options>] <command> | ||
12 | 'perf c2c record' [<options>] -- [<record command options>] <command> | ||
13 | 'perf c2c report' [<options>] | ||
14 | |||
15 | DESCRIPTION | ||
16 | ----------- | ||
17 | C2C stands for Cache To Cache. | ||
18 | |||
19 | The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows | ||
20 | you to track down the cacheline contentions. | ||
21 | |||
22 | The tool is based on x86's load latency and precise store facility events | ||
23 | provided by Intel CPUs. These events provide: | ||
24 | - memory address of the access | ||
25 | - type of the access (load and store details) | ||
26 | - latency (in cycles) of the load access | ||
27 | |||
28 | The c2c tool provide means to record this data and report back access details | ||
29 | for cachelines with highest contention - highest number of HITM accesses. | ||
30 | |||
31 | The basic workflow with this tool follows the standard record/report phase. | ||
32 | User uses the record command to record events data and report command to | ||
33 | display it. | ||
34 | |||
35 | |||
36 | RECORD OPTIONS | ||
37 | -------------- | ||
38 | -e:: | ||
39 | --event=:: | ||
40 | Select the PMU event. Use 'perf mem record -e list' | ||
41 | to list available events. | ||
42 | |||
43 | -v:: | ||
44 | --verbose:: | ||
45 | Be more verbose (show counter open errors, etc). | ||
46 | |||
47 | -l:: | ||
48 | --ldlat:: | ||
49 | Configure mem-loads latency. | ||
50 | |||
51 | -k:: | ||
52 | --all-kernel:: | ||
53 | Configure all used events to run in kernel space. | ||
54 | |||
55 | -u:: | ||
56 | --all-user:: | ||
57 | Configure all used events to run in user space. | ||
58 | |||
59 | REPORT OPTIONS | ||
60 | -------------- | ||
61 | -k:: | ||
62 | --vmlinux=<file>:: | ||
63 | vmlinux pathname | ||
64 | |||
65 | -v:: | ||
66 | --verbose:: | ||
67 | Be more verbose (show counter open errors, etc). | ||
68 | |||
69 | -i:: | ||
70 | --input:: | ||
71 | Specify the input file to process. | ||
72 | |||
73 | -N:: | ||
74 | --node-info:: | ||
75 | Show extra node info in report (see NODE INFO section) | ||
76 | |||
77 | -c:: | ||
78 | --coalesce:: | ||
79 | Specify sorintg fields for single cacheline display. | ||
80 | Following fields are available: tid,pid,iaddr,dso | ||
81 | (see COALESCE) | ||
82 | |||
83 | -g:: | ||
84 | --call-graph:: | ||
85 | Setup callchains parameters. | ||
86 | Please refer to perf-report man page for details. | ||
87 | |||
88 | --stdio:: | ||
89 | Force the stdio output (see STDIO OUTPUT) | ||
90 | |||
91 | --stats:: | ||
92 | Display only statistic tables and force stdio mode. | ||
93 | |||
94 | --full-symbols:: | ||
95 | Display full length of symbols. | ||
96 | |||
97 | C2C RECORD | ||
98 | ---------- | ||
99 | The perf c2c record command setup options related to HITM cacheline analysis | ||
100 | and calls standard perf record command. | ||
101 | |||
102 | Following perf record options are configured by default: | ||
103 | (check perf record man page for details) | ||
104 | |||
105 | -W,-d,--sample-cpu | ||
106 | |||
107 | Unless specified otherwise with '-e' option, following events are monitored by | ||
108 | default: | ||
109 | |||
110 | cpu/mem-loads,ldlat=30/P | ||
111 | cpu/mem-stores/P | ||
112 | |||
113 | User can pass any 'perf record' option behind '--' mark, like (to enable | ||
114 | callchains and system wide monitoring): | ||
115 | |||
116 | $ perf c2c record -- -g -a | ||
117 | |||
118 | Please check RECORD OPTIONS section for specific c2c record options. | ||
119 | |||
120 | C2C REPORT | ||
121 | ---------- | ||
122 | The perf c2c report command displays shared data analysis. It comes in two | ||
123 | display modes: stdio and tui (default). | ||
124 | |||
125 | The report command workflow is following: | ||
126 | - sort all the data based on the cacheline address | ||
127 | - store access details for each cacheline | ||
128 | - sort all cachelines based on user settings | ||
129 | - display data | ||
130 | |||
131 | In general perf report output consist of 2 basic views: | ||
132 | 1) most expensive cachelines list | ||
133 | 2) offsets details for each cacheline | ||
134 | |||
135 | For each cacheline in the 1) list we display following data: | ||
136 | (Both stdio and TUI modes follow the same fields output) | ||
137 | |||
138 | Index | ||
139 | - zero based index to identify the cacheline | ||
140 | |||
141 | Cacheline | ||
142 | - cacheline address (hex number) | ||
143 | |||
144 | Total records | ||
145 | - sum of all cachelines accesses | ||
146 | |||
147 | Rmt/Lcl Hitm | ||
148 | - cacheline percentage of all Remote/Local HITM accesses | ||
149 | |||
150 | LLC Load Hitm - Total, Lcl, Rmt | ||
151 | - count of Total/Local/Remote load HITMs | ||
152 | |||
153 | Store Reference - Total, L1Hit, L1Miss | ||
154 | Total - all store accesses | ||
155 | L1Hit - store accesses that hit L1 | ||
156 | L1Hit - store accesses that missed L1 | ||
157 | |||
158 | Load Dram | ||
159 | - count of local and remote DRAM accesses | ||
160 | |||
161 | LLC Ld Miss | ||
162 | - count of all accesses that missed LLC | ||
163 | |||
164 | Total Loads | ||
165 | - sum of all load accesses | ||
166 | |||
167 | Core Load Hit - FB, L1, L2 | ||
168 | - count of load hits in FB (Fill Buffer), L1 and L2 cache | ||
169 | |||
170 | LLC Load Hit - Llc, Rmt | ||
171 | - count of LLC and Remote load hits | ||
172 | |||
173 | For each offset in the 2) list we display following data: | ||
174 | |||
175 | HITM - Rmt, Lcl | ||
176 | - % of Remote/Local HITM accesses for given offset within cacheline | ||
177 | |||
178 | Store Refs - L1 Hit, L1 Miss | ||
179 | - % of store accesses that hit/missed L1 for given offset within cacheline | ||
180 | |||
181 | Data address - Offset | ||
182 | - offset address | ||
183 | |||
184 | Pid | ||
185 | - pid of the process responsible for the accesses | ||
186 | |||
187 | Tid | ||
188 | - tid of the process responsible for the accesses | ||
189 | |||
190 | Code address | ||
191 | - code address responsible for the accesses | ||
192 | |||
193 | cycles - rmt hitm, lcl hitm, load | ||
194 | - sum of cycles for given accesses - Remote/Local HITM and generic load | ||
195 | |||
196 | cpu cnt | ||
197 | - number of cpus that participated on the access | ||
198 | |||
199 | Symbol | ||
200 | - code symbol related to the 'Code address' value | ||
201 | |||
202 | Shared Object | ||
203 | - shared object name related to the 'Code address' value | ||
204 | |||
205 | Source:Line | ||
206 | - source information related to the 'Code address' value | ||
207 | |||
208 | Node | ||
209 | - nodes participating on the access (see NODE INFO section) | ||
210 | |||
211 | NODE INFO | ||
212 | --------- | ||
213 | The 'Node' field displays nodes that accesses given cacheline | ||
214 | offset. Its output comes in 3 flavors: | ||
215 | - node IDs separated by ',' | ||
216 | - node IDs with stats for each ID, in following format: | ||
217 | Node{cpus %hitms %stores} | ||
218 | - node IDs with list of affected CPUs in following format: | ||
219 | Node{cpu list} | ||
220 | |||
221 | User can switch between above flavors with -N option or | ||
222 | use 'n' key to interactively switch in TUI mode. | ||
223 | |||
224 | COALESCE | ||
225 | -------- | ||
226 | User can specify how to sort offsets for cacheline. | ||
227 | |||
228 | Following fields are available and governs the final | ||
229 | output fields set for caheline offsets output: | ||
230 | |||
231 | tid - coalesced by process TIDs | ||
232 | pid - coalesced by process PIDs | ||
233 | iaddr - coalesced by code address, following fields are displayed: | ||
234 | Code address, Code symbol, Shared Object, Source line | ||
235 | dso - coalesced by shared object | ||
236 | |||
237 | By default the coalescing is setup with 'pid,tid,iaddr'. | ||
238 | |||
239 | STDIO OUTPUT | ||
240 | ------------ | ||
241 | The stdio output displays data on standard output. | ||
242 | |||
243 | Following tables are displayed: | ||
244 | Trace Event Information | ||
245 | - overall statistics of memory accesses | ||
246 | |||
247 | Global Shared Cache Line Event Information | ||
248 | - overall statistics on shared cachelines | ||
249 | |||
250 | Shared Data Cache Line Table | ||
251 | - list of most expensive cachelines | ||
252 | |||
253 | Shared Cache Line Distribution Pareto | ||
254 | - list of all accessed offsets for each cacheline | ||
255 | |||
256 | TUI OUTPUT | ||
257 | ---------- | ||
258 | The TUI output provides interactive interface to navigate | ||
259 | through cachelines list and to display offset details. | ||
260 | |||
261 | For details please refer to the help window by pressing '?' key. | ||
262 | |||
263 | CREDITS | ||
264 | ------- | ||
265 | Although Don Zickus, Dick Fowles and Joe Mario worked together | ||
266 | to get this implemented, we got lots of early help from Arnaldo | ||
267 | Carvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen. | ||
268 | |||
269 | C2C BLOG | ||
270 | -------- | ||
271 | Check Joe's blog on c2c tool for detailed use case explanation: | ||
272 | https://joemario.github.io/blog/2016/09/01/c2c-blog/ | ||
273 | |||
274 | SEE ALSO | ||
275 | -------- | ||
276 | linkperf:perf-record[1], linkperf:perf-mem[1] | ||
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index 4af66835bcb7..32c9e62be5a2 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c | |||
@@ -1,3 +1,14 @@ | |||
1 | /* | ||
2 | * This is rewrite of original c2c tool introduced in here: | ||
3 | * http://lwn.net/Articles/588866/ | ||
4 | * | ||
5 | * The original tool was changed to fit in current perf state. | ||
6 | * | ||
7 | * Original authors: | ||
8 | * Don Zickus <dzickus@redhat.com> | ||
9 | * Dick Fowles <fowles@inreach.com> | ||
10 | * Joe Mario <jmario@redhat.com> | ||
11 | */ | ||
1 | #include <linux/compiler.h> | 12 | #include <linux/compiler.h> |
2 | #include <linux/kernel.h> | 13 | #include <linux/kernel.h> |
3 | #include <linux/stringify.h> | 14 | #include <linux/stringify.h> |