aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJiri Olsa <jolsa@kernel.org>2016-08-26 04:36:12 -0400
committerArnaldo Carvalho de Melo <acme@redhat.com>2016-10-21 09:32:01 -0400
commit465f27a3b2a21cdd1561537c8f2cf293b1d77da4 (patch)
treebd25e1646a3d051f56029b18f937f6e4144dacc6
parent9a406eb610e3676611ce3e32d2e6c55ccc7e5d61 (diff)
perf c2c: Add man page and credits
Add man page for c2c command and credits to builtin-c2c.c file. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-twbp391v8v9f5idp584hlfov@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-rw-r--r--tools/perf/Documentation/perf-c2c.txt276
-rw-r--r--tools/perf/builtin-c2c.c11
2 files changed, 287 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
new file mode 100644
index 000000000000..ba2f4de399c3
--- /dev/null
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -0,0 +1,276 @@
1perf-c2c(1)
2===========
3
4NAME
5----
6perf-c2c - Shared Data C2C/HITM Analyzer.
7
8SYNOPSIS
9--------
10[verse]
11'perf c2c record' [<options>] <command>
12'perf c2c record' [<options>] -- [<record command options>] <command>
13'perf c2c report' [<options>]
14
15DESCRIPTION
16-----------
17C2C stands for Cache To Cache.
18
19The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
20you to track down the cacheline contentions.
21
22The tool is based on x86's load latency and precise store facility events
23provided by Intel CPUs. These events provide:
24 - memory address of the access
25 - type of the access (load and store details)
26 - latency (in cycles) of the load access
27
28The c2c tool provide means to record this data and report back access details
29for cachelines with highest contention - highest number of HITM accesses.
30
31The basic workflow with this tool follows the standard record/report phase.
32User uses the record command to record events data and report command to
33display it.
34
35
36RECORD OPTIONS
37--------------
38-e::
39--event=::
40 Select the PMU event. Use 'perf mem record -e list'
41 to list available events.
42
43-v::
44--verbose::
45 Be more verbose (show counter open errors, etc).
46
47-l::
48--ldlat::
49 Configure mem-loads latency.
50
51-k::
52--all-kernel::
53 Configure all used events to run in kernel space.
54
55-u::
56--all-user::
57 Configure all used events to run in user space.
58
59REPORT OPTIONS
60--------------
61-k::
62--vmlinux=<file>::
63 vmlinux pathname
64
65-v::
66--verbose::
67 Be more verbose (show counter open errors, etc).
68
69-i::
70--input::
71 Specify the input file to process.
72
73-N::
74--node-info::
75 Show extra node info in report (see NODE INFO section)
76
77-c::
78--coalesce::
79 Specify sorintg fields for single cacheline display.
80 Following fields are available: tid,pid,iaddr,dso
81 (see COALESCE)
82
83-g::
84--call-graph::
85 Setup callchains parameters.
86 Please refer to perf-report man page for details.
87
88--stdio::
89 Force the stdio output (see STDIO OUTPUT)
90
91--stats::
92 Display only statistic tables and force stdio mode.
93
94--full-symbols::
95 Display full length of symbols.
96
97C2C RECORD
98----------
99The perf c2c record command setup options related to HITM cacheline analysis
100and calls standard perf record command.
101
102Following perf record options are configured by default:
103(check perf record man page for details)
104
105 -W,-d,--sample-cpu
106
107Unless specified otherwise with '-e' option, following events are monitored by
108default:
109
110 cpu/mem-loads,ldlat=30/P
111 cpu/mem-stores/P
112
113User can pass any 'perf record' option behind '--' mark, like (to enable
114callchains and system wide monitoring):
115
116 $ perf c2c record -- -g -a
117
118Please check RECORD OPTIONS section for specific c2c record options.
119
120C2C REPORT
121----------
122The perf c2c report command displays shared data analysis. It comes in two
123display modes: stdio and tui (default).
124
125The report command workflow is following:
126 - sort all the data based on the cacheline address
127 - store access details for each cacheline
128 - sort all cachelines based on user settings
129 - display data
130
131In general perf report output consist of 2 basic views:
132 1) most expensive cachelines list
133 2) offsets details for each cacheline
134
135For each cacheline in the 1) list we display following data:
136(Both stdio and TUI modes follow the same fields output)
137
138 Index
139 - zero based index to identify the cacheline
140
141 Cacheline
142 - cacheline address (hex number)
143
144 Total records
145 - sum of all cachelines accesses
146
147 Rmt/Lcl Hitm
148 - cacheline percentage of all Remote/Local HITM accesses
149
150 LLC Load Hitm - Total, Lcl, Rmt
151 - count of Total/Local/Remote load HITMs
152
153 Store Reference - Total, L1Hit, L1Miss
154 Total - all store accesses
155 L1Hit - store accesses that hit L1
156 L1Hit - store accesses that missed L1
157
158 Load Dram
159 - count of local and remote DRAM accesses
160
161 LLC Ld Miss
162 - count of all accesses that missed LLC
163
164 Total Loads
165 - sum of all load accesses
166
167 Core Load Hit - FB, L1, L2
168 - count of load hits in FB (Fill Buffer), L1 and L2 cache
169
170 LLC Load Hit - Llc, Rmt
171 - count of LLC and Remote load hits
172
173For each offset in the 2) list we display following data:
174
175 HITM - Rmt, Lcl
176 - % of Remote/Local HITM accesses for given offset within cacheline
177
178 Store Refs - L1 Hit, L1 Miss
179 - % of store accesses that hit/missed L1 for given offset within cacheline
180
181 Data address - Offset
182 - offset address
183
184 Pid
185 - pid of the process responsible for the accesses
186
187 Tid
188 - tid of the process responsible for the accesses
189
190 Code address
191 - code address responsible for the accesses
192
193 cycles - rmt hitm, lcl hitm, load
194 - sum of cycles for given accesses - Remote/Local HITM and generic load
195
196 cpu cnt
197 - number of cpus that participated on the access
198
199 Symbol
200 - code symbol related to the 'Code address' value
201
202 Shared Object
203 - shared object name related to the 'Code address' value
204
205 Source:Line
206 - source information related to the 'Code address' value
207
208 Node
209 - nodes participating on the access (see NODE INFO section)
210
211NODE INFO
212---------
213The 'Node' field displays nodes that accesses given cacheline
214offset. Its output comes in 3 flavors:
215 - node IDs separated by ','
216 - node IDs with stats for each ID, in following format:
217 Node{cpus %hitms %stores}
218 - node IDs with list of affected CPUs in following format:
219 Node{cpu list}
220
221User can switch between above flavors with -N option or
222use 'n' key to interactively switch in TUI mode.
223
224COALESCE
225--------
226User can specify how to sort offsets for cacheline.
227
228Following fields are available and governs the final
229output fields set for caheline offsets output:
230
231 tid - coalesced by process TIDs
232 pid - coalesced by process PIDs
233 iaddr - coalesced by code address, following fields are displayed:
234 Code address, Code symbol, Shared Object, Source line
235 dso - coalesced by shared object
236
237By default the coalescing is setup with 'pid,tid,iaddr'.
238
239STDIO OUTPUT
240------------
241The stdio output displays data on standard output.
242
243Following tables are displayed:
244 Trace Event Information
245 - overall statistics of memory accesses
246
247 Global Shared Cache Line Event Information
248 - overall statistics on shared cachelines
249
250 Shared Data Cache Line Table
251 - list of most expensive cachelines
252
253 Shared Cache Line Distribution Pareto
254 - list of all accessed offsets for each cacheline
255
256TUI OUTPUT
257----------
258The TUI output provides interactive interface to navigate
259through cachelines list and to display offset details.
260
261For details please refer to the help window by pressing '?' key.
262
263CREDITS
264-------
265Although Don Zickus, Dick Fowles and Joe Mario worked together
266to get this implemented, we got lots of early help from Arnaldo
267Carvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
268
269C2C BLOG
270--------
271Check Joe's blog on c2c tool for detailed use case explanation:
272 https://joemario.github.io/blog/2016/09/01/c2c-blog/
273
274SEE ALSO
275--------
276linkperf:perf-record[1], linkperf:perf-mem[1]
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 4af66835bcb7..32c9e62be5a2 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1,3 +1,14 @@
1/*
2 * This is rewrite of original c2c tool introduced in here:
3 * http://lwn.net/Articles/588866/
4 *
5 * The original tool was changed to fit in current perf state.
6 *
7 * Original authors:
8 * Don Zickus <dzickus@redhat.com>
9 * Dick Fowles <fowles@inreach.com>
10 * Joe Mario <jmario@redhat.com>
11 */
1#include <linux/compiler.h> 12#include <linux/compiler.h>
2#include <linux/kernel.h> 13#include <linux/kernel.h>
3#include <linux/stringify.h> 14#include <linux/stringify.h>