diff options
author | Roberto Agostino Vitillo <ravitillo@lbl.gov> | 2012-02-09 17:21:03 -0500 |
---|---|---|
committer | Ingo Molnar <mingo@elte.hu> | 2012-03-09 02:26:05 -0500 |
commit | b50311dc2ac1c04ad19163c2359910b25e16caf6 (patch) | |
tree | 80a4489b2b268b7512dd4eb566858a6bae8bfffe /tools/perf/Documentation | |
parent | bdfebd848f2a14e639031a0b0e61d7c7ee5e5fd2 (diff) |
perf report: Add support for taken branch sampling
This patch adds support for taken branch sampling, i.e, the
PERF_SAMPLE_BRANCH_STACK feature to perf report. In other
words, to display histograms based on taken branches rather
than executed instructions addresses.
The new option is called -b and it takes no argument. To
generate meaningful output, the perf.data must have been
obtained using perf record -b xxx ... where xxx is a branch
filter option.
The output shows symbols, modules, sorted by 'who branches
where' the most often. The percentages reported in the first
column refer to the total number of branches captured and
not the usual number of samples.
Here is a quick example.
Here branchy is simple test program which looks as follows:
void f2(void)
{}
void f3(void)
{}
void f1(unsigned long n)
{
if (n & 1UL)
f2();
else
f3();
}
int main(void)
{
unsigned long i;
for (i=0; i < N; i++)
f1(i);
return 0;
}
Here is the output captured on Nehalem, if we are
only interested in user level function calls.
$ perf record -b any_call,u -e cycles:u branchy
$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf
About half (52%) of the call branches captured are from main()
-> f1(). The second half (24%+23%) is split in two equal shares
between f1() -> f2(), f1() ->f3(). The output is as expected
given the code.
It should be noted, that using -b in perf record does not
eliminate information in the perf.data file. Consequently, a
typical profile can also be obtained by perf report by simply
not using its -b option.
It is possible to sort on branch related columns:
- dso_from, symbol_from
- dso_to, symbol_to
- mispredict
Signed-off-by: Roberto Agostino Vitillo <ravitillo@lbl.gov>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: acme@redhat.com
Cc: robert.richter@amd.com
Cc: ming.m.lin@intel.com
Cc: andi@firstfloor.org
Cc: asharma@fb.com
Cc: vweaver1@eecs.utk.edu
Cc: khandual@linux.vnet.ibm.com
Cc: dsahern@gmail.com
Link: http://lkml.kernel.org/r/1328826068-11713-14-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/perf-report.txt | 7 |
1 files changed, 7 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 9b430e98712e..19b9092cf8b7 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt | |||
@@ -153,6 +153,13 @@ OPTIONS | |||
153 | information which may be very large and thus may clutter the display. | 153 | information which may be very large and thus may clutter the display. |
154 | It currently includes: cpu and numa topology of the host system. | 154 | It currently includes: cpu and numa topology of the host system. |
155 | 155 | ||
156 | -b:: | ||
157 | --branch-stack:: | ||
158 | Use the addresses of sampled taken branches instead of the instruction | ||
159 | address to build the histograms. To generate meaningful output, the | ||
160 | perf.data file must have been obtained using perf record -b xxx where | ||
161 | xxx is a branch filter option. | ||
162 | |||
156 | SEE ALSO | 163 | SEE ALSO |
157 | -------- | 164 | -------- |
158 | linkperf:perf-stat[1], linkperf:perf-annotate[1] | 165 | linkperf:perf-stat[1], linkperf:perf-annotate[1] |