Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar: "As a first remark I'd like to note that the way to build perf tooling has been simplified and sped up, in the future it should be enough for you to build perf via: cd tools/perf/ make install (ie without the -j option.) The build system will figure out the number of CPUs and will do a parallel build+install. The various build system inefficiencies and breakages Linus reported against the v3.12 pull request should now be resolved - please (re-)report any remaining annoyances or bugs. Main changes on the perf kernel side: * Performance optimizations: . perf ring-buffer code optimizations, by Peter Zijlstra . perf ring-buffer code optimizations, by Oleg Nesterov . x86 NMI call-stack processing optimizations, by Peter Zijlstra . perf context-switch optimizations, by Peter Zijlstra . perf sampling speedups, by Peter Zijlstra . x86 Intel PEBS processing speedups, by Peter Zijlstra * Enhanced hardware support: . for Intel Ivy Bridge-EP uncore PMUs, by Zheng Yan . for Haswell transactions, by Andi Kleen, Peter Zijlstra * Core perf events code enhancements and fixes by Oleg Nesterov: . for uprobes, if fork() is called with pending ret-probes . for uprobes platform support code * New ABI details by Andi Kleen: . Report x86 Haswell TSX transaction abort cost as weight Main changes on the perf tooling side (some of these tooling changes utilize the above kernel side changes): * 'perf report/top' enhancements: . Convert callchain children list to rbtree, greatly reducing the time taken for callchain processing, from Namhyung Kim. . Add new COMM infrastructure, further improving histogram processing, from Frédéric Weisbecker, one fix from Namhyung Kim. . Add /proc/kcore based live-annotation improvements, including build-id cache support, multi map 'call' instruction navigation fixes, kcore address validation, objdump workarounds. From Adrian Hunter. . Show progress on histogram collapsing, that can take a long time, from Namhyung Kim. . Add --max-stack option to limit callchain stack scan in 'top' and 'report', improving callchain processing when reducing the stack depth is an option, from Waiman Long. . Add new option --ignore-vmlinux for perf top, from Willy Tarreau. * 'perf trace' enhancements: . 'perf trace' now can can use a 'perf probe' dynamic tracepoints to hook into the userspace -> kernel pathname copy so that it can map fds to pathnames without reading /proc/pid/fd/ symlinks. From Arnaldo Carvalho de Melo. . Show VFS path associated with fd in live sessions, using a 'vfs_getname' 'perf probe' created dynamic tracepoint or by looking at /proc/pid/fd, from Arnaldo Carvalho de Melo. . Add 'trace' beautifiers for lots of syscall arguments, from Arnaldo Carvalho de Melo. . Implement more compact 'trace' output by suppressing zeroed args, from Arnaldo Carvalho de Melo. . Show thread COMM by default in 'trace', from Arnaldo Carvalho de Melo. . Add option to show full timestamp in 'trace', from David Ahern. . Add 'record' command in 'trace', to record raw_syscalls:*, from David Ahern. . Add summary option to dump syscall statistics in 'trace', from David Ahern. . Improve error messages in 'trace', providing hints about system configuration steps needed for using it, from Ramkumar Ramachandra. . 'perf trace' now emits hints as to why tracing is not possible, helping the user to setup the system to allow tracing in the desired permission granularity, telling if the problem is due to debugfs not being mounted or with not enough permission for !root, /proc/sys/kernel/perf_event_paranoit value, etc. From Arnaldo Carvalho de Melo. * 'perf record' enhancements: . Check maximum frequency rate for record/top, emitting better error messages, from Jiri Olsa. . 'perf record' code cleanups, from David Ahern. . Improve write_output error message in 'perf record', from Adrian Hunter. . Allow specifying B/K/M/G unit to the --mmap-pages arguments, from Jiri Olsa. . Fix command line callchain attribute tests to handle the new -g/--call-chain semantics, from Arnaldo Carvalho de Melo. * 'perf kvm' enhancements: . Disable live kvm command if timerfd is not supported, from David Ahern. . Fix detection of non-core features, from David Ahern. * 'perf list' enhancements: . Add usage to 'perf list', from David Ahern. . Show error in 'perf list' if tracepoints not available, from Pekka Enberg. * 'perf probe' enhancements: . Support "$vars" meta argument syntax for local variables, allowing asking for all possible variables at a given probe point to be collected when it hits, from Masami Hiramatsu. * 'perf sched' enhancements: . Address the root cause of that 'perf sched' stack initialization build slowdown, by programmatically setting a big array after moving the global variable back to the stack. Fix from Adrian Hunter. * 'perf script' enhancements: . Set up output options for in-stream attributes, from Adrian Hunter. . Print addr by default for BTS in 'perf script', from Adrian Juntmer * 'perf stat' enhancements: . Improved messages when doing profiling in all or a subset of CPUs using a workload as the session delimitator, as in: 'perf stat --cpu 0,2 sleep 10s' from Arnaldo Carvalho de Melo. . Add units to nanosec-based counters in 'perf stat', from David Ahern. . Remove bogus info when using 'perf stat' -e cycles/instructions, from Ramkumar Ramachandra. * 'perf lock' enhancements: . 'perf lock' fixes and cleanups, from Davidlohr Bueso. * 'perf test' enhancements: . Fixup PERF_SAMPLE_TRANSACTION handling in sample synthesizing and 'perf test', from Adrian Hunter. . Clarify the "sample parsing" test entry, from Arnaldo Carvalho de Melo. . Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" test, from Arnaldo Carvalho de Melo. . Memory leak fixes in 'perf test', from Felipe Pena. * 'perf bench' enhancements: . Change the procps visible command-name of invididual benchmark tests plus cleanups, from Ingo Molnar. * Generic perf tooling infrastructure/plumbing changes: . Separating data file properties from session, code reorganization from Jiri Olsa. . Fix version when building out of tree, as when using one of these: $ make help | grep perf perf-tar-src-pkg - Build perf-3.12.0.tar source tarball perf-targz-src-pkg - Build perf-3.12.0.tar.gz source tarball perf-tarbz2-src-pkg - Build perf-3.12.0.tar.bz2 source tarball perf-tarxz-src-pkg - Build perf-3.12.0.tar.xz source tarball $ from David Ahern. . Enhance option parse error message, showing just the help lines of the options affected, from Namhyung Kim. . libtraceevent updates from upstream trace-cmd repo, from Steven Rostedt. . Always use perf_evsel__set_sample_bit to set sample_type, from Adrian Hunter. . Memory and mmap leak fixes from Chenggang Qin. . Assorted build fixes for from David Ahern and Jiri Olsa. . Speed up and prettify the build system, from Ingo Molnar. . Implement addr2line directly using libbfd, from Roberto Vitillo. . Separate the GTK support in a separate libperf-gtk.so DSO, that is only loaded when --gtk is specified, from Namhyung Kim. . perf bash completion fixes and improvements from Ramkumar Ramachandra. . Support for Openembedded/Yocto -dbg packages, from Ricardo Ribalda Delgado. And lots and lots of other fixes and code reorganizations that did not make it into the list, see the shortlog, diffstat and the Git log for details!" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (300 commits) uprobes: Fix the memory out of bound overwrite in copy_insn() uprobes: Fix the wrong usage of current->utask in uprobe_copy_process() perf tools: Remove unneeded include perf record: Remove post_processing_offset variable perf record: Remove advance_output function perf record: Refactor feature handling into a separate function perf trace: Don't relookup fields by name in each sample perf tools: Fix version when building out of tree perf evsel: Ditch evsel->handler.data field uprobes: Export write_opcode() as uprobe_write_opcode() uprobes: Introduce arch_uprobe->ixol uprobes: Kill module_init() and module_exit() uprobes: Move function declarations out of arch perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes perf: Factor out strncpy() in perf_event_mmap_event() tools/perf: Add required memory barriers perf: Fix arch_perf_out_copy_user default perf: Update a stale comment perf: Optimize perf_output_begin() -- address calculation ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2013-11-11 20:06:34 -0500
committer: Linus Torvalds <torvalds@linux-foundation.org> 2013-11-11 20:06:34 -0500
commit: ad5d69899e52792671c1aa6c7360464c7edfe09c (patch)
tree: 21833c1fdab4b3cf791d4fdc86dd578e4a620514 /tools/perf/builtin-stat.c
parent: ef1417a5a6a400dbc1a2f44da716ab146a29ddc4 (diff)
parent: caea6cf52139116e43e615d87fcbf9823e197fdf (diff)
1 files changed, 186 insertions, 25 deletions
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5098f144b92d..0fc1c941a73c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -46,6 +46,7 @@
 #include "util/util.h"
 #include "util/parse-options.h"
 #include "util/parse-events.h"
+#include "util/pmu.h"
 #include "util/event.h"
 #include "util/evlist.h"
 #include "util/evsel.h"
@@ -70,6 +71,41 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix);
 static void print_counter(struct perf_evsel *counter, char *prefix);
 static void print_aggr(char *prefix);
+/* Default events used for perf stat -T */
+static const char * const transaction_attrs[] = {
+        "task-clock",
+        "{"
+        "instructions,"
+        "cycles,"
+        "cpu/cycles-t/,"
+        "cpu/tx-start/,"
+        "cpu/el-start/,"
+        "cpu/cycles-ct/"
+        "}"
+};
+/* More limited version when the CPU does not have all events. */
+static const char * const transaction_limited_attrs[] = {
+        "task-clock",
+        "{"
+        "instructions,"
+        "cycles,"
+        "cpu/cycles-t/,"
+        "cpu/tx-start/"
+        "}"
+};
+/* must match transaction_attrs and the beginning limited_attrs */
+enum {
+        T_TASK_CLOCK,
+        T_INSTRUCTIONS,
+        T_CYCLES,
+        T_CYCLES_IN_TX,
+        T_TRANSACTION_START,
+        T_ELISION_START,
+        T_CYCLES_IN_TX_CP,
+};
 static struct perf_evlist       *evsel_list;
 static struct perf_target       target = {
@@ -90,6 +126,7 @@ static enum aggr_mode		aggr_mode			= AGGR_GLOBAL;
 static volatile pid_t           child_pid                       = -1;
 static bool                     null_run                        =  false;
 static int                      detailed_run                    =  0;
+static bool                     transaction_run;
 static bool                     big_num                         =  true;
 static int                      big_num_opt                     =  -1;
 static const char               *csv_sep                        = NULL;
@@ -214,7 +251,10 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
 static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_in_tx_stats[MAX_NR_CPUS];
 static struct stats walltime_nsecs_stats;
+static struct stats runtime_transaction_stats[MAX_NR_CPUS];
+static struct stats runtime_elision_stats[MAX_NR_CPUS];
 static void perf_stat__reset_stats(struct perf_evlist *evlist)
 {
@@ -236,6 +276,11 @@ static void perf_stat__reset_stats(struct perf_evlist *evlist)
        memset(runtime_ll_cache_stats, 0, sizeof(runtime_ll_cache_stats));
        memset(runtime_itlb_cache_stats, 0, sizeof(runtime_itlb_cache_stats));
        memset(runtime_dtlb_cache_stats, 0, sizeof(runtime_dtlb_cache_stats));
+        memset(runtime_cycles_in_tx_stats, 0,
+                        sizeof(runtime_cycles_in_tx_stats));
+        memset(runtime_transaction_stats, 0,
+                sizeof(runtime_transaction_stats));
+        memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats));
        memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
 }
@@ -274,6 +319,29 @@ static inline int nsec_counter(struct perf_evsel *evsel)
        return 0;
 }
+static struct perf_evsel *nth_evsel(int n)
+{
+        static struct perf_evsel **array;
+        static int array_len;
+        struct perf_evsel *ev;
+        int j;
+        /* Assumes this only called when evsel_list does not change anymore. */
+        if (!array) {
+                list_for_each_entry(ev, &evsel_list->entries, node)
+                        array_len++;
+                array = malloc(array_len * sizeof(void *));
+                if (!array)
+                        exit(ENOMEM);
+                j = 0;
+                list_for_each_entry(ev, &evsel_list->entries, node)
+                        array[j++] = ev;
+        }
+        if (n < array_len)
+                return array[n];
+        return NULL;
+}
 /*
 * Update various tracking values we maintain to print
 * more semantic information such as miss/hit ratios,
@@ -285,6 +353,15 @@ static void update_shadow_stats(struct perf_evsel *counter, u64 *count)
                update_stats(&runtime_nsecs_stats[0], count[0]);
        else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
                update_stats(&runtime_cycles_stats[0], count[0]);
+        else if (transaction_run &&
+                 perf_evsel__cmp(counter, nth_evsel(T_CYCLES_IN_TX)))
+                update_stats(&runtime_cycles_in_tx_stats[0], count[0]);
+        else if (transaction_run &&
+                 perf_evsel__cmp(counter, nth_evsel(T_TRANSACTION_START)))
+                update_stats(&runtime_transaction_stats[0], count[0]);
+        else if (transaction_run &&
+                 perf_evsel__cmp(counter, nth_evsel(T_ELISION_START)))
+                update_stats(&runtime_elision_stats[0], count[0]);
        else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
                update_stats(&runtime_stalled_cycles_front_stats[0], count[0]);
        else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
@@ -629,10 +706,13 @@ static void nsec_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
 {
        double msecs = avg / 1e6;
        const char *fmt = csv_output ? "%.6f%s%s" : "%18.6f%s%-25s";
+        char name[25];
        aggr_printout(evsel, cpu, nr);
-        fprintf(output, fmt, msecs, csv_sep, perf_evsel__name(evsel));
+        scnprintf(name, sizeof(name), "%s%s",
+                  perf_evsel__name(evsel), csv_output ? "" : " (msec)");
+        fprintf(output, fmt, msecs, csv_sep, name);
        if (evsel->cgrp)
                fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
@@ -828,7 +908,7 @@ static void print_ll_cache_misses(int cpu,
 static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
 {
-        double total, ratio = 0.0;
+        double total, ratio = 0.0, total2;
        const char *fmt;
        if (csv_output)
@@ -853,11 +933,10 @@ static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
        if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
                total = avg_stats(&runtime_cycles_stats[cpu]);
-                if (total)
+                if (total) {
                        ratio = avg / total;
+                        fprintf(output, " #   %5.2f  insns per cycle        ", ratio);
-                fprintf(output, " #   %5.2f  insns per cycle        ", ratio);
+                }
                total = avg_stats(&runtime_stalled_cycles_front_stats[cpu]);
                total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[cpu]));
@@ -920,10 +999,47 @@ static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
        } else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
                total = avg_stats(&runtime_nsecs_stats[cpu]);
+                if (total) {
+                        ratio = avg / total;
+                        fprintf(output, " # %8.3f GHz                    ", ratio);
+                }
+        } else if (transaction_run &&
+                   perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_IN_TX))) {
+                total = avg_stats(&runtime_cycles_stats[cpu]);
+                if (total)
+                        fprintf(output,
+                                " #   %5.2f%% transactional cycles   ",
+                                100.0 * (avg / total));
+        } else if (transaction_run &&
+                   perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_IN_TX_CP))) {
+                total = avg_stats(&runtime_cycles_stats[cpu]);
+                total2 = avg_stats(&runtime_cycles_in_tx_stats[cpu]);
+                if (total2 < avg)
+                        total2 = avg;
+                if (total)
+                        fprintf(output,
+                                " #   %5.2f%% aborted cycles         ",
+                                100.0 * ((total2-avg) / total));
+        } else if (transaction_run &&
+                   perf_evsel__cmp(evsel, nth_evsel(T_TRANSACTION_START)) &&
+                   avg > 0 &&
+                   runtime_cycles_in_tx_stats[cpu].n != 0) {
+                total = avg_stats(&runtime_cycles_in_tx_stats[cpu]);
                if (total)
-                        ratio = 1.0 * avg / total;
+                        ratio = total / avg;
-                fprintf(output, " # %8.3f GHz                    ", ratio);
+                fprintf(output, " # %8.0f cycles / transaction   ", ratio);
+        } else if (transaction_run &&
+                   perf_evsel__cmp(evsel, nth_evsel(T_ELISION_START)) &&
+                   avg > 0 &&
+                   runtime_cycles_in_tx_stats[cpu].n != 0) {
+                total = avg_stats(&runtime_cycles_in_tx_stats[cpu]);
+                if (total)
+                        ratio = total / avg;
+                fprintf(output, " # %8.0f cycles / elision       ", ratio);
        } else if (runtime_nsecs_stats[cpu].n != 0) {
                char unit = 'M';
@@ -1116,7 +1232,11 @@ static void print_stat(int argc, const char **argv)
        if (!csv_output) {
                fprintf(output, "\n");
                fprintf(output, " Performance counter stats for ");
-                if (!perf_target__has_task(&target)) {
+                if (target.system_wide)
+                        fprintf(output, "\'system wide");
+                else if (target.cpu_list)
+                        fprintf(output, "\'CPU(s) %s", target.cpu_list);
+                else if (!perf_target__has_task(&target)) {
                        fprintf(output, "\'%s", argv[0]);
                        for (i = 1; i < argc; i++)
                                fprintf(output, " %s", argv[i]);
@@ -1237,6 +1357,16 @@ static int perf_stat_init_aggr_mode(void)
        return 0;
 }
+static int setup_events(const char * const *attrs, unsigned len)
+{
+        unsigned i;
+        for (i = 0; i < len; i++) {
+                if (parse_events(evsel_list, attrs[i]))
+                        return -1;
+        }
+        return 0;
+}
 /*
 * Add default attributes, if there were no attributes specified or
@@ -1355,6 +1485,22 @@ static int add_default_attributes(void)
        if (null_run)
                return 0;
+        if (transaction_run) {
+                int err;
+                if (pmu_have_event("cpu", "cycles-ct") &&
+                    pmu_have_event("cpu", "el-start"))
+                        err = setup_events(transaction_attrs,
+                                        ARRAY_SIZE(transaction_attrs));
+                else
+                        err = setup_events(transaction_limited_attrs,
+                                 ARRAY_SIZE(transaction_limited_attrs));
+                if (err < 0) {
+                        fprintf(stderr, "Cannot set up transaction events\n");
+                        return -1;
+                }
+                return 0;
+        }
        if (!evsel_list->nr_entries) {
                if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0)
                        return -1;
@@ -1389,6 +1535,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
        int output_fd = 0;
        const char *output_name = NULL;
        const struct option options[] = {
+        OPT_BOOLEAN('T', "transaction", &transaction_run,
+                    "hardware transaction statistics"),
        OPT_CALLBACK('e', "event", &evsel_list, "event",
                     "event selector. use 'perf list' to list available events",
                     parse_events_option),
@@ -1448,7 +1596,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
                "perf stat [<options>] [<command>]",
                NULL
        };
-        int status = -ENOMEM, run_idx;
+        int status = -EINVAL, run_idx;
        const char *mode;
        setlocale(LC_ALL, "");
@@ -1466,12 +1614,15 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
        if (output_name && output_fd) {
                fprintf(stderr, "cannot use both --output and --log-fd\n");
-                usage_with_options(stat_usage, options);
+                parse_options_usage(stat_usage, options, "o", 1);
+                parse_options_usage(NULL, options, "log-fd", 0);
+                goto out;
        }
        if (output_fd < 0) {
                fprintf(stderr, "argument to --log-fd must be a > 0\n");
-                usage_with_options(stat_usage, options);
+                parse_options_usage(stat_usage, options, "log-fd", 0);
+                goto out;
        }
        if (!output) {
@@ -1508,16 +1659,21 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
                /* User explicitly passed -B? */
                if (big_num_opt == 1) {
                        fprintf(stderr, "-B option not supported with -x\n");
-                        usage_with_options(stat_usage, options);
+                        parse_options_usage(stat_usage, options, "B", 1);
+                        parse_options_usage(NULL, options, "x", 1);
+                        goto out;
                } else /* Nope, so disable big number formatting */
                        big_num = false;
        } else if (big_num_opt == 0) /* User passed --no-big-num */
                big_num = false;
-        if (!argc && !perf_target__has_task(&target))
+        if (!argc && perf_target__none(&target))
                usage_with_options(stat_usage, options);
        if (run_count < 0) {
-                usage_with_options(stat_usage, options);
+                pr_err("Run count must be a positive number\n");
+                parse_options_usage(stat_usage, options, "r", 1);
+                goto out;
        } else if (run_count == 0) {
                forever = true;
                run_count = 1;
@@ -1529,8 +1685,10 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
                fprintf(stderr, "both cgroup and no-aggregation "
                        "modes only available in system-wide mode\n");
-                usage_with_options(stat_usage, options);
+                parse_options_usage(stat_usage, options, "G", 1);
-                return -1;
+                parse_options_usage(NULL, options, "A", 1);
+                parse_options_usage(NULL, options, "a", 1);
+                goto out;
        }
        if (add_default_attributes())
@@ -1539,25 +1697,28 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
        perf_target__validate(&target);
        if (perf_evlist__create_maps(evsel_list, &target) < 0) {
-                if (perf_target__has_task(&target))
+                if (perf_target__has_task(&target)) {
                        pr_err("Problems finding threads of monitor\n");
-                if (perf_target__has_cpu(&target))
+                        parse_options_usage(stat_usage, options, "p", 1);
+                        parse_options_usage(NULL, options, "t", 1);
+                } else if (perf_target__has_cpu(&target)) {
                        perror("failed to parse CPUs map");
+                        parse_options_usage(stat_usage, options, "C", 1);
-                usage_with_options(stat_usage, options);
+                        parse_options_usage(NULL, options, "a", 1);
-                return -1;
+                }
+                goto out;
        }
        if (interval && interval < 100) {
                pr_err("print interval must be >= 100ms\n");
-                usage_with_options(stat_usage, options);
+                parse_options_usage(stat_usage, options, "I", 1);
-                return -1;
+                goto out_free_maps;
        }
        if (perf_evlist__alloc_stats(evsel_list, interval))
                goto out_free_maps;
        if (perf_stat_init_aggr_mode())
-                goto out;
+                goto out_free_maps;
        /*
         * We dont want to block the signals - that would cause
author	Linus Torvalds <torvalds@linux-foundation.org>	2013-11-11 20:06:34 -0500
committer	Linus Torvalds <torvalds@linux-foundation.org>	2013-11-11 20:06:34 -0500
commit	ad5d69899e52792671c1aa6c7360464c7edfe09c (patch)
tree	21833c1fdab4b3cf791d4fdc86dd578e4a620514 /tools/perf/builtin-stat.c
parent	ef1417a5a6a400dbc1a2f44da716ab146a29ddc4 (diff)
parent	caea6cf52139116e43e615d87fcbf9823e197fdf (diff)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 5098f144b92d..0fc1c941a73c 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c
@@ -46,6 +46,7 @@
46	#include "util/util.h"	46	#include "util/util.h"
47	#include "util/parse-options.h"	47	#include "util/parse-options.h"
48	#include "util/parse-events.h"	48	#include "util/parse-events.h"
		49	#include "util/pmu.h"
49	#include "util/event.h"	50	#include "util/event.h"
50	#include "util/evlist.h"	51	#include "util/evlist.h"
51	#include "util/evsel.h"	52	#include "util/evsel.h"
@@ -70,6 +71,41 @@ static void print_counter_aggr(struct perf_evsel counter, char prefix);
70	static void print_counter(struct perf_evsel counter, char prefix);	71	static void print_counter(struct perf_evsel counter, char prefix);
71	static void print_aggr(char *prefix);	72	static void print_aggr(char *prefix);
72		73
		74	/* Default events used for perf stat -T */
		75	static const char * const transaction_attrs[] = {
		76	"task-clock",
		77	"{"
		78	"instructions,"
		79	"cycles,"
		80	"cpu/cycles-t/,"
		81	"cpu/tx-start/,"
		82	"cpu/el-start/,"
		83	"cpu/cycles-ct/"
		84	"}"
		85	};
		86
		87	/* More limited version when the CPU does not have all events. */
		88	static const char * const transaction_limited_attrs[] = {
		89	"task-clock",
		90	"{"
		91	"instructions,"
		92	"cycles,"
		93	"cpu/cycles-t/,"
		94	"cpu/tx-start/"
		95	"}"
		96	};
		97
		98	/* must match transaction_attrs and the beginning limited_attrs */
		99	enum {
		100	T_TASK_CLOCK,
		101	T_INSTRUCTIONS,
		102	T_CYCLES,
		103	T_CYCLES_IN_TX,
		104	T_TRANSACTION_START,
		105	T_ELISION_START,
		106	T_CYCLES_IN_TX_CP,
		107	};
		108
73	static struct perf_evlist *evsel_list;	109	static struct perf_evlist *evsel_list;
74		110
75	static struct perf_target target = {	111	static struct perf_target target = {
@@ -90,6 +126,7 @@ static enum aggr_mode aggr_mode = AGGR_GLOBAL;
90	static volatile pid_t child_pid = -1;	126	static volatile pid_t child_pid = -1;
91	static bool null_run = false;	127	static bool null_run = false;
92	static int detailed_run = 0;	128	static int detailed_run = 0;
		129	static bool transaction_run;
93	static bool big_num = true;	130	static bool big_num = true;
94	static int big_num_opt = -1;	131	static int big_num_opt = -1;
95	static const char *csv_sep = NULL;	132	static const char *csv_sep = NULL;
@@ -214,7 +251,10 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
214	static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];	251	static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
215	static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];	252	static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
216	static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];	253	static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
		254	static struct stats runtime_cycles_in_tx_stats[MAX_NR_CPUS];
217	static struct stats walltime_nsecs_stats;	255	static struct stats walltime_nsecs_stats;
		256	static struct stats runtime_transaction_stats[MAX_NR_CPUS];
		257	static struct stats runtime_elision_stats[MAX_NR_CPUS];
218		258
219	static void perf_stat__reset_stats(struct perf_evlist *evlist)	259	static void perf_stat__reset_stats(struct perf_evlist *evlist)
220	{	260	{
@@ -236,6 +276,11 @@ static void perf_stat__reset_stats(struct perf_evlist *evlist)
236	memset(runtime_ll_cache_stats, 0, sizeof(runtime_ll_cache_stats));	276	memset(runtime_ll_cache_stats, 0, sizeof(runtime_ll_cache_stats));
237	memset(runtime_itlb_cache_stats, 0, sizeof(runtime_itlb_cache_stats));	277	memset(runtime_itlb_cache_stats, 0, sizeof(runtime_itlb_cache_stats));
238	memset(runtime_dtlb_cache_stats, 0, sizeof(runtime_dtlb_cache_stats));	278	memset(runtime_dtlb_cache_stats, 0, sizeof(runtime_dtlb_cache_stats));
		279	memset(runtime_cycles_in_tx_stats, 0,
		280	sizeof(runtime_cycles_in_tx_stats));
		281	memset(runtime_transaction_stats, 0,
		282	sizeof(runtime_transaction_stats));
		283	memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats));
239	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));	284	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
240	}	285	}
241		286
@@ -274,6 +319,29 @@ static inline int nsec_counter(struct perf_evsel *evsel)
274	return 0;	319	return 0;
275	}	320	}
276		321
		322	static struct perf_evsel *nth_evsel(int n)
		323	{
		324	static struct perf_evsel **array;
		325	static int array_len;
		326	struct perf_evsel *ev;
		327	int j;
		328
		329	/* Assumes this only called when evsel_list does not change anymore. */
		330	if (!array) {
		331	list_for_each_entry(ev, &evsel_list->entries, node)
		332	array_len++;
		333	array = malloc(array_len * sizeof(void *));
		334	if (!array)
		335	exit(ENOMEM);
		336	j = 0;
		337	list_for_each_entry(ev, &evsel_list->entries, node)
		338	array[j++] = ev;
		339	}
		340	if (n < array_len)
		341	return array[n];
		342	return NULL;
		343	}
		344
277	/*	345	/*
278	* Update various tracking values we maintain to print	346	* Update various tracking values we maintain to print
279	* more semantic information such as miss/hit ratios,	347	* more semantic information such as miss/hit ratios,
@@ -285,6 +353,15 @@ static void update_shadow_stats(struct perf_evsel counter, u64 count)
285	update_stats(&runtime_nsecs_stats[0], count[0]);	353	update_stats(&runtime_nsecs_stats[0], count[0]);
286	else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))	354	else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
287	update_stats(&runtime_cycles_stats[0], count[0]);	355	update_stats(&runtime_cycles_stats[0], count[0]);
		356	else if (transaction_run &&
		357	perf_evsel__cmp(counter, nth_evsel(T_CYCLES_IN_TX)))
		358	update_stats(&runtime_cycles_in_tx_stats[0], count[0]);
		359	else if (transaction_run &&
		360	perf_evsel__cmp(counter, nth_evsel(T_TRANSACTION_START)))
		361	update_stats(&runtime_transaction_stats[0], count[0]);
		362	else if (transaction_run &&
		363	perf_evsel__cmp(counter, nth_evsel(T_ELISION_START)))
		364	update_stats(&runtime_elision_stats[0], count[0]);
288	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))	365	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
289	update_stats(&runtime_stalled_cycles_front_stats[0], count[0]);	366	update_stats(&runtime_stalled_cycles_front_stats[0], count[0]);
290	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))	367	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
@@ -629,10 +706,13 @@ static void nsec_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
629	{	706	{
630	double msecs = avg / 1e6;	707	double msecs = avg / 1e6;
631	const char *fmt = csv_output ? "%.6f%s%s" : "%18.6f%s%-25s";	708	const char *fmt = csv_output ? "%.6f%s%s" : "%18.6f%s%-25s";
		709	char name[25];
632		710
633	aggr_printout(evsel, cpu, nr);	711	aggr_printout(evsel, cpu, nr);
634		712
635	fprintf(output, fmt, msecs, csv_sep, perf_evsel__name(evsel));	713	scnprintf(name, sizeof(name), "%s%s",
		714	perf_evsel__name(evsel), csv_output ? "" : " (msec)");
		715	fprintf(output, fmt, msecs, csv_sep, name);
636		716
637	if (evsel->cgrp)	717	if (evsel->cgrp)
638	fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);	718	fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
@@ -828,7 +908,7 @@ static void print_ll_cache_misses(int cpu,
828		908
829	static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)	909	static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
830	{	910	{
831	double total, ratio = 0.0;	911	double total, ratio = 0.0, total2;
832	const char *fmt;	912	const char *fmt;
833		913
834	if (csv_output)	914	if (csv_output)
@@ -853,11 +933,10 @@ static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
853		933
854	if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {	934	if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
855	total = avg_stats(&runtime_cycles_stats[cpu]);	935	total = avg_stats(&runtime_cycles_stats[cpu]);
856	if (total)	936	if (total) {
857	ratio = avg / total;	937	ratio = avg / total;
858		938	fprintf(output, " # %5.2f insns per cycle ", ratio);
859	fprintf(output, " # %5.2f insns per cycle ", ratio);	939	}
860
861	total = avg_stats(&runtime_stalled_cycles_front_stats[cpu]);	940	total = avg_stats(&runtime_stalled_cycles_front_stats[cpu]);
862	total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[cpu]));	941	total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[cpu]));
863		942
@@ -920,10 +999,47 @@ static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg)
920	} else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {	999	} else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
921	total = avg_stats(&runtime_nsecs_stats[cpu]);	1000	total = avg_stats(&runtime_nsecs_stats[cpu]);
922		1001
		1002	if (total) {
		1003	ratio = avg / total;
		1004	fprintf(output, " # %8.3f GHz ", ratio);
		1005	}
		1006	} else if (transaction_run &&
		1007	perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_IN_TX))) {
		1008	total = avg_stats(&runtime_cycles_stats[cpu]);
		1009	if (total)
		1010	fprintf(output,
		1011	" # %5.2f%% transactional cycles ",
		1012	100.0 * (avg / total));
		1013	} else if (transaction_run &&
		1014	perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_IN_TX_CP))) {
		1015	total = avg_stats(&runtime_cycles_stats[cpu]);
		1016	total2 = avg_stats(&runtime_cycles_in_tx_stats[cpu]);
		1017	if (total2 < avg)
		1018	total2 = avg;
		1019	if (total)
		1020	fprintf(output,
		1021	" # %5.2f%% aborted cycles ",
		1022	100.0 * ((total2-avg) / total));
		1023	} else if (transaction_run &&
		1024	perf_evsel__cmp(evsel, nth_evsel(T_TRANSACTION_START)) &&
		1025	avg > 0 &&
		1026	runtime_cycles_in_tx_stats[cpu].n != 0) {
		1027	total = avg_stats(&runtime_cycles_in_tx_stats[cpu]);
		1028
923	if (total)	1029	if (total)
924	ratio = 1.0 * avg / total;	1030	ratio = total / avg;
925		1031
926	fprintf(output, " # %8.3f GHz ", ratio);	1032	fprintf(output, " # %8.0f cycles / transaction ", ratio);
		1033	} else if (transaction_run &&
		1034	perf_evsel__cmp(evsel, nth_evsel(T_ELISION_START)) &&
		1035	avg > 0 &&
		1036	runtime_cycles_in_tx_stats[cpu].n != 0) {
		1037	total = avg_stats(&runtime_cycles_in_tx_stats[cpu]);
		1038
		1039	if (total)
		1040	ratio = total / avg;
		1041
		1042	fprintf(output, " # %8.0f cycles / elision ", ratio);
927	} else if (runtime_nsecs_stats[cpu].n != 0) {	1043	} else if (runtime_nsecs_stats[cpu].n != 0) {
928	char unit = 'M';	1044	char unit = 'M';
929		1045
@@ -1116,7 +1232,11 @@ static void print_stat(int argc, const char **argv)
1116	if (!csv_output) {	1232	if (!csv_output) {
1117	fprintf(output, "\n");	1233	fprintf(output, "\n");
1118	fprintf(output, " Performance counter stats for ");	1234	fprintf(output, " Performance counter stats for ");
1119	if (!perf_target__has_task(&target)) {	1235	if (target.system_wide)
		1236	fprintf(output, "\'system wide");
		1237	else if (target.cpu_list)
		1238	fprintf(output, "\'CPU(s) %s", target.cpu_list);
		1239	else if (!perf_target__has_task(&target)) {
1120	fprintf(output, "\'%s", argv[0]);	1240	fprintf(output, "\'%s", argv[0]);
1121	for (i = 1; i < argc; i++)	1241	for (i = 1; i < argc; i++)
1122	fprintf(output, " %s", argv[i]);	1242	fprintf(output, " %s", argv[i]);
@@ -1237,6 +1357,16 @@ static int perf_stat_init_aggr_mode(void)
1237	return 0;	1357	return 0;
1238	}	1358	}
1239		1359
		1360	static int setup_events(const char * const *attrs, unsigned len)
		1361	{
		1362	unsigned i;
		1363
		1364	for (i = 0; i < len; i++) {
		1365	if (parse_events(evsel_list, attrs[i]))
		1366	return -1;
		1367	}
		1368	return 0;
		1369	}
1240		1370
1241	/*	1371	/*
1242	* Add default attributes, if there were no attributes specified or	1372	* Add default attributes, if there were no attributes specified or
@@ -1355,6 +1485,22 @@ static int add_default_attributes(void)
1355	if (null_run)	1485	if (null_run)
1356	return 0;	1486	return 0;
1357		1487
		1488	if (transaction_run) {
		1489	int err;
		1490	if (pmu_have_event("cpu", "cycles-ct") &&
		1491	pmu_have_event("cpu", "el-start"))
		1492	err = setup_events(transaction_attrs,
		1493	ARRAY_SIZE(transaction_attrs));
		1494	else
		1495	err = setup_events(transaction_limited_attrs,
		1496	ARRAY_SIZE(transaction_limited_attrs));
		1497	if (err < 0) {
		1498	fprintf(stderr, "Cannot set up transaction events\n");
		1499	return -1;
		1500	}
		1501	return 0;
		1502	}
		1503
1358	if (!evsel_list->nr_entries) {	1504	if (!evsel_list->nr_entries) {
1359	if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0)	1505	if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0)
1360	return -1;	1506	return -1;
@@ -1389,6 +1535,8 @@ int cmd_stat(int argc, const char *argv, const char prefix __maybe_unused)
1389	int output_fd = 0;	1535	int output_fd = 0;
1390	const char *output_name = NULL;	1536	const char *output_name = NULL;
1391	const struct option options[] = {	1537	const struct option options[] = {
		1538	OPT_BOOLEAN('T', "transaction", &transaction_run,
		1539	"hardware transaction statistics"),
1392	OPT_CALLBACK('e', "event", &evsel_list, "event",	1540	OPT_CALLBACK('e', "event", &evsel_list, "event",
1393	"event selector. use 'perf list' to list available events",	1541	"event selector. use 'perf list' to list available events",
1394	parse_events_option),	1542	parse_events_option),
@@ -1448,7 +1596,7 @@ int cmd_stat(int argc, const char *argv, const char prefix __maybe_unused)
1448	"perf stat [<options>] [<command>]",	1596	"perf stat [<options>] [<command>]",
1449	NULL	1597	NULL
1450	};	1598	};
1451	int status = -ENOMEM, run_idx;	1599	int status = -EINVAL, run_idx;
1452	const char *mode;	1600	const char *mode;
1453		1601
1454	setlocale(LC_ALL, "");	1602	setlocale(LC_ALL, "");
@@ -1466,12 +1614,15 @@ int cmd_stat(int argc, const char *argv, const char prefix __maybe_unused)
1466		1614
1467	if (output_name && output_fd) {	1615	if (output_name && output_fd) {
1468	fprintf(stderr, "cannot use both --output and --log-fd\n");	1616	fprintf(stderr, "cannot use both --output and --log-fd\n");
1469	usage_with_options(stat_usage, options);	1617	parse_options_usage(stat_usage, options, "o", 1);
		1618	parse_options_usage(NULL, options, "log-fd", 0);
		1619	goto out;
1470	}	1620	}
1471		1621
1472	if (output_fd < 0) {	1622	if (output_fd < 0) {
1473	fprintf(stderr, "argument to --log-fd must be a > 0\n");	1623	fprintf(stderr, "argument to --log-fd must be a > 0\n");
1474	usage_with_options(stat_usage, options);	1624	parse_options_usage(stat_usage, options, "log-fd", 0);
		1625	goto out;
1475	}	1626	}
1476		1627
1477	if (!output) {	1628	if (!output) {
@@ -1508,16 +1659,21 @@ int cmd_stat(int argc, const char *argv, const char prefix __maybe_unused)
1508	/* User explicitly passed -B? */	1659	/* User explicitly passed -B? */
1509	if (big_num_opt == 1) {	1660	if (big_num_opt == 1) {
1510	fprintf(stderr, "-B option not supported with -x\n");	1661	fprintf(stderr, "-B option not supported with -x\n");
1511	usage_with_options(stat_usage, options);	1662	parse_options_usage(stat_usage, options, "B", 1);
		1663	parse_options_usage(NULL, options, "x", 1);
		1664	goto out;
1512	} else /* Nope, so disable big number formatting */	1665	} else /* Nope, so disable big number formatting */
1513	big_num = false;	1666	big_num = false;
1514	} else if (big_num_opt == 0) /* User passed --no-big-num */	1667	} else if (big_num_opt == 0) /* User passed --no-big-num */
1515	big_num = false;	1668	big_num = false;
1516		1669
1517	if (!argc && !perf_target__has_task(&target))	1670	if (!argc && perf_target__none(&target))
1518	usage_with_options(stat_usage, options);	1671	usage_with_options(stat_usage, options);
		1672
1519	if (run_count < 0) {	1673	if (run_count < 0) {
1520	usage_with_options(stat_usage, options);	1674	pr_err("Run count must be a positive number\n");
		1675	parse_options_usage(stat_usage, options, "r", 1);
		1676	goto out;
1521	} else if (run_count == 0) {	1677	} else if (run_count == 0) {
1522	forever = true;	1678	forever = true;
1523	run_count = 1;	1679	run_count = 1;
@@ -1529,8 +1685,10 @@ int cmd_stat(int argc, const char *argv, const char prefix __maybe_unused)
1529	fprintf(stderr, "both cgroup and no-aggregation "	1685	fprintf(stderr, "both cgroup and no-aggregation "
1530	"modes only available in system-wide mode\n");	1686	"modes only available in system-wide mode\n");
1531		1687
1532	usage_with_options(stat_usage, options);	1688	parse_options_usage(stat_usage, options, "G", 1);
1533	return -1;	1689	parse_options_usage(NULL, options, "A", 1);
		1690	parse_options_usage(NULL, options, "a", 1);
		1691	goto out;
1534	}	1692	}
1535		1693
1536	if (add_default_attributes())	1694	if (add_default_attributes())
@@ -1539,25 +1697,28 @@ int cmd_stat(int argc, const char *argv, const char prefix __maybe_unused)
1539	perf_target__validate(&target);	1697	perf_target__validate(&target);
1540		1698
1541	if (perf_evlist__create_maps(evsel_list, &target) < 0) {	1699	if (perf_evlist__create_maps(evsel_list, &target) < 0) {
1542	if (perf_target__has_task(&target))	1700	if (perf_target__has_task(&target)) {
1543	pr_err("Problems finding threads of monitor\n");	1701	pr_err("Problems finding threads of monitor\n");
1544	if (perf_target__has_cpu(&target))	1702	parse_options_usage(stat_usage, options, "p", 1);
		1703	parse_options_usage(NULL, options, "t", 1);
		1704	} else if (perf_target__has_cpu(&target)) {
1545	perror("failed to parse CPUs map");	1705	perror("failed to parse CPUs map");
1546		1706	parse_options_usage(stat_usage, options, "C", 1);
1547	usage_with_options(stat_usage, options);	1707	parse_options_usage(NULL, options, "a", 1);
1548	return -1;	1708	}
		1709	goto out;
1549	}	1710	}
1550	if (interval && interval < 100) {	1711	if (interval && interval < 100) {
1551	pr_err("print interval must be >= 100ms\n");	1712	pr_err("print interval must be >= 100ms\n");
1552	usage_with_options(stat_usage, options);	1713	parse_options_usage(stat_usage, options, "I", 1);
1553	return -1;	1714	goto out_free_maps;
1554	}	1715	}
1555		1716
1556	if (perf_evlist__alloc_stats(evsel_list, interval))	1717	if (perf_evlist__alloc_stats(evsel_list, interval))
1557	goto out_free_maps;	1718	goto out_free_maps;
1558		1719
1559	if (perf_stat_init_aggr_mode())	1720	if (perf_stat_init_aggr_mode())
1560	goto out;	1721	goto out_free_maps;
1561		1722
1562	/*	1723	/*
1563	* We dont want to block the signals - that would cause	1724	* We dont want to block the signals - that would cause