aboutsummaryrefslogtreecommitdiffstats
path: root/include/uapi/linux
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2015-04-14 17:37:47 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2015-04-14 17:37:47 -0400
commit6c8a53c9e6a151fffb07f8b4c34bd1e33dddd467 (patch)
tree791caf826ef136c521a97b7878f226b6ba1c1d75 /include/uapi/linux
parente95e7f627062be5e6ce971ce873e6234c91ffc50 (diff)
parent066450be419fa48007a9f29e19828f2a86198754 (diff)
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
Diffstat (limited to 'include/uapi/linux')
-rw-r--r--include/uapi/linux/bpf.h5
-rw-r--r--include/uapi/linux/perf_event.h115
2 files changed, 103 insertions, 17 deletions
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 45da7ec7d274..cc47ef41076a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -118,6 +118,7 @@ enum bpf_map_type {
118enum bpf_prog_type { 118enum bpf_prog_type {
119 BPF_PROG_TYPE_UNSPEC, 119 BPF_PROG_TYPE_UNSPEC,
120 BPF_PROG_TYPE_SOCKET_FILTER, 120 BPF_PROG_TYPE_SOCKET_FILTER,
121 BPF_PROG_TYPE_KPROBE,
121}; 122};
122 123
123/* flags for BPF_MAP_UPDATE_ELEM command */ 124/* flags for BPF_MAP_UPDATE_ELEM command */
@@ -151,6 +152,7 @@ union bpf_attr {
151 __u32 log_level; /* verbosity level of verifier */ 152 __u32 log_level; /* verbosity level of verifier */
152 __u32 log_size; /* size of user buffer */ 153 __u32 log_size; /* size of user buffer */
153 __aligned_u64 log_buf; /* user supplied buffer */ 154 __aligned_u64 log_buf; /* user supplied buffer */
155 __u32 kern_version; /* checked when prog_type=kprobe */
154 }; 156 };
155} __attribute__((aligned(8))); 157} __attribute__((aligned(8)));
156 158
@@ -162,6 +164,9 @@ enum bpf_func_id {
162 BPF_FUNC_map_lookup_elem, /* void *map_lookup_elem(&map, &key) */ 164 BPF_FUNC_map_lookup_elem, /* void *map_lookup_elem(&map, &key) */
163 BPF_FUNC_map_update_elem, /* int map_update_elem(&map, &key, &value, flags) */ 165 BPF_FUNC_map_update_elem, /* int map_update_elem(&map, &key, &value, flags) */
164 BPF_FUNC_map_delete_elem, /* int map_delete_elem(&map, &key) */ 166 BPF_FUNC_map_delete_elem, /* int map_delete_elem(&map, &key) */
167 BPF_FUNC_probe_read, /* int bpf_probe_read(void *dst, int size, void *src) */
168 BPF_FUNC_ktime_get_ns, /* u64 bpf_ktime_get_ns(void) */
169 BPF_FUNC_trace_printk, /* int bpf_trace_printk(const char *fmt, int fmt_size, ...) */
165 __BPF_FUNC_MAX_ID, 170 __BPF_FUNC_MAX_ID,
166}; 171};
167 172
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9b79abbd1ab8..309211b3eb67 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -152,21 +152,42 @@ enum perf_event_sample_format {
152 * The branch types can be combined, however BRANCH_ANY covers all types 152 * The branch types can be combined, however BRANCH_ANY covers all types
153 * of branches and therefore it supersedes all the other types. 153 * of branches and therefore it supersedes all the other types.
154 */ 154 */
155enum perf_branch_sample_type_shift {
156 PERF_SAMPLE_BRANCH_USER_SHIFT = 0, /* user branches */
157 PERF_SAMPLE_BRANCH_KERNEL_SHIFT = 1, /* kernel branches */
158 PERF_SAMPLE_BRANCH_HV_SHIFT = 2, /* hypervisor branches */
159
160 PERF_SAMPLE_BRANCH_ANY_SHIFT = 3, /* any branch types */
161 PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT = 4, /* any call branch */
162 PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT = 5, /* any return branch */
163 PERF_SAMPLE_BRANCH_IND_CALL_SHIFT = 6, /* indirect calls */
164 PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT = 7, /* transaction aborts */
165 PERF_SAMPLE_BRANCH_IN_TX_SHIFT = 8, /* in transaction */
166 PERF_SAMPLE_BRANCH_NO_TX_SHIFT = 9, /* not in transaction */
167 PERF_SAMPLE_BRANCH_COND_SHIFT = 10, /* conditional branches */
168
169 PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = 11, /* call/ret stack */
170
171 PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
172};
173
155enum perf_branch_sample_type { 174enum perf_branch_sample_type {
156 PERF_SAMPLE_BRANCH_USER = 1U << 0, /* user branches */ 175 PERF_SAMPLE_BRANCH_USER = 1U << PERF_SAMPLE_BRANCH_USER_SHIFT,
157 PERF_SAMPLE_BRANCH_KERNEL = 1U << 1, /* kernel branches */ 176 PERF_SAMPLE_BRANCH_KERNEL = 1U << PERF_SAMPLE_BRANCH_KERNEL_SHIFT,
158 PERF_SAMPLE_BRANCH_HV = 1U << 2, /* hypervisor branches */ 177 PERF_SAMPLE_BRANCH_HV = 1U << PERF_SAMPLE_BRANCH_HV_SHIFT,
159 178
160 PERF_SAMPLE_BRANCH_ANY = 1U << 3, /* any branch types */ 179 PERF_SAMPLE_BRANCH_ANY = 1U << PERF_SAMPLE_BRANCH_ANY_SHIFT,
161 PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 4, /* any call branch */ 180 PERF_SAMPLE_BRANCH_ANY_CALL = 1U << PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT,
162 PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << 5, /* any return branch */ 181 PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT,
163 PERF_SAMPLE_BRANCH_IND_CALL = 1U << 6, /* indirect calls */ 182 PERF_SAMPLE_BRANCH_IND_CALL = 1U << PERF_SAMPLE_BRANCH_IND_CALL_SHIFT,
164 PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ 183 PERF_SAMPLE_BRANCH_ABORT_TX = 1U << PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT,
165 PERF_SAMPLE_BRANCH_IN_TX = 1U << 8, /* in transaction */ 184 PERF_SAMPLE_BRANCH_IN_TX = 1U << PERF_SAMPLE_BRANCH_IN_TX_SHIFT,
166 PERF_SAMPLE_BRANCH_NO_TX = 1U << 9, /* not in transaction */ 185 PERF_SAMPLE_BRANCH_NO_TX = 1U << PERF_SAMPLE_BRANCH_NO_TX_SHIFT,
167 PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ 186 PERF_SAMPLE_BRANCH_COND = 1U << PERF_SAMPLE_BRANCH_COND_SHIFT,
168 187
169 PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ 188 PERF_SAMPLE_BRANCH_CALL_STACK = 1U << PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT,
189
190 PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
170}; 191};
171 192
172#define PERF_SAMPLE_BRANCH_PLM_ALL \ 193#define PERF_SAMPLE_BRANCH_PLM_ALL \
@@ -240,6 +261,7 @@ enum perf_event_read_format {
240#define PERF_ATTR_SIZE_VER3 96 /* add: sample_regs_user */ 261#define PERF_ATTR_SIZE_VER3 96 /* add: sample_regs_user */
241 /* add: sample_stack_user */ 262 /* add: sample_stack_user */
242#define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */ 263#define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */
264#define PERF_ATTR_SIZE_VER5 112 /* add: aux_watermark */
243 265
244/* 266/*
245 * Hardware event_id to monitor via a performance monitoring event: 267 * Hardware event_id to monitor via a performance monitoring event:
@@ -305,7 +327,8 @@ struct perf_event_attr {
305 exclude_callchain_user : 1, /* exclude user callchains */ 327 exclude_callchain_user : 1, /* exclude user callchains */
306 mmap2 : 1, /* include mmap with inode data */ 328 mmap2 : 1, /* include mmap with inode data */
307 comm_exec : 1, /* flag comm events that are due to an exec */ 329 comm_exec : 1, /* flag comm events that are due to an exec */
308 __reserved_1 : 39; 330 use_clockid : 1, /* use @clockid for time fields */
331 __reserved_1 : 38;
309 332
310 union { 333 union {
311 __u32 wakeup_events; /* wakeup every n events */ 334 __u32 wakeup_events; /* wakeup every n events */
@@ -334,8 +357,7 @@ struct perf_event_attr {
334 */ 357 */
335 __u32 sample_stack_user; 358 __u32 sample_stack_user;
336 359
337 /* Align to u64. */ 360 __s32 clockid;
338 __u32 __reserved_2;
339 /* 361 /*
340 * Defines set of regs to dump for each sample 362 * Defines set of regs to dump for each sample
341 * state captured on: 363 * state captured on:
@@ -345,6 +367,12 @@ struct perf_event_attr {
345 * See asm/perf_regs.h for details. 367 * See asm/perf_regs.h for details.
346 */ 368 */
347 __u64 sample_regs_intr; 369 __u64 sample_regs_intr;
370
371 /*
372 * Wakeup watermark for AUX area
373 */
374 __u32 aux_watermark;
375 __u32 __reserved_2; /* align to __u64 */
348}; 376};
349 377
350#define perf_flags(attr) (*(&(attr)->read_format + 1)) 378#define perf_flags(attr) (*(&(attr)->read_format + 1))
@@ -360,6 +388,7 @@ struct perf_event_attr {
360#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5) 388#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
361#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *) 389#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
362#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *) 390#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *)
391#define PERF_EVENT_IOC_SET_BPF _IOW('$', 8, __u32)
363 392
364enum perf_event_ioc_flags { 393enum perf_event_ioc_flags {
365 PERF_IOC_FLAG_GROUP = 1U << 0, 394 PERF_IOC_FLAG_GROUP = 1U << 0,
@@ -500,9 +529,30 @@ struct perf_event_mmap_page {
500 * In this case the kernel will not over-write unread data. 529 * In this case the kernel will not over-write unread data.
501 * 530 *
502 * See perf_output_put_handle() for the data ordering. 531 * See perf_output_put_handle() for the data ordering.
532 *
533 * data_{offset,size} indicate the location and size of the perf record
534 * buffer within the mmapped area.
503 */ 535 */
504 __u64 data_head; /* head in the data section */ 536 __u64 data_head; /* head in the data section */
505 __u64 data_tail; /* user-space written tail */ 537 __u64 data_tail; /* user-space written tail */
538 __u64 data_offset; /* where the buffer starts */
539 __u64 data_size; /* data buffer size */
540
541 /*
542 * AUX area is defined by aux_{offset,size} fields that should be set
543 * by the userspace, so that
544 *
545 * aux_offset >= data_offset + data_size
546 *
547 * prior to mmap()ing it. Size of the mmap()ed area should be aux_size.
548 *
549 * Ring buffer pointers aux_{head,tail} have the same semantics as
550 * data_{head,tail} and same ordering rules apply.
551 */
552 __u64 aux_head;
553 __u64 aux_tail;
554 __u64 aux_offset;
555 __u64 aux_size;
506}; 556};
507 557
508#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0) 558#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0)
@@ -725,6 +775,31 @@ enum perf_event_type {
725 */ 775 */
726 PERF_RECORD_MMAP2 = 10, 776 PERF_RECORD_MMAP2 = 10,
727 777
778 /*
779 * Records that new data landed in the AUX buffer part.
780 *
781 * struct {
782 * struct perf_event_header header;
783 *
784 * u64 aux_offset;
785 * u64 aux_size;
786 * u64 flags;
787 * struct sample_id sample_id;
788 * };
789 */
790 PERF_RECORD_AUX = 11,
791
792 /*
793 * Indicates that instruction trace has started
794 *
795 * struct {
796 * struct perf_event_header header;
797 * u32 pid;
798 * u32 tid;
799 * };
800 */
801 PERF_RECORD_ITRACE_START = 12,
802
728 PERF_RECORD_MAX, /* non-ABI */ 803 PERF_RECORD_MAX, /* non-ABI */
729}; 804};
730 805
@@ -742,6 +817,12 @@ enum perf_callchain_context {
742 PERF_CONTEXT_MAX = (__u64)-4095, 817 PERF_CONTEXT_MAX = (__u64)-4095,
743}; 818};
744 819
820/**
821 * PERF_RECORD_AUX::flags bits
822 */
823#define PERF_AUX_FLAG_TRUNCATED 0x01 /* record was truncated to fit */
824#define PERF_AUX_FLAG_OVERWRITE 0x02 /* snapshot from overwrite mode */
825
745#define PERF_FLAG_FD_NO_GROUP (1UL << 0) 826#define PERF_FLAG_FD_NO_GROUP (1UL << 0)
746#define PERF_FLAG_FD_OUTPUT (1UL << 1) 827#define PERF_FLAG_FD_OUTPUT (1UL << 1)
747#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */ 828#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */