aboutsummaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2015-04-14 17:37:47 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2015-04-14 17:37:47 -0400
commit6c8a53c9e6a151fffb07f8b4c34bd1e33dddd467 (patch)
tree791caf826ef136c521a97b7878f226b6ba1c1d75 /include
parente95e7f627062be5e6ce971ce873e6234c91ffc50 (diff)
parent066450be419fa48007a9f29e19828f2a86198754 (diff)
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
Diffstat (limited to 'include')
-rw-r--r--include/linux/bpf.h20
-rw-r--r--include/linux/ftrace_event.h14
-rw-r--r--include/linux/perf_event.h121
-rw-r--r--include/linux/watchdog.h8
-rw-r--r--include/uapi/linux/bpf.h5
-rw-r--r--include/uapi/linux/perf_event.h115
6 files changed, 256 insertions, 27 deletions
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bbfceb756452..c2e21113ecc0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -113,8 +113,6 @@ struct bpf_prog_type_list {
113 enum bpf_prog_type type; 113 enum bpf_prog_type type;
114}; 114};
115 115
116void bpf_register_prog_type(struct bpf_prog_type_list *tl);
117
118struct bpf_prog; 116struct bpf_prog;
119 117
120struct bpf_prog_aux { 118struct bpf_prog_aux {
@@ -129,11 +127,25 @@ struct bpf_prog_aux {
129}; 127};
130 128
131#ifdef CONFIG_BPF_SYSCALL 129#ifdef CONFIG_BPF_SYSCALL
130void bpf_register_prog_type(struct bpf_prog_type_list *tl);
131
132void bpf_prog_put(struct bpf_prog *prog); 132void bpf_prog_put(struct bpf_prog *prog);
133struct bpf_prog *bpf_prog_get(u32 ufd);
133#else 134#else
134static inline void bpf_prog_put(struct bpf_prog *prog) {} 135static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
136{
137}
138
139static inline struct bpf_prog *bpf_prog_get(u32 ufd)
140{
141 return ERR_PTR(-EOPNOTSUPP);
142}
143
144static inline void bpf_prog_put(struct bpf_prog *prog)
145{
146}
135#endif 147#endif
136struct bpf_prog *bpf_prog_get(u32 ufd); 148
137/* verify correctness of eBPF program */ 149/* verify correctness of eBPF program */
138int bpf_check(struct bpf_prog *fp, union bpf_attr *attr); 150int bpf_check(struct bpf_prog *fp, union bpf_attr *attr);
139 151
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 112cf49d9576..46e83c2156c6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -13,6 +13,7 @@ struct trace_array;
13struct trace_buffer; 13struct trace_buffer;
14struct tracer; 14struct tracer;
15struct dentry; 15struct dentry;
16struct bpf_prog;
16 17
17struct trace_print_flags { 18struct trace_print_flags {
18 unsigned long mask; 19 unsigned long mask;
@@ -252,6 +253,7 @@ enum {
252 TRACE_EVENT_FL_WAS_ENABLED_BIT, 253 TRACE_EVENT_FL_WAS_ENABLED_BIT,
253 TRACE_EVENT_FL_USE_CALL_FILTER_BIT, 254 TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
254 TRACE_EVENT_FL_TRACEPOINT_BIT, 255 TRACE_EVENT_FL_TRACEPOINT_BIT,
256 TRACE_EVENT_FL_KPROBE_BIT,
255}; 257};
256 258
257/* 259/*
@@ -265,6 +267,7 @@ enum {
265 * it is best to clear the buffers that used it). 267 * it is best to clear the buffers that used it).
266 * USE_CALL_FILTER - For ftrace internal events, don't use file filter 268 * USE_CALL_FILTER - For ftrace internal events, don't use file filter
267 * TRACEPOINT - Event is a tracepoint 269 * TRACEPOINT - Event is a tracepoint
270 * KPROBE - Event is a kprobe
268 */ 271 */
269enum { 272enum {
270 TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT), 273 TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -274,6 +277,7 @@ enum {
274 TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT), 277 TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
275 TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT), 278 TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
276 TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT), 279 TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
280 TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
277}; 281};
278 282
279struct ftrace_event_call { 283struct ftrace_event_call {
@@ -303,6 +307,7 @@ struct ftrace_event_call {
303#ifdef CONFIG_PERF_EVENTS 307#ifdef CONFIG_PERF_EVENTS
304 int perf_refcount; 308 int perf_refcount;
305 struct hlist_head __percpu *perf_events; 309 struct hlist_head __percpu *perf_events;
310 struct bpf_prog *prog;
306 311
307 int (*perf_perm)(struct ftrace_event_call *, 312 int (*perf_perm)(struct ftrace_event_call *,
308 struct perf_event *); 313 struct perf_event *);
@@ -548,6 +553,15 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file *file,
548 event_triggers_post_call(file, tt); 553 event_triggers_post_call(file, tt);
549} 554}
550 555
556#ifdef CONFIG_BPF_SYSCALL
557unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx);
558#else
559static inline unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx)
560{
561 return 1;
562}
563#endif
564
551enum { 565enum {
552 FILTER_OTHER = 0, 566 FILTER_OTHER = 0,
553 FILTER_STATIC_STRING, 567 FILTER_STATIC_STRING,
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2b621982938d..61992cf2e977 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -53,6 +53,7 @@ struct perf_guest_info_callbacks {
53#include <linux/sysfs.h> 53#include <linux/sysfs.h>
54#include <linux/perf_regs.h> 54#include <linux/perf_regs.h>
55#include <linux/workqueue.h> 55#include <linux/workqueue.h>
56#include <linux/cgroup.h>
56#include <asm/local.h> 57#include <asm/local.h>
57 58
58struct perf_callchain_entry { 59struct perf_callchain_entry {
@@ -118,10 +119,19 @@ struct hw_perf_event {
118 struct hrtimer hrtimer; 119 struct hrtimer hrtimer;
119 }; 120 };
120 struct { /* tracepoint */ 121 struct { /* tracepoint */
121 struct task_struct *tp_target;
122 /* for tp_event->class */ 122 /* for tp_event->class */
123 struct list_head tp_list; 123 struct list_head tp_list;
124 }; 124 };
125 struct { /* intel_cqm */
126 int cqm_state;
127 int cqm_rmid;
128 struct list_head cqm_events_entry;
129 struct list_head cqm_groups_entry;
130 struct list_head cqm_group_entry;
131 };
132 struct { /* itrace */
133 int itrace_started;
134 };
125#ifdef CONFIG_HAVE_HW_BREAKPOINT 135#ifdef CONFIG_HAVE_HW_BREAKPOINT
126 struct { /* breakpoint */ 136 struct { /* breakpoint */
127 /* 137 /*
@@ -129,12 +139,12 @@ struct hw_perf_event {
129 * problem hw_breakpoint has with context 139 * problem hw_breakpoint has with context
130 * creation and event initalization. 140 * creation and event initalization.
131 */ 141 */
132 struct task_struct *bp_target;
133 struct arch_hw_breakpoint info; 142 struct arch_hw_breakpoint info;
134 struct list_head bp_list; 143 struct list_head bp_list;
135 }; 144 };
136#endif 145#endif
137 }; 146 };
147 struct task_struct *target;
138 int state; 148 int state;
139 local64_t prev_count; 149 local64_t prev_count;
140 u64 sample_period; 150 u64 sample_period;
@@ -166,6 +176,11 @@ struct perf_event;
166 * pmu::capabilities flags 176 * pmu::capabilities flags
167 */ 177 */
168#define PERF_PMU_CAP_NO_INTERRUPT 0x01 178#define PERF_PMU_CAP_NO_INTERRUPT 0x01
179#define PERF_PMU_CAP_NO_NMI 0x02
180#define PERF_PMU_CAP_AUX_NO_SG 0x04
181#define PERF_PMU_CAP_AUX_SW_DOUBLEBUF 0x08
182#define PERF_PMU_CAP_EXCLUSIVE 0x10
183#define PERF_PMU_CAP_ITRACE 0x20
169 184
170/** 185/**
171 * struct pmu - generic performance monitoring unit 186 * struct pmu - generic performance monitoring unit
@@ -186,6 +201,7 @@ struct pmu {
186 201
187 int * __percpu pmu_disable_count; 202 int * __percpu pmu_disable_count;
188 struct perf_cpu_context * __percpu pmu_cpu_context; 203 struct perf_cpu_context * __percpu pmu_cpu_context;
204 atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
189 int task_ctx_nr; 205 int task_ctx_nr;
190 int hrtimer_interval_ms; 206 int hrtimer_interval_ms;
191 207
@@ -262,9 +278,32 @@ struct pmu {
262 int (*event_idx) (struct perf_event *event); /*optional */ 278 int (*event_idx) (struct perf_event *event); /*optional */
263 279
264 /* 280 /*
265 * flush branch stack on context-switches (needed in cpu-wide mode) 281 * context-switches callback
282 */
283 void (*sched_task) (struct perf_event_context *ctx,
284 bool sched_in);
285 /*
286 * PMU specific data size
287 */
288 size_t task_ctx_size;
289
290
291 /*
292 * Return the count value for a counter.
293 */
294 u64 (*count) (struct perf_event *event); /*optional*/
295
296 /*
297 * Set up pmu-private data structures for an AUX area
266 */ 298 */
267 void (*flush_branch_stack) (void); 299 void *(*setup_aux) (int cpu, void **pages,
300 int nr_pages, bool overwrite);
301 /* optional */
302
303 /*
304 * Free pmu-private AUX data structures
305 */
306 void (*free_aux) (void *aux); /* optional */
268}; 307};
269 308
270/** 309/**
@@ -300,6 +339,7 @@ struct swevent_hlist {
300#define PERF_ATTACH_CONTEXT 0x01 339#define PERF_ATTACH_CONTEXT 0x01
301#define PERF_ATTACH_GROUP 0x02 340#define PERF_ATTACH_GROUP 0x02
302#define PERF_ATTACH_TASK 0x04 341#define PERF_ATTACH_TASK 0x04
342#define PERF_ATTACH_TASK_DATA 0x08
303 343
304struct perf_cgroup; 344struct perf_cgroup;
305struct ring_buffer; 345struct ring_buffer;
@@ -438,6 +478,7 @@ struct perf_event {
438 struct pid_namespace *ns; 478 struct pid_namespace *ns;
439 u64 id; 479 u64 id;
440 480
481 u64 (*clock)(void);
441 perf_overflow_handler_t overflow_handler; 482 perf_overflow_handler_t overflow_handler;
442 void *overflow_handler_context; 483 void *overflow_handler_context;
443 484
@@ -504,7 +545,7 @@ struct perf_event_context {
504 u64 generation; 545 u64 generation;
505 int pin_count; 546 int pin_count;
506 int nr_cgroups; /* cgroup evts */ 547 int nr_cgroups; /* cgroup evts */
507 int nr_branch_stack; /* branch_stack evt */ 548 void *task_ctx_data; /* pmu specific data */
508 struct rcu_head rcu_head; 549 struct rcu_head rcu_head;
509 550
510 struct delayed_work orphans_remove; 551 struct delayed_work orphans_remove;
@@ -536,12 +577,52 @@ struct perf_output_handle {
536 struct ring_buffer *rb; 577 struct ring_buffer *rb;
537 unsigned long wakeup; 578 unsigned long wakeup;
538 unsigned long size; 579 unsigned long size;
539 void *addr; 580 union {
581 void *addr;
582 unsigned long head;
583 };
540 int page; 584 int page;
541}; 585};
542 586
587#ifdef CONFIG_CGROUP_PERF
588
589/*
590 * perf_cgroup_info keeps track of time_enabled for a cgroup.
591 * This is a per-cpu dynamically allocated data structure.
592 */
593struct perf_cgroup_info {
594 u64 time;
595 u64 timestamp;
596};
597
598struct perf_cgroup {
599 struct cgroup_subsys_state css;
600 struct perf_cgroup_info __percpu *info;
601};
602
603/*
604 * Must ensure cgroup is pinned (css_get) before calling
605 * this function. In other words, we cannot call this function
606 * if there is no cgroup event for the current CPU context.
607 */
608static inline struct perf_cgroup *
609perf_cgroup_from_task(struct task_struct *task)
610{
611 return container_of(task_css(task, perf_event_cgrp_id),
612 struct perf_cgroup, css);
613}
614#endif /* CONFIG_CGROUP_PERF */
615
543#ifdef CONFIG_PERF_EVENTS 616#ifdef CONFIG_PERF_EVENTS
544 617
618extern void *perf_aux_output_begin(struct perf_output_handle *handle,
619 struct perf_event *event);
620extern void perf_aux_output_end(struct perf_output_handle *handle,
621 unsigned long size, bool truncated);
622extern int perf_aux_output_skip(struct perf_output_handle *handle,
623 unsigned long size);
624extern void *perf_get_aux(struct perf_output_handle *handle);
625
545extern int perf_pmu_register(struct pmu *pmu, const char *name, int type); 626extern int perf_pmu_register(struct pmu *pmu, const char *name, int type);
546extern void perf_pmu_unregister(struct pmu *pmu); 627extern void perf_pmu_unregister(struct pmu *pmu);
547 628
@@ -558,6 +639,8 @@ extern void perf_event_delayed_put(struct task_struct *task);
558extern void perf_event_print_debug(void); 639extern void perf_event_print_debug(void);
559extern void perf_pmu_disable(struct pmu *pmu); 640extern void perf_pmu_disable(struct pmu *pmu);
560extern void perf_pmu_enable(struct pmu *pmu); 641extern void perf_pmu_enable(struct pmu *pmu);
642extern void perf_sched_cb_dec(struct pmu *pmu);
643extern void perf_sched_cb_inc(struct pmu *pmu);
561extern int perf_event_task_disable(void); 644extern int perf_event_task_disable(void);
562extern int perf_event_task_enable(void); 645extern int perf_event_task_enable(void);
563extern int perf_event_refresh(struct perf_event *event, int refresh); 646extern int perf_event_refresh(struct perf_event *event, int refresh);
@@ -731,6 +814,11 @@ static inline void perf_event_task_sched_out(struct task_struct *prev,
731 __perf_event_task_sched_out(prev, next); 814 __perf_event_task_sched_out(prev, next);
732} 815}
733 816
817static inline u64 __perf_event_count(struct perf_event *event)
818{
819 return local64_read(&event->count) + atomic64_read(&event->child_count);
820}
821
734extern void perf_event_mmap(struct vm_area_struct *vma); 822extern void perf_event_mmap(struct vm_area_struct *vma);
735extern struct perf_guest_info_callbacks *perf_guest_cbs; 823extern struct perf_guest_info_callbacks *perf_guest_cbs;
736extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); 824extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
@@ -800,6 +888,16 @@ static inline bool has_branch_stack(struct perf_event *event)
800 return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK; 888 return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK;
801} 889}
802 890
891static inline bool needs_branch_stack(struct perf_event *event)
892{
893 return event->attr.branch_sample_type != 0;
894}
895
896static inline bool has_aux(struct perf_event *event)
897{
898 return event->pmu->setup_aux;
899}
900
803extern int perf_output_begin(struct perf_output_handle *handle, 901extern int perf_output_begin(struct perf_output_handle *handle,
804 struct perf_event *event, unsigned int size); 902 struct perf_event *event, unsigned int size);
805extern void perf_output_end(struct perf_output_handle *handle); 903extern void perf_output_end(struct perf_output_handle *handle);
@@ -815,6 +913,17 @@ extern void perf_event_disable(struct perf_event *event);
815extern int __perf_event_disable(void *info); 913extern int __perf_event_disable(void *info);
816extern void perf_event_task_tick(void); 914extern void perf_event_task_tick(void);
817#else /* !CONFIG_PERF_EVENTS: */ 915#else /* !CONFIG_PERF_EVENTS: */
916static inline void *
917perf_aux_output_begin(struct perf_output_handle *handle,
918 struct perf_event *event) { return NULL; }
919static inline void
920perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
921 bool truncated) { }
922static inline int
923perf_aux_output_skip(struct perf_output_handle *handle,
924 unsigned long size) { return -EINVAL; }
925static inline void *
926perf_get_aux(struct perf_output_handle *handle) { return NULL; }
818static inline void 927static inline void
819perf_event_task_sched_in(struct task_struct *prev, 928perf_event_task_sched_in(struct task_struct *prev,
820 struct task_struct *task) { } 929 struct task_struct *task) { }
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 395b70e0eccf..a746bf5216f8 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -137,4 +137,12 @@ extern int watchdog_init_timeout(struct watchdog_device *wdd,
137extern int watchdog_register_device(struct watchdog_device *); 137extern int watchdog_register_device(struct watchdog_device *);
138extern void watchdog_unregister_device(struct watchdog_device *); 138extern void watchdog_unregister_device(struct watchdog_device *);
139 139
140#ifdef CONFIG_HARDLOCKUP_DETECTOR
141void watchdog_nmi_disable_all(void);
142void watchdog_nmi_enable_all(void);
143#else
144static inline void watchdog_nmi_disable_all(void) {}
145static inline void watchdog_nmi_enable_all(void) {}
146#endif
147
140#endif /* ifndef _LINUX_WATCHDOG_H */ 148#endif /* ifndef _LINUX_WATCHDOG_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 45da7ec7d274..cc47ef41076a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -118,6 +118,7 @@ enum bpf_map_type {
118enum bpf_prog_type { 118enum bpf_prog_type {
119 BPF_PROG_TYPE_UNSPEC, 119 BPF_PROG_TYPE_UNSPEC,
120 BPF_PROG_TYPE_SOCKET_FILTER, 120 BPF_PROG_TYPE_SOCKET_FILTER,
121 BPF_PROG_TYPE_KPROBE,
121}; 122};
122 123
123/* flags for BPF_MAP_UPDATE_ELEM command */ 124/* flags for BPF_MAP_UPDATE_ELEM command */
@@ -151,6 +152,7 @@ union bpf_attr {
151 __u32 log_level; /* verbosity level of verifier */ 152 __u32 log_level; /* verbosity level of verifier */
152 __u32 log_size; /* size of user buffer */ 153 __u32 log_size; /* size of user buffer */
153 __aligned_u64 log_buf; /* user supplied buffer */ 154 __aligned_u64 log_buf; /* user supplied buffer */
155 __u32 kern_version; /* checked when prog_type=kprobe */
154 }; 156 };
155} __attribute__((aligned(8))); 157} __attribute__((aligned(8)));
156 158
@@ -162,6 +164,9 @@ enum bpf_func_id {
162 BPF_FUNC_map_lookup_elem, /* void *map_lookup_elem(&map, &key) */ 164 BPF_FUNC_map_lookup_elem, /* void *map_lookup_elem(&map, &key) */
163 BPF_FUNC_map_update_elem, /* int map_update_elem(&map, &key, &value, flags) */ 165 BPF_FUNC_map_update_elem, /* int map_update_elem(&map, &key, &value, flags) */
164 BPF_FUNC_map_delete_elem, /* int map_delete_elem(&map, &key) */ 166 BPF_FUNC_map_delete_elem, /* int map_delete_elem(&map, &key) */
167 BPF_FUNC_probe_read, /* int bpf_probe_read(void *dst, int size, void *src) */
168 BPF_FUNC_ktime_get_ns, /* u64 bpf_ktime_get_ns(void) */
169 BPF_FUNC_trace_printk, /* int bpf_trace_printk(const char *fmt, int fmt_size, ...) */
165 __BPF_FUNC_MAX_ID, 170 __BPF_FUNC_MAX_ID,
166}; 171};
167 172
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9b79abbd1ab8..309211b3eb67 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -152,21 +152,42 @@ enum perf_event_sample_format {
152 * The branch types can be combined, however BRANCH_ANY covers all types 152 * The branch types can be combined, however BRANCH_ANY covers all types
153 * of branches and therefore it supersedes all the other types. 153 * of branches and therefore it supersedes all the other types.
154 */ 154 */
155enum perf_branch_sample_type_shift {
156 PERF_SAMPLE_BRANCH_USER_SHIFT = 0, /* user branches */
157 PERF_SAMPLE_BRANCH_KERNEL_SHIFT = 1, /* kernel branches */
158 PERF_SAMPLE_BRANCH_HV_SHIFT = 2, /* hypervisor branches */
159
160 PERF_SAMPLE_BRANCH_ANY_SHIFT = 3, /* any branch types */
161 PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT = 4, /* any call branch */
162 PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT = 5, /* any return branch */
163 PERF_SAMPLE_BRANCH_IND_CALL_SHIFT = 6, /* indirect calls */
164 PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT = 7, /* transaction aborts */
165 PERF_SAMPLE_BRANCH_IN_TX_SHIFT = 8, /* in transaction */
166 PERF_SAMPLE_BRANCH_NO_TX_SHIFT = 9, /* not in transaction */
167 PERF_SAMPLE_BRANCH_COND_SHIFT = 10, /* conditional branches */
168
169 PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = 11, /* call/ret stack */
170
171 PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
172};
173
155enum perf_branch_sample_type { 174enum perf_branch_sample_type {
156 PERF_SAMPLE_BRANCH_USER = 1U << 0, /* user branches */ 175 PERF_SAMPLE_BRANCH_USER = 1U << PERF_SAMPLE_BRANCH_USER_SHIFT,
157 PERF_SAMPLE_BRANCH_KERNEL = 1U << 1, /* kernel branches */ 176 PERF_SAMPLE_BRANCH_KERNEL = 1U << PERF_SAMPLE_BRANCH_KERNEL_SHIFT,
158 PERF_SAMPLE_BRANCH_HV = 1U << 2, /* hypervisor branches */ 177 PERF_SAMPLE_BRANCH_HV = 1U << PERF_SAMPLE_BRANCH_HV_SHIFT,
159 178
160 PERF_SAMPLE_BRANCH_ANY = 1U << 3, /* any branch types */ 179 PERF_SAMPLE_BRANCH_ANY = 1U << PERF_SAMPLE_BRANCH_ANY_SHIFT,
161 PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 4, /* any call branch */ 180 PERF_SAMPLE_BRANCH_ANY_CALL = 1U << PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT,
162 PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << 5, /* any return branch */ 181 PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT,
163 PERF_SAMPLE_BRANCH_IND_CALL = 1U << 6, /* indirect calls */ 182 PERF_SAMPLE_BRANCH_IND_CALL = 1U << PERF_SAMPLE_BRANCH_IND_CALL_SHIFT,
164 PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ 183 PERF_SAMPLE_BRANCH_ABORT_TX = 1U << PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT,
165 PERF_SAMPLE_BRANCH_IN_TX = 1U << 8, /* in transaction */ 184 PERF_SAMPLE_BRANCH_IN_TX = 1U << PERF_SAMPLE_BRANCH_IN_TX_SHIFT,
166 PERF_SAMPLE_BRANCH_NO_TX = 1U << 9, /* not in transaction */ 185 PERF_SAMPLE_BRANCH_NO_TX = 1U << PERF_SAMPLE_BRANCH_NO_TX_SHIFT,
167 PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ 186 PERF_SAMPLE_BRANCH_COND = 1U << PERF_SAMPLE_BRANCH_COND_SHIFT,
168 187
169 PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ 188 PERF_SAMPLE_BRANCH_CALL_STACK = 1U << PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT,
189
190 PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
170}; 191};
171 192
172#define PERF_SAMPLE_BRANCH_PLM_ALL \ 193#define PERF_SAMPLE_BRANCH_PLM_ALL \
@@ -240,6 +261,7 @@ enum perf_event_read_format {
240#define PERF_ATTR_SIZE_VER3 96 /* add: sample_regs_user */ 261#define PERF_ATTR_SIZE_VER3 96 /* add: sample_regs_user */
241 /* add: sample_stack_user */ 262 /* add: sample_stack_user */
242#define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */ 263#define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */
264#define PERF_ATTR_SIZE_VER5 112 /* add: aux_watermark */
243 265
244/* 266/*
245 * Hardware event_id to monitor via a performance monitoring event: 267 * Hardware event_id to monitor via a performance monitoring event:
@@ -305,7 +327,8 @@ struct perf_event_attr {
305 exclude_callchain_user : 1, /* exclude user callchains */ 327 exclude_callchain_user : 1, /* exclude user callchains */
306 mmap2 : 1, /* include mmap with inode data */ 328 mmap2 : 1, /* include mmap with inode data */
307 comm_exec : 1, /* flag comm events that are due to an exec */ 329 comm_exec : 1, /* flag comm events that are due to an exec */
308 __reserved_1 : 39; 330 use_clockid : 1, /* use @clockid for time fields */
331 __reserved_1 : 38;
309 332
310 union { 333 union {
311 __u32 wakeup_events; /* wakeup every n events */ 334 __u32 wakeup_events; /* wakeup every n events */
@@ -334,8 +357,7 @@ struct perf_event_attr {
334 */ 357 */
335 __u32 sample_stack_user; 358 __u32 sample_stack_user;
336 359
337 /* Align to u64. */ 360 __s32 clockid;
338 __u32 __reserved_2;
339 /* 361 /*
340 * Defines set of regs to dump for each sample 362 * Defines set of regs to dump for each sample
341 * state captured on: 363 * state captured on:
@@ -345,6 +367,12 @@ struct perf_event_attr {
345 * See asm/perf_regs.h for details. 367 * See asm/perf_regs.h for details.
346 */ 368 */
347 __u64 sample_regs_intr; 369 __u64 sample_regs_intr;
370
371 /*
372 * Wakeup watermark for AUX area
373 */
374 __u32 aux_watermark;
375 __u32 __reserved_2; /* align to __u64 */
348}; 376};
349 377
350#define perf_flags(attr) (*(&(attr)->read_format + 1)) 378#define perf_flags(attr) (*(&(attr)->read_format + 1))
@@ -360,6 +388,7 @@ struct perf_event_attr {
360#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5) 388#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
361#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *) 389#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
362#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *) 390#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *)
391#define PERF_EVENT_IOC_SET_BPF _IOW('$', 8, __u32)
363 392
364enum perf_event_ioc_flags { 393enum perf_event_ioc_flags {
365 PERF_IOC_FLAG_GROUP = 1U << 0, 394 PERF_IOC_FLAG_GROUP = 1U << 0,
@@ -500,9 +529,30 @@ struct perf_event_mmap_page {
500 * In this case the kernel will not over-write unread data. 529 * In this case the kernel will not over-write unread data.
501 * 530 *
502 * See perf_output_put_handle() for the data ordering. 531 * See perf_output_put_handle() for the data ordering.
532 *
533 * data_{offset,size} indicate the location and size of the perf record
534 * buffer within the mmapped area.
503 */ 535 */
504 __u64 data_head; /* head in the data section */ 536 __u64 data_head; /* head in the data section */
505 __u64 data_tail; /* user-space written tail */ 537 __u64 data_tail; /* user-space written tail */
538 __u64 data_offset; /* where the buffer starts */
539 __u64 data_size; /* data buffer size */
540
541 /*
542 * AUX area is defined by aux_{offset,size} fields that should be set
543 * by the userspace, so that
544 *
545 * aux_offset >= data_offset + data_size
546 *
547 * prior to mmap()ing it. Size of the mmap()ed area should be aux_size.
548 *
549 * Ring buffer pointers aux_{head,tail} have the same semantics as
550 * data_{head,tail} and same ordering rules apply.
551 */
552 __u64 aux_head;
553 __u64 aux_tail;
554 __u64 aux_offset;
555 __u64 aux_size;
506}; 556};
507 557
508#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0) 558#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0)
@@ -725,6 +775,31 @@ enum perf_event_type {
725 */ 775 */
726 PERF_RECORD_MMAP2 = 10, 776 PERF_RECORD_MMAP2 = 10,
727 777
778 /*
779 * Records that new data landed in the AUX buffer part.
780 *
781 * struct {
782 * struct perf_event_header header;
783 *
784 * u64 aux_offset;
785 * u64 aux_size;
786 * u64 flags;
787 * struct sample_id sample_id;
788 * };
789 */
790 PERF_RECORD_AUX = 11,
791
792 /*
793 * Indicates that instruction trace has started
794 *
795 * struct {
796 * struct perf_event_header header;
797 * u32 pid;
798 * u32 tid;
799 * };
800 */
801 PERF_RECORD_ITRACE_START = 12,
802
728 PERF_RECORD_MAX, /* non-ABI */ 803 PERF_RECORD_MAX, /* non-ABI */
729}; 804};
730 805
@@ -742,6 +817,12 @@ enum perf_callchain_context {
742 PERF_CONTEXT_MAX = (__u64)-4095, 817 PERF_CONTEXT_MAX = (__u64)-4095,
743}; 818};
744 819
820/**
821 * PERF_RECORD_AUX::flags bits
822 */
823#define PERF_AUX_FLAG_TRUNCATED 0x01 /* record was truncated to fit */
824#define PERF_AUX_FLAG_OVERWRITE 0x02 /* snapshot from overwrite mode */
825
745#define PERF_FLAG_FD_NO_GROUP (1UL << 0) 826#define PERF_FLAG_FD_NO_GROUP (1UL << 0)
746#define PERF_FLAG_FD_OUTPUT (1UL << 1) 827#define PERF_FLAG_FD_OUTPUT (1UL << 1)
747#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */ 828#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */