aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/events
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2012-10-01 13:28:49 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2012-10-01 13:28:49 -0400
commit7e92daaefa68e5ef1e1732e45231e73adbb724e7 (patch)
tree8e7f8ac9d82654df4c65939c6682f95510e22977 /kernel/events
parent7a68294278ae714ce2632a54f0f46916dca64f56 (diff)
parent1d787d37c8ff6612b8151c6dff15bfa7347bcbdf (diff)
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf update from Ingo Molnar: "Lots of changes in this cycle as well, with hundreds of commits from over 30 contributors. Most of the activity was on the tooling side. Higher level changes: - New 'perf kvm' analysis tool, from Xiao Guangrong. - New 'perf trace' system-wide tracing tool - uprobes fixes + cleanups from Oleg Nesterov. - Lots of patches to make perf build on Android out of box, from Irina Tirdea - Extend ftrace function tracing utility to be more dynamic for its users. It allows for data passing to the callback functions, as well as reading regs as if a breakpoint were to trigger at function entry. The main goal of this patch series was to allow kprobes to use ftrace as an optimized probe point when a probe is placed on an ftrace nop. With lots of help from Masami Hiramatsu, and going through lots of iterations, we finally came up with a good solution. - Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng. - Various tracing updates from Steve Rostedt - Clean up and improve 'perf sched' performance by elliminating lots of needless calls to libtraceevent. - Event group parsing support, from Jiri Olsa - UI/gtk refactorings and improvements from Namhyung Kim - Add support for non-tracepoint events in perf script python, from Feng Tang - Add --symbols to 'script', similar to the one in 'report', from Feng Tang. Infrastructure enhancements and fixes: - Convert the trace builtins to use the growing evsel/evlist tracepoint infrastructure, removing several open coded constructs like switch like series of strcmp to dispatch events, etc. Basically what had already been showcased in 'perf sched'. - Add evsel constructor for tracepoints, that uses libtraceevent just to parse the /format events file, use it in a new 'perf test' to make sure the libtraceevent format parsing regressions can be more readily caught. - Some strange errors were happening in some builds, but not on the next, reported by several people, problem was some parser related files, generated during the build, didn't had proper make deps, fix from Eric Sandeen. - Introduce struct and cache information about the environment where a perf.data file was captured, from Namhyung Kim. - Fix handling of unresolved samples when --symbols is used in 'report', from Feng Tang. - Add union member access support to 'probe', from Hyeoncheol Lee. - Fixups to die() removal, from Namhyung Kim. - Render fixes for the TUI, from Namhyung Kim. - Don't enable annotation in non symbolic view, from Namhyung Kim. - Fix pipe mode in 'report', from Namhyung Kim. - Move related stats code from stat to util/, will be used by the 'stat' kvm tool, from Xiao Guangrong. - Remove die()/exit() calls from several tools. - Resolve vdso callchains, from Jiri Olsa - Don't pass const char pointers to basename, so that we can unconditionally use libgen.h and thus avoid ifdef BIONIC lines, from David Ahern - Refactor hist formatting so that it can be reused with the GTK browser, From Namhyung Kim - Fix build for another rbtree.c change, from Adrian Hunter. - Make 'perf diff' command work with evsel hists, from Jiri Olsa. - Use the only field_sep var that is set up: symbol_conf.field_sep, fix from Jiri Olsa. - .gitignore compiled python binaries, from Namhyung Kim. - Get rid of die() in more libtraceevent places, from Namhyung Kim. - Rename libtraceevent 'private' struct member to 'priv' so that it works in C++, from Steven Rostedt - Remove lots of exit()/die() calls from tools so that the main perf exit routine can take place, from David Ahern - Fix x86 build on x86-64, from David Ahern. - {int,str,rb}list fixes from Suzuki K Poulose - perf.data header fixes from Namhyung Kim - Allow user to indicate objdump path, needed in cross environments, from Maciek Borzecki - Fix hardware cache event name generation, fix from Jiri Olsa - Add round trip test for sw, hw and cache event names, catching the problem Jiri fixed, after Jiri's patch, the test passes successfully. - Clean target should do clean for lib/traceevent too, fix from David Ahern - Check the right variable for allocation failure, fix from Namhyung Kim - Set up evsel->tp_format regardless of evsel->name being set already, fix from Namhyung Kim - Oprofile fixes from Robert Richter. - Remove perf_event_attr needless version inflation, from Jiri Olsa - Introduce libtraceevent strerror like error reporting facility, from Namhyung Kim - Add pmu mappings to perf.data header and use event names from cmd line, from Robert Richter - Fix include order for bison/flex-generated C files, from Ben Hutchings - Build fixes and documentation corrections from David Ahern - Assorted cleanups from Robert Richter - Let O= makes handle relative paths, from Steven Rostedt - perf script python fixes, from Feng Tang. - Initial bash completion support, from Frederic Weisbecker - Allow building without libelf, from Namhyung Kim. - Support DWARF CFI based unwind to have callchains when %bp based unwinding is not possible, from Jiri Olsa. - Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF section was the end goal, several fixes for code that handles all architectures and cleanups are included, from Cody Schafer. - Assorted fixes for Documentation and build in 32 bit, from Robert Richter - Cache the libtraceevent event_format associated to each evsel early, so that we avoid relookups, i.e. calling pevent_find_event repeatedly when processing tracepoint events. [ This is to reduce the surface contact with libtraceevents and make clear what is that the perf tools needs from that lib: so far parsing the common and per event fields. ] - Don't stop the build if the audit libraries are not installed, fix from Namhyung Kim. - Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf. - Improve warning message when libunwind devel packages not present, from Jiri Olsa" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits) perf trace: Add aliases for some syscalls perf probe: Print an enum type variable in "enum variable-name" format when showing accessible variables perf tools: Check libaudit availability for perf-trace builtin perf hists: Add missing period_* fields when collapsing a hist entry perf trace: New tool perf evsel: Export the event_format constructor perf evsel: Introduce rawptr() method perf tools: Use perf_evsel__newtp in the event parser perf evsel: The tracepoint constructor should store sys:name perf evlist: Introduce set_filter() method perf evlist: Renane set_filters method to apply_filters perf test: Add test to check we correctly parse and match syscall open parms perf evsel: Handle endianity in intval method perf evsel: Know if byte swap is needed perf tools: Allow handling a NULL cpu_map as meaning "all cpus" perf evsel: Improve tracepoint constructor setup tools lib traceevent: Fix error path on pevent_parse_event perf test: Fix build failure trace: Move trace event enable from fs_initcall to core_initcall tracing: Add an option for disabling markers ...
Diffstat (limited to 'kernel/events')
-rw-r--r--kernel/events/callchain.c38
-rw-r--r--kernel/events/core.c214
-rw-r--r--kernel/events/internal.h82
-rw-r--r--kernel/events/ring_buffer.c10
-rw-r--r--kernel/events/uprobes.c248
5 files changed, 425 insertions, 167 deletions
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 98d4597f43d6..c77206184b8b 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -159,6 +159,11 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
159 int rctx; 159 int rctx;
160 struct perf_callchain_entry *entry; 160 struct perf_callchain_entry *entry;
161 161
162 int kernel = !event->attr.exclude_callchain_kernel;
163 int user = !event->attr.exclude_callchain_user;
164
165 if (!kernel && !user)
166 return NULL;
162 167
163 entry = get_callchain_entry(&rctx); 168 entry = get_callchain_entry(&rctx);
164 if (rctx == -1) 169 if (rctx == -1)
@@ -169,24 +174,29 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
169 174
170 entry->nr = 0; 175 entry->nr = 0;
171 176
172 if (!user_mode(regs)) { 177 if (kernel && !user_mode(regs)) {
173 perf_callchain_store(entry, PERF_CONTEXT_KERNEL); 178 perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
174 perf_callchain_kernel(entry, regs); 179 perf_callchain_kernel(entry, regs);
175 if (current->mm)
176 regs = task_pt_regs(current);
177 else
178 regs = NULL;
179 } 180 }
180 181
181 if (regs) { 182 if (user) {
182 /* 183 if (!user_mode(regs)) {
183 * Disallow cross-task user callchains. 184 if (current->mm)
184 */ 185 regs = task_pt_regs(current);
185 if (event->ctx->task && event->ctx->task != current) 186 else
186 goto exit_put; 187 regs = NULL;
187 188 }
188 perf_callchain_store(entry, PERF_CONTEXT_USER); 189
189 perf_callchain_user(entry, regs); 190 if (regs) {
191 /*
192 * Disallow cross-task user callchains.
193 */
194 if (event->ctx->task && event->ctx->task != current)
195 goto exit_put;
196
197 perf_callchain_store(entry, PERF_CONTEXT_USER);
198 perf_callchain_user(entry, regs);
199 }
190 } 200 }
191 201
192exit_put: 202exit_put:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7fee567153f0..7b9df353ba1b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -36,6 +36,7 @@
36#include <linux/perf_event.h> 36#include <linux/perf_event.h>
37#include <linux/ftrace_event.h> 37#include <linux/ftrace_event.h>
38#include <linux/hw_breakpoint.h> 38#include <linux/hw_breakpoint.h>
39#include <linux/mm_types.h>
39 40
40#include "internal.h" 41#include "internal.h"
41 42
@@ -3764,6 +3765,132 @@ int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
3764} 3765}
3765EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks); 3766EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
3766 3767
3768static void
3769perf_output_sample_regs(struct perf_output_handle *handle,
3770 struct pt_regs *regs, u64 mask)
3771{
3772 int bit;
3773
3774 for_each_set_bit(bit, (const unsigned long *) &mask,
3775 sizeof(mask) * BITS_PER_BYTE) {
3776 u64 val;
3777
3778 val = perf_reg_value(regs, bit);
3779 perf_output_put(handle, val);
3780 }
3781}
3782
3783static void perf_sample_regs_user(struct perf_regs_user *regs_user,
3784 struct pt_regs *regs)
3785{
3786 if (!user_mode(regs)) {
3787 if (current->mm)
3788 regs = task_pt_regs(current);
3789 else
3790 regs = NULL;
3791 }
3792
3793 if (regs) {
3794 regs_user->regs = regs;
3795 regs_user->abi = perf_reg_abi(current);
3796 }
3797}
3798
3799/*
3800 * Get remaining task size from user stack pointer.
3801 *
3802 * It'd be better to take stack vma map and limit this more
3803 * precisly, but there's no way to get it safely under interrupt,
3804 * so using TASK_SIZE as limit.
3805 */
3806static u64 perf_ustack_task_size(struct pt_regs *regs)
3807{
3808 unsigned long addr = perf_user_stack_pointer(regs);
3809
3810 if (!addr || addr >= TASK_SIZE)
3811 return 0;
3812
3813 return TASK_SIZE - addr;
3814}
3815
3816static u16
3817perf_sample_ustack_size(u16 stack_size, u16 header_size,
3818 struct pt_regs *regs)
3819{
3820 u64 task_size;
3821
3822 /* No regs, no stack pointer, no dump. */
3823 if (!regs)
3824 return 0;
3825
3826 /*
3827 * Check if we fit in with the requested stack size into the:
3828 * - TASK_SIZE
3829 * If we don't, we limit the size to the TASK_SIZE.
3830 *
3831 * - remaining sample size
3832 * If we don't, we customize the stack size to
3833 * fit in to the remaining sample size.
3834 */
3835
3836 task_size = min((u64) USHRT_MAX, perf_ustack_task_size(regs));
3837 stack_size = min(stack_size, (u16) task_size);
3838
3839 /* Current header size plus static size and dynamic size. */
3840 header_size += 2 * sizeof(u64);
3841
3842 /* Do we fit in with the current stack dump size? */
3843 if ((u16) (header_size + stack_size) < header_size) {
3844 /*
3845 * If we overflow the maximum size for the sample,
3846 * we customize the stack dump size to fit in.
3847 */
3848 stack_size = USHRT_MAX - header_size - sizeof(u64);
3849 stack_size = round_up(stack_size, sizeof(u64));
3850 }
3851
3852 return stack_size;
3853}
3854
3855static void
3856perf_output_sample_ustack(struct perf_output_handle *handle, u64 dump_size,
3857 struct pt_regs *regs)
3858{
3859 /* Case of a kernel thread, nothing to dump */
3860 if (!regs) {
3861 u64 size = 0;
3862 perf_output_put(handle, size);
3863 } else {
3864 unsigned long sp;
3865 unsigned int rem;
3866 u64 dyn_size;
3867
3868 /*
3869 * We dump:
3870 * static size
3871 * - the size requested by user or the best one we can fit
3872 * in to the sample max size
3873 * data
3874 * - user stack dump data
3875 * dynamic size
3876 * - the actual dumped size
3877 */
3878
3879 /* Static size. */
3880 perf_output_put(handle, dump_size);
3881
3882 /* Data. */
3883 sp = perf_user_stack_pointer(regs);
3884 rem = __output_copy_user(handle, (void *) sp, dump_size);
3885 dyn_size = dump_size - rem;
3886
3887 perf_output_skip(handle, rem);
3888
3889 /* Dynamic size. */
3890 perf_output_put(handle, dyn_size);
3891 }
3892}
3893
3767static void __perf_event_header__init_id(struct perf_event_header *header, 3894static void __perf_event_header__init_id(struct perf_event_header *header,
3768 struct perf_sample_data *data, 3895 struct perf_sample_data *data,
3769 struct perf_event *event) 3896 struct perf_event *event)
@@ -4024,6 +4151,28 @@ void perf_output_sample(struct perf_output_handle *handle,
4024 perf_output_put(handle, nr); 4151 perf_output_put(handle, nr);
4025 } 4152 }
4026 } 4153 }
4154
4155 if (sample_type & PERF_SAMPLE_REGS_USER) {
4156 u64 abi = data->regs_user.abi;
4157
4158 /*
4159 * If there are no regs to dump, notice it through
4160 * first u64 being zero (PERF_SAMPLE_REGS_ABI_NONE).
4161 */
4162 perf_output_put(handle, abi);
4163
4164 if (abi) {
4165 u64 mask = event->attr.sample_regs_user;
4166 perf_output_sample_regs(handle,
4167 data->regs_user.regs,
4168 mask);
4169 }
4170 }
4171
4172 if (sample_type & PERF_SAMPLE_STACK_USER)
4173 perf_output_sample_ustack(handle,
4174 data->stack_user_size,
4175 data->regs_user.regs);
4027} 4176}
4028 4177
4029void perf_prepare_sample(struct perf_event_header *header, 4178void perf_prepare_sample(struct perf_event_header *header,
@@ -4075,6 +4224,49 @@ void perf_prepare_sample(struct perf_event_header *header,
4075 } 4224 }
4076 header->size += size; 4225 header->size += size;
4077 } 4226 }
4227
4228 if (sample_type & PERF_SAMPLE_REGS_USER) {
4229 /* regs dump ABI info */
4230 int size = sizeof(u64);
4231
4232 perf_sample_regs_user(&data->regs_user, regs);
4233
4234 if (data->regs_user.regs) {
4235 u64 mask = event->attr.sample_regs_user;
4236 size += hweight64(mask) * sizeof(u64);
4237 }
4238
4239 header->size += size;
4240 }
4241
4242 if (sample_type & PERF_SAMPLE_STACK_USER) {
4243 /*
4244 * Either we need PERF_SAMPLE_STACK_USER bit to be allways
4245 * processed as the last one or have additional check added
4246 * in case new sample type is added, because we could eat
4247 * up the rest of the sample size.
4248 */
4249 struct perf_regs_user *uregs = &data->regs_user;
4250 u16 stack_size = event->attr.sample_stack_user;
4251 u16 size = sizeof(u64);
4252
4253 if (!uregs->abi)
4254 perf_sample_regs_user(uregs, regs);
4255
4256 stack_size = perf_sample_ustack_size(stack_size, header->size,
4257 uregs->regs);
4258
4259 /*
4260 * If there is something to dump, add space for the dump
4261 * itself and for the field that tells the dynamic size,
4262 * which is how many have been actually dumped.
4263 */
4264 if (stack_size)
4265 size += sizeof(u64) + stack_size;
4266
4267 data->stack_user_size = stack_size;
4268 header->size += size;
4269 }
4078} 4270}
4079 4271
4080static void perf_event_output(struct perf_event *event, 4272static void perf_event_output(struct perf_event *event,
@@ -6151,6 +6343,28 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
6151 attr->branch_sample_type = mask; 6343 attr->branch_sample_type = mask;
6152 } 6344 }
6153 } 6345 }
6346
6347 if (attr->sample_type & PERF_SAMPLE_REGS_USER) {
6348 ret = perf_reg_validate(attr->sample_regs_user);
6349 if (ret)
6350 return ret;
6351 }
6352
6353 if (attr->sample_type & PERF_SAMPLE_STACK_USER) {
6354 if (!arch_perf_have_user_stack_dump())
6355 return -ENOSYS;
6356
6357 /*
6358 * We have __u32 type for the size, but so far
6359 * we can only use __u16 as maximum due to the
6360 * __u16 sample size limit.
6361 */
6362 if (attr->sample_stack_user >= USHRT_MAX)
6363 ret = -EINVAL;
6364 else if (!IS_ALIGNED(attr->sample_stack_user, sizeof(u64)))
6365 ret = -EINVAL;
6366 }
6367
6154out: 6368out:
6155 return ret; 6369 return ret;
6156 6370
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index a096c19f2c2a..d56a64c99a8b 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -2,6 +2,7 @@
2#define _KERNEL_EVENTS_INTERNAL_H 2#define _KERNEL_EVENTS_INTERNAL_H
3 3
4#include <linux/hardirq.h> 4#include <linux/hardirq.h>
5#include <linux/uaccess.h>
5 6
6/* Buffer handling */ 7/* Buffer handling */
7 8
@@ -76,30 +77,53 @@ static inline unsigned long perf_data_size(struct ring_buffer *rb)
76 return rb->nr_pages << (PAGE_SHIFT + page_order(rb)); 77 return rb->nr_pages << (PAGE_SHIFT + page_order(rb));
77} 78}
78 79
79static inline void 80#define DEFINE_OUTPUT_COPY(func_name, memcpy_func) \
80__output_copy(struct perf_output_handle *handle, 81static inline unsigned int \
81 const void *buf, unsigned int len) 82func_name(struct perf_output_handle *handle, \
83 const void *buf, unsigned int len) \
84{ \
85 unsigned long size, written; \
86 \
87 do { \
88 size = min_t(unsigned long, handle->size, len); \
89 \
90 written = memcpy_func(handle->addr, buf, size); \
91 \
92 len -= written; \
93 handle->addr += written; \
94 buf += written; \
95 handle->size -= written; \
96 if (!handle->size) { \
97 struct ring_buffer *rb = handle->rb; \
98 \
99 handle->page++; \
100 handle->page &= rb->nr_pages - 1; \
101 handle->addr = rb->data_pages[handle->page]; \
102 handle->size = PAGE_SIZE << page_order(rb); \
103 } \
104 } while (len && written == size); \
105 \
106 return len; \
107}
108
109static inline int memcpy_common(void *dst, const void *src, size_t n)
82{ 110{
83 do { 111 memcpy(dst, src, n);
84 unsigned long size = min_t(unsigned long, handle->size, len); 112 return n;
85
86 memcpy(handle->addr, buf, size);
87
88 len -= size;
89 handle->addr += size;
90 buf += size;
91 handle->size -= size;
92 if (!handle->size) {
93 struct ring_buffer *rb = handle->rb;
94
95 handle->page++;
96 handle->page &= rb->nr_pages - 1;
97 handle->addr = rb->data_pages[handle->page];
98 handle->size = PAGE_SIZE << page_order(rb);
99 }
100 } while (len);
101} 113}
102 114
115DEFINE_OUTPUT_COPY(__output_copy, memcpy_common)
116
117#define MEMCPY_SKIP(dst, src, n) (n)
118
119DEFINE_OUTPUT_COPY(__output_skip, MEMCPY_SKIP)
120
121#ifndef arch_perf_out_copy_user
122#define arch_perf_out_copy_user __copy_from_user_inatomic
123#endif
124
125DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user)
126
103/* Callchain handling */ 127/* Callchain handling */
104extern struct perf_callchain_entry * 128extern struct perf_callchain_entry *
105perf_callchain(struct perf_event *event, struct pt_regs *regs); 129perf_callchain(struct perf_event *event, struct pt_regs *regs);
@@ -134,4 +158,20 @@ static inline void put_recursion_context(int *recursion, int rctx)
134 recursion[rctx]--; 158 recursion[rctx]--;
135} 159}
136 160
161#ifdef CONFIG_HAVE_PERF_USER_STACK_DUMP
162static inline bool arch_perf_have_user_stack_dump(void)
163{
164 return true;
165}
166
167#define perf_user_stack_pointer(regs) user_stack_pointer(regs)
168#else
169static inline bool arch_perf_have_user_stack_dump(void)
170{
171 return false;
172}
173
174#define perf_user_stack_pointer(regs) 0
175#endif /* CONFIG_HAVE_PERF_USER_STACK_DUMP */
176
137#endif /* _KERNEL_EVENTS_INTERNAL_H */ 177#endif /* _KERNEL_EVENTS_INTERNAL_H */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 6ddaba43fb7a..23cb34ff3973 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -182,10 +182,16 @@ out:
182 return -ENOSPC; 182 return -ENOSPC;
183} 183}
184 184
185void perf_output_copy(struct perf_output_handle *handle, 185unsigned int perf_output_copy(struct perf_output_handle *handle,
186 const void *buf, unsigned int len) 186 const void *buf, unsigned int len)
187{ 187{
188 __output_copy(handle, buf, len); 188 return __output_copy(handle, buf, len);
189}
190
191unsigned int perf_output_skip(struct perf_output_handle *handle,
192 unsigned int len)
193{
194 return __output_skip(handle, NULL, len);
189} 195}
190 196
191void perf_output_end(struct perf_output_handle *handle) 197void perf_output_end(struct perf_output_handle *handle)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c08a22d02f72..912ef48d28ab 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -280,12 +280,10 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_
280 if (ret <= 0) 280 if (ret <= 0)
281 return ret; 281 return ret;
282 282
283 lock_page(page);
284 vaddr_new = kmap_atomic(page); 283 vaddr_new = kmap_atomic(page);
285 vaddr &= ~PAGE_MASK; 284 vaddr &= ~PAGE_MASK;
286 memcpy(opcode, vaddr_new + vaddr, UPROBE_SWBP_INSN_SIZE); 285 memcpy(opcode, vaddr_new + vaddr, UPROBE_SWBP_INSN_SIZE);
287 kunmap_atomic(vaddr_new); 286 kunmap_atomic(vaddr_new);
288 unlock_page(page);
289 287
290 put_page(page); 288 put_page(page);
291 289
@@ -334,7 +332,7 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
334 */ 332 */
335 result = is_swbp_at_addr(mm, vaddr); 333 result = is_swbp_at_addr(mm, vaddr);
336 if (result == 1) 334 if (result == 1)
337 return -EEXIST; 335 return 0;
338 336
339 if (result) 337 if (result)
340 return result; 338 return result;
@@ -347,24 +345,22 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
347 * @mm: the probed process address space. 345 * @mm: the probed process address space.
348 * @auprobe: arch specific probepoint information. 346 * @auprobe: arch specific probepoint information.
349 * @vaddr: the virtual address to insert the opcode. 347 * @vaddr: the virtual address to insert the opcode.
350 * @verify: if true, verify existance of breakpoint instruction.
351 * 348 *
352 * For mm @mm, restore the original opcode (opcode) at @vaddr. 349 * For mm @mm, restore the original opcode (opcode) at @vaddr.
353 * Return 0 (success) or a negative errno. 350 * Return 0 (success) or a negative errno.
354 */ 351 */
355int __weak 352int __weak
356set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, bool verify) 353set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
357{ 354{
358 if (verify) { 355 int result;
359 int result; 356
357 result = is_swbp_at_addr(mm, vaddr);
358 if (!result)
359 return -EINVAL;
360 360
361 result = is_swbp_at_addr(mm, vaddr); 361 if (result != 1)
362 if (!result) 362 return result;
363 return -EINVAL;
364 363
365 if (result != 1)
366 return result;
367 }
368 return write_opcode(auprobe, mm, vaddr, *(uprobe_opcode_t *)auprobe->insn); 364 return write_opcode(auprobe, mm, vaddr, *(uprobe_opcode_t *)auprobe->insn);
369} 365}
370 366
@@ -415,11 +411,10 @@ static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset)
415static struct uprobe *find_uprobe(struct inode *inode, loff_t offset) 411static struct uprobe *find_uprobe(struct inode *inode, loff_t offset)
416{ 412{
417 struct uprobe *uprobe; 413 struct uprobe *uprobe;
418 unsigned long flags;
419 414
420 spin_lock_irqsave(&uprobes_treelock, flags); 415 spin_lock(&uprobes_treelock);
421 uprobe = __find_uprobe(inode, offset); 416 uprobe = __find_uprobe(inode, offset);
422 spin_unlock_irqrestore(&uprobes_treelock, flags); 417 spin_unlock(&uprobes_treelock);
423 418
424 return uprobe; 419 return uprobe;
425} 420}
@@ -466,12 +461,11 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
466 */ 461 */
467static struct uprobe *insert_uprobe(struct uprobe *uprobe) 462static struct uprobe *insert_uprobe(struct uprobe *uprobe)
468{ 463{
469 unsigned long flags;
470 struct uprobe *u; 464 struct uprobe *u;
471 465
472 spin_lock_irqsave(&uprobes_treelock, flags); 466 spin_lock(&uprobes_treelock);
473 u = __insert_uprobe(uprobe); 467 u = __insert_uprobe(uprobe);
474 spin_unlock_irqrestore(&uprobes_treelock, flags); 468 spin_unlock(&uprobes_treelock);
475 469
476 /* For now assume that the instruction need not be single-stepped */ 470 /* For now assume that the instruction need not be single-stepped */
477 uprobe->flags |= UPROBE_SKIP_SSTEP; 471 uprobe->flags |= UPROBE_SKIP_SSTEP;
@@ -649,6 +643,7 @@ static int
649install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, 643install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
650 struct vm_area_struct *vma, unsigned long vaddr) 644 struct vm_area_struct *vma, unsigned long vaddr)
651{ 645{
646 bool first_uprobe;
652 int ret; 647 int ret;
653 648
654 /* 649 /*
@@ -659,7 +654,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
659 * Hence behave as if probe already existed. 654 * Hence behave as if probe already existed.
660 */ 655 */
661 if (!uprobe->consumers) 656 if (!uprobe->consumers)
662 return -EEXIST; 657 return 0;
663 658
664 if (!(uprobe->flags & UPROBE_COPY_INSN)) { 659 if (!(uprobe->flags & UPROBE_COPY_INSN)) {
665 ret = copy_insn(uprobe, vma->vm_file); 660 ret = copy_insn(uprobe, vma->vm_file);
@@ -681,17 +676,18 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
681 } 676 }
682 677
683 /* 678 /*
684 * Ideally, should be updating the probe count after the breakpoint 679 * set MMF_HAS_UPROBES in advance for uprobe_pre_sstep_notifier(),
685 * has been successfully inserted. However a thread could hit the 680 * the task can hit this breakpoint right after __replace_page().
686 * breakpoint we just inserted even before the probe count is
687 * incremented. If this is the first breakpoint placed, breakpoint
688 * notifier might ignore uprobes and pass the trap to the thread.
689 * Hence increment before and decrement on failure.
690 */ 681 */
691 atomic_inc(&mm->uprobes_state.count); 682 first_uprobe = !test_bit(MMF_HAS_UPROBES, &mm->flags);
683 if (first_uprobe)
684 set_bit(MMF_HAS_UPROBES, &mm->flags);
685
692 ret = set_swbp(&uprobe->arch, mm, vaddr); 686 ret = set_swbp(&uprobe->arch, mm, vaddr);
693 if (ret) 687 if (!ret)
694 atomic_dec(&mm->uprobes_state.count); 688 clear_bit(MMF_RECALC_UPROBES, &mm->flags);
689 else if (first_uprobe)
690 clear_bit(MMF_HAS_UPROBES, &mm->flags);
695 691
696 return ret; 692 return ret;
697} 693}
@@ -699,8 +695,12 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
699static void 695static void
700remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr) 696remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
701{ 697{
702 if (!set_orig_insn(&uprobe->arch, mm, vaddr, true)) 698 /* can happen if uprobe_register() fails */
703 atomic_dec(&mm->uprobes_state.count); 699 if (!test_bit(MMF_HAS_UPROBES, &mm->flags))
700 return;
701
702 set_bit(MMF_RECALC_UPROBES, &mm->flags);
703 set_orig_insn(&uprobe->arch, mm, vaddr);
704} 704}
705 705
706/* 706/*
@@ -710,11 +710,9 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vad
710 */ 710 */
711static void delete_uprobe(struct uprobe *uprobe) 711static void delete_uprobe(struct uprobe *uprobe)
712{ 712{
713 unsigned long flags; 713 spin_lock(&uprobes_treelock);
714
715 spin_lock_irqsave(&uprobes_treelock, flags);
716 rb_erase(&uprobe->rb_node, &uprobes_tree); 714 rb_erase(&uprobe->rb_node, &uprobes_tree);
717 spin_unlock_irqrestore(&uprobes_treelock, flags); 715 spin_unlock(&uprobes_treelock);
718 iput(uprobe->inode); 716 iput(uprobe->inode);
719 put_uprobe(uprobe); 717 put_uprobe(uprobe);
720 atomic_dec(&uprobe_events); 718 atomic_dec(&uprobe_events);
@@ -831,17 +829,11 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
831 vaddr_to_offset(vma, info->vaddr) != uprobe->offset) 829 vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
832 goto unlock; 830 goto unlock;
833 831
834 if (is_register) { 832 if (is_register)
835 err = install_breakpoint(uprobe, mm, vma, info->vaddr); 833 err = install_breakpoint(uprobe, mm, vma, info->vaddr);
836 /* 834 else
837 * We can race against uprobe_mmap(), see the
838 * comment near uprobe_hash().
839 */
840 if (err == -EEXIST)
841 err = 0;
842 } else {
843 remove_breakpoint(uprobe, mm, info->vaddr); 835 remove_breakpoint(uprobe, mm, info->vaddr);
844 } 836
845 unlock: 837 unlock:
846 up_write(&mm->mmap_sem); 838 up_write(&mm->mmap_sem);
847 free: 839 free:
@@ -908,7 +900,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
908 } 900 }
909 901
910 mutex_unlock(uprobes_hash(inode)); 902 mutex_unlock(uprobes_hash(inode));
911 put_uprobe(uprobe); 903 if (uprobe)
904 put_uprobe(uprobe);
912 905
913 return ret; 906 return ret;
914} 907}
@@ -978,7 +971,6 @@ static void build_probe_list(struct inode *inode,
978 struct list_head *head) 971 struct list_head *head)
979{ 972{
980 loff_t min, max; 973 loff_t min, max;
981 unsigned long flags;
982 struct rb_node *n, *t; 974 struct rb_node *n, *t;
983 struct uprobe *u; 975 struct uprobe *u;
984 976
@@ -986,7 +978,7 @@ static void build_probe_list(struct inode *inode,
986 min = vaddr_to_offset(vma, start); 978 min = vaddr_to_offset(vma, start);
987 max = min + (end - start) - 1; 979 max = min + (end - start) - 1;
988 980
989 spin_lock_irqsave(&uprobes_treelock, flags); 981 spin_lock(&uprobes_treelock);
990 n = find_node_in_range(inode, min, max); 982 n = find_node_in_range(inode, min, max);
991 if (n) { 983 if (n) {
992 for (t = n; t; t = rb_prev(t)) { 984 for (t = n; t; t = rb_prev(t)) {
@@ -1004,27 +996,20 @@ static void build_probe_list(struct inode *inode,
1004 atomic_inc(&u->ref); 996 atomic_inc(&u->ref);
1005 } 997 }
1006 } 998 }
1007 spin_unlock_irqrestore(&uprobes_treelock, flags); 999 spin_unlock(&uprobes_treelock);
1008} 1000}
1009 1001
1010/* 1002/*
1011 * Called from mmap_region. 1003 * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
1012 * called with mm->mmap_sem acquired.
1013 * 1004 *
1014 * Return -ve no if we fail to insert probes and we cannot 1005 * Currently we ignore all errors and always return 0, the callers
1015 * bail-out. 1006 * can't handle the failure anyway.
1016 * Return 0 otherwise. i.e:
1017 *
1018 * - successful insertion of probes
1019 * - (or) no possible probes to be inserted.
1020 * - (or) insertion of probes failed but we can bail-out.
1021 */ 1007 */
1022int uprobe_mmap(struct vm_area_struct *vma) 1008int uprobe_mmap(struct vm_area_struct *vma)
1023{ 1009{
1024 struct list_head tmp_list; 1010 struct list_head tmp_list;
1025 struct uprobe *uprobe, *u; 1011 struct uprobe *uprobe, *u;
1026 struct inode *inode; 1012 struct inode *inode;
1027 int ret, count;
1028 1013
1029 if (!atomic_read(&uprobe_events) || !valid_vma(vma, true)) 1014 if (!atomic_read(&uprobe_events) || !valid_vma(vma, true))
1030 return 0; 1015 return 0;
@@ -1036,44 +1021,35 @@ int uprobe_mmap(struct vm_area_struct *vma)
1036 mutex_lock(uprobes_mmap_hash(inode)); 1021 mutex_lock(uprobes_mmap_hash(inode));
1037 build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list); 1022 build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list);
1038 1023
1039 ret = 0;
1040 count = 0;
1041
1042 list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) { 1024 list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
1043 if (!ret) { 1025 if (!fatal_signal_pending(current)) {
1044 unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset); 1026 unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
1045 1027 install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
1046 ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
1047 /*
1048 * We can race against uprobe_register(), see the
1049 * comment near uprobe_hash().
1050 */
1051 if (ret == -EEXIST) {
1052 ret = 0;
1053
1054 if (!is_swbp_at_addr(vma->vm_mm, vaddr))
1055 continue;
1056
1057 /*
1058 * Unable to insert a breakpoint, but
1059 * breakpoint lies underneath. Increment the
1060 * probe count.
1061 */
1062 atomic_inc(&vma->vm_mm->uprobes_state.count);
1063 }
1064
1065 if (!ret)
1066 count++;
1067 } 1028 }
1068 put_uprobe(uprobe); 1029 put_uprobe(uprobe);
1069 } 1030 }
1070
1071 mutex_unlock(uprobes_mmap_hash(inode)); 1031 mutex_unlock(uprobes_mmap_hash(inode));
1072 1032
1073 if (ret) 1033 return 0;
1074 atomic_sub(count, &vma->vm_mm->uprobes_state.count); 1034}
1075 1035
1076 return ret; 1036static bool
1037vma_has_uprobes(struct vm_area_struct *vma, unsigned long start, unsigned long end)
1038{
1039 loff_t min, max;
1040 struct inode *inode;
1041 struct rb_node *n;
1042
1043 inode = vma->vm_file->f_mapping->host;
1044
1045 min = vaddr_to_offset(vma, start);
1046 max = min + (end - start) - 1;
1047
1048 spin_lock(&uprobes_treelock);
1049 n = find_node_in_range(inode, min, max);
1050 spin_unlock(&uprobes_treelock);
1051
1052 return !!n;
1077} 1053}
1078 1054
1079/* 1055/*
@@ -1081,37 +1057,18 @@ int uprobe_mmap(struct vm_area_struct *vma)
1081 */ 1057 */
1082void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end) 1058void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end)
1083{ 1059{
1084 struct list_head tmp_list;
1085 struct uprobe *uprobe, *u;
1086 struct inode *inode;
1087
1088 if (!atomic_read(&uprobe_events) || !valid_vma(vma, false)) 1060 if (!atomic_read(&uprobe_events) || !valid_vma(vma, false))
1089 return; 1061 return;
1090 1062
1091 if (!atomic_read(&vma->vm_mm->mm_users)) /* called by mmput() ? */ 1063 if (!atomic_read(&vma->vm_mm->mm_users)) /* called by mmput() ? */
1092 return; 1064 return;
1093 1065
1094 if (!atomic_read(&vma->vm_mm->uprobes_state.count)) 1066 if (!test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags) ||
1095 return; 1067 test_bit(MMF_RECALC_UPROBES, &vma->vm_mm->flags))
1096
1097 inode = vma->vm_file->f_mapping->host;
1098 if (!inode)
1099 return; 1068 return;
1100 1069
1101 mutex_lock(uprobes_mmap_hash(inode)); 1070 if (vma_has_uprobes(vma, start, end))
1102 build_probe_list(inode, vma, start, end, &tmp_list); 1071 set_bit(MMF_RECALC_UPROBES, &vma->vm_mm->flags);
1103
1104 list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
1105 unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
1106 /*
1107 * An unregister could have removed the probe before
1108 * unmap. So check before we decrement the count.
1109 */
1110 if (is_swbp_at_addr(vma->vm_mm, vaddr) == 1)
1111 atomic_dec(&vma->vm_mm->uprobes_state.count);
1112 put_uprobe(uprobe);
1113 }
1114 mutex_unlock(uprobes_mmap_hash(inode));
1115} 1072}
1116 1073
1117/* Slot allocation for XOL */ 1074/* Slot allocation for XOL */
@@ -1213,13 +1170,15 @@ void uprobe_clear_state(struct mm_struct *mm)
1213 kfree(area); 1170 kfree(area);
1214} 1171}
1215 1172
1216/* 1173void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
1217 * uprobe_reset_state - Free the area allocated for slots.
1218 */
1219void uprobe_reset_state(struct mm_struct *mm)
1220{ 1174{
1221 mm->uprobes_state.xol_area = NULL; 1175 newmm->uprobes_state.xol_area = NULL;
1222 atomic_set(&mm->uprobes_state.count, 0); 1176
1177 if (test_bit(MMF_HAS_UPROBES, &oldmm->flags)) {
1178 set_bit(MMF_HAS_UPROBES, &newmm->flags);
1179 /* unconditionally, dup_mmap() skips VM_DONTCOPY vmas */
1180 set_bit(MMF_RECALC_UPROBES, &newmm->flags);
1181 }
1223} 1182}
1224 1183
1225/* 1184/*
@@ -1437,6 +1396,25 @@ static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
1437 return false; 1396 return false;
1438} 1397}
1439 1398
1399static void mmf_recalc_uprobes(struct mm_struct *mm)
1400{
1401 struct vm_area_struct *vma;
1402
1403 for (vma = mm->mmap; vma; vma = vma->vm_next) {
1404 if (!valid_vma(vma, false))
1405 continue;
1406 /*
1407 * This is not strictly accurate, we can race with
1408 * uprobe_unregister() and see the already removed
1409 * uprobe if delete_uprobe() was not yet called.
1410 */
1411 if (vma_has_uprobes(vma, vma->vm_start, vma->vm_end))
1412 return;
1413 }
1414
1415 clear_bit(MMF_HAS_UPROBES, &mm->flags);
1416}
1417
1440static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp) 1418static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
1441{ 1419{
1442 struct mm_struct *mm = current->mm; 1420 struct mm_struct *mm = current->mm;
@@ -1458,11 +1436,24 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
1458 } else { 1436 } else {
1459 *is_swbp = -EFAULT; 1437 *is_swbp = -EFAULT;
1460 } 1438 }
1439
1440 if (!uprobe && test_and_clear_bit(MMF_RECALC_UPROBES, &mm->flags))
1441 mmf_recalc_uprobes(mm);
1461 up_read(&mm->mmap_sem); 1442 up_read(&mm->mmap_sem);
1462 1443
1463 return uprobe; 1444 return uprobe;
1464} 1445}
1465 1446
1447void __weak arch_uprobe_enable_step(struct arch_uprobe *arch)
1448{
1449 user_enable_single_step(current);
1450}
1451
1452void __weak arch_uprobe_disable_step(struct arch_uprobe *arch)
1453{
1454 user_disable_single_step(current);
1455}
1456
1466/* 1457/*
1467 * Run handler and ask thread to singlestep. 1458 * Run handler and ask thread to singlestep.
1468 * Ensure all non-fatal signals cannot interrupt thread while it singlesteps. 1459 * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1509,7 +1500,7 @@ static void handle_swbp(struct pt_regs *regs)
1509 1500
1510 utask->state = UTASK_SSTEP; 1501 utask->state = UTASK_SSTEP;
1511 if (!pre_ssout(uprobe, regs, bp_vaddr)) { 1502 if (!pre_ssout(uprobe, regs, bp_vaddr)) {
1512 user_enable_single_step(current); 1503 arch_uprobe_enable_step(&uprobe->arch);
1513 return; 1504 return;
1514 } 1505 }
1515 1506
@@ -1518,17 +1509,15 @@ cleanup_ret:
1518 utask->active_uprobe = NULL; 1509 utask->active_uprobe = NULL;
1519 utask->state = UTASK_RUNNING; 1510 utask->state = UTASK_RUNNING;
1520 } 1511 }
1521 if (uprobe) { 1512 if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
1522 if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
1523 1513
1524 /* 1514 /*
1525 * cannot singlestep; cannot skip instruction; 1515 * cannot singlestep; cannot skip instruction;
1526 * re-execute the instruction. 1516 * re-execute the instruction.
1527 */ 1517 */
1528 instruction_pointer_set(regs, bp_vaddr); 1518 instruction_pointer_set(regs, bp_vaddr);
1529 1519
1530 put_uprobe(uprobe); 1520 put_uprobe(uprobe);
1531 }
1532} 1521}
1533 1522
1534/* 1523/*
@@ -1547,10 +1536,10 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
1547 else 1536 else
1548 WARN_ON_ONCE(1); 1537 WARN_ON_ONCE(1);
1549 1538
1539 arch_uprobe_disable_step(&uprobe->arch);
1550 put_uprobe(uprobe); 1540 put_uprobe(uprobe);
1551 utask->active_uprobe = NULL; 1541 utask->active_uprobe = NULL;
1552 utask->state = UTASK_RUNNING; 1542 utask->state = UTASK_RUNNING;
1553 user_disable_single_step(current);
1554 xol_free_insn_slot(current); 1543 xol_free_insn_slot(current);
1555 1544
1556 spin_lock_irq(&current->sighand->siglock); 1545 spin_lock_irq(&current->sighand->siglock);
@@ -1589,8 +1578,7 @@ int uprobe_pre_sstep_notifier(struct pt_regs *regs)
1589{ 1578{
1590 struct uprobe_task *utask; 1579 struct uprobe_task *utask;
1591 1580
1592 if (!current->mm || !atomic_read(&current->mm->uprobes_state.count)) 1581 if (!current->mm || !test_bit(MMF_HAS_UPROBES, &current->mm->flags))
1593 /* task is currently not uprobed */
1594 return 0; 1582 return 0;
1595 1583
1596 utask = current->utask; 1584 utask = current->utask;