aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/kprobes.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/kprobes.txt')
-rw-r--r--Documentation/kprobes.txt203
1 files changed, 190 insertions, 13 deletions
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 053037a1fe6d..6653017680dd 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -1,6 +1,7 @@
1Title : Kernel Probes (Kprobes) 1Title : Kernel Probes (Kprobes)
2Authors : Jim Keniston <jkenisto@us.ibm.com> 2Authors : Jim Keniston <jkenisto@us.ibm.com>
3 : Prasanna S Panchamukhi <prasanna@in.ibm.com> 3 : Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
4 : Masami Hiramatsu <mhiramat@redhat.com>
4 5
5CONTENTS 6CONTENTS
6 7
@@ -15,6 +16,7 @@ CONTENTS
159. Jprobes Example 169. Jprobes Example
1610. Kretprobes Example 1710. Kretprobes Example
17Appendix A: The kprobes debugfs interface 18Appendix A: The kprobes debugfs interface
19Appendix B: The kprobes sysctl interface
18 20
191. Concepts: Kprobes, Jprobes, Return Probes 211. Concepts: Kprobes, Jprobes, Return Probes
20 22
@@ -42,13 +44,13 @@ registration/unregistration of a group of *probes. These functions
42can speed up unregistration process when you have to unregister 44can speed up unregistration process when you have to unregister
43a lot of probes at once. 45a lot of probes at once.
44 46
45The next three subsections explain how the different types of 47The next four subsections explain how the different types of
46probes work. They explain certain things that you'll need to 48probes work and how jump optimization works. They explain certain
47know in order to make the best use of Kprobes -- e.g., the 49things that you'll need to know in order to make the best use of
48difference between a pre_handler and a post_handler, and how 50Kprobes -- e.g., the difference between a pre_handler and
49to use the maxactive and nmissed fields of a kretprobe. But 51a post_handler, and how to use the maxactive and nmissed fields of
50if you're in a hurry to start using Kprobes, you can skip ahead 52a kretprobe. But if you're in a hurry to start using Kprobes, you
51to section 2. 53can skip ahead to section 2.
52 54
531.1 How Does a Kprobe Work? 551.1 How Does a Kprobe Work?
54 56
@@ -161,13 +163,123 @@ In case probed function is entered but there is no kretprobe_instance
161object available, then in addition to incrementing the nmissed count, 163object available, then in addition to incrementing the nmissed count,
162the user entry_handler invocation is also skipped. 164the user entry_handler invocation is also skipped.
163 165
1661.4 How Does Jump Optimization Work?
167
168If your kernel is built with CONFIG_OPTPROBES=y (currently this flag
169is automatically set 'y' on x86/x86-64, non-preemptive kernel) and
170the "debug.kprobes_optimization" kernel parameter is set to 1 (see
171sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
172instruction instead of a breakpoint instruction at each probepoint.
173
1741.4.1 Init a Kprobe
175
176When a probe is registered, before attempting this optimization,
177Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
178address. So, even if it's not possible to optimize this particular
179probepoint, there'll be a probe there.
180
1811.4.2 Safety Check
182
183Before optimizing a probe, Kprobes performs the following safety checks:
184
185- Kprobes verifies that the region that will be replaced by the jump
186instruction (the "optimized region") lies entirely within one function.
187(A jump instruction is multiple bytes, and so may overlay multiple
188instructions.)
189
190- Kprobes analyzes the entire function and verifies that there is no
191jump into the optimized region. Specifically:
192 - the function contains no indirect jump;
193 - the function contains no instruction that causes an exception (since
194 the fixup code triggered by the exception could jump back into the
195 optimized region -- Kprobes checks the exception tables to verify this);
196 and
197 - there is no near jump to the optimized region (other than to the first
198 byte).
199
200- For each instruction in the optimized region, Kprobes verifies that
201the instruction can be executed out of line.
202
2031.4.3 Preparing Detour Buffer
204
205Next, Kprobes prepares a "detour" buffer, which contains the following
206instruction sequence:
207- code to push the CPU's registers (emulating a breakpoint trap)
208- a call to the trampoline code which calls user's probe handlers.
209- code to restore registers
210- the instructions from the optimized region
211- a jump back to the original execution path.
212
2131.4.4 Pre-optimization
214
215After preparing the detour buffer, Kprobes verifies that none of the
216following situations exist:
217- The probe has either a break_handler (i.e., it's a jprobe) or a
218post_handler.
219- Other instructions in the optimized region are probed.
220- The probe is disabled.
221In any of the above cases, Kprobes won't start optimizing the probe.
222Since these are temporary situations, Kprobes tries to start
223optimizing it again if the situation is changed.
224
225If the kprobe can be optimized, Kprobes enqueues the kprobe to an
226optimizing list, and kicks the kprobe-optimizer workqueue to optimize
227it. If the to-be-optimized probepoint is hit before being optimized,
228Kprobes returns control to the original instruction path by setting
229the CPU's instruction pointer to the copied code in the detour buffer
230-- thus at least avoiding the single-step.
231
2321.4.5 Optimization
233
234The Kprobe-optimizer doesn't insert the jump instruction immediately;
235rather, it calls synchronize_sched() for safety first, because it's
236possible for a CPU to be interrupted in the middle of executing the
237optimized region(*). As you know, synchronize_sched() can ensure
238that all interruptions that were active when synchronize_sched()
239was called are done, but only if CONFIG_PREEMPT=n. So, this version
240of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**)
241
242After that, the Kprobe-optimizer calls stop_machine() to replace
243the optimized region with a jump instruction to the detour buffer,
244using text_poke_smp().
245
2461.4.6 Unoptimization
247
248When an optimized kprobe is unregistered, disabled, or blocked by
249another kprobe, it will be unoptimized. If this happens before
250the optimization is complete, the kprobe is just dequeued from the
251optimized list. If the optimization has been done, the jump is
252replaced with the original code (except for an int3 breakpoint in
253the first byte) by using text_poke_smp().
254
255(*)Please imagine that the 2nd instruction is interrupted and then
256the optimizer replaces the 2nd instruction with the jump *address*
257while the interrupt handler is running. When the interrupt
258returns to original address, there is no valid instruction,
259and it causes an unexpected result.
260
261(**)This optimization-safety checking may be replaced with the
262stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
263kernel.
264
265NOTE for geeks:
266The jump optimization changes the kprobe's pre_handler behavior.
267Without optimization, the pre_handler can change the kernel's execution
268path by changing regs->ip and returning 1. However, when the probe
269is optimized, that modification is ignored. Thus, if you want to
270tweak the kernel's execution path, you need to suppress optimization,
271using one of the following techniques:
272- Specify an empty function for the kprobe's post_handler or break_handler.
273 or
274- Execute 'sysctl -w debug.kprobes_optimization=n'
275
1642. Architectures Supported 2762. Architectures Supported
165 277
166Kprobes, jprobes, and return probes are implemented on the following 278Kprobes, jprobes, and return probes are implemented on the following
167architectures: 279architectures:
168 280
169- i386 281- i386 (Supports jump optimization)
170- x86_64 (AMD-64, EM64T) 282- x86_64 (AMD-64, EM64T) (Supports jump optimization)
171- ppc64 283- ppc64
172- ia64 (Does not support probes on instruction slot1.) 284- ia64 (Does not support probes on instruction slot1.)
173- sparc64 (Return probes not yet implemented.) 285- sparc64 (Return probes not yet implemented.)
@@ -214,7 +326,7 @@ occurs during execution of kp->pre_handler or kp->post_handler,
214or during single-stepping of the probed instruction, Kprobes calls 326or during single-stepping of the probed instruction, Kprobes calls
215kp->fault_handler. Any or all handlers can be NULL. If kp->flags 327kp->fault_handler. Any or all handlers can be NULL. If kp->flags
216is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled, 328is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled,
217so, it's handlers aren't hit until calling enable_kprobe(kp). 329so, its handlers aren't hit until calling enable_kprobe(kp).
218 330
219NOTE: 331NOTE:
2201. With the introduction of the "symbol_name" field to struct kprobe, 3321. With the introduction of the "symbol_name" field to struct kprobe,
@@ -389,7 +501,10 @@ the probe which has been registered.
389 501
390Kprobes allows multiple probes at the same address. Currently, 502Kprobes allows multiple probes at the same address. Currently,
391however, there cannot be multiple jprobes on the same function at 503however, there cannot be multiple jprobes on the same function at
392the same time. 504the same time. Also, a probepoint for which there is a jprobe or
505a post_handler cannot be optimized. So if you install a jprobe,
506or a kprobe with a post_handler, at an optimized probepoint, the
507probepoint will be unoptimized automatically.
393 508
394In general, you can install a probe anywhere in the kernel. 509In general, you can install a probe anywhere in the kernel.
395In particular, you can probe interrupt handlers. Known exceptions 510In particular, you can probe interrupt handlers. Known exceptions
@@ -453,6 +568,38 @@ reason, Kprobes doesn't support return probes (or kprobes or jprobes)
453on the x86_64 version of __switch_to(); the registration functions 568on the x86_64 version of __switch_to(); the registration functions
454return -EINVAL. 569return -EINVAL.
455 570
571On x86/x86-64, since the Jump Optimization of Kprobes modifies
572instructions widely, there are some limitations to optimization. To
573explain it, we introduce some terminology. Imagine a 3-instruction
574sequence consisting of a two 2-byte instructions and one 3-byte
575instruction.
576
577 IA
578 |
579[-2][-1][0][1][2][3][4][5][6][7]
580 [ins1][ins2][ ins3 ]
581 [<- DCR ->]
582 [<- JTPR ->]
583
584ins1: 1st Instruction
585ins2: 2nd Instruction
586ins3: 3rd Instruction
587IA: Insertion Address
588JTPR: Jump Target Prohibition Region
589DCR: Detoured Code Region
590
591The instructions in DCR are copied to the out-of-line buffer
592of the kprobe, because the bytes in DCR are replaced by
593a 5-byte jump instruction. So there are several limitations.
594
595a) The instructions in DCR must be relocatable.
596b) The instructions in DCR must not include a call instruction.
597c) JTPR must not be targeted by any jump or call instruction.
598d) DCR must not straddle the border betweeen functions.
599
600Anyway, these limitations are checked by the in-kernel instruction
601decoder, so you don't need to worry about that.
602
4566. Probe Overhead 6036. Probe Overhead
457 604
458On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 605On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
@@ -476,6 +623,19 @@ k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
476ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) 623ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
477k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 624k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
478 625
6266.1 Optimized Probe Overhead
627
628Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
629process. Here are sample overhead figures (in usec) for x86 architectures.
630k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
631r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
632
633i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
634k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33
635
636x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
637k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
638
4797. TODO 6397. TODO
480 640
481a. SystemTap (http://sourceware.org/systemtap): Provides a simplified 641a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
@@ -523,7 +683,8 @@ is also specified. Following columns show probe status. If the probe is on
523a virtual address that is no longer valid (module init sections, module 683a virtual address that is no longer valid (module init sections, module
524virtual addresses that correspond to modules that've been unloaded), 684virtual addresses that correspond to modules that've been unloaded),
525such probes are marked with [GONE]. If the probe is temporarily disabled, 685such probes are marked with [GONE]. If the probe is temporarily disabled,
526such probes are marked with [DISABLED]. 686such probes are marked with [DISABLED]. If the probe is optimized, it is
687marked with [OPTIMIZED].
527 688
528/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly. 689/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
529 690
@@ -533,3 +694,19 @@ registered probes will be disarmed, till such time a "1" is echoed to this
533file. Note that this knob just disarms and arms all kprobes and doesn't 694file. Note that this knob just disarms and arms all kprobes and doesn't
534change each probe's disabling state. This means that disabled kprobes (marked 695change each probe's disabling state. This means that disabled kprobes (marked
535[DISABLED]) will be not enabled if you turn ON all kprobes by this knob. 696[DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
697
698
699Appendix B: The kprobes sysctl interface
700
701/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
702
703When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
704a knob to globally and forcibly turn jump optimization (see section
7051.4) ON or OFF. By default, jump optimization is allowed (ON).
706If you echo "0" to this file or set "debug.kprobes_optimization" to
7070 via sysctl, all optimized probes will be unoptimized, and any new
708probes registered after that will not be optimized. Note that this
709knob *changes* the optimized state. This means that optimized probes
710(marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
711removed). If the knob is turned on, they will be optimized again.
712