aboutsummaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>2008-07-18 12:16:16 -0400
committerIngo Molnar <mingo@elte.hu>2008-10-14 04:28:28 -0400
commit97e1c18e8d17bd87e1e383b2e9d9fc740332c8e2 (patch)
tree5c6bfce8bacf04c2993a0029788a1370a483afa6 /include
parente7f2f9918c0e97aa98ba147ca387e2c7238f0711 (diff)
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers. Allows complete typing verification by declaring both tracing statement inline functions and probe registration/unregistration static inline functions within the same macro "DEFINE_TRACE". No format string is required. See the tracepoint Documentation and Samples patches for usage examples. Taken from the documentation patch : "A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or "off" (no probe is attached). When a tracepoint is "off" it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section). When a tracepoint is "on", the function you provide is called each time the tracepoint is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the tracepoint site). You can put tracepoints at important locations in the code. They are lightweight hooks that can pass an arbitrary number of parameters, which prototypes are described in a tracepoint declaration placed in a header file." Addition and removal of tracepoints is synchronized by RCU using the scheduler (and preempt_disable) as guarantees to find a quiescent state (this is really RCU "classic"). The update side uses rcu_barrier_sched() with call_rcu_sched() and the read/execute side uses "preempt_disable()/preempt_enable()". We make sure the previous array containing probes, which has been scheduled for deletion by the rcu callback, is indeed freed before we proceed to the next update. It therefore limits the rate of modification of a single tracepoint to one update per RCU period. The objective here is to permit fast batch add/removal of probes on _different_ tracepoints. Changelog : - Use #name ":" #proto as string to identify the tracepoint in the tracepoint table. This will make sure not type mismatch happens due to connexion of a probe with the wrong type to a tracepoint declared with the same name in a different header. - Add tracepoint_entry_free_old. - Change __TO_TRACE to get rid of the 'i' iterator. Masami Hiramatsu <mhiramat@redhat.com> : Tested on x86-64. Performance impact of a tracepoint : same as markers, except that it adds about 70 bytes of instructions in an unlikely branch of each instrumented function (the for loop, the stack setup and the function call). It currently adds a memory read, a test and a conditional branch at the instrumentation site (in the hot path). Immediate values will eventually change this into a load immediate, test and branch, which removes the memory read which will make the i-cache impact smaller (changing the memory read for a load immediate removes 3-4 bytes per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also saves the d-cache hit). About the performance impact of tracepoints (which is comparable to markers), even without immediate values optimizations, tests done by Hideo Aoki on ia64 show no regression. His test case was using hackbench on a kernel where scheduler instrumentation (about 5 events in code scheduler code) was added. Quoting Hideo Aoki about Markers : I evaluated overhead of kernel marker using linux-2.6-sched-fixes git tree, which includes several markers for LTTng, using an ia64 server. While the immediate trace mark feature isn't implemented on ia64, there is no major performance regression. So, I think that we don't have any issues to propose merging marker point patches into Linus's tree from the viewpoint of performance impact. I prepared two kernels to evaluate. The first one was compiled without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS. I downloaded the original hackbench from the following URL: http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c I ran hackbench 5 times in each condition and calculated the average and difference between the kernels. The parameter of hackbench: every 50 from 50 to 800 The number of CPUs of the server: 2, 4, and 8 Below is the results. As you can see, major performance regression wasn't found in any case. Even if number of processes increases, differences between marker-enabled kernel and marker- disabled kernel doesn't increase. Moreover, if number of CPUs increases, the differences doesn't increase either. Curiously, marker-enabled kernel is better than marker-disabled kernel in more than half cases, although I guess it comes from the difference of memory access pattern. * 2 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 4.811 | 4.872 | +0.061 | +1.27 | 100 | 9.854 | 10.309 | +0.454 | +4.61 | 150 | 15.602 | 15.040 | -0.562 | -3.6 | 200 | 20.489 | 20.380 | -0.109 | -0.53 | 250 | 25.798 | 25.652 | -0.146 | -0.56 | 300 | 31.260 | 30.797 | -0.463 | -1.48 | 350 | 36.121 | 35.770 | -0.351 | -0.97 | 400 | 42.288 | 42.102 | -0.186 | -0.44 | 450 | 47.778 | 47.253 | -0.526 | -1.1 | 500 | 51.953 | 52.278 | +0.325 | +0.63 | 550 | 58.401 | 57.700 | -0.701 | -1.2 | 600 | 63.334 | 63.222 | -0.112 | -0.18 | 650 | 68.816 | 68.511 | -0.306 | -0.44 | 700 | 74.667 | 74.088 | -0.579 | -0.78 | 750 | 78.612 | 79.582 | +0.970 | +1.23 | 800 | 85.431 | 85.263 | -0.168 | -0.2 | -------------------------------------------------------------- * 4 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.586 | 2.584 | -0.003 | -0.1 | 100 | 5.254 | 5.283 | +0.030 | +0.56 | 150 | 8.012 | 8.074 | +0.061 | +0.76 | 200 | 11.172 | 11.000 | -0.172 | -1.54 | 250 | 13.917 | 14.036 | +0.119 | +0.86 | 300 | 16.905 | 16.543 | -0.362 | -2.14 | 350 | 19.901 | 20.036 | +0.135 | +0.68 | 400 | 22.908 | 23.094 | +0.186 | +0.81 | 450 | 26.273 | 26.101 | -0.172 | -0.66 | 500 | 29.554 | 29.092 | -0.461 | -1.56 | 550 | 32.377 | 32.274 | -0.103 | -0.32 | 600 | 35.855 | 35.322 | -0.533 | -1.49 | 650 | 39.192 | 38.388 | -0.804 | -2.05 | 700 | 41.744 | 41.719 | -0.025 | -0.06 | 750 | 45.016 | 44.496 | -0.520 | -1.16 | 800 | 48.212 | 47.603 | -0.609 | -1.26 | -------------------------------------------------------------- * 8 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.094 | 2.072 | -0.022 | -1.07 | 100 | 4.162 | 4.273 | +0.111 | +2.66 | 150 | 6.485 | 6.540 | +0.055 | +0.84 | 200 | 8.556 | 8.478 | -0.078 | -0.91 | 250 | 10.458 | 10.258 | -0.200 | -1.91 | 300 | 12.425 | 12.750 | +0.325 | +2.62 | 350 | 14.807 | 14.839 | +0.032 | +0.22 | 400 | 16.801 | 16.959 | +0.158 | +0.94 | 450 | 19.478 | 19.009 | -0.470 | -2.41 | 500 | 21.296 | 21.504 | +0.208 | +0.98 | 550 | 23.842 | 23.979 | +0.137 | +0.57 | 600 | 26.309 | 26.111 | -0.198 | -0.75 | 650 | 28.705 | 28.446 | -0.259 | -0.9 | 700 | 31.233 | 31.394 | +0.161 | +0.52 | 750 | 34.064 | 33.720 | -0.344 | -1.01 | 800 | 36.320 | 36.114 | -0.206 | -0.57 | -------------------------------------------------------------- Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: 'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'include')
-rw-r--r--include/asm-generic/vmlinux.lds.h6
-rw-r--r--include/linux/module.h17
-rw-r--r--include/linux/tracepoint.h127
3 files changed, 149 insertions, 1 deletions
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 7440a0dceddb..3d8e472a09c8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -52,7 +52,10 @@
52 . = ALIGN(8); \ 52 . = ALIGN(8); \
53 VMLINUX_SYMBOL(__start___markers) = .; \ 53 VMLINUX_SYMBOL(__start___markers) = .; \
54 *(__markers) \ 54 *(__markers) \
55 VMLINUX_SYMBOL(__stop___markers) = .; 55 VMLINUX_SYMBOL(__stop___markers) = .; \
56 VMLINUX_SYMBOL(__start___tracepoints) = .; \
57 *(__tracepoints) \
58 VMLINUX_SYMBOL(__stop___tracepoints) = .;
56 59
57#define RO_DATA(align) \ 60#define RO_DATA(align) \
58 . = ALIGN((align)); \ 61 . = ALIGN((align)); \
@@ -61,6 +64,7 @@
61 *(.rodata) *(.rodata.*) \ 64 *(.rodata) *(.rodata.*) \
62 *(__vermagic) /* Kernel version magic */ \ 65 *(__vermagic) /* Kernel version magic */ \
63 *(__markers_strings) /* Markers: strings */ \ 66 *(__markers_strings) /* Markers: strings */ \
67 *(__tracepoints_strings)/* Tracepoints: strings */ \
64 } \ 68 } \
65 \ 69 \
66 .rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \ 70 .rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \
diff --git a/include/linux/module.h b/include/linux/module.h
index 68e09557c951..8b6113503863 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -16,6 +16,7 @@
16#include <linux/kobject.h> 16#include <linux/kobject.h>
17#include <linux/moduleparam.h> 17#include <linux/moduleparam.h>
18#include <linux/marker.h> 18#include <linux/marker.h>
19#include <linux/tracepoint.h>
19#include <asm/local.h> 20#include <asm/local.h>
20 21
21#include <asm/module.h> 22#include <asm/module.h>
@@ -331,6 +332,10 @@ struct module
331 struct marker *markers; 332 struct marker *markers;
332 unsigned int num_markers; 333 unsigned int num_markers;
333#endif 334#endif
335#ifdef CONFIG_TRACEPOINTS
336 struct tracepoint *tracepoints;
337 unsigned int num_tracepoints;
338#endif
334 339
335#ifdef CONFIG_MODULE_UNLOAD 340#ifdef CONFIG_MODULE_UNLOAD
336 /* What modules depend on me? */ 341 /* What modules depend on me? */
@@ -454,6 +459,9 @@ extern void print_modules(void);
454 459
455extern void module_update_markers(void); 460extern void module_update_markers(void);
456 461
462extern void module_update_tracepoints(void);
463extern int module_get_iter_tracepoints(struct tracepoint_iter *iter);
464
457#else /* !CONFIG_MODULES... */ 465#else /* !CONFIG_MODULES... */
458#define EXPORT_SYMBOL(sym) 466#define EXPORT_SYMBOL(sym)
459#define EXPORT_SYMBOL_GPL(sym) 467#define EXPORT_SYMBOL_GPL(sym)
@@ -558,6 +566,15 @@ static inline void module_update_markers(void)
558{ 566{
559} 567}
560 568
569static inline void module_update_tracepoints(void)
570{
571}
572
573static inline int module_get_iter_tracepoints(struct tracepoint_iter *iter)
574{
575 return 0;
576}
577
561#endif /* CONFIG_MODULES */ 578#endif /* CONFIG_MODULES */
562 579
563struct device_driver; 580struct device_driver;
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
new file mode 100644
index 000000000000..e623a6fca5c3
--- /dev/null
+++ b/include/linux/tracepoint.h
@@ -0,0 +1,127 @@
1#ifndef _LINUX_TRACEPOINT_H
2#define _LINUX_TRACEPOINT_H
3
4/*
5 * Kernel Tracepoint API.
6 *
7 * See Documentation/tracepoint.txt.
8 *
9 * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
10 *
11 * Heavily inspired from the Linux Kernel Markers.
12 *
13 * This file is released under the GPLv2.
14 * See the file COPYING for more details.
15 */
16
17#include <linux/types.h>
18#include <linux/rcupdate.h>
19
20struct module;
21struct tracepoint;
22
23struct tracepoint {
24 const char *name; /* Tracepoint name */
25 int state; /* State. */
26 void **funcs;
27} __attribute__((aligned(8)));
28
29
30#define TPPROTO(args...) args
31#define TPARGS(args...) args
32
33#ifdef CONFIG_TRACEPOINTS
34
35/*
36 * it_func[0] is never NULL because there is at least one element in the array
37 * when the array itself is non NULL.
38 */
39#define __DO_TRACE(tp, proto, args) \
40 do { \
41 void **it_func; \
42 \
43 rcu_read_lock_sched(); \
44 it_func = rcu_dereference((tp)->funcs); \
45 if (it_func) { \
46 do { \
47 ((void(*)(proto))(*it_func))(args); \
48 } while (*(++it_func)); \
49 } \
50 rcu_read_unlock_sched(); \
51 } while (0)
52
53/*
54 * Make sure the alignment of the structure in the __tracepoints section will
55 * not add unwanted padding between the beginning of the section and the
56 * structure. Force alignment to the same alignment as the section start.
57 */
58#define DEFINE_TRACE(name, proto, args) \
59 static inline void trace_##name(proto) \
60 { \
61 static const char __tpstrtab_##name[] \
62 __attribute__((section("__tracepoints_strings"))) \
63 = #name ":" #proto; \
64 static struct tracepoint __tracepoint_##name \
65 __attribute__((section("__tracepoints"), aligned(8))) = \
66 { __tpstrtab_##name, 0, NULL }; \
67 if (unlikely(__tracepoint_##name.state)) \
68 __DO_TRACE(&__tracepoint_##name, \
69 TPPROTO(proto), TPARGS(args)); \
70 } \
71 static inline int register_trace_##name(void (*probe)(proto)) \
72 { \
73 return tracepoint_probe_register(#name ":" #proto, \
74 (void *)probe); \
75 } \
76 static inline void unregister_trace_##name(void (*probe)(proto))\
77 { \
78 tracepoint_probe_unregister(#name ":" #proto, \
79 (void *)probe); \
80 }
81
82extern void tracepoint_update_probe_range(struct tracepoint *begin,
83 struct tracepoint *end);
84
85#else /* !CONFIG_TRACEPOINTS */
86#define DEFINE_TRACE(name, proto, args) \
87 static inline void _do_trace_##name(struct tracepoint *tp, proto) \
88 { } \
89 static inline void trace_##name(proto) \
90 { } \
91 static inline int register_trace_##name(void (*probe)(proto)) \
92 { \
93 return -ENOSYS; \
94 } \
95 static inline void unregister_trace_##name(void (*probe)(proto))\
96 { }
97
98static inline void tracepoint_update_probe_range(struct tracepoint *begin,
99 struct tracepoint *end)
100{ }
101#endif /* CONFIG_TRACEPOINTS */
102
103/*
104 * Connect a probe to a tracepoint.
105 * Internal API, should not be used directly.
106 */
107extern int tracepoint_probe_register(const char *name, void *probe);
108
109/*
110 * Disconnect a probe from a tracepoint.
111 * Internal API, should not be used directly.
112 */
113extern int tracepoint_probe_unregister(const char *name, void *probe);
114
115struct tracepoint_iter {
116 struct module *module;
117 struct tracepoint *tracepoint;
118};
119
120extern void tracepoint_iter_start(struct tracepoint_iter *iter);
121extern void tracepoint_iter_next(struct tracepoint_iter *iter);
122extern void tracepoint_iter_stop(struct tracepoint_iter *iter);
123extern void tracepoint_iter_reset(struct tracepoint_iter *iter);
124extern int tracepoint_get_iter_range(struct tracepoint **tracepoint,
125 struct tracepoint *begin, struct tracepoint *end);
126
127#endif