aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2010-08-06 12:30:52 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2010-08-06 12:30:52 -0400
commit4aed2fd8e3181fea7c09ba79cf64e7e3f4413bf9 (patch)
tree1f69733e5daab4915a76a41de0e4d1dc61e12cfb /Documentation
parent3a3527b6461b1298cc53ce72f336346739297ac8 (diff)
parentfc9ea5a1e53ee54f681e226d735008e2a6f8f470 (diff)
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits) tracing/kprobes: unregister_trace_probe needs to be called under mutex perf: expose event__process function perf events: Fix mmap offset determination perf, powerpc: fsl_emb: Restore setting perf_sample_data.period perf, powerpc: Convert the FSL driver to use local64_t perf tools: Don't keep unreferenced maps when unmaps are detected perf session: Invalidate last_match when removing threads from rb_tree perf session: Free the ref_reloc_sym memory at the right place x86,mmiotrace: Add support for tracing STOS instruction perf, sched migration: Librarize task states and event headers helpers perf, sched migration: Librarize the GUI class perf, sched migration: Make the GUI class client agnostic perf, sched migration: Make it vertically scrollable perf, sched migration: Parameterize cpu height and spacing perf, sched migration: Fix key bindings perf, sched migration: Ignore unhandled task states perf, sched migration: Handle ignored migrate out events perf: New migration tool overview tracing: Drop cpparg() macro perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call ... Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/debugfs-kmemtrace71
-rw-r--r--Documentation/kernel-parameters.txt2
-rw-r--r--Documentation/trace/ftrace-design.txt153
-rw-r--r--Documentation/trace/kmemtrace.txt126
-rw-r--r--Documentation/trace/kprobetrace.txt2
5 files changed, 151 insertions, 203 deletions
diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace
deleted file mode 100644
index 5e6a92a02d85..000000000000
--- a/Documentation/ABI/testing/debugfs-kmemtrace
+++ /dev/null
@@ -1,71 +0,0 @@
1What: /sys/kernel/debug/kmemtrace/
2Date: July 2008
3Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
4Description:
5
6In kmemtrace-enabled kernels, the following files are created:
7
8/sys/kernel/debug/kmemtrace/
9 cpu<n> (0400) Per-CPU tracing data, see below. (binary)
10 total_overruns (0400) Total number of bytes which were dropped from
11 cpu<n> files because of full buffer condition,
12 non-binary. (text)
13 abi_version (0400) Kernel's kmemtrace ABI version. (text)
14
15Each per-CPU file should be read according to the relay interface. That is,
16the reader should set affinity to that specific CPU and, as currently done by
17the userspace application (though there are other methods), use poll() with
18an infinite timeout before every read(). Otherwise, erroneous data may be
19read. The binary data has the following _core_ format:
20
21 Event ID (1 byte) Unsigned integer, one of:
22 0 - represents an allocation (KMEMTRACE_EVENT_ALLOC)
23 1 - represents a freeing of previously allocated memory
24 (KMEMTRACE_EVENT_FREE)
25 Type ID (1 byte) Unsigned integer, one of:
26 0 - this is a kmalloc() / kfree()
27 1 - this is a kmem_cache_alloc() / kmem_cache_free()
28 2 - this is a __get_free_pages() et al.
29 Event size (2 bytes) Unsigned integer representing the
30 size of this event. Used to extend
31 kmemtrace. Discard the bytes you
32 don't know about.
33 Sequence number (4 bytes) Signed integer used to reorder data
34 logged on SMP machines. Wraparound
35 must be taken into account, although
36 it is unlikely.
37 Caller address (8 bytes) Return address to the caller.
38 Pointer to mem (8 bytes) Pointer to target memory area. Can be
39 NULL, but not all such calls might be
40 recorded.
41
42In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow:
43
44 Requested bytes (8 bytes) Total number of requested bytes,
45 unsigned, must not be zero.
46 Allocated bytes (8 bytes) Total number of actually allocated
47 bytes, unsigned, must not be lower
48 than requested bytes.
49 Requested flags (4 bytes) GFP flags supplied by the caller.
50 Target CPU (4 bytes) Signed integer, valid for event id 1.
51 If equal to -1, target CPU is the same
52 as origin CPU, but the reverse might
53 not be true.
54
55The data is made available in the same endianness the machine has.
56
57Other event ids and type ids may be defined and added. Other fields may be
58added by increasing event size, but see below for details.
59Every modification to the ABI, including new id definitions, are followed
60by bumping the ABI version by one.
61
62Adding new data to the packet (features) is done at the end of the mandatory
63data:
64 Feature size (2 byte)
65 Feature ID (1 byte)
66 Feature data (Feature size - 3 bytes)
67
68
69Users:
70 kmemtrace-user - git://repo.or.cz/kmemtrace-user.git
71
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index f72ba727441f..f20c7abc0329 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1816,6 +1816,8 @@ and is between 256 and 4096 characters. It is defined in the file
1816 1816
1817 nousb [USB] Disable the USB subsystem 1817 nousb [USB] Disable the USB subsystem
1818 1818
1819 nowatchdog [KNL] Disable the lockup detector.
1820
1819 nowb [ARM] 1821 nowb [ARM]
1820 1822
1821 nox2apic [X86-64,APIC] Do not enable x2APIC mode. 1823 nox2apic [X86-64,APIC] Do not enable x2APIC mode.
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index f1f81afee8a0..dc52bd442c92 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -13,6 +13,9 @@ Note that this focuses on architecture implementation details only. If you
13want more explanation of a feature in terms of common code, review the common 13want more explanation of a feature in terms of common code, review the common
14ftrace.txt file. 14ftrace.txt file.
15 15
16Ideally, everyone who wishes to retain performance while supporting tracing in
17their kernel should make it all the way to dynamic ftrace support.
18
16 19
17Prerequisites 20Prerequisites
18------------- 21-------------
@@ -215,7 +218,7 @@ An arch may pass in a unique value (frame pointer) to both the entering and
215exiting of a function. On exit, the value is compared and if it does not 218exiting of a function. On exit, the value is compared and if it does not
216match, then it will panic the kernel. This is largely a sanity check for bad 219match, then it will panic the kernel. This is largely a sanity check for bad
217code generation with gcc. If gcc for your port sanely updates the frame 220code generation with gcc. If gcc for your port sanely updates the frame
218pointer under different opitmization levels, then ignore this option. 221pointer under different optimization levels, then ignore this option.
219 222
220However, adding support for it isn't terribly difficult. In your assembly code 223However, adding support for it isn't terribly difficult. In your assembly code
221that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument. 224that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument.
@@ -234,7 +237,7 @@ If you can't trace NMI functions, then skip this option.
234 237
235 238
236HAVE_SYSCALL_TRACEPOINTS 239HAVE_SYSCALL_TRACEPOINTS
237--------------------- 240------------------------
238 241
239You need very few things to get the syscalls tracing in an arch. 242You need very few things to get the syscalls tracing in an arch.
240 243
@@ -250,12 +253,152 @@ You need very few things to get the syscalls tracing in an arch.
250HAVE_FTRACE_MCOUNT_RECORD 253HAVE_FTRACE_MCOUNT_RECORD
251------------------------- 254-------------------------
252 255
253See scripts/recordmcount.pl for more info. 256See scripts/recordmcount.pl for more info. Just fill in the arch-specific
257details for how to locate the addresses of mcount call sites via objdump.
258This option doesn't make much sense without also implementing dynamic ftrace.
254 259
260
261HAVE_DYNAMIC_FTRACE
262-------------------
263
264You will first need HAVE_FTRACE_MCOUNT_RECORD and HAVE_FUNCTION_TRACER, so
265scroll your reader back up if you got over eager.
266
267Once those are out of the way, you will need to implement:
268 - asm/ftrace.h:
269 - MCOUNT_ADDR
270 - ftrace_call_adjust()
271 - struct dyn_arch_ftrace{}
272 - asm code:
273 - mcount() (new stub)
274 - ftrace_caller()
275 - ftrace_call()
276 - ftrace_stub()
277 - C code:
278 - ftrace_dyn_arch_init()
279 - ftrace_make_nop()
280 - ftrace_make_call()
281 - ftrace_update_ftrace_func()
282
283First you will need to fill out some arch details in your asm/ftrace.h.
284
285Define MCOUNT_ADDR as the address of your mcount symbol similar to:
286 #define MCOUNT_ADDR ((unsigned long)mcount)
287Since no one else will have a decl for that function, you will need to:
288 extern void mcount(void);
289
290You will also need the helper function ftrace_call_adjust(). Most people
291will be able to stub it out like so:
292 static inline unsigned long ftrace_call_adjust(unsigned long addr)
293 {
294 return addr;
295 }
255<details to be filled> 296<details to be filled>
256 297
298Lastly you will need the custom dyn_arch_ftrace structure. If you need
299some extra state when runtime patching arbitrary call sites, this is the
300place. For now though, create an empty struct:
301 struct dyn_arch_ftrace {
302 /* No extra data needed */
303 };
304
305With the header out of the way, we can fill out the assembly code. While we
306did already create a mcount() function earlier, dynamic ftrace only wants a
307stub function. This is because the mcount() will only be used during boot
308and then all references to it will be patched out never to return. Instead,
309the guts of the old mcount() will be used to create a new ftrace_caller()
310function. Because the two are hard to merge, it will most likely be a lot
311easier to have two separate definitions split up by #ifdefs. Same goes for
312the ftrace_stub() as that will now be inlined in ftrace_caller().
313
314Before we get confused anymore, let's check out some pseudo code so you can
315implement your own stuff in assembly:
257 316
258HAVE_DYNAMIC_FTRACE 317void mcount(void)
259--------------------- 318{
319 return;
320}
321
322void ftrace_caller(void)
323{
324 /* implement HAVE_FUNCTION_TRACE_MCOUNT_TEST if you desire */
325
326 /* save all state needed by the ABI (see paragraph above) */
327
328 unsigned long frompc = ...;
329 unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;
330
331ftrace_call:
332 ftrace_stub(frompc, selfpc);
333
334 /* restore all state needed by the ABI */
335
336ftrace_stub:
337 return;
338}
339
340This might look a little odd at first, but keep in mind that we will be runtime
341patching multiple things. First, only functions that we actually want to trace
342will be patched to call ftrace_caller(). Second, since we only have one tracer
343active at a time, we will patch the ftrace_caller() function itself to call the
344specific tracer in question. That is the point of the ftrace_call label.
345
346With that in mind, let's move on to the C code that will actually be doing the
347runtime patching. You'll need a little knowledge of your arch's opcodes in
348order to make it through the next section.
349
350Every arch has an init callback function. If you need to do something early on
351to initialize some state, this is the time to do that. Otherwise, this simple
352function below should be sufficient for most people:
353
354int __init ftrace_dyn_arch_init(void *data)
355{
356 /* return value is done indirectly via data */
357 *(unsigned long *)data = 0;
358
359 return 0;
360}
361
362There are two functions that are used to do runtime patching of arbitrary
363functions. The first is used to turn the mcount call site into a nop (which
364is what helps us retain runtime performance when not tracing). The second is
365used to turn the mcount call site into a call to an arbitrary location (but
366typically that is ftracer_caller()). See the general function definition in
367linux/ftrace.h for the functions:
368 ftrace_make_nop()
369 ftrace_make_call()
370The rec->ip value is the address of the mcount call site that was collected
371by the scripts/recordmcount.pl during build time.
372
373The last function is used to do runtime patching of the active tracer. This
374will be modifying the assembly code at the location of the ftrace_call symbol
375inside of the ftrace_caller() function. So you should have sufficient padding
376at that location to support the new function calls you'll be inserting. Some
377people will be using a "call" type instruction while others will be using a
378"branch" type instruction. Specifically, the function is:
379 ftrace_update_ftrace_func()
380
381
382HAVE_DYNAMIC_FTRACE + HAVE_FUNCTION_GRAPH_TRACER
383------------------------------------------------
384
385The function grapher needs a few tweaks in order to work with dynamic ftrace.
386Basically, you will need to:
387 - update:
388 - ftrace_caller()
389 - ftrace_graph_call()
390 - ftrace_graph_caller()
391 - implement:
392 - ftrace_enable_ftrace_graph_caller()
393 - ftrace_disable_ftrace_graph_caller()
260 394
261<details to be filled> 395<details to be filled>
396Quick notes:
397 - add a nop stub after the ftrace_call location named ftrace_graph_call;
398 stub needs to be large enough to support a call to ftrace_graph_caller()
399 - update ftrace_graph_caller() to work with being called by the new
400 ftrace_caller() since some semantics may have changed
401 - ftrace_enable_ftrace_graph_caller() will runtime patch the
402 ftrace_graph_call location with a call to ftrace_graph_caller()
403 - ftrace_disable_ftrace_graph_caller() will runtime patch the
404 ftrace_graph_call location with nops
diff --git a/Documentation/trace/kmemtrace.txt b/Documentation/trace/kmemtrace.txt
deleted file mode 100644
index 6308735e58ca..000000000000
--- a/Documentation/trace/kmemtrace.txt
+++ /dev/null
@@ -1,126 +0,0 @@
1 kmemtrace - Kernel Memory Tracer
2
3 by Eduard - Gabriel Munteanu
4 <eduard.munteanu@linux360.ro>
5
6I. Introduction
7===============
8
9kmemtrace helps kernel developers figure out two things:
101) how different allocators (SLAB, SLUB etc.) perform
112) how kernel code allocates memory and how much
12
13To do this, we trace every allocation and export information to the userspace
14through the relay interface. We export things such as the number of requested
15bytes, the number of bytes actually allocated (i.e. including internal
16fragmentation), whether this is a slab allocation or a plain kmalloc() and so
17on.
18
19The actual analysis is performed by a userspace tool (see section III for
20details on where to get it from). It logs the data exported by the kernel,
21processes it and (as of writing this) can provide the following information:
22- the total amount of memory allocated and fragmentation per call-site
23- the amount of memory allocated and fragmentation per allocation
24- total memory allocated and fragmentation in the collected dataset
25- number of cross-CPU allocation and frees (makes sense in NUMA environments)
26
27Moreover, it can potentially find inconsistent and erroneous behavior in
28kernel code, such as using slab free functions on kmalloc'ed memory or
29allocating less memory than requested (but not truly failed allocations).
30
31kmemtrace also makes provisions for tracing on some arch and analysing the
32data on another.
33
34II. Design and goals
35====================
36
37kmemtrace was designed to handle rather large amounts of data. Thus, it uses
38the relay interface to export whatever is logged to userspace, which then
39stores it. Analysis and reporting is done asynchronously, that is, after the
40data is collected and stored. By design, it allows one to log and analyse
41on different machines and different arches.
42
43As of writing this, the ABI is not considered stable, though it might not
44change much. However, no guarantees are made about compatibility yet. When
45deemed stable, the ABI should still allow easy extension while maintaining
46backward compatibility. This is described further in Documentation/ABI.
47
48Summary of design goals:
49 - allow logging and analysis to be done across different machines
50 - be fast and anticipate usage in high-load environments (*)
51 - be reasonably extensible
52 - make it possible for GNU/Linux distributions to have kmemtrace
53 included in their repositories
54
55(*) - one of the reasons Pekka Enberg's original userspace data analysis
56 tool's code was rewritten from Perl to C (although this is more than a
57 simple conversion)
58
59
60III. Quick usage guide
61======================
62
631) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
64CONFIG_KMEMTRACE).
65
662) Get the userspace tool and build it:
67$ git clone git://repo.or.cz/kmemtrace-user.git # current repository
68$ cd kmemtrace-user/
69$ ./autogen.sh
70$ ./configure
71$ make
72
733) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
74'single' runlevel (so that relay buffers don't fill up easily), and run
75kmemtrace:
76# '$' does not mean user, but root here.
77$ mount -t debugfs none /sys/kernel/debug
78$ mount -t proc none /proc
79$ cd path/to/kmemtrace-user/
80$ ./kmemtraced
81Wait a bit, then stop it with CTRL+C.
82$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
83 # overrun, should
84 # be zero.
85$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
86 check its correctness]
87$ ./kmemtrace-report
88
89Now you should have a nice and short summary of how the allocator performs.
90
91IV. FAQ and known issues
92========================
93
94Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
95this? Should I worry?
96A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
97large the number is. You can fix it by supplying a higher
98'kmemtrace.subbufs=N' kernel parameter.
99---
100
101Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
102A: This is a bug and should be reported. It can occur for a variety of
103reasons:
104 - possible bugs in relay code
105 - possible misuse of relay by kmemtrace
106 - timestamps being collected unorderly
107Or you may fix it yourself and send us a patch.
108---
109
110Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
111A: This is a known issue and I'm working on it. These might be true errors
112in kernel code, which may have inconsistent behavior (e.g. allocating memory
113with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
114out this behavior may work with SLAB, but may fail with other allocators.
115
116It may also be due to lack of tracing in some unusual allocator functions.
117
118We don't want bug reports regarding this issue yet.
119---
120
121V. See also
122===========
123
124Documentation/kernel-parameters.txt
125Documentation/ABI/testing/debugfs-kmemtrace
126
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index ec94748ae65b..5f77d94598dd 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -42,7 +42,7 @@ Synopsis of kprobe_events
42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) 42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
43 NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 43 NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
44 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 44 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
45 (u8/u16/u32/u64/s8/s16/s32/s64) are supported. 45 (u8/u16/u32/u64/s8/s16/s32/s64) and string are supported.
46 46
47 (*) only for return probe. 47 (*) only for return probe.
48 (**) this is useful for fetching a field of data structures. 48 (**) this is useful for fetching a field of data structures.