diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/debugfs-kmemtrace | 71 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 2 | ||||
-rw-r--r-- | Documentation/trace/ftrace-design.txt | 153 | ||||
-rw-r--r-- | Documentation/trace/kmemtrace.txt | 126 | ||||
-rw-r--r-- | Documentation/trace/kprobetrace.txt | 2 |
5 files changed, 151 insertions, 203 deletions
diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace deleted file mode 100644 index 5e6a92a02d85..000000000000 --- a/Documentation/ABI/testing/debugfs-kmemtrace +++ /dev/null | |||
@@ -1,71 +0,0 @@ | |||
1 | What: /sys/kernel/debug/kmemtrace/ | ||
2 | Date: July 2008 | ||
3 | Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> | ||
4 | Description: | ||
5 | |||
6 | In kmemtrace-enabled kernels, the following files are created: | ||
7 | |||
8 | /sys/kernel/debug/kmemtrace/ | ||
9 | cpu<n> (0400) Per-CPU tracing data, see below. (binary) | ||
10 | total_overruns (0400) Total number of bytes which were dropped from | ||
11 | cpu<n> files because of full buffer condition, | ||
12 | non-binary. (text) | ||
13 | abi_version (0400) Kernel's kmemtrace ABI version. (text) | ||
14 | |||
15 | Each per-CPU file should be read according to the relay interface. That is, | ||
16 | the reader should set affinity to that specific CPU and, as currently done by | ||
17 | the userspace application (though there are other methods), use poll() with | ||
18 | an infinite timeout before every read(). Otherwise, erroneous data may be | ||
19 | read. The binary data has the following _core_ format: | ||
20 | |||
21 | Event ID (1 byte) Unsigned integer, one of: | ||
22 | 0 - represents an allocation (KMEMTRACE_EVENT_ALLOC) | ||
23 | 1 - represents a freeing of previously allocated memory | ||
24 | (KMEMTRACE_EVENT_FREE) | ||
25 | Type ID (1 byte) Unsigned integer, one of: | ||
26 | 0 - this is a kmalloc() / kfree() | ||
27 | 1 - this is a kmem_cache_alloc() / kmem_cache_free() | ||
28 | 2 - this is a __get_free_pages() et al. | ||
29 | Event size (2 bytes) Unsigned integer representing the | ||
30 | size of this event. Used to extend | ||
31 | kmemtrace. Discard the bytes you | ||
32 | don't know about. | ||
33 | Sequence number (4 bytes) Signed integer used to reorder data | ||
34 | logged on SMP machines. Wraparound | ||
35 | must be taken into account, although | ||
36 | it is unlikely. | ||
37 | Caller address (8 bytes) Return address to the caller. | ||
38 | Pointer to mem (8 bytes) Pointer to target memory area. Can be | ||
39 | NULL, but not all such calls might be | ||
40 | recorded. | ||
41 | |||
42 | In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow: | ||
43 | |||
44 | Requested bytes (8 bytes) Total number of requested bytes, | ||
45 | unsigned, must not be zero. | ||
46 | Allocated bytes (8 bytes) Total number of actually allocated | ||
47 | bytes, unsigned, must not be lower | ||
48 | than requested bytes. | ||
49 | Requested flags (4 bytes) GFP flags supplied by the caller. | ||
50 | Target CPU (4 bytes) Signed integer, valid for event id 1. | ||
51 | If equal to -1, target CPU is the same | ||
52 | as origin CPU, but the reverse might | ||
53 | not be true. | ||
54 | |||
55 | The data is made available in the same endianness the machine has. | ||
56 | |||
57 | Other event ids and type ids may be defined and added. Other fields may be | ||
58 | added by increasing event size, but see below for details. | ||
59 | Every modification to the ABI, including new id definitions, are followed | ||
60 | by bumping the ABI version by one. | ||
61 | |||
62 | Adding new data to the packet (features) is done at the end of the mandatory | ||
63 | data: | ||
64 | Feature size (2 byte) | ||
65 | Feature ID (1 byte) | ||
66 | Feature data (Feature size - 3 bytes) | ||
67 | |||
68 | |||
69 | Users: | ||
70 | kmemtrace-user - git://repo.or.cz/kmemtrace-user.git | ||
71 | |||
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index f72ba727441f..f20c7abc0329 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -1816,6 +1816,8 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1816 | 1816 | ||
1817 | nousb [USB] Disable the USB subsystem | 1817 | nousb [USB] Disable the USB subsystem |
1818 | 1818 | ||
1819 | nowatchdog [KNL] Disable the lockup detector. | ||
1820 | |||
1819 | nowb [ARM] | 1821 | nowb [ARM] |
1820 | 1822 | ||
1821 | nox2apic [X86-64,APIC] Do not enable x2APIC mode. | 1823 | nox2apic [X86-64,APIC] Do not enable x2APIC mode. |
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index f1f81afee8a0..dc52bd442c92 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt | |||
@@ -13,6 +13,9 @@ Note that this focuses on architecture implementation details only. If you | |||
13 | want more explanation of a feature in terms of common code, review the common | 13 | want more explanation of a feature in terms of common code, review the common |
14 | ftrace.txt file. | 14 | ftrace.txt file. |
15 | 15 | ||
16 | Ideally, everyone who wishes to retain performance while supporting tracing in | ||
17 | their kernel should make it all the way to dynamic ftrace support. | ||
18 | |||
16 | 19 | ||
17 | Prerequisites | 20 | Prerequisites |
18 | ------------- | 21 | ------------- |
@@ -215,7 +218,7 @@ An arch may pass in a unique value (frame pointer) to both the entering and | |||
215 | exiting of a function. On exit, the value is compared and if it does not | 218 | exiting of a function. On exit, the value is compared and if it does not |
216 | match, then it will panic the kernel. This is largely a sanity check for bad | 219 | match, then it will panic the kernel. This is largely a sanity check for bad |
217 | code generation with gcc. If gcc for your port sanely updates the frame | 220 | code generation with gcc. If gcc for your port sanely updates the frame |
218 | pointer under different opitmization levels, then ignore this option. | 221 | pointer under different optimization levels, then ignore this option. |
219 | 222 | ||
220 | However, adding support for it isn't terribly difficult. In your assembly code | 223 | However, adding support for it isn't terribly difficult. In your assembly code |
221 | that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument. | 224 | that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument. |
@@ -234,7 +237,7 @@ If you can't trace NMI functions, then skip this option. | |||
234 | 237 | ||
235 | 238 | ||
236 | HAVE_SYSCALL_TRACEPOINTS | 239 | HAVE_SYSCALL_TRACEPOINTS |
237 | --------------------- | 240 | ------------------------ |
238 | 241 | ||
239 | You need very few things to get the syscalls tracing in an arch. | 242 | You need very few things to get the syscalls tracing in an arch. |
240 | 243 | ||
@@ -250,12 +253,152 @@ You need very few things to get the syscalls tracing in an arch. | |||
250 | HAVE_FTRACE_MCOUNT_RECORD | 253 | HAVE_FTRACE_MCOUNT_RECORD |
251 | ------------------------- | 254 | ------------------------- |
252 | 255 | ||
253 | See scripts/recordmcount.pl for more info. | 256 | See scripts/recordmcount.pl for more info. Just fill in the arch-specific |
257 | details for how to locate the addresses of mcount call sites via objdump. | ||
258 | This option doesn't make much sense without also implementing dynamic ftrace. | ||
254 | 259 | ||
260 | |||
261 | HAVE_DYNAMIC_FTRACE | ||
262 | ------------------- | ||
263 | |||
264 | You will first need HAVE_FTRACE_MCOUNT_RECORD and HAVE_FUNCTION_TRACER, so | ||
265 | scroll your reader back up if you got over eager. | ||
266 | |||
267 | Once those are out of the way, you will need to implement: | ||
268 | - asm/ftrace.h: | ||
269 | - MCOUNT_ADDR | ||
270 | - ftrace_call_adjust() | ||
271 | - struct dyn_arch_ftrace{} | ||
272 | - asm code: | ||
273 | - mcount() (new stub) | ||
274 | - ftrace_caller() | ||
275 | - ftrace_call() | ||
276 | - ftrace_stub() | ||
277 | - C code: | ||
278 | - ftrace_dyn_arch_init() | ||
279 | - ftrace_make_nop() | ||
280 | - ftrace_make_call() | ||
281 | - ftrace_update_ftrace_func() | ||
282 | |||
283 | First you will need to fill out some arch details in your asm/ftrace.h. | ||
284 | |||
285 | Define MCOUNT_ADDR as the address of your mcount symbol similar to: | ||
286 | #define MCOUNT_ADDR ((unsigned long)mcount) | ||
287 | Since no one else will have a decl for that function, you will need to: | ||
288 | extern void mcount(void); | ||
289 | |||
290 | You will also need the helper function ftrace_call_adjust(). Most people | ||
291 | will be able to stub it out like so: | ||
292 | static inline unsigned long ftrace_call_adjust(unsigned long addr) | ||
293 | { | ||
294 | return addr; | ||
295 | } | ||
255 | <details to be filled> | 296 | <details to be filled> |
256 | 297 | ||
298 | Lastly you will need the custom dyn_arch_ftrace structure. If you need | ||
299 | some extra state when runtime patching arbitrary call sites, this is the | ||
300 | place. For now though, create an empty struct: | ||
301 | struct dyn_arch_ftrace { | ||
302 | /* No extra data needed */ | ||
303 | }; | ||
304 | |||
305 | With the header out of the way, we can fill out the assembly code. While we | ||
306 | did already create a mcount() function earlier, dynamic ftrace only wants a | ||
307 | stub function. This is because the mcount() will only be used during boot | ||
308 | and then all references to it will be patched out never to return. Instead, | ||
309 | the guts of the old mcount() will be used to create a new ftrace_caller() | ||
310 | function. Because the two are hard to merge, it will most likely be a lot | ||
311 | easier to have two separate definitions split up by #ifdefs. Same goes for | ||
312 | the ftrace_stub() as that will now be inlined in ftrace_caller(). | ||
313 | |||
314 | Before we get confused anymore, let's check out some pseudo code so you can | ||
315 | implement your own stuff in assembly: | ||
257 | 316 | ||
258 | HAVE_DYNAMIC_FTRACE | 317 | void mcount(void) |
259 | --------------------- | 318 | { |
319 | return; | ||
320 | } | ||
321 | |||
322 | void ftrace_caller(void) | ||
323 | { | ||
324 | /* implement HAVE_FUNCTION_TRACE_MCOUNT_TEST if you desire */ | ||
325 | |||
326 | /* save all state needed by the ABI (see paragraph above) */ | ||
327 | |||
328 | unsigned long frompc = ...; | ||
329 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | ||
330 | |||
331 | ftrace_call: | ||
332 | ftrace_stub(frompc, selfpc); | ||
333 | |||
334 | /* restore all state needed by the ABI */ | ||
335 | |||
336 | ftrace_stub: | ||
337 | return; | ||
338 | } | ||
339 | |||
340 | This might look a little odd at first, but keep in mind that we will be runtime | ||
341 | patching multiple things. First, only functions that we actually want to trace | ||
342 | will be patched to call ftrace_caller(). Second, since we only have one tracer | ||
343 | active at a time, we will patch the ftrace_caller() function itself to call the | ||
344 | specific tracer in question. That is the point of the ftrace_call label. | ||
345 | |||
346 | With that in mind, let's move on to the C code that will actually be doing the | ||
347 | runtime patching. You'll need a little knowledge of your arch's opcodes in | ||
348 | order to make it through the next section. | ||
349 | |||
350 | Every arch has an init callback function. If you need to do something early on | ||
351 | to initialize some state, this is the time to do that. Otherwise, this simple | ||
352 | function below should be sufficient for most people: | ||
353 | |||
354 | int __init ftrace_dyn_arch_init(void *data) | ||
355 | { | ||
356 | /* return value is done indirectly via data */ | ||
357 | *(unsigned long *)data = 0; | ||
358 | |||
359 | return 0; | ||
360 | } | ||
361 | |||
362 | There are two functions that are used to do runtime patching of arbitrary | ||
363 | functions. The first is used to turn the mcount call site into a nop (which | ||
364 | is what helps us retain runtime performance when not tracing). The second is | ||
365 | used to turn the mcount call site into a call to an arbitrary location (but | ||
366 | typically that is ftracer_caller()). See the general function definition in | ||
367 | linux/ftrace.h for the functions: | ||
368 | ftrace_make_nop() | ||
369 | ftrace_make_call() | ||
370 | The rec->ip value is the address of the mcount call site that was collected | ||
371 | by the scripts/recordmcount.pl during build time. | ||
372 | |||
373 | The last function is used to do runtime patching of the active tracer. This | ||
374 | will be modifying the assembly code at the location of the ftrace_call symbol | ||
375 | inside of the ftrace_caller() function. So you should have sufficient padding | ||
376 | at that location to support the new function calls you'll be inserting. Some | ||
377 | people will be using a "call" type instruction while others will be using a | ||
378 | "branch" type instruction. Specifically, the function is: | ||
379 | ftrace_update_ftrace_func() | ||
380 | |||
381 | |||
382 | HAVE_DYNAMIC_FTRACE + HAVE_FUNCTION_GRAPH_TRACER | ||
383 | ------------------------------------------------ | ||
384 | |||
385 | The function grapher needs a few tweaks in order to work with dynamic ftrace. | ||
386 | Basically, you will need to: | ||
387 | - update: | ||
388 | - ftrace_caller() | ||
389 | - ftrace_graph_call() | ||
390 | - ftrace_graph_caller() | ||
391 | - implement: | ||
392 | - ftrace_enable_ftrace_graph_caller() | ||
393 | - ftrace_disable_ftrace_graph_caller() | ||
260 | 394 | ||
261 | <details to be filled> | 395 | <details to be filled> |
396 | Quick notes: | ||
397 | - add a nop stub after the ftrace_call location named ftrace_graph_call; | ||
398 | stub needs to be large enough to support a call to ftrace_graph_caller() | ||
399 | - update ftrace_graph_caller() to work with being called by the new | ||
400 | ftrace_caller() since some semantics may have changed | ||
401 | - ftrace_enable_ftrace_graph_caller() will runtime patch the | ||
402 | ftrace_graph_call location with a call to ftrace_graph_caller() | ||
403 | - ftrace_disable_ftrace_graph_caller() will runtime patch the | ||
404 | ftrace_graph_call location with nops | ||
diff --git a/Documentation/trace/kmemtrace.txt b/Documentation/trace/kmemtrace.txt deleted file mode 100644 index 6308735e58ca..000000000000 --- a/Documentation/trace/kmemtrace.txt +++ /dev/null | |||
@@ -1,126 +0,0 @@ | |||
1 | kmemtrace - Kernel Memory Tracer | ||
2 | |||
3 | by Eduard - Gabriel Munteanu | ||
4 | <eduard.munteanu@linux360.ro> | ||
5 | |||
6 | I. Introduction | ||
7 | =============== | ||
8 | |||
9 | kmemtrace helps kernel developers figure out two things: | ||
10 | 1) how different allocators (SLAB, SLUB etc.) perform | ||
11 | 2) how kernel code allocates memory and how much | ||
12 | |||
13 | To do this, we trace every allocation and export information to the userspace | ||
14 | through the relay interface. We export things such as the number of requested | ||
15 | bytes, the number of bytes actually allocated (i.e. including internal | ||
16 | fragmentation), whether this is a slab allocation or a plain kmalloc() and so | ||
17 | on. | ||
18 | |||
19 | The actual analysis is performed by a userspace tool (see section III for | ||
20 | details on where to get it from). It logs the data exported by the kernel, | ||
21 | processes it and (as of writing this) can provide the following information: | ||
22 | - the total amount of memory allocated and fragmentation per call-site | ||
23 | - the amount of memory allocated and fragmentation per allocation | ||
24 | - total memory allocated and fragmentation in the collected dataset | ||
25 | - number of cross-CPU allocation and frees (makes sense in NUMA environments) | ||
26 | |||
27 | Moreover, it can potentially find inconsistent and erroneous behavior in | ||
28 | kernel code, such as using slab free functions on kmalloc'ed memory or | ||
29 | allocating less memory than requested (but not truly failed allocations). | ||
30 | |||
31 | kmemtrace also makes provisions for tracing on some arch and analysing the | ||
32 | data on another. | ||
33 | |||
34 | II. Design and goals | ||
35 | ==================== | ||
36 | |||
37 | kmemtrace was designed to handle rather large amounts of data. Thus, it uses | ||
38 | the relay interface to export whatever is logged to userspace, which then | ||
39 | stores it. Analysis and reporting is done asynchronously, that is, after the | ||
40 | data is collected and stored. By design, it allows one to log and analyse | ||
41 | on different machines and different arches. | ||
42 | |||
43 | As of writing this, the ABI is not considered stable, though it might not | ||
44 | change much. However, no guarantees are made about compatibility yet. When | ||
45 | deemed stable, the ABI should still allow easy extension while maintaining | ||
46 | backward compatibility. This is described further in Documentation/ABI. | ||
47 | |||
48 | Summary of design goals: | ||
49 | - allow logging and analysis to be done across different machines | ||
50 | - be fast and anticipate usage in high-load environments (*) | ||
51 | - be reasonably extensible | ||
52 | - make it possible for GNU/Linux distributions to have kmemtrace | ||
53 | included in their repositories | ||
54 | |||
55 | (*) - one of the reasons Pekka Enberg's original userspace data analysis | ||
56 | tool's code was rewritten from Perl to C (although this is more than a | ||
57 | simple conversion) | ||
58 | |||
59 | |||
60 | III. Quick usage guide | ||
61 | ====================== | ||
62 | |||
63 | 1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable | ||
64 | CONFIG_KMEMTRACE). | ||
65 | |||
66 | 2) Get the userspace tool and build it: | ||
67 | $ git clone git://repo.or.cz/kmemtrace-user.git # current repository | ||
68 | $ cd kmemtrace-user/ | ||
69 | $ ./autogen.sh | ||
70 | $ ./configure | ||
71 | $ make | ||
72 | |||
73 | 3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the | ||
74 | 'single' runlevel (so that relay buffers don't fill up easily), and run | ||
75 | kmemtrace: | ||
76 | # '$' does not mean user, but root here. | ||
77 | $ mount -t debugfs none /sys/kernel/debug | ||
78 | $ mount -t proc none /proc | ||
79 | $ cd path/to/kmemtrace-user/ | ||
80 | $ ./kmemtraced | ||
81 | Wait a bit, then stop it with CTRL+C. | ||
82 | $ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't | ||
83 | # overrun, should | ||
84 | # be zero. | ||
85 | $ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to | ||
86 | check its correctness] | ||
87 | $ ./kmemtrace-report | ||
88 | |||
89 | Now you should have a nice and short summary of how the allocator performs. | ||
90 | |||
91 | IV. FAQ and known issues | ||
92 | ======================== | ||
93 | |||
94 | Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix | ||
95 | this? Should I worry? | ||
96 | A: If it's non-zero, this affects kmemtrace's accuracy, depending on how | ||
97 | large the number is. You can fix it by supplying a higher | ||
98 | 'kmemtrace.subbufs=N' kernel parameter. | ||
99 | --- | ||
100 | |||
101 | Q: kmemtrace_check reports errors, how do I fix this? Should I worry? | ||
102 | A: This is a bug and should be reported. It can occur for a variety of | ||
103 | reasons: | ||
104 | - possible bugs in relay code | ||
105 | - possible misuse of relay by kmemtrace | ||
106 | - timestamps being collected unorderly | ||
107 | Or you may fix it yourself and send us a patch. | ||
108 | --- | ||
109 | |||
110 | Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? | ||
111 | A: This is a known issue and I'm working on it. These might be true errors | ||
112 | in kernel code, which may have inconsistent behavior (e.g. allocating memory | ||
113 | with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed | ||
114 | out this behavior may work with SLAB, but may fail with other allocators. | ||
115 | |||
116 | It may also be due to lack of tracing in some unusual allocator functions. | ||
117 | |||
118 | We don't want bug reports regarding this issue yet. | ||
119 | --- | ||
120 | |||
121 | V. See also | ||
122 | =========== | ||
123 | |||
124 | Documentation/kernel-parameters.txt | ||
125 | Documentation/ABI/testing/debugfs-kmemtrace | ||
126 | |||
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index ec94748ae65b..5f77d94598dd 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt | |||
@@ -42,7 +42,7 @@ Synopsis of kprobe_events | |||
42 | +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) | 42 | +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) |
43 | NAME=FETCHARG : Set NAME as the argument name of FETCHARG. | 43 | NAME=FETCHARG : Set NAME as the argument name of FETCHARG. |
44 | FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types | 44 | FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types |
45 | (u8/u16/u32/u64/s8/s16/s32/s64) are supported. | 45 | (u8/u16/u32/u64/s8/s16/s32/s64) and string are supported. |
46 | 46 | ||
47 | (*) only for return probe. | 47 | (*) only for return probe. |
48 | (**) this is useful for fetching a field of data structures. | 48 | (**) this is useful for fetching a field of data structures. |