diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/trace/ftrace-design.txt | 14 | ||||
-rw-r--r-- | Documentation/trace/mmiotrace.txt | 15 | ||||
-rw-r--r-- | Documentation/trace/tracepoint-analysis.txt | 60 |
3 files changed, 45 insertions, 44 deletions
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index 641a1ef2a7ff..239f14b2b55a 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt | |||
@@ -53,14 +53,14 @@ size of the mcount call that is embedded in the function). | |||
53 | For example, if the function foo() calls bar(), when the bar() function calls | 53 | For example, if the function foo() calls bar(), when the bar() function calls |
54 | mcount(), the arguments mcount() will pass to the tracer are: | 54 | mcount(), the arguments mcount() will pass to the tracer are: |
55 | "frompc" - the address bar() will use to return to foo() | 55 | "frompc" - the address bar() will use to return to foo() |
56 | "selfpc" - the address bar() (with _mcount() size adjustment) | 56 | "selfpc" - the address bar() (with mcount() size adjustment) |
57 | 57 | ||
58 | Also keep in mind that this mcount function will be called *a lot*, so | 58 | Also keep in mind that this mcount function will be called *a lot*, so |
59 | optimizing for the default case of no tracer will help the smooth running of | 59 | optimizing for the default case of no tracer will help the smooth running of |
60 | your system when tracing is disabled. So the start of the mcount function is | 60 | your system when tracing is disabled. So the start of the mcount function is |
61 | typically the bare min with checking things before returning. That also means | 61 | typically the bare minimum with checking things before returning. That also |
62 | the code flow should usually kept linear (i.e. no branching in the nop case). | 62 | means the code flow should usually be kept linear (i.e. no branching in the nop |
63 | This is of course an optimization and not a hard requirement. | 63 | case). This is of course an optimization and not a hard requirement. |
64 | 64 | ||
65 | Here is some pseudo code that should help (these functions should actually be | 65 | Here is some pseudo code that should help (these functions should actually be |
66 | implemented in assembly): | 66 | implemented in assembly): |
@@ -131,10 +131,10 @@ some functions to save (hijack) and restore the return address. | |||
131 | 131 | ||
132 | The mcount function should check the function pointers ftrace_graph_return | 132 | The mcount function should check the function pointers ftrace_graph_return |
133 | (compare to ftrace_stub) and ftrace_graph_entry (compare to | 133 | (compare to ftrace_stub) and ftrace_graph_entry (compare to |
134 | ftrace_graph_entry_stub). If either of those are not set to the relevant stub | 134 | ftrace_graph_entry_stub). If either of those is not set to the relevant stub |
135 | function, call the arch-specific function ftrace_graph_caller which in turn | 135 | function, call the arch-specific function ftrace_graph_caller which in turn |
136 | calls the arch-specific function prepare_ftrace_return. Neither of these | 136 | calls the arch-specific function prepare_ftrace_return. Neither of these |
137 | function names are strictly required, but you should use them anyways to stay | 137 | function names is strictly required, but you should use them anyway to stay |
138 | consistent across the architecture ports -- easier to compare & contrast | 138 | consistent across the architecture ports -- easier to compare & contrast |
139 | things. | 139 | things. |
140 | 140 | ||
@@ -144,7 +144,7 @@ but the first argument should be a pointer to the "frompc". Typically this is | |||
144 | located on the stack. This allows the function to hijack the return address | 144 | located on the stack. This allows the function to hijack the return address |
145 | temporarily to have it point to the arch-specific function return_to_handler. | 145 | temporarily to have it point to the arch-specific function return_to_handler. |
146 | That function will simply call the common ftrace_return_to_handler function and | 146 | That function will simply call the common ftrace_return_to_handler function and |
147 | that will return the original return address with which, you can return to the | 147 | that will return the original return address with which you can return to the |
148 | original call site. | 148 | original call site. |
149 | 149 | ||
150 | Here is the updated mcount pseudo code: | 150 | Here is the updated mcount pseudo code: |
diff --git a/Documentation/trace/mmiotrace.txt b/Documentation/trace/mmiotrace.txt index 162effbfbdec..664e7386d89e 100644 --- a/Documentation/trace/mmiotrace.txt +++ b/Documentation/trace/mmiotrace.txt | |||
@@ -44,7 +44,8 @@ Check for lost events. | |||
44 | Usage | 44 | Usage |
45 | ----- | 45 | ----- |
46 | 46 | ||
47 | Make sure debugfs is mounted to /sys/kernel/debug. If not, (requires root privileges) | 47 | Make sure debugfs is mounted to /sys/kernel/debug. |
48 | If not (requires root privileges): | ||
48 | $ mount -t debugfs debugfs /sys/kernel/debug | 49 | $ mount -t debugfs debugfs /sys/kernel/debug |
49 | 50 | ||
50 | Check that the driver you are about to trace is not loaded. | 51 | Check that the driver you are about to trace is not loaded. |
@@ -91,7 +92,7 @@ $ dmesg > dmesg.txt | |||
91 | $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt | 92 | $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt |
92 | and then send the .tar.gz file. The trace compresses considerably. Replace | 93 | and then send the .tar.gz file. The trace compresses considerably. Replace |
93 | "pciid" and "nick" with the PCI ID or model name of your piece of hardware | 94 | "pciid" and "nick" with the PCI ID or model name of your piece of hardware |
94 | under investigation and your nick name. | 95 | under investigation and your nickname. |
95 | 96 | ||
96 | 97 | ||
97 | How Mmiotrace Works | 98 | How Mmiotrace Works |
@@ -100,7 +101,7 @@ How Mmiotrace Works | |||
100 | Access to hardware IO-memory is gained by mapping addresses from PCI bus by | 101 | Access to hardware IO-memory is gained by mapping addresses from PCI bus by |
101 | calling one of the ioremap_*() functions. Mmiotrace is hooked into the | 102 | calling one of the ioremap_*() functions. Mmiotrace is hooked into the |
102 | __ioremap() function and gets called whenever a mapping is created. Mapping is | 103 | __ioremap() function and gets called whenever a mapping is created. Mapping is |
103 | an event that is recorded into the trace log. Note, that ISA range mappings | 104 | an event that is recorded into the trace log. Note that ISA range mappings |
104 | are not caught, since the mapping always exists and is returned directly. | 105 | are not caught, since the mapping always exists and is returned directly. |
105 | 106 | ||
106 | MMIO accesses are recorded via page faults. Just before __ioremap() returns, | 107 | MMIO accesses are recorded via page faults. Just before __ioremap() returns, |
@@ -122,11 +123,11 @@ Trace Log Format | |||
122 | ---------------- | 123 | ---------------- |
123 | 124 | ||
124 | The raw log is text and easily filtered with e.g. grep and awk. One record is | 125 | The raw log is text and easily filtered with e.g. grep and awk. One record is |
125 | one line in the log. A record starts with a keyword, followed by keyword | 126 | one line in the log. A record starts with a keyword, followed by keyword- |
126 | dependant arguments. Arguments are separated by a space, or continue until the | 127 | dependent arguments. Arguments are separated by a space, or continue until the |
127 | end of line. The format for version 20070824 is as follows: | 128 | end of line. The format for version 20070824 is as follows: |
128 | 129 | ||
129 | Explanation Keyword Space separated arguments | 130 | Explanation Keyword Space-separated arguments |
130 | --------------------------------------------------------------------------- | 131 | --------------------------------------------------------------------------- |
131 | 132 | ||
132 | read event R width, timestamp, map id, physical, value, PC, PID | 133 | read event R width, timestamp, map id, physical, value, PC, PID |
@@ -136,7 +137,7 @@ iounmap event UNMAP timestamp, map id, PC, PID | |||
136 | marker MARK timestamp, text | 137 | marker MARK timestamp, text |
137 | version VERSION the string "20070824" | 138 | version VERSION the string "20070824" |
138 | info for reader LSPCI one line from lspci -v | 139 | info for reader LSPCI one line from lspci -v |
139 | PCI address map PCIDEV space separated /proc/bus/pci/devices data | 140 | PCI address map PCIDEV space-separated /proc/bus/pci/devices data |
140 | unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID | 141 | unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID |
141 | 142 | ||
142 | Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual | 143 | Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual |
diff --git a/Documentation/trace/tracepoint-analysis.txt b/Documentation/trace/tracepoint-analysis.txt index 5eb4e487e667..87bee3c129ba 100644 --- a/Documentation/trace/tracepoint-analysis.txt +++ b/Documentation/trace/tracepoint-analysis.txt | |||
@@ -10,8 +10,8 @@ Tracepoints (see Documentation/trace/tracepoints.txt) can be used without | |||
10 | creating custom kernel modules to register probe functions using the event | 10 | creating custom kernel modules to register probe functions using the event |
11 | tracing infrastructure. | 11 | tracing infrastructure. |
12 | 12 | ||
13 | Simplistically, tracepoints will represent an important event that when can | 13 | Simplistically, tracepoints represent important events that can be |
14 | be taken in conjunction with other tracepoints to build a "Big Picture" of | 14 | taken in conjunction with other tracepoints to build a "Big Picture" of |
15 | what is going on within the system. There are a large number of methods for | 15 | what is going on within the system. There are a large number of methods for |
16 | gathering and interpreting these events. Lacking any current Best Practises, | 16 | gathering and interpreting these events. Lacking any current Best Practises, |
17 | this document describes some of the methods that can be used. | 17 | this document describes some of the methods that can be used. |
@@ -33,12 +33,12 @@ calling | |||
33 | 33 | ||
34 | will give a fair indication of the number of events available. | 34 | will give a fair indication of the number of events available. |
35 | 35 | ||
36 | 2.2 PCL | 36 | 2.2 PCL (Performance Counters for Linux) |
37 | ------- | 37 | ------- |
38 | 38 | ||
39 | Discovery and enumeration of all counters and events, including tracepoints | 39 | Discovery and enumeration of all counters and events, including tracepoints, |
40 | are available with the perf tool. Getting a list of available events is a | 40 | are available with the perf tool. Getting a list of available events is a |
41 | simple case of | 41 | simple case of: |
42 | 42 | ||
43 | $ perf list 2>&1 | grep Tracepoint | 43 | $ perf list 2>&1 | grep Tracepoint |
44 | ext4:ext4_free_inode [Tracepoint event] | 44 | ext4:ext4_free_inode [Tracepoint event] |
@@ -49,19 +49,19 @@ simple case of | |||
49 | [ .... remaining output snipped .... ] | 49 | [ .... remaining output snipped .... ] |
50 | 50 | ||
51 | 51 | ||
52 | 2. Enabling Events | 52 | 3. Enabling Events |
53 | ================== | 53 | ================== |
54 | 54 | ||
55 | 2.1 System-Wide Event Enabling | 55 | 3.1 System-Wide Event Enabling |
56 | ------------------------------ | 56 | ------------------------------ |
57 | 57 | ||
58 | See Documentation/trace/events.txt for a proper description on how events | 58 | See Documentation/trace/events.txt for a proper description on how events |
59 | can be enabled system-wide. A short example of enabling all events related | 59 | can be enabled system-wide. A short example of enabling all events related |
60 | to page allocation would look something like | 60 | to page allocation would look something like: |
61 | 61 | ||
62 | $ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done | 62 | $ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done |
63 | 63 | ||
64 | 2.2 System-Wide Event Enabling with SystemTap | 64 | 3.2 System-Wide Event Enabling with SystemTap |
65 | --------------------------------------------- | 65 | --------------------------------------------- |
66 | 66 | ||
67 | In SystemTap, tracepoints are accessible using the kernel.trace() function | 67 | In SystemTap, tracepoints are accessible using the kernel.trace() function |
@@ -86,7 +86,7 @@ were allocating the pages. | |||
86 | print_count() | 86 | print_count() |
87 | } | 87 | } |
88 | 88 | ||
89 | 2.3 System-Wide Event Enabling with PCL | 89 | 3.3 System-Wide Event Enabling with PCL |
90 | --------------------------------------- | 90 | --------------------------------------- |
91 | 91 | ||
92 | By specifying the -a switch and analysing sleep, the system-wide events | 92 | By specifying the -a switch and analysing sleep, the system-wide events |
@@ -107,16 +107,16 @@ for a duration of time can be examined. | |||
107 | Similarly, one could execute a shell and exit it as desired to get a report | 107 | Similarly, one could execute a shell and exit it as desired to get a report |
108 | at that point. | 108 | at that point. |
109 | 109 | ||
110 | 2.4 Local Event Enabling | 110 | 3.4 Local Event Enabling |
111 | ------------------------ | 111 | ------------------------ |
112 | 112 | ||
113 | Documentation/trace/ftrace.txt describes how to enable events on a per-thread | 113 | Documentation/trace/ftrace.txt describes how to enable events on a per-thread |
114 | basis using set_ftrace_pid. | 114 | basis using set_ftrace_pid. |
115 | 115 | ||
116 | 2.5 Local Event Enablement with PCL | 116 | 3.5 Local Event Enablement with PCL |
117 | ----------------------------------- | 117 | ----------------------------------- |
118 | 118 | ||
119 | Events can be activate and tracked for the duration of a process on a local | 119 | Events can be activated and tracked for the duration of a process on a local |
120 | basis using PCL such as follows. | 120 | basis using PCL such as follows. |
121 | 121 | ||
122 | $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ | 122 | $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ |
@@ -131,18 +131,18 @@ basis using PCL such as follows. | |||
131 | 131 | ||
132 | 0.973913387 seconds time elapsed | 132 | 0.973913387 seconds time elapsed |
133 | 133 | ||
134 | 3. Event Filtering | 134 | 4. Event Filtering |
135 | ================== | 135 | ================== |
136 | 136 | ||
137 | Documentation/trace/ftrace.txt covers in-depth how to filter events in | 137 | Documentation/trace/ftrace.txt covers in-depth how to filter events in |
138 | ftrace. Obviously using grep and awk of trace_pipe is an option as well | 138 | ftrace. Obviously using grep and awk of trace_pipe is an option as well |
139 | as any script reading trace_pipe. | 139 | as any script reading trace_pipe. |
140 | 140 | ||
141 | 4. Analysing Event Variances with PCL | 141 | 5. Analysing Event Variances with PCL |
142 | ===================================== | 142 | ===================================== |
143 | 143 | ||
144 | Any workload can exhibit variances between runs and it can be important | 144 | Any workload can exhibit variances between runs and it can be important |
145 | to know what the standard deviation in. By and large, this is left to the | 145 | to know what the standard deviation is. By and large, this is left to the |
146 | performance analyst to do it by hand. In the event that the discrete event | 146 | performance analyst to do it by hand. In the event that the discrete event |
147 | occurrences are useful to the performance analyst, then perf can be used. | 147 | occurrences are useful to the performance analyst, then perf can be used. |
148 | 148 | ||
@@ -166,7 +166,7 @@ In the event that some higher-level event is required that depends on some | |||
166 | aggregation of discrete events, then a script would need to be developed. | 166 | aggregation of discrete events, then a script would need to be developed. |
167 | 167 | ||
168 | Using --repeat, it is also possible to view how events are fluctuating over | 168 | Using --repeat, it is also possible to view how events are fluctuating over |
169 | time on a system wide basis using -a and sleep. | 169 | time on a system-wide basis using -a and sleep. |
170 | 170 | ||
171 | $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ | 171 | $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ |
172 | -e kmem:mm_pagevec_free \ | 172 | -e kmem:mm_pagevec_free \ |
@@ -180,7 +180,7 @@ time on a system wide basis using -a and sleep. | |||
180 | 180 | ||
181 | 1.002251757 seconds time elapsed ( +- 0.005% ) | 181 | 1.002251757 seconds time elapsed ( +- 0.005% ) |
182 | 182 | ||
183 | 5. Higher-Level Analysis with Helper Scripts | 183 | 6. Higher-Level Analysis with Helper Scripts |
184 | ============================================ | 184 | ============================================ |
185 | 185 | ||
186 | When events are enabled the events that are triggering can be read from | 186 | When events are enabled the events that are triggering can be read from |
@@ -190,11 +190,11 @@ be gathered on-line as appropriate. Examples of post-processing might include | |||
190 | 190 | ||
191 | o Reading information from /proc for the PID that triggered the event | 191 | o Reading information from /proc for the PID that triggered the event |
192 | o Deriving a higher-level event from a series of lower-level events. | 192 | o Deriving a higher-level event from a series of lower-level events. |
193 | o Calculate latencies between two events | 193 | o Calculating latencies between two events |
194 | 194 | ||
195 | Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example | 195 | Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example |
196 | script that can read trace_pipe from STDIN or a copy of a trace. When used | 196 | script that can read trace_pipe from STDIN or a copy of a trace. When used |
197 | on-line, it can be interrupted once to generate a report without existing | 197 | on-line, it can be interrupted once to generate a report without exiting |
198 | and twice to exit. | 198 | and twice to exit. |
199 | 199 | ||
200 | Simplistically, the script just reads STDIN and counts up events but it | 200 | Simplistically, the script just reads STDIN and counts up events but it |
@@ -212,12 +212,12 @@ also can do more such as | |||
212 | processes, the parent process responsible for creating all the helpers | 212 | processes, the parent process responsible for creating all the helpers |
213 | can be identified | 213 | can be identified |
214 | 214 | ||
215 | 6. Lower-Level Analysis with PCL | 215 | 7. Lower-Level Analysis with PCL |
216 | ================================ | 216 | ================================ |
217 | 217 | ||
218 | There may also be a requirement to identify what functions with a program | 218 | There may also be a requirement to identify what functions within a program |
219 | were generating events within the kernel. To begin this sort of analysis, the | 219 | were generating events within the kernel. To begin this sort of analysis, the |
220 | data must be recorded. At the time of writing, this required root | 220 | data must be recorded. At the time of writing, this required root: |
221 | 221 | ||
222 | $ perf record -c 1 \ | 222 | $ perf record -c 1 \ |
223 | -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ | 223 | -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ |
@@ -253,11 +253,11 @@ perf report. | |||
253 | # (For more details, try: perf report --sort comm,dso,symbol) | 253 | # (For more details, try: perf report --sort comm,dso,symbol) |
254 | # | 254 | # |
255 | 255 | ||
256 | According to this, the vast majority of events occured triggered on events | 256 | According to this, the vast majority of events triggered on events |
257 | within the VDSO. With simple binaries, this will often be the case so lets | 257 | within the VDSO. With simple binaries, this will often be the case so let's |
258 | take a slightly different example. In the course of writing this, it was | 258 | take a slightly different example. In the course of writing this, it was |
259 | noticed that X was generating an insane amount of page allocations so lets look | 259 | noticed that X was generating an insane amount of page allocations so let's look |
260 | at it | 260 | at it: |
261 | 261 | ||
262 | $ perf record -c 1 -f \ | 262 | $ perf record -c 1 -f \ |
263 | -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ | 263 | -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \ |
@@ -280,8 +280,8 @@ This was interrupted after a few seconds and | |||
280 | # (For more details, try: perf report --sort comm,dso,symbol) | 280 | # (For more details, try: perf report --sort comm,dso,symbol) |
281 | # | 281 | # |
282 | 282 | ||
283 | So, almost half of the events are occuring in a library. To get an idea which | 283 | So, almost half of the events are occurring in a library. To get an idea which |
284 | symbol. | 284 | symbol: |
285 | 285 | ||
286 | $ perf report --sort comm,dso,symbol | 286 | $ perf report --sort comm,dso,symbol |
287 | # Samples: 27666 | 287 | # Samples: 27666 |
@@ -297,7 +297,7 @@ symbol. | |||
297 | 0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] get_fast_path | 297 | 0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] get_fast_path |
298 | 0.00% Xorg [kernel] [k] ftrace_trace_userstack | 298 | 0.00% Xorg [kernel] [k] ftrace_trace_userstack |
299 | 299 | ||
300 | To see where within the function pixmanFillsse2 things are going wrong | 300 | To see where within the function pixmanFillsse2 things are going wrong: |
301 | 301 | ||
302 | $ perf annotate pixmanFillsse2 | 302 | $ perf annotate pixmanFillsse2 |
303 | [ ... ] | 303 | [ ... ] |