diff options
-rw-r--r-- | Documentation/trace/ftrace-uses.rst | 293 |
1 files changed, 293 insertions, 0 deletions
diff --git a/Documentation/trace/ftrace-uses.rst b/Documentation/trace/ftrace-uses.rst new file mode 100644 index 000000000000..8494a801d341 --- /dev/null +++ b/Documentation/trace/ftrace-uses.rst | |||
@@ -0,0 +1,293 @@ | |||
1 | ================================= | ||
2 | Using ftrace to hook to functions | ||
3 | ================================= | ||
4 | |||
5 | .. Copyright 2017 VMware Inc. | ||
6 | .. Author: Steven Rostedt <srostedt@goodmis.org> | ||
7 | .. License: The GNU Free Documentation License, Version 1.2 | ||
8 | .. (dual licensed under the GPL v2) | ||
9 | |||
10 | Written for: 4.14 | ||
11 | |||
12 | Introduction | ||
13 | ============ | ||
14 | |||
15 | The ftrace infrastructure was originially created to attach callbacks to the | ||
16 | beginning of functions in order to record and trace the flow of the kernel. | ||
17 | But callbacks to the start of a function can have other use cases. Either | ||
18 | for live kernel patching, or for security monitoring. This document describes | ||
19 | how to use ftrace to implement your own function callbacks. | ||
20 | |||
21 | |||
22 | The ftrace context | ||
23 | ================== | ||
24 | |||
25 | WARNING: The ability to add a callback to almost any function within the | ||
26 | kernel comes with risks. A callback can be called from any context | ||
27 | (normal, softirq, irq, and NMI). Callbacks can also be called just before | ||
28 | going to idle, during CPU bring up and takedown, or going to user space. | ||
29 | This requires extra care to what can be done inside a callback. A callback | ||
30 | can be called outside the protective scope of RCU. | ||
31 | |||
32 | The ftrace infrastructure has some protections agains recursions and RCU | ||
33 | but one must still be very careful how they use the callbacks. | ||
34 | |||
35 | |||
36 | The ftrace_ops structure | ||
37 | ======================== | ||
38 | |||
39 | To register a function callback, a ftrace_ops is required. This structure | ||
40 | is used to tell ftrace what function should be called as the callback | ||
41 | as well as what protections the callback will perform and not require | ||
42 | ftrace to handle. | ||
43 | |||
44 | There is only one field that is needed to be set when registering | ||
45 | an ftrace_ops with ftrace:: | ||
46 | |||
47 | .. code-block: c | ||
48 | |||
49 | struct ftrace_ops ops = { | ||
50 | .func = my_callback_func, | ||
51 | .flags = MY_FTRACE_FLAGS | ||
52 | .private = any_private_data_structure, | ||
53 | }; | ||
54 | |||
55 | Both .flags and .private are optional. Only .func is required. | ||
56 | |||
57 | To enable tracing call:: | ||
58 | |||
59 | .. c:function:: register_ftrace_function(&ops); | ||
60 | |||
61 | To disable tracing call:: | ||
62 | |||
63 | .. c:function:: unregister_ftrace_function(&ops); | ||
64 | |||
65 | The above is defined by including the header:: | ||
66 | |||
67 | .. c:function:: #include <linux/ftrace.h> | ||
68 | |||
69 | The registered callback will start being called some time after the | ||
70 | register_ftrace_function() is called and before it returns. The exact time | ||
71 | that callbacks start being called is dependent upon architecture and scheduling | ||
72 | of services. The callback itself will have to handle any synchronization if it | ||
73 | must begin at an exact moment. | ||
74 | |||
75 | The unregister_ftrace_function() will guarantee that the callback is | ||
76 | no longer being called by functions after the unregister_ftrace_function() | ||
77 | returns. Note that to perform this guarantee, the unregister_ftrace_function() | ||
78 | may take some time to finish. | ||
79 | |||
80 | |||
81 | The callback function | ||
82 | ===================== | ||
83 | |||
84 | The prototype of the callback function is as follows (as of v4.14):: | ||
85 | |||
86 | .. code-block: c | ||
87 | |||
88 | void callback_func(unsigned long ip, unsigned long parent_ip, | ||
89 | struct ftrace_ops *op, struct pt_regs *regs); | ||
90 | |||
91 | @ip | ||
92 | This is the instruction pointer of the function that is being traced. | ||
93 | (where the fentry or mcount is within the function) | ||
94 | |||
95 | @parent_ip | ||
96 | This is the instruction pointer of the function that called the | ||
97 | the function being traced (where the call of the function occurred). | ||
98 | |||
99 | @op | ||
100 | This is a pointer to ftrace_ops that was used to register the callback. | ||
101 | This can be used to pass data to the callback via the private pointer. | ||
102 | |||
103 | @regs | ||
104 | If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | ||
105 | flags are set in the ftrace_ops structure, then this will be pointing | ||
106 | to the pt_regs structure like it would be if an breakpoint was placed | ||
107 | at the start of the function where ftrace was tracing. Otherwise it | ||
108 | either contains garbage, or NULL. | ||
109 | |||
110 | |||
111 | The ftrace FLAGS | ||
112 | ================ | ||
113 | |||
114 | The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. | ||
115 | Some of the flags are used for internal infrastructure of ftrace, but the | ||
116 | ones that users should be aware of are the following: | ||
117 | |||
118 | FTRACE_OPS_FL_SAVE_REGS | ||
119 | If the callback requires reading or modifying the pt_regs | ||
120 | passed to the callback, then it must set this flag. Registering | ||
121 | a ftrace_ops with this flag set on an architecture that does not | ||
122 | support passing of pt_regs to the callback will fail. | ||
123 | |||
124 | FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | ||
125 | Similar to SAVE_REGS but the registering of a | ||
126 | ftrace_ops on an architecture that does not support passing of regs | ||
127 | will not fail with this flag set. But the callback must check if | ||
128 | regs is NULL or not to determine if the architecture supports it. | ||
129 | |||
130 | FTRACE_OPS_FL_RECURSION_SAFE | ||
131 | By default, a wrapper is added around the callback to | ||
132 | make sure that recursion of the function does not occur. That is, | ||
133 | if a function that is called as a result of the callback's execution | ||
134 | is also traced, ftrace will prevent the callback from being called | ||
135 | again. But this wrapper adds some overhead, and if the callback is | ||
136 | safe from recursion, it can set this flag to disable the ftrace | ||
137 | protection. | ||
138 | |||
139 | Note, if this flag is set, and recursion does occur, it could cause | ||
140 | the system to crash, and possibly reboot via a triple fault. | ||
141 | |||
142 | It is OK if another callback traces a function that is called by a | ||
143 | callback that is marked recursion safe. Recursion safe callbacks | ||
144 | must never trace any function that are called by the callback | ||
145 | itself or any nested functions that those functions call. | ||
146 | |||
147 | If this flag is set, it is possible that the callback will also | ||
148 | be called with preemption enabled (when CONFIG_PREEMPT is set), | ||
149 | but this is not guaranteed. | ||
150 | |||
151 | FTRACE_OPS_FL_IPMODIFY | ||
152 | Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" | ||
153 | the traced function (have another function called instead of the | ||
154 | traced function), it requires setting this flag. This is what live | ||
155 | kernel patches uses. Without this flag the pt_regs->ip can not be | ||
156 | modified. | ||
157 | |||
158 | Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be | ||
159 | registered to any given function at a time. | ||
160 | |||
161 | FTRACE_OPS_FL_RCU | ||
162 | If this is set, then the callback will only be called by functions | ||
163 | where RCU is "watching". This is required if the callback function | ||
164 | performs any rcu_read_lock() operation. | ||
165 | |||
166 | RCU stops watching when the system goes idle, the time when a CPU | ||
167 | is taken down and comes back online, and when entering from kernel | ||
168 | to user space and back to kernel space. During these transitions, | ||
169 | a callback may be executed and RCU synchronization will not protect | ||
170 | it. | ||
171 | |||
172 | |||
173 | Filtering which functions to trace | ||
174 | ================================== | ||
175 | |||
176 | If a callback is only to be called from specific functions, a filter must be | ||
177 | set up. The filters are added by name, or ip if it is known. | ||
178 | |||
179 | .. code-block: c | ||
180 | |||
181 | int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, | ||
182 | int len, int reset); | ||
183 | |||
184 | @ops | ||
185 | The ops to set the filter with | ||
186 | |||
187 | @buf | ||
188 | The string that holds the function filter text. | ||
189 | @len | ||
190 | The length of the string. | ||
191 | |||
192 | @reset | ||
193 | Non-zero to reset all filters before applying this filter. | ||
194 | |||
195 | Filters denote which functions should be enabled when tracing is enabled. | ||
196 | If @buf is NULL and reset is set, all functions will be enabled for tracing. | ||
197 | |||
198 | The @buf can also be a glob expression to enable all functions that | ||
199 | match a specific pattern. | ||
200 | |||
201 | See Filter Commands in :file:`Documentation/trace/ftrace.txt`. | ||
202 | |||
203 | To just trace the schedule function:: | ||
204 | |||
205 | .. code-block: c | ||
206 | |||
207 | ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); | ||
208 | |||
209 | To add more functions, call the ftrace_set_filter() more than once with the | ||
210 | @reset parameter set to zero. To remove the current filter set and replace it | ||
211 | with new functions defined by @buf, have @reset be non-zero. | ||
212 | |||
213 | To remove all the filtered functions and trace all functions:: | ||
214 | |||
215 | .. code-block: c | ||
216 | |||
217 | ret = ftrace_set_filter(&ops, NULL, 0, 1); | ||
218 | |||
219 | |||
220 | Sometimes more than one function has the same name. To trace just a specific | ||
221 | function in this case, ftrace_set_filter_ip() can be used. | ||
222 | |||
223 | .. code-block: c | ||
224 | |||
225 | ret = ftrace_set_filter_ip(&ops, ip, 0, 0); | ||
226 | |||
227 | Although the ip must be the address where the call to fentry or mcount is | ||
228 | located in the function. This function is used by perf and kprobes that | ||
229 | gets the ip address from the user (usually using debug info from the kernel). | ||
230 | |||
231 | If a glob is used to set the filter, functions can be added to a "notrace" | ||
232 | list that will prevent those functions from calling the callback. | ||
233 | The "notrace" list takes precedence over the "filter" list. If the | ||
234 | two lists are non-empty and contain the same functions, the callback will not | ||
235 | be called by any function. | ||
236 | |||
237 | An empty "notrace" list means to allow all functions defined by the filter | ||
238 | to be traced. | ||
239 | |||
240 | .. code-block: c | ||
241 | |||
242 | int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, | ||
243 | int len, int reset); | ||
244 | |||
245 | This takes the same parameters as ftrace_set_filter() but will add the | ||
246 | functions it finds to not be traced. This is a separate list from the | ||
247 | filter list, and this function does not modify the filter list. | ||
248 | |||
249 | A non-zero @reset will clear the "notrace" list before adding functions | ||
250 | that match @buf to it. | ||
251 | |||
252 | Clearing the "notrace" list is the same as clearing the filter list | ||
253 | |||
254 | .. code-block: c | ||
255 | |||
256 | ret = ftrace_set_notrace(&ops, NULL, 0, 1); | ||
257 | |||
258 | The filter and notrace lists may be changed at any time. If only a set of | ||
259 | functions should call the callback, it is best to set the filters before | ||
260 | registering the callback. But the changes may also happen after the callback | ||
261 | has been registered. | ||
262 | |||
263 | If a filter is in place, and the @reset is non-zero, and @buf contains a | ||
264 | matching glob to functions, the switch will happen during the time of | ||
265 | the ftrace_set_filter() call. At no time will all functions call the callback. | ||
266 | |||
267 | .. code-block: c | ||
268 | |||
269 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); | ||
270 | |||
271 | register_ftrace_function(&ops); | ||
272 | |||
273 | msleep(10); | ||
274 | |||
275 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); | ||
276 | |||
277 | is not the same as: | ||
278 | |||
279 | .. code-block: c | ||
280 | |||
281 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); | ||
282 | |||
283 | register_ftrace_function(&ops); | ||
284 | |||
285 | msleep(10); | ||
286 | |||
287 | ftrace_set_filter(&ops, NULL, 0, 1); | ||
288 | |||
289 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); | ||
290 | |||
291 | As the latter will have a short time where all functions will call | ||
292 | the callback, between the time of the reset, and the time of the | ||
293 | new setting of the filter. | ||