diff options
Diffstat (limited to 'Documentation/livepatch/livepatch.rst')
| -rw-r--r-- | Documentation/livepatch/livepatch.rst | 461 |
1 files changed, 461 insertions, 0 deletions
diff --git a/Documentation/livepatch/livepatch.rst b/Documentation/livepatch/livepatch.rst new file mode 100644 index 000000000000..c2c598c4ead8 --- /dev/null +++ b/Documentation/livepatch/livepatch.rst | |||
| @@ -0,0 +1,461 @@ | |||
| 1 | ========= | ||
| 2 | Livepatch | ||
| 3 | ========= | ||
| 4 | |||
| 5 | This document outlines basic information about kernel livepatching. | ||
| 6 | |||
| 7 | .. Table of Contents: | ||
| 8 | |||
| 9 | 1. Motivation | ||
| 10 | 2. Kprobes, Ftrace, Livepatching | ||
| 11 | 3. Consistency model | ||
| 12 | 4. Livepatch module | ||
| 13 | 4.1. New functions | ||
| 14 | 4.2. Metadata | ||
| 15 | 5. Livepatch life-cycle | ||
| 16 | 5.1. Loading | ||
| 17 | 5.2. Enabling | ||
| 18 | 5.3. Replacing | ||
| 19 | 5.4. Disabling | ||
| 20 | 5.5. Removing | ||
| 21 | 6. Sysfs | ||
| 22 | 7. Limitations | ||
| 23 | |||
| 24 | |||
| 25 | 1. Motivation | ||
| 26 | ============= | ||
| 27 | |||
| 28 | There are many situations where users are reluctant to reboot a system. It may | ||
| 29 | be because their system is performing complex scientific computations or under | ||
| 30 | heavy load during peak usage. In addition to keeping systems up and running, | ||
| 31 | users want to also have a stable and secure system. Livepatching gives users | ||
| 32 | both by allowing for function calls to be redirected; thus, fixing critical | ||
| 33 | functions without a system reboot. | ||
| 34 | |||
| 35 | |||
| 36 | 2. Kprobes, Ftrace, Livepatching | ||
| 37 | ================================ | ||
| 38 | |||
| 39 | There are multiple mechanisms in the Linux kernel that are directly related | ||
| 40 | to redirection of code execution; namely: kernel probes, function tracing, | ||
| 41 | and livepatching: | ||
| 42 | |||
| 43 | - The kernel probes are the most generic. The code can be redirected by | ||
| 44 | putting a breakpoint instruction instead of any instruction. | ||
| 45 | |||
| 46 | - The function tracer calls the code from a predefined location that is | ||
| 47 | close to the function entry point. This location is generated by the | ||
| 48 | compiler using the '-pg' gcc option. | ||
| 49 | |||
| 50 | - Livepatching typically needs to redirect the code at the very beginning | ||
| 51 | of the function entry before the function parameters or the stack | ||
| 52 | are in any way modified. | ||
| 53 | |||
| 54 | All three approaches need to modify the existing code at runtime. Therefore | ||
| 55 | they need to be aware of each other and not step over each other's toes. | ||
| 56 | Most of these problems are solved by using the dynamic ftrace framework as | ||
| 57 | a base. A Kprobe is registered as a ftrace handler when the function entry | ||
| 58 | is probed, see CONFIG_KPROBES_ON_FTRACE. Also an alternative function from | ||
| 59 | a live patch is called with the help of a custom ftrace handler. But there are | ||
| 60 | some limitations, see below. | ||
| 61 | |||
| 62 | |||
| 63 | 3. Consistency model | ||
| 64 | ==================== | ||
| 65 | |||
| 66 | Functions are there for a reason. They take some input parameters, get or | ||
| 67 | release locks, read, process, and even write some data in a defined way, | ||
| 68 | have return values. In other words, each function has a defined semantic. | ||
| 69 | |||
| 70 | Many fixes do not change the semantic of the modified functions. For | ||
| 71 | example, they add a NULL pointer or a boundary check, fix a race by adding | ||
| 72 | a missing memory barrier, or add some locking around a critical section. | ||
| 73 | Most of these changes are self contained and the function presents itself | ||
| 74 | the same way to the rest of the system. In this case, the functions might | ||
| 75 | be updated independently one by one. | ||
| 76 | |||
| 77 | But there are more complex fixes. For example, a patch might change | ||
| 78 | ordering of locking in multiple functions at the same time. Or a patch | ||
| 79 | might exchange meaning of some temporary structures and update | ||
| 80 | all the relevant functions. In this case, the affected unit | ||
| 81 | (thread, whole kernel) need to start using all new versions of | ||
| 82 | the functions at the same time. Also the switch must happen only | ||
| 83 | when it is safe to do so, e.g. when the affected locks are released | ||
| 84 | or no data are stored in the modified structures at the moment. | ||
| 85 | |||
| 86 | The theory about how to apply functions a safe way is rather complex. | ||
| 87 | The aim is to define a so-called consistency model. It attempts to define | ||
| 88 | conditions when the new implementation could be used so that the system | ||
| 89 | stays consistent. | ||
| 90 | |||
| 91 | Livepatch has a consistency model which is a hybrid of kGraft and | ||
| 92 | kpatch: it uses kGraft's per-task consistency and syscall barrier | ||
| 93 | switching combined with kpatch's stack trace switching. There are also | ||
| 94 | a number of fallback options which make it quite flexible. | ||
| 95 | |||
| 96 | Patches are applied on a per-task basis, when the task is deemed safe to | ||
| 97 | switch over. When a patch is enabled, livepatch enters into a | ||
| 98 | transition state where tasks are converging to the patched state. | ||
| 99 | Usually this transition state can complete in a few seconds. The same | ||
| 100 | sequence occurs when a patch is disabled, except the tasks converge from | ||
| 101 | the patched state to the unpatched state. | ||
| 102 | |||
| 103 | An interrupt handler inherits the patched state of the task it | ||
| 104 | interrupts. The same is true for forked tasks: the child inherits the | ||
| 105 | patched state of the parent. | ||
| 106 | |||
| 107 | Livepatch uses several complementary approaches to determine when it's | ||
| 108 | safe to patch tasks: | ||
| 109 | |||
| 110 | 1. The first and most effective approach is stack checking of sleeping | ||
| 111 | tasks. If no affected functions are on the stack of a given task, | ||
| 112 | the task is patched. In most cases this will patch most or all of | ||
| 113 | the tasks on the first try. Otherwise it'll keep trying | ||
| 114 | periodically. This option is only available if the architecture has | ||
| 115 | reliable stacks (HAVE_RELIABLE_STACKTRACE). | ||
| 116 | |||
| 117 | 2. The second approach, if needed, is kernel exit switching. A | ||
| 118 | task is switched when it returns to user space from a system call, a | ||
| 119 | user space IRQ, or a signal. It's useful in the following cases: | ||
| 120 | |||
| 121 | a) Patching I/O-bound user tasks which are sleeping on an affected | ||
| 122 | function. In this case you have to send SIGSTOP and SIGCONT to | ||
| 123 | force it to exit the kernel and be patched. | ||
| 124 | b) Patching CPU-bound user tasks. If the task is highly CPU-bound | ||
| 125 | then it will get patched the next time it gets interrupted by an | ||
| 126 | IRQ. | ||
| 127 | |||
| 128 | 3. For idle "swapper" tasks, since they don't ever exit the kernel, they | ||
| 129 | instead have a klp_update_patch_state() call in the idle loop which | ||
| 130 | allows them to be patched before the CPU enters the idle state. | ||
| 131 | |||
| 132 | (Note there's not yet such an approach for kthreads.) | ||
| 133 | |||
| 134 | Architectures which don't have HAVE_RELIABLE_STACKTRACE solely rely on | ||
| 135 | the second approach. It's highly likely that some tasks may still be | ||
| 136 | running with an old version of the function, until that function | ||
| 137 | returns. In this case you would have to signal the tasks. This | ||
| 138 | especially applies to kthreads. They may not be woken up and would need | ||
| 139 | to be forced. See below for more information. | ||
| 140 | |||
| 141 | Unless we can come up with another way to patch kthreads, architectures | ||
| 142 | without HAVE_RELIABLE_STACKTRACE are not considered fully supported by | ||
| 143 | the kernel livepatching. | ||
| 144 | |||
| 145 | The /sys/kernel/livepatch/<patch>/transition file shows whether a patch | ||
| 146 | is in transition. Only a single patch can be in transition at a given | ||
| 147 | time. A patch can remain in transition indefinitely, if any of the tasks | ||
| 148 | are stuck in the initial patch state. | ||
| 149 | |||
| 150 | A transition can be reversed and effectively canceled by writing the | ||
| 151 | opposite value to the /sys/kernel/livepatch/<patch>/enabled file while | ||
| 152 | the transition is in progress. Then all the tasks will attempt to | ||
| 153 | converge back to the original patch state. | ||
| 154 | |||
| 155 | There's also a /proc/<pid>/patch_state file which can be used to | ||
| 156 | determine which tasks are blocking completion of a patching operation. | ||
| 157 | If a patch is in transition, this file shows 0 to indicate the task is | ||
| 158 | unpatched and 1 to indicate it's patched. Otherwise, if no patch is in | ||
| 159 | transition, it shows -1. Any tasks which are blocking the transition | ||
| 160 | can be signaled with SIGSTOP and SIGCONT to force them to change their | ||
| 161 | patched state. This may be harmful to the system though. Sending a fake signal | ||
| 162 | to all remaining blocking tasks is a better alternative. No proper signal is | ||
| 163 | actually delivered (there is no data in signal pending structures). Tasks are | ||
| 164 | interrupted or woken up, and forced to change their patched state. The fake | ||
| 165 | signal is automatically sent every 15 seconds. | ||
| 166 | |||
| 167 | Administrator can also affect a transition through | ||
| 168 | /sys/kernel/livepatch/<patch>/force attribute. Writing 1 there clears | ||
| 169 | TIF_PATCH_PENDING flag of all tasks and thus forces the tasks to the patched | ||
| 170 | state. Important note! The force attribute is intended for cases when the | ||
| 171 | transition gets stuck for a long time because of a blocking task. Administrator | ||
| 172 | is expected to collect all necessary data (namely stack traces of such blocking | ||
| 173 | tasks) and request a clearance from a patch distributor to force the transition. | ||
| 174 | Unauthorized usage may cause harm to the system. It depends on the nature of the | ||
| 175 | patch, which functions are (un)patched, and which functions the blocking tasks | ||
| 176 | are sleeping in (/proc/<pid>/stack may help here). Removal (rmmod) of patch | ||
| 177 | modules is permanently disabled when the force feature is used. It cannot be | ||
| 178 | guaranteed there is no task sleeping in such module. It implies unbounded | ||
| 179 | reference count if a patch module is disabled and enabled in a loop. | ||
| 180 | |||
| 181 | Moreover, the usage of force may also affect future applications of live | ||
| 182 | patches and cause even more harm to the system. Administrator should first | ||
| 183 | consider to simply cancel a transition (see above). If force is used, reboot | ||
| 184 | should be planned and no more live patches applied. | ||
| 185 | |||
| 186 | 3.1 Adding consistency model support to new architectures | ||
| 187 | --------------------------------------------------------- | ||
| 188 | |||
| 189 | For adding consistency model support to new architectures, there are a | ||
| 190 | few options: | ||
| 191 | |||
| 192 | 1) Add CONFIG_HAVE_RELIABLE_STACKTRACE. This means porting objtool, and | ||
| 193 | for non-DWARF unwinders, also making sure there's a way for the stack | ||
| 194 | tracing code to detect interrupts on the stack. | ||
| 195 | |||
| 196 | 2) Alternatively, ensure that every kthread has a call to | ||
| 197 | klp_update_patch_state() in a safe location. Kthreads are typically | ||
| 198 | in an infinite loop which does some action repeatedly. The safe | ||
| 199 | location to switch the kthread's patch state would be at a designated | ||
| 200 | point in the loop where there are no locks taken and all data | ||
| 201 | structures are in a well-defined state. | ||
| 202 | |||
| 203 | The location is clear when using workqueues or the kthread worker | ||
| 204 | API. These kthreads process independent actions in a generic loop. | ||
| 205 | |||
| 206 | It's much more complicated with kthreads which have a custom loop. | ||
| 207 | There the safe location must be carefully selected on a case-by-case | ||
| 208 | basis. | ||
| 209 | |||
| 210 | In that case, arches without HAVE_RELIABLE_STACKTRACE would still be | ||
| 211 | able to use the non-stack-checking parts of the consistency model: | ||
| 212 | |||
| 213 | a) patching user tasks when they cross the kernel/user space | ||
| 214 | boundary; and | ||
| 215 | |||
| 216 | b) patching kthreads and idle tasks at their designated patch points. | ||
| 217 | |||
| 218 | This option isn't as good as option 1 because it requires signaling | ||
| 219 | user tasks and waking kthreads to patch them. But it could still be | ||
| 220 | a good backup option for those architectures which don't have | ||
| 221 | reliable stack traces yet. | ||
| 222 | |||
| 223 | |||
| 224 | 4. Livepatch module | ||
| 225 | =================== | ||
| 226 | |||
| 227 | Livepatches are distributed using kernel modules, see | ||
| 228 | samples/livepatch/livepatch-sample.c. | ||
| 229 | |||
| 230 | The module includes a new implementation of functions that we want | ||
| 231 | to replace. In addition, it defines some structures describing the | ||
| 232 | relation between the original and the new implementation. Then there | ||
| 233 | is code that makes the kernel start using the new code when the livepatch | ||
| 234 | module is loaded. Also there is code that cleans up before the | ||
| 235 | livepatch module is removed. All this is explained in more details in | ||
| 236 | the next sections. | ||
| 237 | |||
| 238 | |||
| 239 | 4.1. New functions | ||
| 240 | ------------------ | ||
| 241 | |||
| 242 | New versions of functions are typically just copied from the original | ||
| 243 | sources. A good practice is to add a prefix to the names so that they | ||
| 244 | can be distinguished from the original ones, e.g. in a backtrace. Also | ||
| 245 | they can be declared as static because they are not called directly | ||
| 246 | and do not need the global visibility. | ||
| 247 | |||
| 248 | The patch contains only functions that are really modified. But they | ||
| 249 | might want to access functions or data from the original source file | ||
| 250 | that may only be locally accessible. This can be solved by a special | ||
| 251 | relocation section in the generated livepatch module, see | ||
| 252 | Documentation/livepatch/module-elf-format.rst for more details. | ||
| 253 | |||
| 254 | |||
| 255 | 4.2. Metadata | ||
| 256 | ------------- | ||
| 257 | |||
| 258 | The patch is described by several structures that split the information | ||
| 259 | into three levels: | ||
| 260 | |||
| 261 | - struct klp_func is defined for each patched function. It describes | ||
| 262 | the relation between the original and the new implementation of a | ||
| 263 | particular function. | ||
| 264 | |||
| 265 | The structure includes the name, as a string, of the original function. | ||
| 266 | The function address is found via kallsyms at runtime. | ||
| 267 | |||
| 268 | Then it includes the address of the new function. It is defined | ||
| 269 | directly by assigning the function pointer. Note that the new | ||
| 270 | function is typically defined in the same source file. | ||
| 271 | |||
| 272 | As an optional parameter, the symbol position in the kallsyms database can | ||
| 273 | be used to disambiguate functions of the same name. This is not the | ||
| 274 | absolute position in the database, but rather the order it has been found | ||
| 275 | only for a particular object ( vmlinux or a kernel module ). Note that | ||
| 276 | kallsyms allows for searching symbols according to the object name. | ||
| 277 | |||
| 278 | - struct klp_object defines an array of patched functions (struct | ||
| 279 | klp_func) in the same object. Where the object is either vmlinux | ||
| 280 | (NULL) or a module name. | ||
| 281 | |||
| 282 | The structure helps to group and handle functions for each object | ||
| 283 | together. Note that patched modules might be loaded later than | ||
| 284 | the patch itself and the relevant functions might be patched | ||
| 285 | only when they are available. | ||
| 286 | |||
| 287 | |||
| 288 | - struct klp_patch defines an array of patched objects (struct | ||
| 289 | klp_object). | ||
| 290 | |||
| 291 | This structure handles all patched functions consistently and eventually, | ||
| 292 | synchronously. The whole patch is applied only when all patched | ||
| 293 | symbols are found. The only exception are symbols from objects | ||
| 294 | (kernel modules) that have not been loaded yet. | ||
| 295 | |||
| 296 | For more details on how the patch is applied on a per-task basis, | ||
| 297 | see the "Consistency model" section. | ||
| 298 | |||
| 299 | |||
| 300 | 5. Livepatch life-cycle | ||
| 301 | ======================= | ||
| 302 | |||
| 303 | Livepatching can be described by five basic operations: | ||
| 304 | loading, enabling, replacing, disabling, removing. | ||
| 305 | |||
| 306 | Where the replacing and the disabling operations are mutually | ||
| 307 | exclusive. They have the same result for the given patch but | ||
| 308 | not for the system. | ||
| 309 | |||
| 310 | |||
| 311 | 5.1. Loading | ||
| 312 | ------------ | ||
| 313 | |||
| 314 | The only reasonable way is to enable the patch when the livepatch kernel | ||
| 315 | module is being loaded. For this, klp_enable_patch() has to be called | ||
| 316 | in the module_init() callback. There are two main reasons: | ||
| 317 | |||
| 318 | First, only the module has an easy access to the related struct klp_patch. | ||
| 319 | |||
| 320 | Second, the error code might be used to refuse loading the module when | ||
| 321 | the patch cannot get enabled. | ||
| 322 | |||
| 323 | |||
| 324 | 5.2. Enabling | ||
| 325 | ------------- | ||
| 326 | |||
| 327 | The livepatch gets enabled by calling klp_enable_patch() from | ||
| 328 | the module_init() callback. The system will start using the new | ||
| 329 | implementation of the patched functions at this stage. | ||
| 330 | |||
| 331 | First, the addresses of the patched functions are found according to their | ||
| 332 | names. The special relocations, mentioned in the section "New functions", | ||
| 333 | are applied. The relevant entries are created under | ||
| 334 | /sys/kernel/livepatch/<name>. The patch is rejected when any above | ||
| 335 | operation fails. | ||
| 336 | |||
| 337 | Second, livepatch enters into a transition state where tasks are converging | ||
| 338 | to the patched state. If an original function is patched for the first | ||
| 339 | time, a function specific struct klp_ops is created and an universal | ||
| 340 | ftrace handler is registered\ [#]_. This stage is indicated by a value of '1' | ||
| 341 | in /sys/kernel/livepatch/<name>/transition. For more information about | ||
| 342 | this process, see the "Consistency model" section. | ||
| 343 | |||
| 344 | Finally, once all tasks have been patched, the 'transition' value changes | ||
| 345 | to '0'. | ||
| 346 | |||
| 347 | .. [#] | ||
| 348 | |||
| 349 | Note that functions might be patched multiple times. The ftrace handler | ||
| 350 | is registered only once for a given function. Further patches just add | ||
| 351 | an entry to the list (see field `func_stack`) of the struct klp_ops. | ||
| 352 | The right implementation is selected by the ftrace handler, see | ||
| 353 | the "Consistency model" section. | ||
| 354 | |||
| 355 | That said, it is highly recommended to use cumulative livepatches | ||
| 356 | because they help keeping the consistency of all changes. In this case, | ||
| 357 | functions might be patched two times only during the transition period. | ||
| 358 | |||
| 359 | |||
| 360 | 5.3. Replacing | ||
| 361 | -------------- | ||
| 362 | |||
| 363 | All enabled patches might get replaced by a cumulative patch that | ||
| 364 | has the .replace flag set. | ||
| 365 | |||
| 366 | Once the new patch is enabled and the 'transition' finishes then | ||
| 367 | all the functions (struct klp_func) associated with the replaced | ||
| 368 | patches are removed from the corresponding struct klp_ops. Also | ||
| 369 | the ftrace handler is unregistered and the struct klp_ops is | ||
| 370 | freed when the related function is not modified by the new patch | ||
| 371 | and func_stack list becomes empty. | ||
| 372 | |||
| 373 | See Documentation/livepatch/cumulative-patches.rst for more details. | ||
| 374 | |||
| 375 | |||
| 376 | 5.4. Disabling | ||
| 377 | -------------- | ||
| 378 | |||
| 379 | Enabled patches might get disabled by writing '0' to | ||
| 380 | /sys/kernel/livepatch/<name>/enabled. | ||
| 381 | |||
| 382 | First, livepatch enters into a transition state where tasks are converging | ||
| 383 | to the unpatched state. The system starts using either the code from | ||
| 384 | the previously enabled patch or even the original one. This stage is | ||
| 385 | indicated by a value of '1' in /sys/kernel/livepatch/<name>/transition. | ||
| 386 | For more information about this process, see the "Consistency model" | ||
| 387 | section. | ||
| 388 | |||
| 389 | Second, once all tasks have been unpatched, the 'transition' value changes | ||
| 390 | to '0'. All the functions (struct klp_func) associated with the to-be-disabled | ||
| 391 | patch are removed from the corresponding struct klp_ops. The ftrace handler | ||
| 392 | is unregistered and the struct klp_ops is freed when the func_stack list | ||
| 393 | becomes empty. | ||
| 394 | |||
| 395 | Third, the sysfs interface is destroyed. | ||
| 396 | |||
| 397 | |||
| 398 | 5.5. Removing | ||
| 399 | ------------- | ||
| 400 | |||
| 401 | Module removal is only safe when there are no users of functions provided | ||
| 402 | by the module. This is the reason why the force feature permanently | ||
| 403 | disables the removal. Only when the system is successfully transitioned | ||
| 404 | to a new patch state (patched/unpatched) without being forced it is | ||
| 405 | guaranteed that no task sleeps or runs in the old code. | ||
| 406 | |||
| 407 | |||
| 408 | 6. Sysfs | ||
| 409 | ======== | ||
| 410 | |||
| 411 | Information about the registered patches can be found under | ||
| 412 | /sys/kernel/livepatch. The patches could be enabled and disabled | ||
| 413 | by writing there. | ||
| 414 | |||
| 415 | /sys/kernel/livepatch/<patch>/force attributes allow administrator to affect a | ||
| 416 | patching operation. | ||
| 417 | |||
| 418 | See Documentation/ABI/testing/sysfs-kernel-livepatch for more details. | ||
| 419 | |||
| 420 | |||
| 421 | 7. Limitations | ||
| 422 | ============== | ||
| 423 | |||
| 424 | The current Livepatch implementation has several limitations: | ||
| 425 | |||
| 426 | - Only functions that can be traced could be patched. | ||
| 427 | |||
| 428 | Livepatch is based on the dynamic ftrace. In particular, functions | ||
| 429 | implementing ftrace or the livepatch ftrace handler could not be | ||
| 430 | patched. Otherwise, the code would end up in an infinite loop. A | ||
| 431 | potential mistake is prevented by marking the problematic functions | ||
| 432 | by "notrace". | ||
| 433 | |||
| 434 | |||
| 435 | |||
| 436 | - Livepatch works reliably only when the dynamic ftrace is located at | ||
| 437 | the very beginning of the function. | ||
| 438 | |||
| 439 | The function need to be redirected before the stack or the function | ||
| 440 | parameters are modified in any way. For example, livepatch requires | ||
| 441 | using -fentry gcc compiler option on x86_64. | ||
| 442 | |||
| 443 | One exception is the PPC port. It uses relative addressing and TOC. | ||
| 444 | Each function has to handle TOC and save LR before it could call | ||
| 445 | the ftrace handler. This operation has to be reverted on return. | ||
| 446 | Fortunately, the generic ftrace code has the same problem and all | ||
| 447 | this is handled on the ftrace level. | ||
| 448 | |||
| 449 | |||
| 450 | - Kretprobes using the ftrace framework conflict with the patched | ||
| 451 | functions. | ||
| 452 | |||
| 453 | Both kretprobes and livepatches use a ftrace handler that modifies | ||
| 454 | the return address. The first user wins. Either the probe or the patch | ||
| 455 | is rejected when the handler is already in use by the other. | ||
| 456 | |||
| 457 | |||
| 458 | - Kprobes in the original function are ignored when the code is | ||
| 459 | redirected to the new implementation. | ||
| 460 | |||
| 461 | There is a work in progress to add warnings about this situation. | ||
