diff options
| -rw-r--r-- | Documentation/static-keys.txt | 207 |
1 files changed, 108 insertions, 99 deletions
diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt index ef419fd0897f..b83dfa1c0602 100644 --- a/Documentation/static-keys.txt +++ b/Documentation/static-keys.txt | |||
| @@ -1,30 +1,34 @@ | |||
| 1 | Static Keys | 1 | =========== |
| 2 | ----------- | 2 | Static Keys |
| 3 | =========== | ||
| 3 | 4 | ||
| 4 | DEPRECATED API: | 5 | .. warning:: |
| 5 | 6 | ||
| 6 | The use of 'struct static_key' directly, is now DEPRECATED. In addition | 7 | DEPRECATED API: |
| 7 | static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following: | ||
| 8 | 8 | ||
| 9 | struct static_key false = STATIC_KEY_INIT_FALSE; | 9 | The use of 'struct static_key' directly, is now DEPRECATED. In addition |
| 10 | struct static_key true = STATIC_KEY_INIT_TRUE; | 10 | static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:: |
| 11 | static_key_true() | ||
| 12 | static_key_false() | ||
| 13 | 11 | ||
| 14 | The updated API replacements are: | 12 | struct static_key false = STATIC_KEY_INIT_FALSE; |
| 13 | struct static_key true = STATIC_KEY_INIT_TRUE; | ||
| 14 | static_key_true() | ||
| 15 | static_key_false() | ||
| 15 | 16 | ||
| 16 | DEFINE_STATIC_KEY_TRUE(key); | 17 | The updated API replacements are:: |
| 17 | DEFINE_STATIC_KEY_FALSE(key); | ||
| 18 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); | ||
| 19 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); | ||
| 20 | static_branch_likely() | ||
| 21 | static_branch_unlikely() | ||
| 22 | 18 | ||
| 23 | 0) Abstract | 19 | DEFINE_STATIC_KEY_TRUE(key); |
| 20 | DEFINE_STATIC_KEY_FALSE(key); | ||
| 21 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); | ||
| 22 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); | ||
| 23 | static_branch_likely() | ||
| 24 | static_branch_unlikely() | ||
| 25 | |||
| 26 | Abstract | ||
| 27 | ======== | ||
| 24 | 28 | ||
| 25 | Static keys allows the inclusion of seldom used features in | 29 | Static keys allows the inclusion of seldom used features in |
| 26 | performance-sensitive fast-path kernel code, via a GCC feature and a code | 30 | performance-sensitive fast-path kernel code, via a GCC feature and a code |
| 27 | patching technique. A quick example: | 31 | patching technique. A quick example:: |
| 28 | 32 | ||
| 29 | DEFINE_STATIC_KEY_FALSE(key); | 33 | DEFINE_STATIC_KEY_FALSE(key); |
| 30 | 34 | ||
| @@ -45,7 +49,8 @@ The static_branch_unlikely() branch will be generated into the code with as litt | |||
| 45 | impact to the likely code path as possible. | 49 | impact to the likely code path as possible. |
| 46 | 50 | ||
| 47 | 51 | ||
| 48 | 1) Motivation | 52 | Motivation |
| 53 | ========== | ||
| 49 | 54 | ||
| 50 | 55 | ||
| 51 | Currently, tracepoints are implemented using a conditional branch. The | 56 | Currently, tracepoints are implemented using a conditional branch. The |
| @@ -60,7 +65,8 @@ possible. Although tracepoints are the original motivation for this work, other | |||
| 60 | kernel code paths should be able to make use of the static keys facility. | 65 | kernel code paths should be able to make use of the static keys facility. |
| 61 | 66 | ||
| 62 | 67 | ||
| 63 | 2) Solution | 68 | Solution |
| 69 | ======== | ||
| 64 | 70 | ||
| 65 | 71 | ||
| 66 | gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label: | 72 | gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label: |
| @@ -71,7 +77,7 @@ Using the 'asm goto', we can create branches that are either taken or not taken | |||
| 71 | by default, without the need to check memory. Then, at run-time, we can patch | 77 | by default, without the need to check memory. Then, at run-time, we can patch |
| 72 | the branch site to change the branch direction. | 78 | the branch site to change the branch direction. |
| 73 | 79 | ||
| 74 | For example, if we have a simple branch that is disabled by default: | 80 | For example, if we have a simple branch that is disabled by default:: |
| 75 | 81 | ||
| 76 | if (static_branch_unlikely(&key)) | 82 | if (static_branch_unlikely(&key)) |
| 77 | printk("I am the true branch\n"); | 83 | printk("I am the true branch\n"); |
| @@ -87,14 +93,15 @@ optimization. | |||
| 87 | This lowlevel patching mechanism is called 'jump label patching', and it gives | 93 | This lowlevel patching mechanism is called 'jump label patching', and it gives |
| 88 | the basis for the static keys facility. | 94 | the basis for the static keys facility. |
| 89 | 95 | ||
| 90 | 3) Static key label API, usage and examples: | 96 | Static key label API, usage and examples |
| 97 | ======================================== | ||
| 91 | 98 | ||
| 92 | 99 | ||
| 93 | In order to make use of this optimization you must first define a key: | 100 | In order to make use of this optimization you must first define a key:: |
| 94 | 101 | ||
| 95 | DEFINE_STATIC_KEY_TRUE(key); | 102 | DEFINE_STATIC_KEY_TRUE(key); |
| 96 | 103 | ||
| 97 | or: | 104 | or:: |
| 98 | 105 | ||
| 99 | DEFINE_STATIC_KEY_FALSE(key); | 106 | DEFINE_STATIC_KEY_FALSE(key); |
| 100 | 107 | ||
| @@ -102,14 +109,14 @@ or: | |||
| 102 | The key must be global, that is, it can't be allocated on the stack or dynamically | 109 | The key must be global, that is, it can't be allocated on the stack or dynamically |
| 103 | allocated at run-time. | 110 | allocated at run-time. |
| 104 | 111 | ||
| 105 | The key is then used in code as: | 112 | The key is then used in code as:: |
| 106 | 113 | ||
| 107 | if (static_branch_unlikely(&key)) | 114 | if (static_branch_unlikely(&key)) |
| 108 | do unlikely code | 115 | do unlikely code |
| 109 | else | 116 | else |
| 110 | do likely code | 117 | do likely code |
| 111 | 118 | ||
| 112 | Or: | 119 | Or:: |
| 113 | 120 | ||
| 114 | if (static_branch_likely(&key)) | 121 | if (static_branch_likely(&key)) |
| 115 | do likely code | 122 | do likely code |
| @@ -120,15 +127,15 @@ Keys defined via DEFINE_STATIC_KEY_TRUE(), or DEFINE_STATIC_KEY_FALSE, may | |||
| 120 | be used in either static_branch_likely() or static_branch_unlikely() | 127 | be used in either static_branch_likely() or static_branch_unlikely() |
| 121 | statements. | 128 | statements. |
| 122 | 129 | ||
| 123 | Branch(es) can be set true via: | 130 | Branch(es) can be set true via:: |
| 124 | 131 | ||
| 125 | static_branch_enable(&key); | 132 | static_branch_enable(&key); |
| 126 | 133 | ||
| 127 | or false via: | 134 | or false via:: |
| 128 | 135 | ||
| 129 | static_branch_disable(&key); | 136 | static_branch_disable(&key); |
| 130 | 137 | ||
| 131 | The branch(es) can then be switched via reference counts: | 138 | The branch(es) can then be switched via reference counts:: |
| 132 | 139 | ||
| 133 | static_branch_inc(&key); | 140 | static_branch_inc(&key); |
| 134 | ... | 141 | ... |
| @@ -142,11 +149,11 @@ static_branch_inc(), will change the branch back to true. Likewise, if the | |||
| 142 | key is initialized false, a 'static_branch_inc()', will change the branch to | 149 | key is initialized false, a 'static_branch_inc()', will change the branch to |
| 143 | true. And then a 'static_branch_dec()', will again make the branch false. | 150 | true. And then a 'static_branch_dec()', will again make the branch false. |
| 144 | 151 | ||
| 145 | Where an array of keys is required, it can be defined as: | 152 | Where an array of keys is required, it can be defined as:: |
| 146 | 153 | ||
| 147 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); | 154 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); |
| 148 | 155 | ||
| 149 | or: | 156 | or:: |
| 150 | 157 | ||
| 151 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); | 158 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); |
| 152 | 159 | ||
| @@ -159,96 +166,98 @@ simply fall back to a traditional, load, test, and jump sequence. Also, the | |||
| 159 | struct jump_entry table must be at least 4-byte aligned because the | 166 | struct jump_entry table must be at least 4-byte aligned because the |
| 160 | static_key->entry field makes use of the two least significant bits. | 167 | static_key->entry field makes use of the two least significant bits. |
| 161 | 168 | ||
| 162 | * select HAVE_ARCH_JUMP_LABEL, see: arch/x86/Kconfig | 169 | * ``select HAVE_ARCH_JUMP_LABEL``, |
| 163 | 170 | see: arch/x86/Kconfig | |
| 164 | * #define JUMP_LABEL_NOP_SIZE, see: arch/x86/include/asm/jump_label.h | ||
| 165 | 171 | ||
| 166 | * __always_inline bool arch_static_branch(struct static_key *key, bool branch), see: | 172 | * ``#define JUMP_LABEL_NOP_SIZE``, |
| 167 | arch/x86/include/asm/jump_label.h | 173 | see: arch/x86/include/asm/jump_label.h |
| 168 | 174 | ||
| 169 | * __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch), | 175 | * ``__always_inline bool arch_static_branch(struct static_key *key, bool branch)``, |
| 170 | see: arch/x86/include/asm/jump_label.h | 176 | see: arch/x86/include/asm/jump_label.h |
| 171 | 177 | ||
| 172 | * void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type), | 178 | * ``__always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)``, |
| 173 | see: arch/x86/kernel/jump_label.c | 179 | see: arch/x86/include/asm/jump_label.h |
| 174 | 180 | ||
| 175 | * __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type), | 181 | * ``void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)``, |
| 176 | see: arch/x86/kernel/jump_label.c | 182 | see: arch/x86/kernel/jump_label.c |
| 177 | 183 | ||
| 184 | * ``__init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type)``, | ||
| 185 | see: arch/x86/kernel/jump_label.c | ||
| 178 | 186 | ||
| 179 | * struct jump_entry, see: arch/x86/include/asm/jump_label.h | 187 | * ``struct jump_entry``, |
| 188 | see: arch/x86/include/asm/jump_label.h | ||
| 180 | 189 | ||
| 181 | 190 | ||
| 182 | 5) Static keys / jump label analysis, results (x86_64): | 191 | 5) Static keys / jump label analysis, results (x86_64): |
| 183 | 192 | ||
| 184 | 193 | ||
| 185 | As an example, let's add the following branch to 'getppid()', such that the | 194 | As an example, let's add the following branch to 'getppid()', such that the |
| 186 | system call now looks like: | 195 | system call now looks like:: |
| 187 | 196 | ||
| 188 | SYSCALL_DEFINE0(getppid) | 197 | SYSCALL_DEFINE0(getppid) |
| 189 | { | 198 | { |
| 190 | int pid; | 199 | int pid; |
| 191 | 200 | ||
| 192 | + if (static_branch_unlikely(&key)) | 201 | + if (static_branch_unlikely(&key)) |
| 193 | + printk("I am the true branch\n"); | 202 | + printk("I am the true branch\n"); |
| 194 | 203 | ||
| 195 | rcu_read_lock(); | 204 | rcu_read_lock(); |
| 196 | pid = task_tgid_vnr(rcu_dereference(current->real_parent)); | 205 | pid = task_tgid_vnr(rcu_dereference(current->real_parent)); |
| 197 | rcu_read_unlock(); | 206 | rcu_read_unlock(); |
| 198 | 207 | ||
| 199 | return pid; | 208 | return pid; |
| 200 | } | 209 | } |
| 201 | 210 | ||
| 202 | The resulting instructions with jump labels generated by GCC is: | 211 | The resulting instructions with jump labels generated by GCC is:: |
| 203 | 212 | ||
| 204 | ffffffff81044290 <sys_getppid>: | 213 | ffffffff81044290 <sys_getppid>: |
| 205 | ffffffff81044290: 55 push %rbp | 214 | ffffffff81044290: 55 push %rbp |
| 206 | ffffffff81044291: 48 89 e5 mov %rsp,%rbp | 215 | ffffffff81044291: 48 89 e5 mov %rsp,%rbp |
| 207 | ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9> | 216 | ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9> |
| 208 | ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax | 217 | ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax |
| 209 | ffffffff810442a0: 00 00 | 218 | ffffffff810442a0: 00 00 |
| 210 | ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax | 219 | ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax |
| 211 | ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax | 220 | ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax |
| 212 | ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi | 221 | ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi |
| 213 | ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr> | 222 | ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr> |
| 214 | ffffffff810442bc: 5d pop %rbp | 223 | ffffffff810442bc: 5d pop %rbp |
| 215 | ffffffff810442bd: 48 98 cltq | 224 | ffffffff810442bd: 48 98 cltq |
| 216 | ffffffff810442bf: c3 retq | 225 | ffffffff810442bf: c3 retq |
| 217 | ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi | 226 | ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi |
| 218 | ffffffff810442c7: 31 c0 xor %eax,%eax | 227 | ffffffff810442c7: 31 c0 xor %eax,%eax |
| 219 | ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk> | 228 | ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk> |
| 220 | ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9> | 229 | ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9> |
| 221 | 230 | ||
| 222 | Without the jump label optimization it looks like: | 231 | Without the jump label optimization it looks like:: |
| 223 | 232 | ||
| 224 | ffffffff810441f0 <sys_getppid>: | 233 | ffffffff810441f0 <sys_getppid>: |
| 225 | ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key> | 234 | ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key> |
| 226 | ffffffff810441f6: 55 push %rbp | 235 | ffffffff810441f6: 55 push %rbp |
| 227 | ffffffff810441f7: 48 89 e5 mov %rsp,%rbp | 236 | ffffffff810441f7: 48 89 e5 mov %rsp,%rbp |
| 228 | ffffffff810441fa: 85 c0 test %eax,%eax | 237 | ffffffff810441fa: 85 c0 test %eax,%eax |
| 229 | ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35> | 238 | ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35> |
| 230 | ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax | 239 | ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax |
| 231 | ffffffff81044205: 00 00 | 240 | ffffffff81044205: 00 00 |
| 232 | ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax | 241 | ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax |
| 233 | ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax | 242 | ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax |
| 234 | ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi | 243 | ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi |
| 235 | ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr> | 244 | ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr> |
| 236 | ffffffff81044221: 5d pop %rbp | 245 | ffffffff81044221: 5d pop %rbp |
| 237 | ffffffff81044222: 48 98 cltq | 246 | ffffffff81044222: 48 98 cltq |
| 238 | ffffffff81044224: c3 retq | 247 | ffffffff81044224: c3 retq |
| 239 | ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi | 248 | ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi |
| 240 | ffffffff8104422c: 31 c0 xor %eax,%eax | 249 | ffffffff8104422c: 31 c0 xor %eax,%eax |
| 241 | ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk> | 250 | ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk> |
| 242 | ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe> | 251 | ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe> |
| 243 | ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) | 252 | ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) |
| 244 | ffffffff8104423c: 00 00 00 00 | 253 | ffffffff8104423c: 00 00 00 00 |
| 245 | 254 | ||
| 246 | Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction | 255 | Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction |
| 247 | vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched | 256 | vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched |
| 248 | to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump | 257 | to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump |
| 249 | label case adds: | 258 | label case adds:: |
| 250 | 259 | ||
| 251 | 6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes. | 260 | 6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes. |
| 252 | 261 | ||
| 253 | If we then include the padding bytes, the jump label code saves, 16 total bytes | 262 | If we then include the padding bytes, the jump label code saves, 16 total bytes |
| 254 | of instruction memory for this small function. In this case the non-jump label | 263 | of instruction memory for this small function. In this case the non-jump label |
| @@ -262,7 +271,7 @@ Since there are a number of static key API uses in the scheduler paths, | |||
| 262 | 'pipe-test' (also known as 'perf bench sched pipe') can be used to show the | 271 | 'pipe-test' (also known as 'perf bench sched pipe') can be used to show the |
| 263 | performance improvement. Testing done on 3.3.0-rc2: | 272 | performance improvement. Testing done on 3.3.0-rc2: |
| 264 | 273 | ||
| 265 | jump label disabled: | 274 | jump label disabled:: |
| 266 | 275 | ||
| 267 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): | 276 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): |
| 268 | 277 | ||
| @@ -279,7 +288,7 @@ jump label disabled: | |||
| 279 | 288 | ||
| 280 | 1.601607384 seconds time elapsed ( +- 0.07% ) | 289 | 1.601607384 seconds time elapsed ( +- 0.07% ) |
| 281 | 290 | ||
| 282 | jump label enabled: | 291 | jump label enabled:: |
| 283 | 292 | ||
| 284 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): | 293 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): |
| 285 | 294 | ||
