diff options
Diffstat (limited to 'Documentation/RCU/checklist.txt')
-rw-r--r-- | Documentation/RCU/checklist.txt | 215 |
1 files changed, 130 insertions, 85 deletions
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 51525a30e8b4..790d1a812376 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt | |||
@@ -8,13 +8,12 @@ would cause. This list is based on experiences reviewing such patches | |||
8 | over a rather long period of time, but improvements are always welcome! | 8 | over a rather long period of time, but improvements are always welcome! |
9 | 9 | ||
10 | 0. Is RCU being applied to a read-mostly situation? If the data | 10 | 0. Is RCU being applied to a read-mostly situation? If the data |
11 | structure is updated more than about 10% of the time, then | 11 | structure is updated more than about 10% of the time, then you |
12 | you should strongly consider some other approach, unless | 12 | should strongly consider some other approach, unless detailed |
13 | detailed performance measurements show that RCU is nonetheless | 13 | performance measurements show that RCU is nonetheless the right |
14 | the right tool for the job. Yes, you might think of RCU | 14 | tool for the job. Yes, RCU does reduce read-side overhead by |
15 | as simply cutting overhead off of the readers and imposing it | 15 | increasing write-side overhead, which is exactly why normal uses |
16 | on the writers. That is exactly why normal uses of RCU will | 16 | of RCU will do much more reading than updating. |
17 | do much more reading than updating. | ||
18 | 17 | ||
19 | Another exception is where performance is not an issue, and RCU | 18 | Another exception is where performance is not an issue, and RCU |
20 | provides a simpler implementation. An example of this situation | 19 | provides a simpler implementation. An example of this situation |
@@ -35,13 +34,13 @@ over a rather long period of time, but improvements are always welcome! | |||
35 | 34 | ||
36 | If you choose #b, be prepared to describe how you have handled | 35 | If you choose #b, be prepared to describe how you have handled |
37 | memory barriers on weakly ordered machines (pretty much all of | 36 | memory barriers on weakly ordered machines (pretty much all of |
38 | them -- even x86 allows reads to be reordered), and be prepared | 37 | them -- even x86 allows later loads to be reordered to precede |
39 | to explain why this added complexity is worthwhile. If you | 38 | earlier stores), and be prepared to explain why this added |
40 | choose #c, be prepared to explain how this single task does not | 39 | complexity is worthwhile. If you choose #c, be prepared to |
41 | become a major bottleneck on big multiprocessor machines (for | 40 | explain how this single task does not become a major bottleneck on |
42 | example, if the task is updating information relating to itself | 41 | big multiprocessor machines (for example, if the task is updating |
43 | that other tasks can read, there by definition can be no | 42 | information relating to itself that other tasks can read, there |
44 | bottleneck). | 43 | by definition can be no bottleneck). |
45 | 44 | ||
46 | 2. Do the RCU read-side critical sections make proper use of | 45 | 2. Do the RCU read-side critical sections make proper use of |
47 | rcu_read_lock() and friends? These primitives are needed | 46 | rcu_read_lock() and friends? These primitives are needed |
@@ -51,8 +50,10 @@ over a rather long period of time, but improvements are always welcome! | |||
51 | actuarial risk of your kernel. | 50 | actuarial risk of your kernel. |
52 | 51 | ||
53 | As a rough rule of thumb, any dereference of an RCU-protected | 52 | As a rough rule of thumb, any dereference of an RCU-protected |
54 | pointer must be covered by rcu_read_lock() or rcu_read_lock_bh() | 53 | pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), |
55 | or by the appropriate update-side lock. | 54 | rcu_read_lock_sched(), or by the appropriate update-side lock. |
55 | Disabling of preemption can serve as rcu_read_lock_sched(), but | ||
56 | is less readable. | ||
56 | 57 | ||
57 | 3. Does the update code tolerate concurrent accesses? | 58 | 3. Does the update code tolerate concurrent accesses? |
58 | 59 | ||
@@ -62,25 +63,27 @@ over a rather long period of time, but improvements are always welcome! | |||
62 | of ways to handle this concurrency, depending on the situation: | 63 | of ways to handle this concurrency, depending on the situation: |
63 | 64 | ||
64 | a. Use the RCU variants of the list and hlist update | 65 | a. Use the RCU variants of the list and hlist update |
65 | primitives to add, remove, and replace elements on an | 66 | primitives to add, remove, and replace elements on |
66 | RCU-protected list. Alternatively, use the RCU-protected | 67 | an RCU-protected list. Alternatively, use the other |
67 | trees that have been added to the Linux kernel. | 68 | RCU-protected data structures that have been added to |
69 | the Linux kernel. | ||
68 | 70 | ||
69 | This is almost always the best approach. | 71 | This is almost always the best approach. |
70 | 72 | ||
71 | b. Proceed as in (a) above, but also maintain per-element | 73 | b. Proceed as in (a) above, but also maintain per-element |
72 | locks (that are acquired by both readers and writers) | 74 | locks (that are acquired by both readers and writers) |
73 | that guard per-element state. Of course, fields that | 75 | that guard per-element state. Of course, fields that |
74 | the readers refrain from accessing can be guarded by the | 76 | the readers refrain from accessing can be guarded by |
75 | update-side lock. | 77 | some other lock acquired only by updaters, if desired. |
76 | 78 | ||
77 | This works quite well, also. | 79 | This works quite well, also. |
78 | 80 | ||
79 | c. Make updates appear atomic to readers. For example, | 81 | c. Make updates appear atomic to readers. For example, |
80 | pointer updates to properly aligned fields will appear | 82 | pointer updates to properly aligned fields will |
81 | atomic, as will individual atomic primitives. Operations | 83 | appear atomic, as will individual atomic primitives. |
82 | performed under a lock and sequences of multiple atomic | 84 | Sequences of perations performed under a lock will -not- |
83 | primitives will -not- appear to be atomic. | 85 | appear to be atomic to RCU readers, nor will sequences |
86 | of multiple atomic primitives. | ||
84 | 87 | ||
85 | This can work, but is starting to get a bit tricky. | 88 | This can work, but is starting to get a bit tricky. |
86 | 89 | ||
@@ -98,9 +101,9 @@ over a rather long period of time, but improvements are always welcome! | |||
98 | a new structure containing updated values. | 101 | a new structure containing updated values. |
99 | 102 | ||
100 | 4. Weakly ordered CPUs pose special challenges. Almost all CPUs | 103 | 4. Weakly ordered CPUs pose special challenges. Almost all CPUs |
101 | are weakly ordered -- even i386 CPUs allow reads to be reordered. | 104 | are weakly ordered -- even x86 CPUs allow later loads to be |
102 | RCU code must take all of the following measures to prevent | 105 | reordered to precede earlier stores. RCU code must take all of |
103 | memory-corruption problems: | 106 | the following measures to prevent memory-corruption problems: |
104 | 107 | ||
105 | a. Readers must maintain proper ordering of their memory | 108 | a. Readers must maintain proper ordering of their memory |
106 | accesses. The rcu_dereference() primitive ensures that | 109 | accesses. The rcu_dereference() primitive ensures that |
@@ -113,14 +116,25 @@ over a rather long period of time, but improvements are always welcome! | |||
113 | The rcu_dereference() primitive is also an excellent | 116 | The rcu_dereference() primitive is also an excellent |
114 | documentation aid, letting the person reading the code | 117 | documentation aid, letting the person reading the code |
115 | know exactly which pointers are protected by RCU. | 118 | know exactly which pointers are protected by RCU. |
116 | 119 | Please note that compilers can also reorder code, and | |
117 | The rcu_dereference() primitive is used by the various | 120 | they are becoming increasingly aggressive about doing |
118 | "_rcu()" list-traversal primitives, such as the | 121 | just that. The rcu_dereference() primitive therefore |
119 | list_for_each_entry_rcu(). Note that it is perfectly | 122 | also prevents destructive compiler optimizations. |
120 | legal (if redundant) for update-side code to use | 123 | |
121 | rcu_dereference() and the "_rcu()" list-traversal | 124 | The rcu_dereference() primitive is used by the |
122 | primitives. This is particularly useful in code | 125 | various "_rcu()" list-traversal primitives, such |
123 | that is common to readers and updaters. | 126 | as the list_for_each_entry_rcu(). Note that it is |
127 | perfectly legal (if redundant) for update-side code to | ||
128 | use rcu_dereference() and the "_rcu()" list-traversal | ||
129 | primitives. This is particularly useful in code that | ||
130 | is common to readers and updaters. However, lockdep | ||
131 | will complain if you access rcu_dereference() outside | ||
132 | of an RCU read-side critical section. See lockdep.txt | ||
133 | to learn what to do about this. | ||
134 | |||
135 | Of course, neither rcu_dereference() nor the "_rcu()" | ||
136 | list-traversal primitives can substitute for a good | ||
137 | concurrency design coordinating among multiple updaters. | ||
124 | 138 | ||
125 | b. If the list macros are being used, the list_add_tail_rcu() | 139 | b. If the list macros are being used, the list_add_tail_rcu() |
126 | and list_add_rcu() primitives must be used in order | 140 | and list_add_rcu() primitives must be used in order |
@@ -135,11 +149,14 @@ over a rather long period of time, but improvements are always welcome! | |||
135 | readers. Similarly, if the hlist macros are being used, | 149 | readers. Similarly, if the hlist macros are being used, |
136 | the hlist_del_rcu() primitive is required. | 150 | the hlist_del_rcu() primitive is required. |
137 | 151 | ||
138 | The list_replace_rcu() primitive may be used to | 152 | The list_replace_rcu() and hlist_replace_rcu() primitives |
139 | replace an old structure with a new one in an | 153 | may be used to replace an old structure with a new one |
140 | RCU-protected list. | 154 | in their respective types of RCU-protected lists. |
155 | |||
156 | d. Rules similar to (4b) and (4c) apply to the "hlist_nulls" | ||
157 | type of RCU-protected linked lists. | ||
141 | 158 | ||
142 | d. Updates must ensure that initialization of a given | 159 | e. Updates must ensure that initialization of a given |
143 | structure happens before pointers to that structure are | 160 | structure happens before pointers to that structure are |
144 | publicized. Use the rcu_assign_pointer() primitive | 161 | publicized. Use the rcu_assign_pointer() primitive |
145 | when publicizing a pointer to a structure that can | 162 | when publicizing a pointer to a structure that can |
@@ -151,16 +168,31 @@ over a rather long period of time, but improvements are always welcome! | |||
151 | it cannot block. | 168 | it cannot block. |
152 | 169 | ||
153 | 6. Since synchronize_rcu() can block, it cannot be called from | 170 | 6. Since synchronize_rcu() can block, it cannot be called from |
154 | any sort of irq context. Ditto for synchronize_sched() and | 171 | any sort of irq context. The same rule applies for |
155 | synchronize_srcu(). | 172 | synchronize_rcu_bh(), synchronize_sched(), synchronize_srcu(), |
156 | 173 | synchronize_rcu_expedited(), synchronize_rcu_bh_expedited(), | |
157 | 7. If the updater uses call_rcu(), then the corresponding readers | 174 | synchronize_sched_expedite(), and synchronize_srcu_expedited(). |
158 | must use rcu_read_lock() and rcu_read_unlock(). If the updater | 175 | |
159 | uses call_rcu_bh(), then the corresponding readers must use | 176 | The expedited forms of these primitives have the same semantics |
160 | rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater | 177 | as the non-expedited forms, but expediting is both expensive |
161 | uses call_rcu_sched(), then the corresponding readers must | 178 | and unfriendly to real-time workloads. Use of the expedited |
162 | disable preemption. Mixing things up will result in confusion | 179 | primitives should be restricted to rare configuration-change |
163 | and broken kernels. | 180 | operations that would not normally be undertaken while a real-time |
181 | workload is running. | ||
182 | |||
183 | 7. If the updater uses call_rcu() or synchronize_rcu(), then the | ||
184 | corresponding readers must use rcu_read_lock() and | ||
185 | rcu_read_unlock(). If the updater uses call_rcu_bh() or | ||
186 | synchronize_rcu_bh(), then the corresponding readers must | ||
187 | use rcu_read_lock_bh() and rcu_read_unlock_bh(). If the | ||
188 | updater uses call_rcu_sched() or synchronize_sched(), then | ||
189 | the corresponding readers must disable preemption, possibly | ||
190 | by calling rcu_read_lock_sched() and rcu_read_unlock_sched(). | ||
191 | If the updater uses synchronize_srcu(), the the corresponding | ||
192 | readers must use srcu_read_lock() and srcu_read_unlock(), | ||
193 | and with the same srcu_struct. The rules for the expedited | ||
194 | primitives are the same as for their non-expedited counterparts. | ||
195 | Mixing things up will result in confusion and broken kernels. | ||
164 | 196 | ||
165 | One exception to this rule: rcu_read_lock() and rcu_read_unlock() | 197 | One exception to this rule: rcu_read_lock() and rcu_read_unlock() |
166 | may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() | 198 | may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() |
@@ -212,6 +244,8 @@ over a rather long period of time, but improvements are always welcome! | |||
212 | e. Periodically invoke synchronize_rcu(), permitting a limited | 244 | e. Periodically invoke synchronize_rcu(), permitting a limited |
213 | number of updates per grace period. | 245 | number of updates per grace period. |
214 | 246 | ||
247 | The same cautions apply to call_rcu_bh() and call_rcu_sched(). | ||
248 | |||
215 | 9. All RCU list-traversal primitives, which include | 249 | 9. All RCU list-traversal primitives, which include |
216 | rcu_dereference(), list_for_each_entry_rcu(), | 250 | rcu_dereference(), list_for_each_entry_rcu(), |
217 | list_for_each_continue_rcu(), and list_for_each_safe_rcu(), | 251 | list_for_each_continue_rcu(), and list_for_each_safe_rcu(), |
@@ -219,17 +253,21 @@ over a rather long period of time, but improvements are always welcome! | |||
219 | must be protected by appropriate update-side locks. RCU | 253 | must be protected by appropriate update-side locks. RCU |
220 | read-side critical sections are delimited by rcu_read_lock() | 254 | read-side critical sections are delimited by rcu_read_lock() |
221 | and rcu_read_unlock(), or by similar primitives such as | 255 | and rcu_read_unlock(), or by similar primitives such as |
222 | rcu_read_lock_bh() and rcu_read_unlock_bh(). | 256 | rcu_read_lock_bh() and rcu_read_unlock_bh(), in which case |
257 | the matching rcu_dereference() primitive must be used in order | ||
258 | to keep lockdep happy, in this case, rcu_dereference_bh(). | ||
223 | 259 | ||
224 | The reason that it is permissible to use RCU list-traversal | 260 | The reason that it is permissible to use RCU list-traversal |
225 | primitives when the update-side lock is held is that doing so | 261 | primitives when the update-side lock is held is that doing so |
226 | can be quite helpful in reducing code bloat when common code is | 262 | can be quite helpful in reducing code bloat when common code is |
227 | shared between readers and updaters. | 263 | shared between readers and updaters. Additional primitives |
264 | are provided for this case, as discussed in lockdep.txt. | ||
228 | 265 | ||
229 | 10. Conversely, if you are in an RCU read-side critical section, | 266 | 10. Conversely, if you are in an RCU read-side critical section, |
230 | and you don't hold the appropriate update-side lock, you -must- | 267 | and you don't hold the appropriate update-side lock, you -must- |
231 | use the "_rcu()" variants of the list macros. Failing to do so | 268 | use the "_rcu()" variants of the list macros. Failing to do so |
232 | will break Alpha and confuse people reading your code. | 269 | will break Alpha, cause aggressive compilers to generate bad code, |
270 | and confuse people trying to read your code. | ||
233 | 271 | ||
234 | 11. Note that synchronize_rcu() -only- guarantees to wait until | 272 | 11. Note that synchronize_rcu() -only- guarantees to wait until |
235 | all currently executing rcu_read_lock()-protected RCU read-side | 273 | all currently executing rcu_read_lock()-protected RCU read-side |
@@ -239,15 +277,21 @@ over a rather long period of time, but improvements are always welcome! | |||
239 | rcu_read_lock()-protected read-side critical sections, do -not- | 277 | rcu_read_lock()-protected read-side critical sections, do -not- |
240 | use synchronize_rcu(). | 278 | use synchronize_rcu(). |
241 | 279 | ||
242 | If you want to wait for some of these other things, you might | 280 | Similarly, disabling preemption is not an acceptable substitute |
243 | instead need to use synchronize_irq() or synchronize_sched(). | 281 | for rcu_read_lock(). Code that attempts to use preemption |
282 | disabling where it should be using rcu_read_lock() will break | ||
283 | in real-time kernel builds. | ||
284 | |||
285 | If you want to wait for interrupt handlers, NMI handlers, and | ||
286 | code under the influence of preempt_disable(), you instead | ||
287 | need to use synchronize_irq() or synchronize_sched(). | ||
244 | 288 | ||
245 | 12. Any lock acquired by an RCU callback must be acquired elsewhere | 289 | 12. Any lock acquired by an RCU callback must be acquired elsewhere |
246 | with softirq disabled, e.g., via spin_lock_irqsave(), | 290 | with softirq disabled, e.g., via spin_lock_irqsave(), |
247 | spin_lock_bh(), etc. Failing to disable irq on a given | 291 | spin_lock_bh(), etc. Failing to disable irq on a given |
248 | acquisition of that lock will result in deadlock as soon as the | 292 | acquisition of that lock will result in deadlock as soon as |
249 | RCU callback happens to interrupt that acquisition's critical | 293 | the RCU softirq handler happens to run your RCU callback while |
250 | section. | 294 | interrupting that acquisition's critical section. |
251 | 295 | ||
252 | 13. RCU callbacks can be and are executed in parallel. In many cases, | 296 | 13. RCU callbacks can be and are executed in parallel. In many cases, |
253 | the callback code simply wrappers around kfree(), so that this | 297 | the callback code simply wrappers around kfree(), so that this |
@@ -265,29 +309,30 @@ over a rather long period of time, but improvements are always welcome! | |||
265 | not the case, a self-spawning RCU callback would prevent the | 309 | not the case, a self-spawning RCU callback would prevent the |
266 | victim CPU from ever going offline.) | 310 | victim CPU from ever going offline.) |
267 | 311 | ||
268 | 14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) | 312 | 14. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(), |
269 | may only be invoked from process context. Unlike other forms of | 313 | synchronize_srcu(), and synchronize_srcu_expedited()) may only |
270 | RCU, it -is- permissible to block in an SRCU read-side critical | 314 | be invoked from process context. Unlike other forms of RCU, it |
271 | section (demarked by srcu_read_lock() and srcu_read_unlock()), | 315 | -is- permissible to block in an SRCU read-side critical section |
272 | hence the "SRCU": "sleepable RCU". Please note that if you | 316 | (demarked by srcu_read_lock() and srcu_read_unlock()), hence the |
273 | don't need to sleep in read-side critical sections, you should | 317 | "SRCU": "sleepable RCU". Please note that if you don't need |
274 | be using RCU rather than SRCU, because RCU is almost always | 318 | to sleep in read-side critical sections, you should be using |
275 | faster and easier to use than is SRCU. | 319 | RCU rather than SRCU, because RCU is almost always faster and |
320 | easier to use than is SRCU. | ||
276 | 321 | ||
277 | Also unlike other forms of RCU, explicit initialization | 322 | Also unlike other forms of RCU, explicit initialization |
278 | and cleanup is required via init_srcu_struct() and | 323 | and cleanup is required via init_srcu_struct() and |
279 | cleanup_srcu_struct(). These are passed a "struct srcu_struct" | 324 | cleanup_srcu_struct(). These are passed a "struct srcu_struct" |
280 | that defines the scope of a given SRCU domain. Once initialized, | 325 | that defines the scope of a given SRCU domain. Once initialized, |
281 | the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() | 326 | the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() |
282 | and synchronize_srcu(). A given synchronize_srcu() waits only | 327 | synchronize_srcu(), and synchronize_srcu_expedited(). A given |
283 | for SRCU read-side critical sections governed by srcu_read_lock() | 328 | synchronize_srcu() waits only for SRCU read-side critical |
284 | and srcu_read_unlock() calls that have been passd the same | 329 | sections governed by srcu_read_lock() and srcu_read_unlock() |
285 | srcu_struct. This property is what makes sleeping read-side | 330 | calls that have been passed the same srcu_struct. This property |
286 | critical sections tolerable -- a given subsystem delays only | 331 | is what makes sleeping read-side critical sections tolerable -- |
287 | its own updates, not those of other subsystems using SRCU. | 332 | a given subsystem delays only its own updates, not those of other |
288 | Therefore, SRCU is less prone to OOM the system than RCU would | 333 | subsystems using SRCU. Therefore, SRCU is less prone to OOM the |
289 | be if RCU's read-side critical sections were permitted to | 334 | system than RCU would be if RCU's read-side critical sections |
290 | sleep. | 335 | were permitted to sleep. |
291 | 336 | ||
292 | The ability to sleep in read-side critical sections does not | 337 | The ability to sleep in read-side critical sections does not |
293 | come for free. First, corresponding srcu_read_lock() and | 338 | come for free. First, corresponding srcu_read_lock() and |
@@ -300,8 +345,8 @@ over a rather long period of time, but improvements are always welcome! | |||
300 | requiring SRCU's read-side deadlock immunity or low read-side | 345 | requiring SRCU's read-side deadlock immunity or low read-side |
301 | realtime latency. | 346 | realtime latency. |
302 | 347 | ||
303 | Note that, rcu_assign_pointer() and rcu_dereference() relate to | 348 | Note that, rcu_assign_pointer() relates to SRCU just as they do |
304 | SRCU just as they do to other forms of RCU. | 349 | to other forms of RCU. |
305 | 350 | ||
306 | 15. The whole point of call_rcu(), synchronize_rcu(), and friends | 351 | 15. The whole point of call_rcu(), synchronize_rcu(), and friends |
307 | is to wait until all pre-existing readers have finished before | 352 | is to wait until all pre-existing readers have finished before |
@@ -311,12 +356,12 @@ over a rather long period of time, but improvements are always welcome! | |||
311 | destructive operation, and -only- -then- invoke call_rcu(), | 356 | destructive operation, and -only- -then- invoke call_rcu(), |
312 | synchronize_rcu(), or friends. | 357 | synchronize_rcu(), or friends. |
313 | 358 | ||
314 | Because these primitives only wait for pre-existing readers, | 359 | Because these primitives only wait for pre-existing readers, it |
315 | it is the caller's responsibility to guarantee safety to | 360 | is the caller's responsibility to guarantee that any subsequent |
316 | any subsequent readers. | 361 | readers will execute safely. |
317 | 362 | ||
318 | 16. The various RCU read-side primitives do -not- contain memory | 363 | 16. The various RCU read-side primitives do -not- necessarily contain |
319 | barriers. The CPU (and in some cases, the compiler) is free | 364 | memory barriers. You should therefore plan for the CPU |
320 | to reorder code into and out of RCU read-side critical sections. | 365 | and the compiler to freely reorder code into and out of RCU |
321 | It is the responsibility of the RCU update-side primitives to | 366 | read-side critical sections. It is the responsibility of the |
322 | deal with this. | 367 | RCU update-side primitives to deal with this. |