diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2013-12-11 16:59:04 -0500 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2013-12-16 05:36:08 -0500 |
commit | 2ecf810121c7eae34473b8fa108112036bc61127 (patch) | |
tree | db2f48fec611baab7b9fa15f45c9be8c5587670c | |
parent | 962d9c5757b2ebd30678a88a4f475ae997f4d5dd (diff) |
Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt
The Documentation/memory-barriers.txt file was written before
the need for ACCESS_ONCE() was fully appreciated. It therefore
contains no ACCESS_ONCE() calls, which can be a problem when
people lift examples from it. This commit therefore adds
ACCESS_ONCE() calls.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-1-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-rw-r--r-- | Documentation/memory-barriers.txt | 206 |
1 files changed, 126 insertions, 80 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 020cccdbdd0c..1d067235b0bc 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -194,18 +194,22 @@ There are some minimal guarantees that may be expected of a CPU: | |||
194 | (*) On any given CPU, dependent memory accesses will be issued in order, with | 194 | (*) On any given CPU, dependent memory accesses will be issued in order, with |
195 | respect to itself. This means that for: | 195 | respect to itself. This means that for: |
196 | 196 | ||
197 | Q = P; D = *Q; | 197 | ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q); |
198 | 198 | ||
199 | the CPU will issue the following memory operations: | 199 | the CPU will issue the following memory operations: |
200 | 200 | ||
201 | Q = LOAD P, D = LOAD *Q | 201 | Q = LOAD P, D = LOAD *Q |
202 | 202 | ||
203 | and always in that order. | 203 | and always in that order. On most systems, smp_read_barrier_depends() |
204 | does nothing, but it is required for DEC Alpha. The ACCESS_ONCE() | ||
205 | is required to prevent compiler mischief. Please note that you | ||
206 | should normally use something like rcu_dereference() instead of | ||
207 | open-coding smp_read_barrier_depends(). | ||
204 | 208 | ||
205 | (*) Overlapping loads and stores within a particular CPU will appear to be | 209 | (*) Overlapping loads and stores within a particular CPU will appear to be |
206 | ordered within that CPU. This means that for: | 210 | ordered within that CPU. This means that for: |
207 | 211 | ||
208 | a = *X; *X = b; | 212 | a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b; |
209 | 213 | ||
210 | the CPU will only issue the following sequence of memory operations: | 214 | the CPU will only issue the following sequence of memory operations: |
211 | 215 | ||
@@ -213,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU: | |||
213 | 217 | ||
214 | And for: | 218 | And for: |
215 | 219 | ||
216 | *X = c; d = *X; | 220 | ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X); |
217 | 221 | ||
218 | the CPU will only issue: | 222 | the CPU will only issue: |
219 | 223 | ||
@@ -224,6 +228,41 @@ There are some minimal guarantees that may be expected of a CPU: | |||
224 | 228 | ||
225 | And there are a number of things that _must_ or _must_not_ be assumed: | 229 | And there are a number of things that _must_ or _must_not_ be assumed: |
226 | 230 | ||
231 | (*) It _must_not_ be assumed that the compiler will do what you want with | ||
232 | memory references that are not protected by ACCESS_ONCE(). Without | ||
233 | ACCESS_ONCE(), the compiler is within its rights to do all sorts | ||
234 | of "creative" transformations: | ||
235 | |||
236 | (-) Repeat the load, possibly getting a different value on the second | ||
237 | and subsequent loads. This is especially prone to happen when | ||
238 | register pressure is high. | ||
239 | |||
240 | (-) Merge adjacent loads and stores to the same location. The most | ||
241 | familiar example is the transformation from: | ||
242 | |||
243 | while (a) | ||
244 | do_something(); | ||
245 | |||
246 | to something like: | ||
247 | |||
248 | if (a) | ||
249 | for (;;) | ||
250 | do_something(); | ||
251 | |||
252 | Using ACCESS_ONCE() as follows prevents this sort of optimization: | ||
253 | |||
254 | while (ACCESS_ONCE(a)) | ||
255 | do_something(); | ||
256 | |||
257 | (-) "Store tearing", where a single store in the source code is split | ||
258 | into smaller stores in the object code. Note that gcc really | ||
259 | will do this on some architectures when storing certain constants. | ||
260 | It can be cheaper to do a series of immediate stores than to | ||
261 | form the constant in a register and then to store that register. | ||
262 | |||
263 | (-) "Load tearing", which splits loads in a manner analogous to | ||
264 | store tearing. | ||
265 | |||
227 | (*) It _must_not_ be assumed that independent loads and stores will be issued | 266 | (*) It _must_not_ be assumed that independent loads and stores will be issued |
228 | in the order given. This means that for: | 267 | in the order given. This means that for: |
229 | 268 | ||
@@ -450,14 +489,14 @@ The usage requirements of data dependency barriers are a little subtle, and | |||
450 | it's not always obvious that they're needed. To illustrate, consider the | 489 | it's not always obvious that they're needed. To illustrate, consider the |
451 | following sequence of events: | 490 | following sequence of events: |
452 | 491 | ||
453 | CPU 1 CPU 2 | 492 | CPU 1 CPU 2 |
454 | =============== =============== | 493 | =============== =============== |
455 | { A == 1, B == 2, C = 3, P == &A, Q == &C } | 494 | { A == 1, B == 2, C = 3, P == &A, Q == &C } |
456 | B = 4; | 495 | B = 4; |
457 | <write barrier> | 496 | <write barrier> |
458 | P = &B | 497 | ACCESS_ONCE(P) = &B |
459 | Q = P; | 498 | Q = ACCESS_ONCE(P); |
460 | D = *Q; | 499 | D = *Q; |
461 | 500 | ||
462 | There's a clear data dependency here, and it would seem that by the end of the | 501 | There's a clear data dependency here, and it would seem that by the end of the |
463 | sequence, Q must be either &A or &B, and that: | 502 | sequence, Q must be either &A or &B, and that: |
@@ -477,15 +516,15 @@ Alpha). | |||
477 | To deal with this, a data dependency barrier or better must be inserted | 516 | To deal with this, a data dependency barrier or better must be inserted |
478 | between the address load and the data load: | 517 | between the address load and the data load: |
479 | 518 | ||
480 | CPU 1 CPU 2 | 519 | CPU 1 CPU 2 |
481 | =============== =============== | 520 | =============== =============== |
482 | { A == 1, B == 2, C = 3, P == &A, Q == &C } | 521 | { A == 1, B == 2, C = 3, P == &A, Q == &C } |
483 | B = 4; | 522 | B = 4; |
484 | <write barrier> | 523 | <write barrier> |
485 | P = &B | 524 | ACCESS_ONCE(P) = &B |
486 | Q = P; | 525 | Q = ACCESS_ONCE(P); |
487 | <data dependency barrier> | 526 | <data dependency barrier> |
488 | D = *Q; | 527 | D = *Q; |
489 | 528 | ||
490 | This enforces the occurrence of one of the two implications, and prevents the | 529 | This enforces the occurrence of one of the two implications, and prevents the |
491 | third possibility from arising. | 530 | third possibility from arising. |
@@ -504,21 +543,22 @@ Another example of where data dependency barriers might be required is where a | |||
504 | number is read from memory and then used to calculate the index for an array | 543 | number is read from memory and then used to calculate the index for an array |
505 | access: | 544 | access: |
506 | 545 | ||
507 | CPU 1 CPU 2 | 546 | CPU 1 CPU 2 |
508 | =============== =============== | 547 | =============== =============== |
509 | { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } | 548 | { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } |
510 | M[1] = 4; | 549 | M[1] = 4; |
511 | <write barrier> | 550 | <write barrier> |
512 | P = 1 | 551 | ACCESS_ONCE(P) = 1 |
513 | Q = P; | 552 | Q = ACCESS_ONCE(P); |
514 | <data dependency barrier> | 553 | <data dependency barrier> |
515 | D = M[Q]; | 554 | D = M[Q]; |
516 | 555 | ||
517 | 556 | ||
518 | The data dependency barrier is very important to the RCU system, for example. | 557 | The data dependency barrier is very important to the RCU system, |
519 | See rcu_dereference() in include/linux/rcupdate.h. This permits the current | 558 | for example. See rcu_assign_pointer() and rcu_dereference() in |
520 | target of an RCU'd pointer to be replaced with a new modified target, without | 559 | include/linux/rcupdate.h. This permits the current target of an RCU'd |
521 | the replacement target appearing to be incompletely initialised. | 560 | pointer to be replaced with a new modified target, without the replacement |
561 | target appearing to be incompletely initialised. | ||
522 | 562 | ||
523 | See also the subsection on "Cache Coherency" for a more thorough example. | 563 | See also the subsection on "Cache Coherency" for a more thorough example. |
524 | 564 | ||
@@ -530,22 +570,23 @@ A control dependency requires a full read memory barrier, not simply a data | |||
530 | dependency barrier to make it work correctly. Consider the following bit of | 570 | dependency barrier to make it work correctly. Consider the following bit of |
531 | code: | 571 | code: |
532 | 572 | ||
533 | q = &a; | 573 | q = ACCESS_ONCE(a); |
534 | if (p) { | 574 | if (p) { |
535 | <data dependency barrier> | 575 | <data dependency barrier> |
536 | q = &b; | 576 | q = ACCESS_ONCE(b); |
537 | } | 577 | } |
538 | x = *q; | 578 | x = *q; |
539 | 579 | ||
540 | This will not have the desired effect because there is no actual data | 580 | This will not have the desired effect because there is no actual data |
541 | dependency, but rather a control dependency that the CPU may short-circuit by | 581 | dependency, but rather a control dependency that the CPU may short-circuit |
542 | attempting to predict the outcome in advance. In such a case what's actually | 582 | by attempting to predict the outcome in advance, so that other CPUs see |
543 | required is: | 583 | the load from b as having happened before the load from a. In such a |
584 | case what's actually required is: | ||
544 | 585 | ||
545 | q = &a; | 586 | q = ACCESS_ONCE(a); |
546 | if (p) { | 587 | if (p) { |
547 | <read barrier> | 588 | <read barrier> |
548 | q = &b; | 589 | q = ACCESS_ONCE(b); |
549 | } | 590 | } |
550 | x = *q; | 591 | x = *q; |
551 | 592 | ||
@@ -561,23 +602,23 @@ barrier, though a general barrier would also be viable. Similarly a read | |||
561 | barrier or a data dependency barrier should always be paired with at least an | 602 | barrier or a data dependency barrier should always be paired with at least an |
562 | write barrier, though, again, a general barrier is viable: | 603 | write barrier, though, again, a general barrier is viable: |
563 | 604 | ||
564 | CPU 1 CPU 2 | 605 | CPU 1 CPU 2 |
565 | =============== =============== | 606 | =============== =============== |
566 | a = 1; | 607 | ACCESS_ONCE(a) = 1; |
567 | <write barrier> | 608 | <write barrier> |
568 | b = 2; x = b; | 609 | ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b); |
569 | <read barrier> | 610 | <read barrier> |
570 | y = a; | 611 | y = ACCESS_ONCE(a); |
571 | 612 | ||
572 | Or: | 613 | Or: |
573 | 614 | ||
574 | CPU 1 CPU 2 | 615 | CPU 1 CPU 2 |
575 | =============== =============================== | 616 | =============== =============================== |
576 | a = 1; | 617 | a = 1; |
577 | <write barrier> | 618 | <write barrier> |
578 | b = &a; x = b; | 619 | ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b); |
579 | <data dependency barrier> | 620 | <data dependency barrier> |
580 | y = *x; | 621 | y = *x; |
581 | 622 | ||
582 | Basically, the read barrier always has to be there, even though it can be of | 623 | Basically, the read barrier always has to be there, even though it can be of |
583 | the "weaker" type. | 624 | the "weaker" type. |
@@ -586,13 +627,13 @@ the "weaker" type. | |||
586 | match the loads after the read barrier or the data dependency barrier, and vice | 627 | match the loads after the read barrier or the data dependency barrier, and vice |
587 | versa: | 628 | versa: |
588 | 629 | ||
589 | CPU 1 CPU 2 | 630 | CPU 1 CPU 2 |
590 | =============== =============== | 631 | =================== =================== |
591 | a = 1; }---- --->{ v = c | 632 | ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c); |
592 | b = 2; } \ / { w = d | 633 | ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d); |
593 | <write barrier> \ <read barrier> | 634 | <write barrier> \ <read barrier> |
594 | c = 3; } / \ { x = a; | 635 | ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a); |
595 | d = 4; }---- --->{ y = b; | 636 | ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b); |
596 | 637 | ||
597 | 638 | ||
598 | EXAMPLES OF MEMORY BARRIER SEQUENCES | 639 | EXAMPLES OF MEMORY BARRIER SEQUENCES |
@@ -1435,12 +1476,12 @@ three CPUs; then should the following sequence of events occur: | |||
1435 | 1476 | ||
1436 | CPU 1 CPU 2 | 1477 | CPU 1 CPU 2 |
1437 | =============================== =============================== | 1478 | =============================== =============================== |
1438 | *A = a; *E = e; | 1479 | ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e; |
1439 | LOCK M LOCK Q | 1480 | LOCK M LOCK Q |
1440 | *B = b; *F = f; | 1481 | ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f; |
1441 | *C = c; *G = g; | 1482 | ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g; |
1442 | UNLOCK M UNLOCK Q | 1483 | UNLOCK M UNLOCK Q |
1443 | *D = d; *H = h; | 1484 | ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h; |
1444 | 1485 | ||
1445 | Then there is no guarantee as to what order CPU 3 will see the accesses to *A | 1486 | Then there is no guarantee as to what order CPU 3 will see the accesses to *A |
1446 | through *H occur in, other than the constraints imposed by the separate locks | 1487 | through *H occur in, other than the constraints imposed by the separate locks |
@@ -1460,17 +1501,17 @@ However, if the following occurs: | |||
1460 | 1501 | ||
1461 | CPU 1 CPU 2 | 1502 | CPU 1 CPU 2 |
1462 | =============================== =============================== | 1503 | =============================== =============================== |
1463 | *A = a; | 1504 | ACCESS_ONCE(*A) = a; |
1464 | LOCK M [1] | 1505 | LOCK M [1] |
1465 | *B = b; | 1506 | ACCESS_ONCE(*B) = b; |
1466 | *C = c; | 1507 | ACCESS_ONCE(*C) = c; |
1467 | UNLOCK M [1] | 1508 | UNLOCK M [1] |
1468 | *D = d; *E = e; | 1509 | ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e; |
1469 | LOCK M [2] | 1510 | LOCK M [2] |
1470 | *F = f; | 1511 | ACCESS_ONCE(*F) = f; |
1471 | *G = g; | 1512 | ACCESS_ONCE(*G) = g; |
1472 | UNLOCK M [2] | 1513 | UNLOCK M [2] |
1473 | *H = h; | 1514 | ACCESS_ONCE(*H) = h; |
1474 | 1515 | ||
1475 | CPU 3 might see: | 1516 | CPU 3 might see: |
1476 | 1517 | ||
@@ -2177,11 +2218,11 @@ A programmer might take it for granted that the CPU will perform memory | |||
2177 | operations in exactly the order specified, so that if the CPU is, for example, | 2218 | operations in exactly the order specified, so that if the CPU is, for example, |
2178 | given the following piece of code to execute: | 2219 | given the following piece of code to execute: |
2179 | 2220 | ||
2180 | a = *A; | 2221 | a = ACCESS_ONCE(*A); |
2181 | *B = b; | 2222 | ACCESS_ONCE(*B) = b; |
2182 | c = *C; | 2223 | c = ACCESS_ONCE(*C); |
2183 | d = *D; | 2224 | d = ACCESS_ONCE(*D); |
2184 | *E = e; | 2225 | ACCESS_ONCE(*E) = e; |
2185 | 2226 | ||
2186 | they would then expect that the CPU will complete the memory operation for each | 2227 | they would then expect that the CPU will complete the memory operation for each |
2187 | instruction before moving on to the next one, leading to a definite sequence of | 2228 | instruction before moving on to the next one, leading to a definite sequence of |
@@ -2228,12 +2269,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its | |||
2228 | _own_ accesses appear to be correctly ordered, without the need for a memory | 2269 | _own_ accesses appear to be correctly ordered, without the need for a memory |
2229 | barrier. For instance with the following code: | 2270 | barrier. For instance with the following code: |
2230 | 2271 | ||
2231 | U = *A; | 2272 | U = ACCESS_ONCE(*A); |
2232 | *A = V; | 2273 | ACCESS_ONCE(*A) = V; |
2233 | *A = W; | 2274 | ACCESS_ONCE(*A) = W; |
2234 | X = *A; | 2275 | X = ACCESS_ONCE(*A); |
2235 | *A = Y; | 2276 | ACCESS_ONCE(*A) = Y; |
2236 | Z = *A; | 2277 | Z = ACCESS_ONCE(*A); |
2237 | 2278 | ||
2238 | and assuming no intervention by an external influence, it can be assumed that | 2279 | and assuming no intervention by an external influence, it can be assumed that |
2239 | the final result will appear to be: | 2280 | the final result will appear to be: |
@@ -2250,7 +2291,12 @@ accesses: | |||
2250 | 2291 | ||
2251 | in that order, but, without intervention, the sequence may have almost any | 2292 | in that order, but, without intervention, the sequence may have almost any |
2252 | combination of elements combined or discarded, provided the program's view of | 2293 | combination of elements combined or discarded, provided the program's view of |
2253 | the world remains consistent. | 2294 | the world remains consistent. Note that ACCESS_ONCE() is -not- optional |
2295 | in the above example, as there are architectures where a given CPU might | ||
2296 | interchange successive loads to the same location. On such architectures, | ||
2297 | ACCESS_ONCE() does whatever is necessary to prevent this, for example, on | ||
2298 | Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the | ||
2299 | special ld.acq and st.rel instructions that prevent such reordering. | ||
2254 | 2300 | ||
2255 | The compiler may also combine, discard or defer elements of the sequence before | 2301 | The compiler may also combine, discard or defer elements of the sequence before |
2256 | the CPU even sees them. | 2302 | the CPU even sees them. |
@@ -2264,13 +2310,13 @@ may be reduced to: | |||
2264 | 2310 | ||
2265 | *A = W; | 2311 | *A = W; |
2266 | 2312 | ||
2267 | since, without a write barrier, it can be assumed that the effect of the | 2313 | since, without either a write barrier or an ACCESS_ONCE(), it can be |
2268 | storage of V to *A is lost. Similarly: | 2314 | assumed that the effect of the storage of V to *A is lost. Similarly: |
2269 | 2315 | ||
2270 | *A = Y; | 2316 | *A = Y; |
2271 | Z = *A; | 2317 | Z = *A; |
2272 | 2318 | ||
2273 | may, without a memory barrier, be reduced to: | 2319 | may, without a memory barrier or an ACCESS_ONCE(), be reduced to: |
2274 | 2320 | ||
2275 | *A = Y; | 2321 | *A = Y; |
2276 | Z = Y; | 2322 | Z = Y; |