aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@linux.vnet.ibm.com>2013-12-11 16:59:04 -0500
committerIngo Molnar <mingo@kernel.org>2013-12-16 05:36:08 -0500
commit2ecf810121c7eae34473b8fa108112036bc61127 (patch)
treedb2f48fec611baab7b9fa15f45c9be8c5587670c
parent962d9c5757b2ebd30678a88a4f475ae997f4d5dd (diff)
Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt
The Documentation/memory-barriers.txt file was written before the need for ACCESS_ONCE() was fully appreciated. It therefore contains no ACCESS_ONCE() calls, which can be a problem when people lift examples from it. This commit therefore adds ACCESS_ONCE() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-1-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
-rw-r--r--Documentation/memory-barriers.txt206
1 files changed, 126 insertions, 80 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 020cccdbdd0c..1d067235b0bc 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -194,18 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
194 (*) On any given CPU, dependent memory accesses will be issued in order, with 194 (*) On any given CPU, dependent memory accesses will be issued in order, with
195 respect to itself. This means that for: 195 respect to itself. This means that for:
196 196
197 Q = P; D = *Q; 197 ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q);
198 198
199 the CPU will issue the following memory operations: 199 the CPU will issue the following memory operations:
200 200
201 Q = LOAD P, D = LOAD *Q 201 Q = LOAD P, D = LOAD *Q
202 202
203 and always in that order. 203 and always in that order. On most systems, smp_read_barrier_depends()
204 does nothing, but it is required for DEC Alpha. The ACCESS_ONCE()
205 is required to prevent compiler mischief. Please note that you
206 should normally use something like rcu_dereference() instead of
207 open-coding smp_read_barrier_depends().
204 208
205 (*) Overlapping loads and stores within a particular CPU will appear to be 209 (*) Overlapping loads and stores within a particular CPU will appear to be
206 ordered within that CPU. This means that for: 210 ordered within that CPU. This means that for:
207 211
208 a = *X; *X = b; 212 a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b;
209 213
210 the CPU will only issue the following sequence of memory operations: 214 the CPU will only issue the following sequence of memory operations:
211 215
@@ -213,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
213 217
214 And for: 218 And for:
215 219
216 *X = c; d = *X; 220 ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X);
217 221
218 the CPU will only issue: 222 the CPU will only issue:
219 223
@@ -224,6 +228,41 @@ There are some minimal guarantees that may be expected of a CPU:
224 228
225And there are a number of things that _must_ or _must_not_ be assumed: 229And there are a number of things that _must_ or _must_not_ be assumed:
226 230
231 (*) It _must_not_ be assumed that the compiler will do what you want with
232 memory references that are not protected by ACCESS_ONCE(). Without
233 ACCESS_ONCE(), the compiler is within its rights to do all sorts
234 of "creative" transformations:
235
236 (-) Repeat the load, possibly getting a different value on the second
237 and subsequent loads. This is especially prone to happen when
238 register pressure is high.
239
240 (-) Merge adjacent loads and stores to the same location. The most
241 familiar example is the transformation from:
242
243 while (a)
244 do_something();
245
246 to something like:
247
248 if (a)
249 for (;;)
250 do_something();
251
252 Using ACCESS_ONCE() as follows prevents this sort of optimization:
253
254 while (ACCESS_ONCE(a))
255 do_something();
256
257 (-) "Store tearing", where a single store in the source code is split
258 into smaller stores in the object code. Note that gcc really
259 will do this on some architectures when storing certain constants.
260 It can be cheaper to do a series of immediate stores than to
261 form the constant in a register and then to store that register.
262
263 (-) "Load tearing", which splits loads in a manner analogous to
264 store tearing.
265
227 (*) It _must_not_ be assumed that independent loads and stores will be issued 266 (*) It _must_not_ be assumed that independent loads and stores will be issued
228 in the order given. This means that for: 267 in the order given. This means that for:
229 268
@@ -450,14 +489,14 @@ The usage requirements of data dependency barriers are a little subtle, and
450it's not always obvious that they're needed. To illustrate, consider the 489it's not always obvious that they're needed. To illustrate, consider the
451following sequence of events: 490following sequence of events:
452 491
453 CPU 1 CPU 2 492 CPU 1 CPU 2
454 =============== =============== 493 =============== ===============
455 { A == 1, B == 2, C = 3, P == &A, Q == &C } 494 { A == 1, B == 2, C = 3, P == &A, Q == &C }
456 B = 4; 495 B = 4;
457 <write barrier> 496 <write barrier>
458 P = &B 497 ACCESS_ONCE(P) = &B
459 Q = P; 498 Q = ACCESS_ONCE(P);
460 D = *Q; 499 D = *Q;
461 500
462There's a clear data dependency here, and it would seem that by the end of the 501There's a clear data dependency here, and it would seem that by the end of the
463sequence, Q must be either &A or &B, and that: 502sequence, Q must be either &A or &B, and that:
@@ -477,15 +516,15 @@ Alpha).
477To deal with this, a data dependency barrier or better must be inserted 516To deal with this, a data dependency barrier or better must be inserted
478between the address load and the data load: 517between the address load and the data load:
479 518
480 CPU 1 CPU 2 519 CPU 1 CPU 2
481 =============== =============== 520 =============== ===============
482 { A == 1, B == 2, C = 3, P == &A, Q == &C } 521 { A == 1, B == 2, C = 3, P == &A, Q == &C }
483 B = 4; 522 B = 4;
484 <write barrier> 523 <write barrier>
485 P = &B 524 ACCESS_ONCE(P) = &B
486 Q = P; 525 Q = ACCESS_ONCE(P);
487 <data dependency barrier> 526 <data dependency barrier>
488 D = *Q; 527 D = *Q;
489 528
490This enforces the occurrence of one of the two implications, and prevents the 529This enforces the occurrence of one of the two implications, and prevents the
491third possibility from arising. 530third possibility from arising.
@@ -504,21 +543,22 @@ Another example of where data dependency barriers might be required is where a
504number is read from memory and then used to calculate the index for an array 543number is read from memory and then used to calculate the index for an array
505access: 544access:
506 545
507 CPU 1 CPU 2 546 CPU 1 CPU 2
508 =============== =============== 547 =============== ===============
509 { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } 548 { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
510 M[1] = 4; 549 M[1] = 4;
511 <write barrier> 550 <write barrier>
512 P = 1 551 ACCESS_ONCE(P) = 1
513 Q = P; 552 Q = ACCESS_ONCE(P);
514 <data dependency barrier> 553 <data dependency barrier>
515 D = M[Q]; 554 D = M[Q];
516 555
517 556
518The data dependency barrier is very important to the RCU system, for example. 557The data dependency barrier is very important to the RCU system,
519See rcu_dereference() in include/linux/rcupdate.h. This permits the current 558for example. See rcu_assign_pointer() and rcu_dereference() in
520target of an RCU'd pointer to be replaced with a new modified target, without 559include/linux/rcupdate.h. This permits the current target of an RCU'd
521the replacement target appearing to be incompletely initialised. 560pointer to be replaced with a new modified target, without the replacement
561target appearing to be incompletely initialised.
522 562
523See also the subsection on "Cache Coherency" for a more thorough example. 563See also the subsection on "Cache Coherency" for a more thorough example.
524 564
@@ -530,22 +570,23 @@ A control dependency requires a full read memory barrier, not simply a data
530dependency barrier to make it work correctly. Consider the following bit of 570dependency barrier to make it work correctly. Consider the following bit of
531code: 571code:
532 572
533 q = &a; 573 q = ACCESS_ONCE(a);
534 if (p) { 574 if (p) {
535 <data dependency barrier> 575 <data dependency barrier>
536 q = &b; 576 q = ACCESS_ONCE(b);
537 } 577 }
538 x = *q; 578 x = *q;
539 579
540This will not have the desired effect because there is no actual data 580This will not have the desired effect because there is no actual data
541dependency, but rather a control dependency that the CPU may short-circuit by 581dependency, but rather a control dependency that the CPU may short-circuit
542attempting to predict the outcome in advance. In such a case what's actually 582by attempting to predict the outcome in advance, so that other CPUs see
543required is: 583the load from b as having happened before the load from a. In such a
584case what's actually required is:
544 585
545 q = &a; 586 q = ACCESS_ONCE(a);
546 if (p) { 587 if (p) {
547 <read barrier> 588 <read barrier>
548 q = &b; 589 q = ACCESS_ONCE(b);
549 } 590 }
550 x = *q; 591 x = *q;
551 592
@@ -561,23 +602,23 @@ barrier, though a general barrier would also be viable. Similarly a read
561barrier or a data dependency barrier should always be paired with at least an 602barrier or a data dependency barrier should always be paired with at least an
562write barrier, though, again, a general barrier is viable: 603write barrier, though, again, a general barrier is viable:
563 604
564 CPU 1 CPU 2 605 CPU 1 CPU 2
565 =============== =============== 606 =============== ===============
566 a = 1; 607 ACCESS_ONCE(a) = 1;
567 <write barrier> 608 <write barrier>
568 b = 2; x = b; 609 ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b);
569 <read barrier> 610 <read barrier>
570 y = a; 611 y = ACCESS_ONCE(a);
571 612
572Or: 613Or:
573 614
574 CPU 1 CPU 2 615 CPU 1 CPU 2
575 =============== =============================== 616 =============== ===============================
576 a = 1; 617 a = 1;
577 <write barrier> 618 <write barrier>
578 b = &a; x = b; 619 ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b);
579 <data dependency barrier> 620 <data dependency barrier>
580 y = *x; 621 y = *x;
581 622
582Basically, the read barrier always has to be there, even though it can be of 623Basically, the read barrier always has to be there, even though it can be of
583the "weaker" type. 624the "weaker" type.
@@ -586,13 +627,13 @@ the "weaker" type.
586match the loads after the read barrier or the data dependency barrier, and vice 627match the loads after the read barrier or the data dependency barrier, and vice
587versa: 628versa:
588 629
589 CPU 1 CPU 2 630 CPU 1 CPU 2
590 =============== =============== 631 =================== ===================
591 a = 1; }---- --->{ v = c 632 ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c);
592 b = 2; } \ / { w = d 633 ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d);
593 <write barrier> \ <read barrier> 634 <write barrier> \ <read barrier>
594 c = 3; } / \ { x = a; 635 ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a);
595 d = 4; }---- --->{ y = b; 636 ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b);
596 637
597 638
598EXAMPLES OF MEMORY BARRIER SEQUENCES 639EXAMPLES OF MEMORY BARRIER SEQUENCES
@@ -1435,12 +1476,12 @@ three CPUs; then should the following sequence of events occur:
1435 1476
1436 CPU 1 CPU 2 1477 CPU 1 CPU 2
1437 =============================== =============================== 1478 =============================== ===============================
1438 *A = a; *E = e; 1479 ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e;
1439 LOCK M LOCK Q 1480 LOCK M LOCK Q
1440 *B = b; *F = f; 1481 ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f;
1441 *C = c; *G = g; 1482 ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g;
1442 UNLOCK M UNLOCK Q 1483 UNLOCK M UNLOCK Q
1443 *D = d; *H = h; 1484 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h;
1444 1485
1445Then there is no guarantee as to what order CPU 3 will see the accesses to *A 1486Then there is no guarantee as to what order CPU 3 will see the accesses to *A
1446through *H occur in, other than the constraints imposed by the separate locks 1487through *H occur in, other than the constraints imposed by the separate locks
@@ -1460,17 +1501,17 @@ However, if the following occurs:
1460 1501
1461 CPU 1 CPU 2 1502 CPU 1 CPU 2
1462 =============================== =============================== 1503 =============================== ===============================
1463 *A = a; 1504 ACCESS_ONCE(*A) = a;
1464 LOCK M [1] 1505 LOCK M [1]
1465 *B = b; 1506 ACCESS_ONCE(*B) = b;
1466 *C = c; 1507 ACCESS_ONCE(*C) = c;
1467 UNLOCK M [1] 1508 UNLOCK M [1]
1468 *D = d; *E = e; 1509 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e;
1469 LOCK M [2] 1510 LOCK M [2]
1470 *F = f; 1511 ACCESS_ONCE(*F) = f;
1471 *G = g; 1512 ACCESS_ONCE(*G) = g;
1472 UNLOCK M [2] 1513 UNLOCK M [2]
1473 *H = h; 1514 ACCESS_ONCE(*H) = h;
1474 1515
1475CPU 3 might see: 1516CPU 3 might see:
1476 1517
@@ -2177,11 +2218,11 @@ A programmer might take it for granted that the CPU will perform memory
2177operations in exactly the order specified, so that if the CPU is, for example, 2218operations in exactly the order specified, so that if the CPU is, for example,
2178given the following piece of code to execute: 2219given the following piece of code to execute:
2179 2220
2180 a = *A; 2221 a = ACCESS_ONCE(*A);
2181 *B = b; 2222 ACCESS_ONCE(*B) = b;
2182 c = *C; 2223 c = ACCESS_ONCE(*C);
2183 d = *D; 2224 d = ACCESS_ONCE(*D);
2184 *E = e; 2225 ACCESS_ONCE(*E) = e;
2185 2226
2186they would then expect that the CPU will complete the memory operation for each 2227they would then expect that the CPU will complete the memory operation for each
2187instruction before moving on to the next one, leading to a definite sequence of 2228instruction before moving on to the next one, leading to a definite sequence of
@@ -2228,12 +2269,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
2228_own_ accesses appear to be correctly ordered, without the need for a memory 2269_own_ accesses appear to be correctly ordered, without the need for a memory
2229barrier. For instance with the following code: 2270barrier. For instance with the following code:
2230 2271
2231 U = *A; 2272 U = ACCESS_ONCE(*A);
2232 *A = V; 2273 ACCESS_ONCE(*A) = V;
2233 *A = W; 2274 ACCESS_ONCE(*A) = W;
2234 X = *A; 2275 X = ACCESS_ONCE(*A);
2235 *A = Y; 2276 ACCESS_ONCE(*A) = Y;
2236 Z = *A; 2277 Z = ACCESS_ONCE(*A);
2237 2278
2238and assuming no intervention by an external influence, it can be assumed that 2279and assuming no intervention by an external influence, it can be assumed that
2239the final result will appear to be: 2280the final result will appear to be:
@@ -2250,7 +2291,12 @@ accesses:
2250 2291
2251in that order, but, without intervention, the sequence may have almost any 2292in that order, but, without intervention, the sequence may have almost any
2252combination of elements combined or discarded, provided the program's view of 2293combination of elements combined or discarded, provided the program's view of
2253the world remains consistent. 2294the world remains consistent. Note that ACCESS_ONCE() is -not- optional
2295in the above example, as there are architectures where a given CPU might
2296interchange successive loads to the same location. On such architectures,
2297ACCESS_ONCE() does whatever is necessary to prevent this, for example, on
2298Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the
2299special ld.acq and st.rel instructions that prevent such reordering.
2254 2300
2255The compiler may also combine, discard or defer elements of the sequence before 2301The compiler may also combine, discard or defer elements of the sequence before
2256the CPU even sees them. 2302the CPU even sees them.
@@ -2264,13 +2310,13 @@ may be reduced to:
2264 2310
2265 *A = W; 2311 *A = W;
2266 2312
2267since, without a write barrier, it can be assumed that the effect of the 2313since, without either a write barrier or an ACCESS_ONCE(), it can be
2268storage of V to *A is lost. Similarly: 2314assumed that the effect of the storage of V to *A is lost. Similarly:
2269 2315
2270 *A = Y; 2316 *A = Y;
2271 Z = *A; 2317 Z = *A;
2272 2318
2273may, without a memory barrier, be reduced to: 2319may, without a memory barrier or an ACCESS_ONCE(), be reduced to:
2274 2320
2275 *A = Y; 2321 *A = Y;
2276 Z = Y; 2322 Z = Y;