summaryrefslogtreecommitdiffstats
path: root/Documentation/memory-barriers.txt
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@linux.vnet.ibm.com>2015-06-18 17:33:24 -0400
committerPaul E. McKenney <paulmck@linux.vnet.ibm.com>2015-07-15 17:43:13 -0400
commit9af194cefc3c40e75a59df4cbb06e1c1064bee7f (patch)
treeb9a2d049506997ad053262df177b00143bec611d /Documentation/memory-barriers.txt
parent57aecae950c55ef50934640794160cd118e73256 (diff)
documentation: Replace ACCESS_ONCE() by READ_ONCE() and WRITE_ONCE()
Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r--Documentation/memory-barriers.txt346
1 files changed, 177 insertions, 169 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 3d06f98b2ff2..470c07c868e4 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -194,22 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
194 (*) On any given CPU, dependent memory accesses will be issued in order, with 194 (*) On any given CPU, dependent memory accesses will be issued in order, with
195 respect to itself. This means that for: 195 respect to itself. This means that for:
196 196
197 ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q); 197 WRITE_ONCE(Q, P); smp_read_barrier_depends(); D = READ_ONCE(*Q);
198 198
199 the CPU will issue the following memory operations: 199 the CPU will issue the following memory operations:
200 200
201 Q = LOAD P, D = LOAD *Q 201 Q = LOAD P, D = LOAD *Q
202 202
203 and always in that order. On most systems, smp_read_barrier_depends() 203 and always in that order. On most systems, smp_read_barrier_depends()
204 does nothing, but it is required for DEC Alpha. The ACCESS_ONCE() 204 does nothing, but it is required for DEC Alpha. The READ_ONCE()
205 is required to prevent compiler mischief. Please note that you 205 and WRITE_ONCE() are required to prevent compiler mischief. Please
206 should normally use something like rcu_dereference() instead of 206 note that you should normally use something like rcu_dereference()
207 open-coding smp_read_barrier_depends(). 207 instead of open-coding smp_read_barrier_depends().
208 208
209 (*) Overlapping loads and stores within a particular CPU will appear to be 209 (*) Overlapping loads and stores within a particular CPU will appear to be
210 ordered within that CPU. This means that for: 210 ordered within that CPU. This means that for:
211 211
212 a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b; 212 a = READ_ONCE(*X); WRITE_ONCE(*X, b);
213 213
214 the CPU will only issue the following sequence of memory operations: 214 the CPU will only issue the following sequence of memory operations:
215 215
@@ -217,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
217 217
218 And for: 218 And for:
219 219
220 ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X); 220 WRITE_ONCE(*X, c); d = READ_ONCE(*X);
221 221
222 the CPU will only issue: 222 the CPU will only issue:
223 223
@@ -228,11 +228,11 @@ There are some minimal guarantees that may be expected of a CPU:
228 228
229And there are a number of things that _must_ or _must_not_ be assumed: 229And there are a number of things that _must_ or _must_not_ be assumed:
230 230
231 (*) It _must_not_ be assumed that the compiler will do what you want with 231 (*) It _must_not_ be assumed that the compiler will do what you want
232 memory references that are not protected by ACCESS_ONCE(). Without 232 with memory references that are not protected by READ_ONCE() and
233 ACCESS_ONCE(), the compiler is within its rights to do all sorts 233 WRITE_ONCE(). Without them, the compiler is within its rights to
234 of "creative" transformations, which are covered in the Compiler 234 do all sorts of "creative" transformations, which are covered in
235 Barrier section. 235 the Compiler Barrier section.
236 236
237 (*) It _must_not_ be assumed that independent loads and stores will be issued 237 (*) It _must_not_ be assumed that independent loads and stores will be issued
238 in the order given. This means that for: 238 in the order given. This means that for:
@@ -520,8 +520,8 @@ following sequence of events:
520 { A == 1, B == 2, C = 3, P == &A, Q == &C } 520 { A == 1, B == 2, C = 3, P == &A, Q == &C }
521 B = 4; 521 B = 4;
522 <write barrier> 522 <write barrier>
523 ACCESS_ONCE(P) = &B 523 WRITE_ONCE(P, &B)
524 Q = ACCESS_ONCE(P); 524 Q = READ_ONCE(P);
525 D = *Q; 525 D = *Q;
526 526
527There's a clear data dependency here, and it would seem that by the end of the 527There's a clear data dependency here, and it would seem that by the end of the
@@ -547,8 +547,8 @@ between the address load and the data load:
547 { A == 1, B == 2, C = 3, P == &A, Q == &C } 547 { A == 1, B == 2, C = 3, P == &A, Q == &C }
548 B = 4; 548 B = 4;
549 <write barrier> 549 <write barrier>
550 ACCESS_ONCE(P) = &B 550 WRITE_ONCE(P, &B);
551 Q = ACCESS_ONCE(P); 551 Q = READ_ONCE(P);
552 <data dependency barrier> 552 <data dependency barrier>
553 D = *Q; 553 D = *Q;
554 554
@@ -574,8 +574,8 @@ access:
574 { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } 574 { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
575 M[1] = 4; 575 M[1] = 4;
576 <write barrier> 576 <write barrier>
577 ACCESS_ONCE(P) = 1 577 WRITE_ONCE(P, 1);
578 Q = ACCESS_ONCE(P); 578 Q = READ_ONCE(P);
579 <data dependency barrier> 579 <data dependency barrier>
580 D = M[Q]; 580 D = M[Q];
581 581
@@ -596,10 +596,10 @@ A load-load control dependency requires a full read memory barrier, not
596simply a data dependency barrier to make it work correctly. Consider the 596simply a data dependency barrier to make it work correctly. Consider the
597following bit of code: 597following bit of code:
598 598
599 q = ACCESS_ONCE(a); 599 q = READ_ONCE(a);
600 if (q) { 600 if (q) {
601 <data dependency barrier> /* BUG: No data dependency!!! */ 601 <data dependency barrier> /* BUG: No data dependency!!! */
602 p = ACCESS_ONCE(b); 602 p = READ_ONCE(b);
603 } 603 }
604 604
605This will not have the desired effect because there is no actual data 605This will not have the desired effect because there is no actual data
@@ -608,10 +608,10 @@ by attempting to predict the outcome in advance, so that other CPUs see
608the load from b as having happened before the load from a. In such a 608the load from b as having happened before the load from a. In such a
609case what's actually required is: 609case what's actually required is:
610 610
611 q = ACCESS_ONCE(a); 611 q = READ_ONCE(a);
612 if (q) { 612 if (q) {
613 <read barrier> 613 <read barrier>
614 p = ACCESS_ONCE(b); 614 p = READ_ONCE(b);
615 } 615 }
616 616
617However, stores are not speculated. This means that ordering -is- provided 617However, stores are not speculated. This means that ordering -is- provided
@@ -619,7 +619,7 @@ for load-store control dependencies, as in the following example:
619 619
620 q = READ_ONCE_CTRL(a); 620 q = READ_ONCE_CTRL(a);
621 if (q) { 621 if (q) {
622 ACCESS_ONCE(b) = p; 622 WRITE_ONCE(b, p);
623 } 623 }
624 624
625Control dependencies pair normally with other types of barriers. That 625Control dependencies pair normally with other types of barriers. That
@@ -647,11 +647,11 @@ branches of the "if" statement as follows:
647 q = READ_ONCE_CTRL(a); 647 q = READ_ONCE_CTRL(a);
648 if (q) { 648 if (q) {
649 barrier(); 649 barrier();
650 ACCESS_ONCE(b) = p; 650 WRITE_ONCE(b, p);
651 do_something(); 651 do_something();
652 } else { 652 } else {
653 barrier(); 653 barrier();
654 ACCESS_ONCE(b) = p; 654 WRITE_ONCE(b, p);
655 do_something_else(); 655 do_something_else();
656 } 656 }
657 657
@@ -660,12 +660,12 @@ optimization levels:
660 660
661 q = READ_ONCE_CTRL(a); 661 q = READ_ONCE_CTRL(a);
662 barrier(); 662 barrier();
663 ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ 663 WRITE_ONCE(b, p); /* BUG: No ordering vs. load from a!!! */
664 if (q) { 664 if (q) {
665 /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ 665 /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
666 do_something(); 666 do_something();
667 } else { 667 } else {
668 /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ 668 /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
669 do_something_else(); 669 do_something_else();
670 } 670 }
671 671
@@ -676,7 +676,7 @@ assembly code even after all compiler optimizations have been applied.
676Therefore, if you need ordering in this example, you need explicit 676Therefore, if you need ordering in this example, you need explicit
677memory barriers, for example, smp_store_release(): 677memory barriers, for example, smp_store_release():
678 678
679 q = ACCESS_ONCE(a); 679 q = READ_ONCE(a);
680 if (q) { 680 if (q) {
681 smp_store_release(&b, p); 681 smp_store_release(&b, p);
682 do_something(); 682 do_something();
@@ -690,10 +690,10 @@ ordering is guaranteed only when the stores differ, for example:
690 690
691 q = READ_ONCE_CTRL(a); 691 q = READ_ONCE_CTRL(a);
692 if (q) { 692 if (q) {
693 ACCESS_ONCE(b) = p; 693 WRITE_ONCE(b, p);
694 do_something(); 694 do_something();
695 } else { 695 } else {
696 ACCESS_ONCE(b) = r; 696 WRITE_ONCE(b, r);
697 do_something_else(); 697 do_something_else();
698 } 698 }
699 699
@@ -706,10 +706,10 @@ the needed conditional. For example:
706 706
707 q = READ_ONCE_CTRL(a); 707 q = READ_ONCE_CTRL(a);
708 if (q % MAX) { 708 if (q % MAX) {
709 ACCESS_ONCE(b) = p; 709 WRITE_ONCE(b, p);
710 do_something(); 710 do_something();
711 } else { 711 } else {
712 ACCESS_ONCE(b) = r; 712 WRITE_ONCE(b, r);
713 do_something_else(); 713 do_something_else();
714 } 714 }
715 715
@@ -718,7 +718,7 @@ equal to zero, in which case the compiler is within its rights to
718transform the above code into the following: 718transform the above code into the following:
719 719
720 q = READ_ONCE_CTRL(a); 720 q = READ_ONCE_CTRL(a);
721 ACCESS_ONCE(b) = p; 721 WRITE_ONCE(b, p);
722 do_something_else(); 722 do_something_else();
723 723
724Given this transformation, the CPU is not required to respect the ordering 724Given this transformation, the CPU is not required to respect the ordering
@@ -731,10 +731,10 @@ one, perhaps as follows:
731 q = READ_ONCE_CTRL(a); 731 q = READ_ONCE_CTRL(a);
732 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ 732 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
733 if (q % MAX) { 733 if (q % MAX) {
734 ACCESS_ONCE(b) = p; 734 WRITE_ONCE(b, p);
735 do_something(); 735 do_something();
736 } else { 736 } else {
737 ACCESS_ONCE(b) = r; 737 WRITE_ONCE(b, r);
738 do_something_else(); 738 do_something_else();
739 } 739 }
740 740
@@ -747,17 +747,17 @@ evaluation. Consider this example:
747 747
748 q = READ_ONCE_CTRL(a); 748 q = READ_ONCE_CTRL(a);
749 if (q || 1 > 0) 749 if (q || 1 > 0)
750 ACCESS_ONCE(b) = 1; 750 WRITE_ONCE(b, 1);
751 751
752Because the first condition cannot fault and the second condition is 752Because the first condition cannot fault and the second condition is
753always true, the compiler can transform this example as following, 753always true, the compiler can transform this example as following,
754defeating control dependency: 754defeating control dependency:
755 755
756 q = READ_ONCE_CTRL(a); 756 q = READ_ONCE_CTRL(a);
757 ACCESS_ONCE(b) = 1; 757 WRITE_ONCE(b, 1);
758 758
759This example underscores the need to ensure that the compiler cannot 759This example underscores the need to ensure that the compiler cannot
760out-guess your code. More generally, although ACCESS_ONCE() does force 760out-guess your code. More generally, although READ_ONCE() does force
761the compiler to actually emit code for a given load, it does not force 761the compiler to actually emit code for a given load, it does not force
762the compiler to use the results. 762the compiler to use the results.
763 763
@@ -769,7 +769,7 @@ x and y both being zero:
769 ======================= ======================= 769 ======================= =======================
770 r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y); 770 r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y);
771 if (r1 > 0) if (r2 > 0) 771 if (r1 > 0) if (r2 > 0)
772 ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1; 772 WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
773 773
774 assert(!(r1 == 1 && r2 == 1)); 774 assert(!(r1 == 1 && r2 == 1));
775 775
@@ -779,7 +779,7 @@ then adding the following CPU would guarantee a related assertion:
779 779
780 CPU 2 780 CPU 2
781 ===================== 781 =====================
782 ACCESS_ONCE(x) = 2; 782 WRITE_ONCE(x, 2);
783 783
784 assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */ 784 assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
785 785
@@ -798,8 +798,7 @@ In summary:
798 798
799 (*) Control dependencies must be headed by READ_ONCE_CTRL(). 799 (*) Control dependencies must be headed by READ_ONCE_CTRL().
800 Or, as a much less preferable alternative, interpose 800 Or, as a much less preferable alternative, interpose
801 be headed by READ_ONCE() or an ACCESS_ONCE() read and must 801 smp_read_barrier_depends() between a READ_ONCE() and the
802 have smp_read_barrier_depends() between this read and the
803 control-dependent write. 802 control-dependent write.
804 803
805 (*) Control dependencies can order prior loads against later stores. 804 (*) Control dependencies can order prior loads against later stores.
@@ -815,15 +814,16 @@ In summary:
815 814
816 (*) Control dependencies require at least one run-time conditional 815 (*) Control dependencies require at least one run-time conditional
817 between the prior load and the subsequent store, and this 816 between the prior load and the subsequent store, and this
818 conditional must involve the prior load. If the compiler 817 conditional must involve the prior load. If the compiler is able
819 is able to optimize the conditional away, it will have also 818 to optimize the conditional away, it will have also optimized
820 optimized away the ordering. Careful use of ACCESS_ONCE() can 819 away the ordering. Careful use of READ_ONCE_CTRL() READ_ONCE(),
821 help to preserve the needed conditional. 820 and WRITE_ONCE() can help to preserve the needed conditional.
822 821
823 (*) Control dependencies require that the compiler avoid reordering the 822 (*) Control dependencies require that the compiler avoid reordering the
824 dependency into nonexistence. Careful use of ACCESS_ONCE() or 823 dependency into nonexistence. Careful use of READ_ONCE_CTRL()
825 barrier() can help to preserve your control dependency. Please 824 or smp_read_barrier_depends() can help to preserve your control
826 see the Compiler Barrier section for more information. 825 dependency. Please see the Compiler Barrier section for more
826 information.
827 827
828 (*) Control dependencies pair normally with other types of barriers. 828 (*) Control dependencies pair normally with other types of barriers.
829 829
@@ -848,11 +848,11 @@ barrier, an acquire barrier, a release barrier, or a general barrier:
848 848
849 CPU 1 CPU 2 849 CPU 1 CPU 2
850 =============== =============== 850 =============== ===============
851 ACCESS_ONCE(a) = 1; 851 WRITE_ONCE(a, 1);
852 <write barrier> 852 <write barrier>
853 ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b); 853 WRITE_ONCE(b, 2); x = READ_ONCE(b);
854 <read barrier> 854 <read barrier>
855 y = ACCESS_ONCE(a); 855 y = READ_ONCE(a);
856 856
857Or: 857Or:
858 858
@@ -860,7 +860,7 @@ Or:
860 =============== =============================== 860 =============== ===============================
861 a = 1; 861 a = 1;
862 <write barrier> 862 <write barrier>
863 ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b); 863 WRITE_ONCE(b, &a); x = READ_ONCE(b);
864 <data dependency barrier> 864 <data dependency barrier>
865 y = *x; 865 y = *x;
866 866
@@ -868,11 +868,11 @@ Or even:
868 868
869 CPU 1 CPU 2 869 CPU 1 CPU 2
870 =============== =============================== 870 =============== ===============================
871 r1 = ACCESS_ONCE(y); 871 r1 = READ_ONCE(y);
872 <general barrier> 872 <general barrier>
873 ACCESS_ONCE(y) = 1; if (r2 = ACCESS_ONCE(x)) { 873 WRITE_ONCE(y, 1); if (r2 = READ_ONCE(x)) {
874 <implicit control dependency> 874 <implicit control dependency>
875 ACCESS_ONCE(y) = 1; 875 WRITE_ONCE(y, 1);
876 } 876 }
877 877
878 assert(r1 == 0 || r2 == 0); 878 assert(r1 == 0 || r2 == 0);
@@ -886,11 +886,11 @@ versa:
886 886
887 CPU 1 CPU 2 887 CPU 1 CPU 2
888 =================== =================== 888 =================== ===================
889 ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c); 889 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
890 ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d); 890 WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d);
891 <write barrier> \ <read barrier> 891 <write barrier> \ <read barrier>
892 ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a); 892 WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a);
893 ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b); 893 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
894 894
895 895
896EXAMPLES OF MEMORY BARRIER SEQUENCES 896EXAMPLES OF MEMORY BARRIER SEQUENCES
@@ -1340,10 +1340,10 @@ compiler from moving the memory accesses either side of it to the other side:
1340 1340
1341 barrier(); 1341 barrier();
1342 1342
1343This is a general barrier -- there are no read-read or write-write variants 1343This is a general barrier -- there are no read-read or write-write
1344of barrier(). However, ACCESS_ONCE() can be thought of as a weak form 1344variants of barrier(). However, READ_ONCE() and WRITE_ONCE() can be
1345for barrier() that affects only the specific accesses flagged by the 1345thought of as weak forms of barrier() that affect only the specific
1346ACCESS_ONCE(). 1346accesses flagged by the READ_ONCE() or WRITE_ONCE().
1347 1347
1348The barrier() function has the following effects: 1348The barrier() function has the following effects:
1349 1349
@@ -1355,9 +1355,10 @@ The barrier() function has the following effects:
1355 (*) Within a loop, forces the compiler to load the variables used 1355 (*) Within a loop, forces the compiler to load the variables used
1356 in that loop's conditional on each pass through that loop. 1356 in that loop's conditional on each pass through that loop.
1357 1357
1358The ACCESS_ONCE() function can prevent any number of optimizations that, 1358The READ_ONCE() and WRITE_ONCE() functions can prevent any number of
1359while perfectly safe in single-threaded code, can be fatal in concurrent 1359optimizations that, while perfectly safe in single-threaded code, can
1360code. Here are some examples of these sorts of optimizations: 1360be fatal in concurrent code. Here are some examples of these sorts
1361of optimizations:
1361 1362
1362 (*) The compiler is within its rights to reorder loads and stores 1363 (*) The compiler is within its rights to reorder loads and stores
1363 to the same variable, and in some cases, the CPU is within its 1364 to the same variable, and in some cases, the CPU is within its
@@ -1370,11 +1371,11 @@ code. Here are some examples of these sorts of optimizations:
1370 Might result in an older value of x stored in a[1] than in a[0]. 1371 Might result in an older value of x stored in a[1] than in a[0].
1371 Prevent both the compiler and the CPU from doing this as follows: 1372 Prevent both the compiler and the CPU from doing this as follows:
1372 1373
1373 a[0] = ACCESS_ONCE(x); 1374 a[0] = READ_ONCE(x);
1374 a[1] = ACCESS_ONCE(x); 1375 a[1] = READ_ONCE(x);
1375 1376
1376 In short, ACCESS_ONCE() provides cache coherence for accesses from 1377 In short, READ_ONCE() and WRITE_ONCE() provide cache coherence for
1377 multiple CPUs to a single variable. 1378 accesses from multiple CPUs to a single variable.
1378 1379
1379 (*) The compiler is within its rights to merge successive loads from 1380 (*) The compiler is within its rights to merge successive loads from
1380 the same variable. Such merging can cause the compiler to "optimize" 1381 the same variable. Such merging can cause the compiler to "optimize"
@@ -1391,9 +1392,9 @@ code. Here are some examples of these sorts of optimizations:
1391 for (;;) 1392 for (;;)
1392 do_something_with(tmp); 1393 do_something_with(tmp);
1393 1394
1394 Use ACCESS_ONCE() to prevent the compiler from doing this to you: 1395 Use READ_ONCE() to prevent the compiler from doing this to you:
1395 1396
1396 while (tmp = ACCESS_ONCE(a)) 1397 while (tmp = READ_ONCE(a))
1397 do_something_with(tmp); 1398 do_something_with(tmp);
1398 1399
1399 (*) The compiler is within its rights to reload a variable, for example, 1400 (*) The compiler is within its rights to reload a variable, for example,
@@ -1415,9 +1416,9 @@ code. Here are some examples of these sorts of optimizations:
1415 a was modified by some other CPU between the "while" statement and 1416 a was modified by some other CPU between the "while" statement and
1416 the call to do_something_with(). 1417 the call to do_something_with().
1417 1418
1418 Again, use ACCESS_ONCE() to prevent the compiler from doing this: 1419 Again, use READ_ONCE() to prevent the compiler from doing this:
1419 1420
1420 while (tmp = ACCESS_ONCE(a)) 1421 while (tmp = READ_ONCE(a))
1421 do_something_with(tmp); 1422 do_something_with(tmp);
1422 1423
1423 Note that if the compiler runs short of registers, it might save 1424 Note that if the compiler runs short of registers, it might save
@@ -1437,21 +1438,21 @@ code. Here are some examples of these sorts of optimizations:
1437 1438
1438 do { } while (0); 1439 do { } while (0);
1439 1440
1440 This transformation is a win for single-threaded code because it gets 1441 This transformation is a win for single-threaded code because it
1441 rid of a load and a branch. The problem is that the compiler will 1442 gets rid of a load and a branch. The problem is that the compiler
1442 carry out its proof assuming that the current CPU is the only one 1443 will carry out its proof assuming that the current CPU is the only
1443 updating variable 'a'. If variable 'a' is shared, then the compiler's 1444 one updating variable 'a'. If variable 'a' is shared, then the
1444 proof will be erroneous. Use ACCESS_ONCE() to tell the compiler 1445 compiler's proof will be erroneous. Use READ_ONCE() to tell the
1445 that it doesn't know as much as it thinks it does: 1446 compiler that it doesn't know as much as it thinks it does:
1446 1447
1447 while (tmp = ACCESS_ONCE(a)) 1448 while (tmp = READ_ONCE(a))
1448 do_something_with(tmp); 1449 do_something_with(tmp);
1449 1450
1450 But please note that the compiler is also closely watching what you 1451 But please note that the compiler is also closely watching what you
1451 do with the value after the ACCESS_ONCE(). For example, suppose you 1452 do with the value after the READ_ONCE(). For example, suppose you
1452 do the following and MAX is a preprocessor macro with the value 1: 1453 do the following and MAX is a preprocessor macro with the value 1:
1453 1454
1454 while ((tmp = ACCESS_ONCE(a)) % MAX) 1455 while ((tmp = READ_ONCE(a)) % MAX)
1455 do_something_with(tmp); 1456 do_something_with(tmp);
1456 1457
1457 Then the compiler knows that the result of the "%" operator applied 1458 Then the compiler knows that the result of the "%" operator applied
@@ -1475,12 +1476,12 @@ code. Here are some examples of these sorts of optimizations:
1475 surprise if some other CPU might have stored to variable 'a' in the 1476 surprise if some other CPU might have stored to variable 'a' in the
1476 meantime. 1477 meantime.
1477 1478
1478 Use ACCESS_ONCE() to prevent the compiler from making this sort of 1479 Use WRITE_ONCE() to prevent the compiler from making this sort of
1479 wrong guess: 1480 wrong guess:
1480 1481
1481 ACCESS_ONCE(a) = 0; 1482 WRITE_ONCE(a, 0);
1482 /* Code that does not store to variable a. */ 1483 /* Code that does not store to variable a. */
1483 ACCESS_ONCE(a) = 0; 1484 WRITE_ONCE(a, 0);
1484 1485
1485 (*) The compiler is within its rights to reorder memory accesses unless 1486 (*) The compiler is within its rights to reorder memory accesses unless
1486 you tell it not to. For example, consider the following interaction 1487 you tell it not to. For example, consider the following interaction
@@ -1509,40 +1510,43 @@ code. Here are some examples of these sorts of optimizations:
1509 } 1510 }
1510 1511
1511 If the interrupt occurs between these two statement, then 1512 If the interrupt occurs between these two statement, then
1512 interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE() 1513 interrupt_handler() might be passed a garbled msg. Use WRITE_ONCE()
1513 to prevent this as follows: 1514 to prevent this as follows:
1514 1515
1515 void process_level(void) 1516 void process_level(void)
1516 { 1517 {
1517 ACCESS_ONCE(msg) = get_message(); 1518 WRITE_ONCE(msg, get_message());
1518 ACCESS_ONCE(flag) = true; 1519 WRITE_ONCE(flag, true);
1519 } 1520 }
1520 1521
1521 void interrupt_handler(void) 1522 void interrupt_handler(void)
1522 { 1523 {
1523 if (ACCESS_ONCE(flag)) 1524 if (READ_ONCE(flag))
1524 process_message(ACCESS_ONCE(msg)); 1525 process_message(READ_ONCE(msg));
1525 } 1526 }
1526 1527
1527 Note that the ACCESS_ONCE() wrappers in interrupt_handler() 1528 Note that the READ_ONCE() and WRITE_ONCE() wrappers in
1528 are needed if this interrupt handler can itself be interrupted 1529 interrupt_handler() are needed if this interrupt handler can itself
1529 by something that also accesses 'flag' and 'msg', for example, 1530 be interrupted by something that also accesses 'flag' and 'msg',
1530 a nested interrupt or an NMI. Otherwise, ACCESS_ONCE() is not 1531 for example, a nested interrupt or an NMI. Otherwise, READ_ONCE()
1531 needed in interrupt_handler() other than for documentation purposes. 1532 and WRITE_ONCE() are not needed in interrupt_handler() other than
1532 (Note also that nested interrupts do not typically occur in modern 1533 for documentation purposes. (Note also that nested interrupts
1533 Linux kernels, in fact, if an interrupt handler returns with 1534 do not typically occur in modern Linux kernels, in fact, if an
1534 interrupts enabled, you will get a WARN_ONCE() splat.) 1535 interrupt handler returns with interrupts enabled, you will get a
1535 1536 WARN_ONCE() splat.)
1536 You should assume that the compiler can move ACCESS_ONCE() past 1537
1537 code not containing ACCESS_ONCE(), barrier(), or similar primitives. 1538 You should assume that the compiler can move READ_ONCE() and
1538 1539 WRITE_ONCE() past code not containing READ_ONCE(), WRITE_ONCE(),
1539 This effect could also be achieved using barrier(), but ACCESS_ONCE() 1540 barrier(), or similar primitives.
1540 is more selective: With ACCESS_ONCE(), the compiler need only forget 1541
1541 the contents of the indicated memory locations, while with barrier() 1542 This effect could also be achieved using barrier(), but READ_ONCE()
1542 the compiler must discard the value of all memory locations that 1543 and WRITE_ONCE() are more selective: With READ_ONCE() and
1543 it has currented cached in any machine registers. Of course, 1544 WRITE_ONCE(), the compiler need only forget the contents of the
1544 the compiler must also respect the order in which the ACCESS_ONCE()s 1545 indicated memory locations, while with barrier() the compiler must
1545 occur, though the CPU of course need not do so. 1546 discard the value of all memory locations that it has currented
1547 cached in any machine registers. Of course, the compiler must also
1548 respect the order in which the READ_ONCE()s and WRITE_ONCE()s occur,
1549 though the CPU of course need not do so.
1546 1550
1547 (*) The compiler is within its rights to invent stores to a variable, 1551 (*) The compiler is within its rights to invent stores to a variable,
1548 as in the following example: 1552 as in the following example:
@@ -1562,16 +1566,16 @@ code. Here are some examples of these sorts of optimizations:
1562 a branch. Unfortunately, in concurrent code, this optimization 1566 a branch. Unfortunately, in concurrent code, this optimization
1563 could cause some other CPU to see a spurious value of 42 -- even 1567 could cause some other CPU to see a spurious value of 42 -- even
1564 if variable 'a' was never zero -- when loading variable 'b'. 1568 if variable 'a' was never zero -- when loading variable 'b'.
1565 Use ACCESS_ONCE() to prevent this as follows: 1569 Use WRITE_ONCE() to prevent this as follows:
1566 1570
1567 if (a) 1571 if (a)
1568 ACCESS_ONCE(b) = a; 1572 WRITE_ONCE(b, a);
1569 else 1573 else
1570 ACCESS_ONCE(b) = 42; 1574 WRITE_ONCE(b, 42);
1571 1575
1572 The compiler can also invent loads. These are usually less 1576 The compiler can also invent loads. These are usually less
1573 damaging, but they can result in cache-line bouncing and thus in 1577 damaging, but they can result in cache-line bouncing and thus in
1574 poor performance and scalability. Use ACCESS_ONCE() to prevent 1578 poor performance and scalability. Use READ_ONCE() to prevent
1575 invented loads. 1579 invented loads.
1576 1580
1577 (*) For aligned memory locations whose size allows them to be accessed 1581 (*) For aligned memory locations whose size allows them to be accessed
@@ -1590,9 +1594,9 @@ code. Here are some examples of these sorts of optimizations:
1590 This optimization can therefore be a win in single-threaded code. 1594 This optimization can therefore be a win in single-threaded code.
1591 In fact, a recent bug (since fixed) caused GCC to incorrectly use 1595 In fact, a recent bug (since fixed) caused GCC to incorrectly use
1592 this optimization in a volatile store. In the absence of such bugs, 1596 this optimization in a volatile store. In the absence of such bugs,
1593 use of ACCESS_ONCE() prevents store tearing in the following example: 1597 use of WRITE_ONCE() prevents store tearing in the following example:
1594 1598
1595 ACCESS_ONCE(p) = 0x00010002; 1599 WRITE_ONCE(p, 0x00010002);
1596 1600
1597 Use of packed structures can also result in load and store tearing, 1601 Use of packed structures can also result in load and store tearing,
1598 as in this example: 1602 as in this example:
@@ -1609,22 +1613,23 @@ code. Here are some examples of these sorts of optimizations:
1609 foo2.b = foo1.b; 1613 foo2.b = foo1.b;
1610 foo2.c = foo1.c; 1614 foo2.c = foo1.c;
1611 1615
1612 Because there are no ACCESS_ONCE() wrappers and no volatile markings, 1616 Because there are no READ_ONCE() or WRITE_ONCE() wrappers and no
1613 the compiler would be well within its rights to implement these three 1617 volatile markings, the compiler would be well within its rights to
1614 assignment statements as a pair of 32-bit loads followed by a pair 1618 implement these three assignment statements as a pair of 32-bit
1615 of 32-bit stores. This would result in load tearing on 'foo1.b' 1619 loads followed by a pair of 32-bit stores. This would result in
1616 and store tearing on 'foo2.b'. ACCESS_ONCE() again prevents tearing 1620 load tearing on 'foo1.b' and store tearing on 'foo2.b'. READ_ONCE()
1617 in this example: 1621 and WRITE_ONCE() again prevent tearing in this example:
1618 1622
1619 foo2.a = foo1.a; 1623 foo2.a = foo1.a;
1620 ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b); 1624 WRITE_ONCE(foo2.b, READ_ONCE(foo1.b));
1621 foo2.c = foo1.c; 1625 foo2.c = foo1.c;
1622 1626
1623All that aside, it is never necessary to use ACCESS_ONCE() on a variable 1627All that aside, it is never necessary to use READ_ONCE() and
1624that has been marked volatile. For example, because 'jiffies' is marked 1628WRITE_ONCE() on a variable that has been marked volatile. For example,
1625volatile, it is never necessary to say ACCESS_ONCE(jiffies). The reason 1629because 'jiffies' is marked volatile, it is never necessary to
1626for this is that ACCESS_ONCE() is implemented as a volatile cast, which 1630say READ_ONCE(jiffies). The reason for this is that READ_ONCE() and
1627has no effect when its argument is already marked volatile. 1631WRITE_ONCE() are implemented as volatile casts, which has no effect when
1632its argument is already marked volatile.
1628 1633
1629Please note that these compiler barriers have no direct effect on the CPU, 1634Please note that these compiler barriers have no direct effect on the CPU,
1630which may then reorder things however it wishes. 1635which may then reorder things however it wishes.
@@ -1646,14 +1651,15 @@ The Linux kernel has eight basic CPU memory barriers:
1646All memory barriers except the data dependency barriers imply a compiler 1651All memory barriers except the data dependency barriers imply a compiler
1647barrier. Data dependencies do not impose any additional compiler ordering. 1652barrier. Data dependencies do not impose any additional compiler ordering.
1648 1653
1649Aside: In the case of data dependencies, the compiler would be expected to 1654Aside: In the case of data dependencies, the compiler would be expected
1650issue the loads in the correct order (eg. `a[b]` would have to load the value 1655to issue the loads in the correct order (eg. `a[b]` would have to load
1651of b before loading a[b]), however there is no guarantee in the C specification 1656the value of b before loading a[b]), however there is no guarantee in
1652that the compiler may not speculate the value of b (eg. is equal to 1) and load 1657the C specification that the compiler may not speculate the value of b
1653a before b (eg. tmp = a[1]; if (b != 1) tmp = a[b]; ). There is also the 1658(eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
1654problem of a compiler reloading b after having loaded a[b], thus having a newer 1659tmp = a[b]; ). There is also the problem of a compiler reloading b after
1655copy of b than a[b]. A consensus has not yet been reached about these problems, 1660having loaded a[b], thus having a newer copy of b than a[b]. A consensus
1656however the ACCESS_ONCE macro is a good place to start looking. 1661has not yet been reached about these problems, however the READ_ONCE()
1662macro is a good place to start looking.
1657 1663
1658SMP memory barriers are reduced to compiler barriers on uniprocessor compiled 1664SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
1659systems because it is assumed that a CPU will appear to be self-consistent, 1665systems because it is assumed that a CPU will appear to be self-consistent,
@@ -2126,12 +2132,12 @@ three CPUs; then should the following sequence of events occur:
2126 2132
2127 CPU 1 CPU 2 2133 CPU 1 CPU 2
2128 =============================== =============================== 2134 =============================== ===============================
2129 ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e; 2135 WRITE_ONCE(*A, a); WRITE_ONCE(*E, e);
2130 ACQUIRE M ACQUIRE Q 2136 ACQUIRE M ACQUIRE Q
2131 ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f; 2137 WRITE_ONCE(*B, b); WRITE_ONCE(*F, f);
2132 ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g; 2138 WRITE_ONCE(*C, c); WRITE_ONCE(*G, g);
2133 RELEASE M RELEASE Q 2139 RELEASE M RELEASE Q
2134 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h; 2140 WRITE_ONCE(*D, d); WRITE_ONCE(*H, h);
2135 2141
2136Then there is no guarantee as to what order CPU 3 will see the accesses to *A 2142Then there is no guarantee as to what order CPU 3 will see the accesses to *A
2137through *H occur in, other than the constraints imposed by the separate locks 2143through *H occur in, other than the constraints imposed by the separate locks
@@ -2151,18 +2157,18 @@ However, if the following occurs:
2151 2157
2152 CPU 1 CPU 2 2158 CPU 1 CPU 2
2153 =============================== =============================== 2159 =============================== ===============================
2154 ACCESS_ONCE(*A) = a; 2160 WRITE_ONCE(*A, a);
2155 ACQUIRE M [1] 2161 ACQUIRE M [1]
2156 ACCESS_ONCE(*B) = b; 2162 WRITE_ONCE(*B, b);
2157 ACCESS_ONCE(*C) = c; 2163 WRITE_ONCE(*C, c);
2158 RELEASE M [1] 2164 RELEASE M [1]
2159 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e; 2165 WRITE_ONCE(*D, d); WRITE_ONCE(*E, e);
2160 ACQUIRE M [2] 2166 ACQUIRE M [2]
2161 smp_mb__after_unlock_lock(); 2167 smp_mb__after_unlock_lock();
2162 ACCESS_ONCE(*F) = f; 2168 WRITE_ONCE(*F, f);
2163 ACCESS_ONCE(*G) = g; 2169 WRITE_ONCE(*G, g);
2164 RELEASE M [2] 2170 RELEASE M [2]
2165 ACCESS_ONCE(*H) = h; 2171 WRITE_ONCE(*H, h);
2166 2172
2167CPU 3 might see: 2173CPU 3 might see:
2168 2174
@@ -2881,11 +2887,11 @@ A programmer might take it for granted that the CPU will perform memory
2881operations in exactly the order specified, so that if the CPU is, for example, 2887operations in exactly the order specified, so that if the CPU is, for example,
2882given the following piece of code to execute: 2888given the following piece of code to execute:
2883 2889
2884 a = ACCESS_ONCE(*A); 2890 a = READ_ONCE(*A);
2885 ACCESS_ONCE(*B) = b; 2891 WRITE_ONCE(*B, b);
2886 c = ACCESS_ONCE(*C); 2892 c = READ_ONCE(*C);
2887 d = ACCESS_ONCE(*D); 2893 d = READ_ONCE(*D);
2888 ACCESS_ONCE(*E) = e; 2894 WRITE_ONCE(*E, e);
2889 2895
2890they would then expect that the CPU will complete the memory operation for each 2896they would then expect that the CPU will complete the memory operation for each
2891instruction before moving on to the next one, leading to a definite sequence of 2897instruction before moving on to the next one, leading to a definite sequence of
@@ -2932,12 +2938,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
2932_own_ accesses appear to be correctly ordered, without the need for a memory 2938_own_ accesses appear to be correctly ordered, without the need for a memory
2933barrier. For instance with the following code: 2939barrier. For instance with the following code:
2934 2940
2935 U = ACCESS_ONCE(*A); 2941 U = READ_ONCE(*A);
2936 ACCESS_ONCE(*A) = V; 2942 WRITE_ONCE(*A, V);
2937 ACCESS_ONCE(*A) = W; 2943 WRITE_ONCE(*A, W);
2938 X = ACCESS_ONCE(*A); 2944 X = READ_ONCE(*A);
2939 ACCESS_ONCE(*A) = Y; 2945 WRITE_ONCE(*A, Y);
2940 Z = ACCESS_ONCE(*A); 2946 Z = READ_ONCE(*A);
2941 2947
2942and assuming no intervention by an external influence, it can be assumed that 2948and assuming no intervention by an external influence, it can be assumed that
2943the final result will appear to be: 2949the final result will appear to be:
@@ -2953,13 +2959,14 @@ accesses:
2953 U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A 2959 U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A
2954 2960
2955in that order, but, without intervention, the sequence may have almost any 2961in that order, but, without intervention, the sequence may have almost any
2956combination of elements combined or discarded, provided the program's view of 2962combination of elements combined or discarded, provided the program's view
2957the world remains consistent. Note that ACCESS_ONCE() is -not- optional 2963of the world remains consistent. Note that READ_ONCE() and WRITE_ONCE()
2958in the above example, as there are architectures where a given CPU might 2964are -not- optional in the above example, as there are architectures
2959reorder successive loads to the same location. On such architectures, 2965where a given CPU might reorder successive loads to the same location.
2960ACCESS_ONCE() does whatever is necessary to prevent this, for example, on 2966On such architectures, READ_ONCE() and WRITE_ONCE() do whatever is
2961Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the 2967necessary to prevent this, for example, on Itanium the volatile casts
2962special ld.acq and st.rel instructions that prevent such reordering. 2968used by READ_ONCE() and WRITE_ONCE() cause GCC to emit the special ld.acq
2969and st.rel instructions (respectively) that prevent such reordering.
2963 2970
2964The compiler may also combine, discard or defer elements of the sequence before 2971The compiler may also combine, discard or defer elements of the sequence before
2965the CPU even sees them. 2972the CPU even sees them.
@@ -2973,13 +2980,14 @@ may be reduced to:
2973 2980
2974 *A = W; 2981 *A = W;
2975 2982
2976since, without either a write barrier or an ACCESS_ONCE(), it can be 2983since, without either a write barrier or an WRITE_ONCE(), it can be
2977assumed that the effect of the storage of V to *A is lost. Similarly: 2984assumed that the effect of the storage of V to *A is lost. Similarly:
2978 2985
2979 *A = Y; 2986 *A = Y;
2980 Z = *A; 2987 Z = *A;
2981 2988
2982may, without a memory barrier or an ACCESS_ONCE(), be reduced to: 2989may, without a memory barrier or an READ_ONCE() and WRITE_ONCE(), be
2990reduced to:
2983 2991
2984 *A = Y; 2992 *A = Y;
2985 Z = Y; 2993 Z = Y;