aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@linux.vnet.ibm.com>2017-08-29 18:49:21 -0400
committerPaul E. McKenney <paulmck@linux.vnet.ibm.com>2017-10-09 17:23:36 -0400
commitf1ab25a30ce81f4e9be3cb33cd9bb9fb2db64b28 (patch)
tree5bb7cb22d9e51dba5acd7b64bd2c00b015bf0149
parentd3cf5176d0b17610b7c7f1562e496ec401fe01f8 (diff)
memory-barriers: Replace uses of "transitive"
The current version of memory-barriers.txt misuses the term "transitive", so this commit replaces it with multi-copy atomic, also adding a definition of this term. Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-rw-r--r--Documentation/memory-barriers.txt185
1 files changed, 91 insertions, 94 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index b759a60624fd..b6882680247e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -53,7 +53,7 @@ CONTENTS
53 - SMP barrier pairing. 53 - SMP barrier pairing.
54 - Examples of memory barrier sequences. 54 - Examples of memory barrier sequences.
55 - Read memory barriers vs load speculation. 55 - Read memory barriers vs load speculation.
56 - Transitivity 56 - Multicopy atomicity.
57 57
58 (*) Explicit kernel barriers. 58 (*) Explicit kernel barriers.
59 59
@@ -635,6 +635,11 @@ can be used to record rare error conditions and the like, and the CPUs'
635naturally occurring ordering prevents such records from being lost. 635naturally occurring ordering prevents such records from being lost.
636 636
637 637
638Note well that the ordering provided by a data dependency is local to
639the CPU containing it. See the section on "Multicopy atomicity" for
640more information.
641
642
638The data dependency barrier is very important to the RCU system, 643The data dependency barrier is very important to the RCU system,
639for example. See rcu_assign_pointer() and rcu_dereference() in 644for example. See rcu_assign_pointer() and rcu_dereference() in
640include/linux/rcupdate.h. This permits the current target of an RCU'd 645include/linux/rcupdate.h. This permits the current target of an RCU'd
@@ -851,38 +856,11 @@ In short, control dependencies apply only to the stores in the then-clause
851and else-clause of the if-statement in question (including functions 856and else-clause of the if-statement in question (including functions
852invoked by those two clauses), not to code following that if-statement. 857invoked by those two clauses), not to code following that if-statement.
853 858
854Finally, control dependencies do -not- provide transitivity. This is
855demonstrated by two related examples, with the initial values of
856'x' and 'y' both being zero:
857
858 CPU 0 CPU 1
859 ======================= =======================
860 r1 = READ_ONCE(x); r2 = READ_ONCE(y);
861 if (r1 > 0) if (r2 > 0)
862 WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
863
864 assert(!(r1 == 1 && r2 == 1));
865
866The above two-CPU example will never trigger the assert(). However,
867if control dependencies guaranteed transitivity (which they do not),
868then adding the following CPU would guarantee a related assertion:
869
870 CPU 2
871 =====================
872 WRITE_ONCE(x, 2);
873
874 assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
875 859
876But because control dependencies do -not- provide transitivity, the above 860Note well that the ordering provided by a control dependency is local
877assertion can fail after the combined three-CPU example completes. If you 861to the CPU containing it. See the section on "Multicopy atomicity"
878need the three-CPU example to provide ordering, you will need smp_mb() 862for more information.
879between the loads and stores in the CPU 0 and CPU 1 code fragments,
880that is, just before or just after the "if" statements. Furthermore,
881the original two-CPU example is very fragile and should be avoided.
882 863
883These two examples are the LB and WWC litmus tests from this paper:
884http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
885site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
886 864
887In summary: 865In summary:
888 866
@@ -922,8 +900,8 @@ In summary:
922 900
923 (*) Control dependencies pair normally with other types of barriers. 901 (*) Control dependencies pair normally with other types of barriers.
924 902
925 (*) Control dependencies do -not- provide transitivity. If you 903 (*) Control dependencies do -not- provide multicopy atomicity. If you
926 need transitivity, use smp_mb(). 904 need all the CPUs to see a given store at the same time, use smp_mb().
927 905
928 (*) Compilers do not understand control dependencies. It is therefore 906 (*) Compilers do not understand control dependencies. It is therefore
929 your job to ensure that they do not break your code. 907 your job to ensure that they do not break your code.
@@ -936,13 +914,14 @@ When dealing with CPU-CPU interactions, certain types of memory barrier should
936always be paired. A lack of appropriate pairing is almost certainly an error. 914always be paired. A lack of appropriate pairing is almost certainly an error.
937 915
938General barriers pair with each other, though they also pair with most 916General barriers pair with each other, though they also pair with most
939other types of barriers, albeit without transitivity. An acquire barrier 917other types of barriers, albeit without multicopy atomicity. An acquire
940pairs with a release barrier, but both may also pair with other barriers, 918barrier pairs with a release barrier, but both may also pair with other
941including of course general barriers. A write barrier pairs with a data 919barriers, including of course general barriers. A write barrier pairs
942dependency barrier, a control dependency, an acquire barrier, a release 920with a data dependency barrier, a control dependency, an acquire barrier,
943barrier, a read barrier, or a general barrier. Similarly a read barrier, 921a release barrier, a read barrier, or a general barrier. Similarly a
944control dependency, or a data dependency barrier pairs with a write 922read barrier, control dependency, or a data dependency barrier pairs
945barrier, an acquire barrier, a release barrier, or a general barrier: 923with a write barrier, an acquire barrier, a release barrier, or a
924general barrier:
946 925
947 CPU 1 CPU 2 926 CPU 1 CPU 2
948 =============== =============== 927 =============== ===============
@@ -1359,64 +1338,77 @@ the speculation will be cancelled and the value reloaded:
1359 retrieved : : +-------+ 1338 retrieved : : +-------+
1360 1339
1361 1340
1362TRANSITIVITY 1341MULTICOPY ATOMICITY
1363------------ 1342--------------------
1343
1344Multicopy atomicity is a deeply intuitive notion about ordering that is
1345not always provided by real computer systems, namely that a given store
1346is visible at the same time to all CPUs, or, alternatively, that all
1347CPUs agree on the order in which all stores took place. However, use of
1348full multicopy atomicity would rule out valuable hardware optimizations,
1349so a weaker form called ``other multicopy atomicity'' instead guarantees
1350that a given store is observed at the same time by all -other- CPUs. The
1351remainder of this document discusses this weaker form, but for brevity
1352will call it simply ``multicopy atomicity''.
1364 1353
1365Transitivity is a deeply intuitive notion about ordering that is not 1354The following example demonstrates multicopy atomicity:
1366always provided by real computer systems. The following example
1367demonstrates transitivity:
1368 1355
1369 CPU 1 CPU 2 CPU 3 1356 CPU 1 CPU 2 CPU 3
1370 ======================= ======================= ======================= 1357 ======================= ======================= =======================
1371 { X = 0, Y = 0 } 1358 { X = 0, Y = 0 }
1372 STORE X=1 LOAD X STORE Y=1 1359 STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1)
1373 <general barrier> <general barrier> 1360 <general barrier> <read barrier>
1374 LOAD Y LOAD X 1361 STORE Y=r1 LOAD X
1375 1362
1376Suppose that CPU 2's load from X returns 1 and its load from Y returns 0. 1363Suppose that CPU 2's load from X returns 1 which it then stores to Y and
1377This indicates that CPU 2's load from X in some sense follows CPU 1's 1364that CPU 3's load from Y returns 1. This indicates that CPU 2's load
1378store to X and that CPU 2's load from Y in some sense preceded CPU 3's 1365from X in some sense follows CPU 1's store to X and that CPU 2's store
1379store to Y. The question is then "Can CPU 3's load from X return 0?" 1366to Y in some sense preceded CPU 3's load from Y. The question is then
1367"Can CPU 3's load from X return 0?"
1380 1368
1381Because CPU 2's load from X in some sense came after CPU 1's store, it 1369Because CPU 3's load from X in some sense came after CPU 2's load, it
1382is natural to expect that CPU 3's load from X must therefore return 1. 1370is natural to expect that CPU 3's load from X must therefore return 1.
1383This expectation is an example of transitivity: if a load executing on 1371This expectation is an example of multicopy atomicity: if a load executing
1384CPU A follows a load from the same variable executing on CPU B, then 1372on CPU A follows a load from the same variable executing on CPU B, then
1385CPU A's load must either return the same value that CPU B's load did, 1373an understandable but incorrect expectation is that CPU A's load must
1386or must return some later value. 1374either return the same value that CPU B's load did, or must return some
1387 1375later value.
1388In the Linux kernel, use of general memory barriers guarantees 1376
1389transitivity. Therefore, in the above example, if CPU 2's load from X 1377In the Linux kernel, the above use of a general memory barrier compensates
1390returns 1 and its load from Y returns 0, then CPU 3's load from X must 1378for any lack of multicopy atomicity. Therefore, in the above example,
1391also return 1. 1379if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's
1392 1380load from Y returns 1, then CPU 3's load from X must also return 1.
1393However, transitivity is -not- guaranteed for read or write barriers. 1381
1394For example, suppose that CPU 2's general barrier in the above example 1382However, dependencies, read barriers, and write barriers are not always
1395is changed to a read barrier as shown below: 1383able to compensate for non-multicopy atomicity. For example, suppose
1384that CPU 2's general barrier is removed from the above example, leaving
1385only the data dependency shown below:
1396 1386
1397 CPU 1 CPU 2 CPU 3 1387 CPU 1 CPU 2 CPU 3
1398 ======================= ======================= ======================= 1388 ======================= ======================= =======================
1399 { X = 0, Y = 0 } 1389 { X = 0, Y = 0 }
1400 STORE X=1 LOAD X STORE Y=1 1390 STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1)
1401 <read barrier> <general barrier> 1391 <data dependency> <read barrier>
1402 LOAD Y LOAD X 1392 STORE Y=r1 LOAD X (reads 0)
1403 1393
1404This substitution destroys transitivity: in this example, it is perfectly 1394This substitution allows non-multicopy atomicity to run rampant: in
1405legal for CPU 2's load from X to return 1, its load from Y to return 0, 1395this example, it is perfectly legal for CPU 2's load from X to return 1,
1406and CPU 3's load from X to return 0. 1396CPU 3's load from Y to return 1, and its load from X to return 0.
1407 1397
1408The key point is that although CPU 2's read barrier orders its pair 1398The key point is that although CPU 2's data dependency orders its load
1409of loads, it does not guarantee to order CPU 1's store. Therefore, if 1399and store, it does not guarantee to order CPU 1's store. Therefore,
1410this example runs on a system where CPUs 1 and 2 share a store buffer 1400if this example runs on a non-multicopy-atomic system where CPUs 1 and 2
1411or a level of cache, CPU 2 might have early access to CPU 1's writes. 1401share a store buffer or a level of cache, CPU 2 might have early access
1412General barriers are therefore required to ensure that all CPUs agree 1402to CPU 1's writes. A general barrier is therefore required to ensure
1413on the combined order of CPU 1's and CPU 2's accesses. 1403that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses.
1414 1404
1415General barriers provide "global transitivity", so that all CPUs will 1405General barriers can compensate not only for non-multicopy atomicity,
1416agree on the order of operations. In contrast, a chain of release-acquire 1406but can also generate additional ordering that can ensure that -all-
1417pairs provides only "local transitivity", so that only those CPUs on 1407CPUs will perceive the same order of -all- operations. In contrast, a
1418the chain are guaranteed to agree on the combined order of the accesses. 1408chain of release-acquire pairs do not provide this additional ordering,
1419For example, switching to C code in deference to Herman Hollerith: 1409which means that only those CPUs on the chain are guaranteed to agree
1410on the combined order of the accesses. For example, switching to C code
1411in deference to the ghost of Herman Hollerith:
1420 1412
1421 int u, v, x, y, z; 1413 int u, v, x, y, z;
1422 1414
@@ -1448,9 +1440,9 @@ For example, switching to C code in deference to Herman Hollerith:
1448 r3 = READ_ONCE(u); 1440 r3 = READ_ONCE(u);
1449 } 1441 }
1450 1442
1451Because cpu0(), cpu1(), and cpu2() participate in a local transitive 1443Because cpu0(), cpu1(), and cpu2() participate in a chain of
1452chain of smp_store_release()/smp_load_acquire() pairs, the following 1444smp_store_release()/smp_load_acquire() pairs, the following outcome
1453outcome is prohibited: 1445is prohibited:
1454 1446
1455 r0 == 1 && r1 == 1 && r2 == 1 1447 r0 == 1 && r1 == 1 && r2 == 1
1456 1448
@@ -1460,9 +1452,9 @@ outcome is prohibited:
1460 1452
1461 r1 == 1 && r5 == 0 1453 r1 == 1 && r5 == 0
1462 1454
1463However, the transitivity of release-acquire is local to the participating 1455However, the ordering provided by a release-acquire chain is local
1464CPUs and does not apply to cpu3(). Therefore, the following outcome 1456to the CPUs participating in that chain and does not apply to cpu3(),
1465is possible: 1457at least aside from stores. Therefore, the following outcome is possible:
1466 1458
1467 r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 1459 r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
1468 1460
@@ -1490,8 +1482,8 @@ following outcome is possible:
1490Note that this outcome can happen even on a mythical sequentially 1482Note that this outcome can happen even on a mythical sequentially
1491consistent system where nothing is ever reordered. 1483consistent system where nothing is ever reordered.
1492 1484
1493To reiterate, if your code requires global transitivity, use general 1485To reiterate, if your code requires full ordering of all operations,
1494barriers throughout. 1486use general barriers throughout.
1495 1487
1496 1488
1497======================== 1489========================
@@ -3101,6 +3093,9 @@ AMD64 Architecture Programmer's Manual Volume 2: System Programming
3101 Chapter 7.1: Memory-Access Ordering 3093 Chapter 7.1: Memory-Access Ordering
3102 Chapter 7.4: Buffering and Combining Memory Writes 3094 Chapter 7.4: Buffering and Combining Memory Writes
3103 3095
3096ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
3097 Chapter B2: The AArch64 Application Level Memory Model
3098
3104IA-32 Intel Architecture Software Developer's Manual, Volume 3: 3099IA-32 Intel Architecture Software Developer's Manual, Volume 3:
3105System Programming Guide 3100System Programming Guide
3106 Chapter 7.1: Locked Atomic Operations 3101 Chapter 7.1: Locked Atomic Operations
@@ -3112,6 +3107,8 @@ The SPARC Architecture Manual, Version 9
3112 Appendix D: Formal Specification of the Memory Models 3107 Appendix D: Formal Specification of the Memory Models
3113 Appendix J: Programming with the Memory Models 3108 Appendix J: Programming with the Memory Models
3114 3109
3110Storage in the PowerPC (Stone and Fitzgerald)
3111
3115UltraSPARC Programmer Reference Manual 3112UltraSPARC Programmer Reference Manual
3116 Chapter 5: Memory Accesses and Cacheability 3113 Chapter 5: Memory Accesses and Cacheability
3117 Chapter 15: Sparc-V9 Memory Models 3114 Chapter 15: Sparc-V9 Memory Models