memory-barriers: Replace uses of "transitive"

The current version of memory-barriers.txt misuses the term "transitive", so this commit replaces it with multi-copy atomic, also adding a definition of this term. Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2017-08-29 18:49:21 -0400
committer: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2017-10-09 17:23:36 -0400
commit: f1ab25a30ce81f4e9be3cb33cd9bb9fb2db64b28 (patch)
tree: 5bb7cb22d9e51dba5acd7b64bd2c00b015bf0149
parent: d3cf5176d0b17610b7c7f1562e496ec401fe01f8 (diff)
1 files changed, 91 insertions, 94 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index b759a60624fd..b6882680247e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -53,7 +53,7 @@ CONTENTS
     - SMP barrier pairing.
     - Examples of memory barrier sequences.
     - Read memory barriers vs load speculation.
-     - Transitivity
+     - Multicopy atomicity.
 (*) Explicit kernel barriers.
@@ -635,6 +635,11 @@ can be used to record rare error conditions and the like, and the CPUs'
 naturally occurring ordering prevents such records from being lost.
+Note well that the ordering provided by a data dependency is local to
+the CPU containing it.  See the section on "Multicopy atomicity" for
+more information.
 The data dependency barrier is very important to the RCU system,
 for example.  See rcu_assign_pointer() and rcu_dereference() in
 include/linux/rcupdate.h.  This permits the current target of an RCU'd
@@ -851,38 +856,11 @@ In short, control dependencies apply only to the stores in the then-clause
 and else-clause of the if-statement in question (including functions
 invoked by those two clauses), not to code following that if-statement.
-Finally, control dependencies do -not- provide transitivity.  This is
-demonstrated by two related examples, with the initial values of
-'x' and 'y' both being zero:
-        CPU 0                     CPU 1
-        =======================   =======================
-        r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
-        if (r1 > 0)               if (r2 > 0)
-          WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
-        assert(!(r1 == 1 && r2 == 1));
-The above two-CPU example will never trigger the assert().  However,
-if control dependencies guaranteed transitivity (which they do not),
-then adding the following CPU would guarantee a related assertion:
-        CPU 2
-        =====================
-        WRITE_ONCE(x, 2);
-        assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
-But because control dependencies do -not- provide transitivity, the above
+Note well that the ordering provided by a control dependency is local
-assertion can fail after the combined three-CPU example completes.  If you
+to the CPU containing it.  See the section on "Multicopy atomicity"
-need the three-CPU example to provide ordering, you will need smp_mb()
+for more information.
-between the loads and stores in the CPU 0 and CPU 1 code fragments,
-that is, just before or just after the "if" statements.  Furthermore,
-the original two-CPU example is very fragile and should be avoided.
-These two examples are the LB and WWC litmus tests from this paper:
-http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
-site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
 In summary:
@@ -922,8 +900,8 @@ In summary:
  (*) Control dependencies pair normally with other types of barriers.
-  (*) Control dependencies do -not- provide transitivity.  If you
+  (*) Control dependencies do -not- provide multicopy atomicity.  If you
-      need transitivity, use smp_mb().
+      need all the CPUs to see a given store at the same time, use smp_mb().
  (*) Compilers do not understand control dependencies.  It is therefore
      your job to ensure that they do not break your code.
@@ -936,13 +914,14 @@ When dealing with CPU-CPU interactions, certain types of memory barrier should
 always be paired.  A lack of appropriate pairing is almost certainly an error.
 General barriers pair with each other, though they also pair with most
-other types of barriers, albeit without transitivity.  An acquire barrier
+other types of barriers, albeit without multicopy atomicity.  An acquire
-pairs with a release barrier, but both may also pair with other barriers,
+barrier pairs with a release barrier, but both may also pair with other
-including of course general barriers.  A write barrier pairs with a data
+barriers, including of course general barriers.  A write barrier pairs
-dependency barrier, a control dependency, an acquire barrier, a release
+with a data dependency barrier, a control dependency, an acquire barrier,
-barrier, a read barrier, or a general barrier.  Similarly a read barrier,
+a release barrier, a read barrier, or a general barrier.  Similarly a
-control dependency, or a data dependency barrier pairs with a write
+read barrier, control dependency, or a data dependency barrier pairs
-barrier, an acquire barrier, a release barrier, or a general barrier:
+with a write barrier, an acquire barrier, a release barrier, or a
+general barrier:
        CPU 1                 CPU 2
        ===============       ===============
@@ -1359,64 +1338,77 @@ the speculation will be cancelled and the value reloaded:
        retrieved                               :       :       +-------+
-TRANSITIVITY
+MULTICOPY ATOMICITY
------------
+--------------------
+Multicopy atomicity is a deeply intuitive notion about ordering that is
+not always provided by real computer systems, namely that a given store
+is visible at the same time to all CPUs, or, alternatively, that all
+CPUs agree on the order in which all stores took place.  However, use of
+full multicopy atomicity would rule out valuable hardware optimizations,
+so a weaker form called ``other multicopy atomicity'' instead guarantees
+that a given store is observed at the same time by all -other- CPUs.  The
+remainder of this document discusses this weaker form, but for brevity
+will call it simply ``multicopy atomicity''.
-Transitivity is a deeply intuitive notion about ordering that is not
+The following example demonstrates multicopy atomicity:
-always provided by real computer systems.  The following example
-demonstrates transitivity:
        CPU 1                   CPU 2                   CPU 3
        ======================= ======================= =======================
                { X = 0, Y = 0 }
-        STORE X=1               LOAD X                  STORE Y=1
+        STORE X=1               r1=LOAD X (reads 1)     LOAD Y (reads 1)
-                                <general barrier>       <general barrier>
+                                <general barrier>       <read barrier>
-                                LOAD Y                  LOAD X
+                                STORE Y=r1              LOAD X
-Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
+Suppose that CPU 2's load from X returns 1 which it then stores to Y and
-This indicates that CPU 2's load from X in some sense follows CPU 1's
+that CPU 3's load from Y returns 1.  This indicates that CPU 2's load
-store to X and that CPU 2's load from Y in some sense preceded CPU 3's
+from X in some sense follows CPU 1's store to X and that CPU 2's store
-store to Y.  The question is then "Can CPU 3's load from X return 0?"
+to Y in some sense preceded CPU 3's load from Y.  The question is then
+"Can CPU 3's load from X return 0?"
-Because CPU 2's load from X in some sense came after CPU 1's store, it
+Because CPU 3's load from X in some sense came after CPU 2's load, it
 is natural to expect that CPU 3's load from X must therefore return 1.
-This expectation is an example of transitivity: if a load executing on
+This expectation is an example of multicopy atomicity: if a load executing
-CPU A follows a load from the same variable executing on CPU B, then
+on CPU A follows a load from the same variable executing on CPU B, then
-CPU A's load must either return the same value that CPU B's load did,
+an understandable but incorrect expectation is that CPU A's load must
-or must return some later value.
+either return the same value that CPU B's load did, or must return some
+later value.
-In the Linux kernel, use of general memory barriers guarantees
-transitivity.  Therefore, in the above example, if CPU 2's load from X
+In the Linux kernel, the above use of a general memory barrier compensates
-returns 1 and its load from Y returns 0, then CPU 3's load from X must
+for any lack of multicopy atomicity.  Therefore, in the above example,
-also return 1.
+if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's
+load from Y returns 1, then CPU 3's load from X must also return 1.
-However, transitivity is -not- guaranteed for read or write barriers.
-For example, suppose that CPU 2's general barrier in the above example
+However, dependencies, read barriers, and write barriers are not always
-is changed to a read barrier as shown below:
+able to compensate for non-multicopy atomicity.  For example, suppose
+that CPU 2's general barrier is removed from the above example, leaving
+only the data dependency shown below:
        CPU 1                   CPU 2                   CPU 3
        ======================= ======================= =======================
                { X = 0, Y = 0 }
-        STORE X=1               LOAD X                  STORE Y=1
+        STORE X=1               r1=LOAD X (reads 1)     LOAD Y (reads 1)
-                                <read barrier>          <general barrier>
+                                <data dependency>       <read barrier>
-                                LOAD Y                  LOAD X
+                                STORE Y=r1              LOAD X (reads 0)
-This substitution destroys transitivity: in this example, it is perfectly
+This substitution allows non-multicopy atomicity to run rampant: in
-legal for CPU 2's load from X to return 1, its load from Y to return 0,
+this example, it is perfectly legal for CPU 2's load from X to return 1,
-and CPU 3's load from X to return 0.
+CPU 3's load from Y to return 1, and its load from X to return 0.
-The key point is that although CPU 2's read barrier orders its pair
+The key point is that although CPU 2's data dependency orders its load
-of loads, it does not guarantee to order CPU 1's store.  Therefore, if
+and store, it does not guarantee to order CPU 1's store.  Therefore,
-this example runs on a system where CPUs 1 and 2 share a store buffer
+if this example runs on a non-multicopy-atomic system where CPUs 1 and 2
-or a level of cache, CPU 2 might have early access to CPU 1's writes.
+share a store buffer or a level of cache, CPU 2 might have early access
-General barriers are therefore required to ensure that all CPUs agree
+to CPU 1's writes.  A general barrier is therefore required to ensure
-on the combined order of CPU 1's and CPU 2's accesses.
+that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses.
-General barriers provide "global transitivity", so that all CPUs will
+General barriers can compensate not only for non-multicopy atomicity,
-agree on the order of operations.  In contrast, a chain of release-acquire
+but can also generate additional ordering that can ensure that -all-
-pairs provides only "local transitivity", so that only those CPUs on
+CPUs will perceive the same order of -all- operations.  In contrast, a
-the chain are guaranteed to agree on the combined order of the accesses.
+chain of release-acquire pairs do not provide this additional ordering,
-For example, switching to C code in deference to Herman Hollerith:
+which means that only those CPUs on the chain are guaranteed to agree
+on the combined order of the accesses.  For example, switching to C code
+in deference to the ghost of Herman Hollerith:
        int u, v, x, y, z;
@@ -1448,9 +1440,9 @@ For example, switching to C code in deference to Herman Hollerith:
                r3 = READ_ONCE(u);
        }
-Because cpu0(), cpu1(), and cpu2() participate in a local transitive
+Because cpu0(), cpu1(), and cpu2() participate in a chain of
-chain of smp_store_release()/smp_load_acquire() pairs, the following
+smp_store_release()/smp_load_acquire() pairs, the following outcome
-outcome is prohibited:
+is prohibited:
        r0 == 1 && r1 == 1 && r2 == 1
@@ -1460,9 +1452,9 @@ outcome is prohibited:
        r1 == 1 && r5 == 0
-However, the transitivity of release-acquire is local to the participating
+However, the ordering provided by a release-acquire chain is local
-CPUs and does not apply to cpu3().  Therefore, the following outcome
+to the CPUs participating in that chain and does not apply to cpu3(),
-is possible:
+at least aside from stores.  Therefore, the following outcome is possible:
        r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
@@ -1490,8 +1482,8 @@ following outcome is possible:
 Note that this outcome can happen even on a mythical sequentially
 consistent system where nothing is ever reordered.
-To reiterate, if your code requires global transitivity, use general
+To reiterate, if your code requires full ordering of all operations,
-barriers throughout.
+use general barriers throughout.
 ========================
@@ -3101,6 +3093,9 @@ AMD64 Architecture Programmer's Manual Volume 2: System Programming
        Chapter 7.1: Memory-Access Ordering
        Chapter 7.4: Buffering and Combining Memory Writes
+ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
+        Chapter B2: The AArch64 Application Level Memory Model
 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
 System Programming Guide
        Chapter 7.1: Locked Atomic Operations
@@ -3112,6 +3107,8 @@ The SPARC Architecture Manual, Version 9
        Appendix D: Formal Specification of the Memory Models
        Appendix J: Programming with the Memory Models
+Storage in the PowerPC (Stone and Fitzgerald)
 UltraSPARC Programmer Reference Manual
        Chapter 5: Memory Accesses and Cacheability
        Chapter 15: Sparc-V9 Memory Models
author	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-29 18:49:21 -0400
committer	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 17:23:36 -0400
commit	f1ab25a30ce81f4e9be3cb33cd9bb9fb2db64b28 (patch)
tree	5bb7cb22d9e51dba5acd7b64bd2c00b015bf0149
parent	d3cf5176d0b17610b7c7f1562e496ec401fe01f8 (diff)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index b759a60624fd..b6882680247e 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt
@@ -53,7 +53,7 @@ CONTENTS
53	- SMP barrier pairing.	53	- SMP barrier pairing.
54	- Examples of memory barrier sequences.	54	- Examples of memory barrier sequences.
55	- Read memory barriers vs load speculation.	55	- Read memory barriers vs load speculation.
56	- Transitivity	56	- Multicopy atomicity.
57		57
58	(*) Explicit kernel barriers.	58	(*) Explicit kernel barriers.
59		59
@@ -635,6 +635,11 @@ can be used to record rare error conditions and the like, and the CPUs'
635	naturally occurring ordering prevents such records from being lost.	635	naturally occurring ordering prevents such records from being lost.
636		636
637		637
		638	Note well that the ordering provided by a data dependency is local to
		639	the CPU containing it. See the section on "Multicopy atomicity" for
		640	more information.
		641
		642
638	The data dependency barrier is very important to the RCU system,	643	The data dependency barrier is very important to the RCU system,
639	for example. See rcu_assign_pointer() and rcu_dereference() in	644	for example. See rcu_assign_pointer() and rcu_dereference() in
640	include/linux/rcupdate.h. This permits the current target of an RCU'd	645	include/linux/rcupdate.h. This permits the current target of an RCU'd
@@ -851,38 +856,11 @@ In short, control dependencies apply only to the stores in the then-clause
851	and else-clause of the if-statement in question (including functions	856	and else-clause of the if-statement in question (including functions
852	invoked by those two clauses), not to code following that if-statement.	857	invoked by those two clauses), not to code following that if-statement.
853		858
854	Finally, control dependencies do -not- provide transitivity. This is
855	demonstrated by two related examples, with the initial values of
856	'x' and 'y' both being zero:
857
858	CPU 0 CPU 1
859	======================= =======================
860	r1 = READ_ONCE(x); r2 = READ_ONCE(y);
861	if (r1 > 0) if (r2 > 0)
862	WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
863
864	assert(!(r1 == 1 && r2 == 1));
865
866	The above two-CPU example will never trigger the assert(). However,
867	if control dependencies guaranteed transitivity (which they do not),
868	then adding the following CPU would guarantee a related assertion:
869
870	CPU 2
871	=====================
872	WRITE_ONCE(x, 2);
873
874	assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
875		859
876	But because control dependencies do -not- provide transitivity, the above	860	Note well that the ordering provided by a control dependency is local
877	assertion can fail after the combined three-CPU example completes. If you	861	to the CPU containing it. See the section on "Multicopy atomicity"
878	need the three-CPU example to provide ordering, you will need smp_mb()	862	for more information.
879	between the loads and stores in the CPU 0 and CPU 1 code fragments,
880	that is, just before or just after the "if" statements. Furthermore,
881	the original two-CPU example is very fragile and should be avoided.
882		863
883	These two examples are the LB and WWC litmus tests from this paper:
884	http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
885	site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
886		864
887	In summary:	865	In summary:
888		866
@@ -922,8 +900,8 @@ In summary:
922		900
923	(*) Control dependencies pair normally with other types of barriers.	901	(*) Control dependencies pair normally with other types of barriers.
924		902
925	(*) Control dependencies do -not- provide transitivity. If you	903	(*) Control dependencies do -not- provide multicopy atomicity. If you
926	need transitivity, use smp_mb().	904	need all the CPUs to see a given store at the same time, use smp_mb().
927		905
928	(*) Compilers do not understand control dependencies. It is therefore	906	(*) Compilers do not understand control dependencies. It is therefore
929	your job to ensure that they do not break your code.	907	your job to ensure that they do not break your code.
@@ -936,13 +914,14 @@ When dealing with CPU-CPU interactions, certain types of memory barrier should
936	always be paired. A lack of appropriate pairing is almost certainly an error.	914	always be paired. A lack of appropriate pairing is almost certainly an error.
937		915
938	General barriers pair with each other, though they also pair with most	916	General barriers pair with each other, though they also pair with most
939	other types of barriers, albeit without transitivity. An acquire barrier	917	other types of barriers, albeit without multicopy atomicity. An acquire
940	pairs with a release barrier, but both may also pair with other barriers,	918	barrier pairs with a release barrier, but both may also pair with other
941	including of course general barriers. A write barrier pairs with a data	919	barriers, including of course general barriers. A write barrier pairs
942	dependency barrier, a control dependency, an acquire barrier, a release	920	with a data dependency barrier, a control dependency, an acquire barrier,
943	barrier, a read barrier, or a general barrier. Similarly a read barrier,	921	a release barrier, a read barrier, or a general barrier. Similarly a
944	control dependency, or a data dependency barrier pairs with a write	922	read barrier, control dependency, or a data dependency barrier pairs
945	barrier, an acquire barrier, a release barrier, or a general barrier:	923	with a write barrier, an acquire barrier, a release barrier, or a
		924	general barrier:
946		925
947	CPU 1 CPU 2	926	CPU 1 CPU 2
948	=============== ===============	927	=============== ===============
@@ -1359,64 +1338,77 @@ the speculation will be cancelled and the value reloaded:
1359	retrieved : : +-------+	1338	retrieved : : +-------+
1360		1339
1361		1340
1362	TRANSITIVITY	1341	MULTICOPY ATOMICITY
1363	------------	1342	--------------------
		1343
		1344	Multicopy atomicity is a deeply intuitive notion about ordering that is
		1345	not always provided by real computer systems, namely that a given store
		1346	is visible at the same time to all CPUs, or, alternatively, that all
		1347	CPUs agree on the order in which all stores took place. However, use of
		1348	full multicopy atomicity would rule out valuable hardware optimizations,
		1349	so a weaker form called ``other multicopy atomicity'' instead guarantees
		1350	that a given store is observed at the same time by all -other- CPUs. The
		1351	remainder of this document discusses this weaker form, but for brevity
		1352	will call it simply ``multicopy atomicity''.
1364		1353
1365	Transitivity is a deeply intuitive notion about ordering that is not	1354	The following example demonstrates multicopy atomicity:
1366	always provided by real computer systems. The following example
1367	demonstrates transitivity:
1368		1355
1369	CPU 1 CPU 2 CPU 3	1356	CPU 1 CPU 2 CPU 3
1370	======================= ======================= =======================	1357	======================= ======================= =======================
1371	{ X = 0, Y = 0 }	1358	{ X = 0, Y = 0 }
1372	STORE X=1 LOAD X STORE Y=1	1359	STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1)
1373	<general barrier> <general barrier>	1360	<general barrier> <read barrier>
1374	LOAD Y LOAD X	1361	STORE Y=r1 LOAD X
1375		1362
1376	Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.	1363	Suppose that CPU 2's load from X returns 1 which it then stores to Y and
1377	This indicates that CPU 2's load from X in some sense follows CPU 1's	1364	that CPU 3's load from Y returns 1. This indicates that CPU 2's load
1378	store to X and that CPU 2's load from Y in some sense preceded CPU 3's	1365	from X in some sense follows CPU 1's store to X and that CPU 2's store
1379	store to Y. The question is then "Can CPU 3's load from X return 0?"	1366	to Y in some sense preceded CPU 3's load from Y. The question is then
		1367	"Can CPU 3's load from X return 0?"
1380		1368
1381	Because CPU 2's load from X in some sense came after CPU 1's store, it	1369	Because CPU 3's load from X in some sense came after CPU 2's load, it
1382	is natural to expect that CPU 3's load from X must therefore return 1.	1370	is natural to expect that CPU 3's load from X must therefore return 1.
1383	This expectation is an example of transitivity: if a load executing on	1371	This expectation is an example of multicopy atomicity: if a load executing
1384	CPU A follows a load from the same variable executing on CPU B, then	1372	on CPU A follows a load from the same variable executing on CPU B, then
1385	CPU A's load must either return the same value that CPU B's load did,	1373	an understandable but incorrect expectation is that CPU A's load must
1386	or must return some later value.	1374	either return the same value that CPU B's load did, or must return some
1387		1375	later value.
1388	In the Linux kernel, use of general memory barriers guarantees	1376
1389	transitivity. Therefore, in the above example, if CPU 2's load from X	1377	In the Linux kernel, the above use of a general memory barrier compensates
1390	returns 1 and its load from Y returns 0, then CPU 3's load from X must	1378	for any lack of multicopy atomicity. Therefore, in the above example,
1391	also return 1.	1379	if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's
1392		1380	load from Y returns 1, then CPU 3's load from X must also return 1.
1393	However, transitivity is -not- guaranteed for read or write barriers.	1381
1394	For example, suppose that CPU 2's general barrier in the above example	1382	However, dependencies, read barriers, and write barriers are not always
1395	is changed to a read barrier as shown below:	1383	able to compensate for non-multicopy atomicity. For example, suppose
		1384	that CPU 2's general barrier is removed from the above example, leaving
		1385	only the data dependency shown below:
1396		1386
1397	CPU 1 CPU 2 CPU 3	1387	CPU 1 CPU 2 CPU 3
1398	======================= ======================= =======================	1388	======================= ======================= =======================
1399	{ X = 0, Y = 0 }	1389	{ X = 0, Y = 0 }
1400	STORE X=1 LOAD X STORE Y=1	1390	STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1)
1401	<read barrier> <general barrier>	1391	<data dependency> <read barrier>
1402	LOAD Y LOAD X	1392	STORE Y=r1 LOAD X (reads 0)
1403		1393
1404	This substitution destroys transitivity: in this example, it is perfectly	1394	This substitution allows non-multicopy atomicity to run rampant: in
1405	legal for CPU 2's load from X to return 1, its load from Y to return 0,	1395	this example, it is perfectly legal for CPU 2's load from X to return 1,
1406	and CPU 3's load from X to return 0.	1396	CPU 3's load from Y to return 1, and its load from X to return 0.
1407		1397
1408	The key point is that although CPU 2's read barrier orders its pair	1398	The key point is that although CPU 2's data dependency orders its load
1409	of loads, it does not guarantee to order CPU 1's store. Therefore, if	1399	and store, it does not guarantee to order CPU 1's store. Therefore,
1410	this example runs on a system where CPUs 1 and 2 share a store buffer	1400	if this example runs on a non-multicopy-atomic system where CPUs 1 and 2
1411	or a level of cache, CPU 2 might have early access to CPU 1's writes.	1401	share a store buffer or a level of cache, CPU 2 might have early access
1412	General barriers are therefore required to ensure that all CPUs agree	1402	to CPU 1's writes. A general barrier is therefore required to ensure
1413	on the combined order of CPU 1's and CPU 2's accesses.	1403	that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses.
1414		1404
1415	General barriers provide "global transitivity", so that all CPUs will	1405	General barriers can compensate not only for non-multicopy atomicity,
1416	agree on the order of operations. In contrast, a chain of release-acquire	1406	but can also generate additional ordering that can ensure that -all-
1417	pairs provides only "local transitivity", so that only those CPUs on	1407	CPUs will perceive the same order of -all- operations. In contrast, a
1418	the chain are guaranteed to agree on the combined order of the accesses.	1408	chain of release-acquire pairs do not provide this additional ordering,
1419	For example, switching to C code in deference to Herman Hollerith:	1409	which means that only those CPUs on the chain are guaranteed to agree
		1410	on the combined order of the accesses. For example, switching to C code
		1411	in deference to the ghost of Herman Hollerith:
1420		1412
1421	int u, v, x, y, z;	1413	int u, v, x, y, z;
1422		1414
@@ -1448,9 +1440,9 @@ For example, switching to C code in deference to Herman Hollerith:
1448	r3 = READ_ONCE(u);	1440	r3 = READ_ONCE(u);
1449	}	1441	}
1450		1442
1451	Because cpu0(), cpu1(), and cpu2() participate in a local transitive	1443	Because cpu0(), cpu1(), and cpu2() participate in a chain of
1452	chain of smp_store_release()/smp_load_acquire() pairs, the following	1444	smp_store_release()/smp_load_acquire() pairs, the following outcome
1453	outcome is prohibited:	1445	is prohibited:
1454		1446
1455	r0 == 1 && r1 == 1 && r2 == 1	1447	r0 == 1 && r1 == 1 && r2 == 1
1456		1448
@@ -1460,9 +1452,9 @@ outcome is prohibited:
1460		1452
1461	r1 == 1 && r5 == 0	1453	r1 == 1 && r5 == 0
1462		1454
1463	However, the transitivity of release-acquire is local to the participating	1455	However, the ordering provided by a release-acquire chain is local
1464	CPUs and does not apply to cpu3(). Therefore, the following outcome	1456	to the CPUs participating in that chain and does not apply to cpu3(),
1465	is possible:	1457	at least aside from stores. Therefore, the following outcome is possible:
1466		1458
1467	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0	1459	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
1468		1460
@@ -1490,8 +1482,8 @@ following outcome is possible:
1490	Note that this outcome can happen even on a mythical sequentially	1482	Note that this outcome can happen even on a mythical sequentially
1491	consistent system where nothing is ever reordered.	1483	consistent system where nothing is ever reordered.
1492		1484
1493	To reiterate, if your code requires global transitivity, use general	1485	To reiterate, if your code requires full ordering of all operations,
1494	barriers throughout.	1486	use general barriers throughout.
1495		1487
1496		1488
1497	========================	1489	========================
@@ -3101,6 +3093,9 @@ AMD64 Architecture Programmer's Manual Volume 2: System Programming
3101	Chapter 7.1: Memory-Access Ordering	3093	Chapter 7.1: Memory-Access Ordering
3102	Chapter 7.4: Buffering and Combining Memory Writes	3094	Chapter 7.4: Buffering and Combining Memory Writes
3103		3095
		3096	ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
		3097	Chapter B2: The AArch64 Application Level Memory Model
		3098
3104	IA-32 Intel Architecture Software Developer's Manual, Volume 3:	3099	IA-32 Intel Architecture Software Developer's Manual, Volume 3:
3105	System Programming Guide	3100	System Programming Guide
3106	Chapter 7.1: Locked Atomic Operations	3101	Chapter 7.1: Locked Atomic Operations
@@ -3112,6 +3107,8 @@ The SPARC Architecture Manual, Version 9
3112	Appendix D: Formal Specification of the Memory Models	3107	Appendix D: Formal Specification of the Memory Models
3113	Appendix J: Programming with the Memory Models	3108	Appendix J: Programming with the Memory Models
3114		3109
		3110	Storage in the PowerPC (Stone and Fitzgerald)
		3111
3115	UltraSPARC Programmer Reference Manual	3112	UltraSPARC Programmer Reference Manual
3116	Chapter 5: Memory Accesses and Cacheability	3113	Chapter 5: Memory Accesses and Cacheability
3117	Chapter 15: Sparc-V9 Memory Models	3114	Chapter 15: Sparc-V9 Memory Models