documentation: Distinguish between local and global transitivity

The introduction of smp_load_acquire() and smp_store_release() had the side effect of introducing a weaker notion of transitivity: The transitivity of full smp_mb() barriers is global, but that of smp_store_release()/smp_load_acquire() chains is local. This commit therefore introduces the notion of local transitivity and gives an example. Reported-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2016-01-15 12:30:42 -0500
committer: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2016-03-14 18:52:18 -0400
commit: c535cc92924baf68e238bd1b5ff8d74883f88b9b (patch)
tree: 7659bf6bc6c43ba024700015f89a4b7b77b3b758 /Documentation/memory-barriers.txt
parent: 92a84dd210b8263f765882d3ee1a1d5cd348c16a (diff)
1 files changed, 76 insertions, 2 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index e9ebeb3b1077..ae9d306725ba 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
 General barriers are therefore required to ensure that all CPUs agree
 on the combined order of CPU 1's and CPU 2's accesses.
-To reiterate, if your code requires transitivity, use general barriers
+General barriers provide "global transitivity", so that all CPUs will
-throughout.
+agree on the order of operations.  In contrast, a chain of release-acquire
+pairs provides only "local transitivity", so that only those CPUs on
+the chain are guaranteed to agree on the combined order of the accesses.
+For example, switching to C code in deference to Herman Hollerith:
+        int u, v, x, y, z;
+        void cpu0(void)
+        {
+                r0 = smp_load_acquire(&x);
+                WRITE_ONCE(u, 1);
+                smp_store_release(&y, 1);
+        }
+        void cpu1(void)
+        {
+                r1 = smp_load_acquire(&y);
+                r4 = READ_ONCE(v);
+                r5 = READ_ONCE(u);
+                smp_store_release(&z, 1);
+        }
+        void cpu2(void)
+        {
+                r2 = smp_load_acquire(&z);
+                smp_store_release(&x, 1);
+        }
+        void cpu3(void)
+        {
+                WRITE_ONCE(v, 1);
+                smp_mb();
+                r3 = READ_ONCE(u);
+        }
+Because cpu0(), cpu1(), and cpu2() participate in a local transitive
+chain of smp_store_release()/smp_load_acquire() pairs, the following
+outcome is prohibited:
+        r0 == 1 && r1 == 1 && r2 == 1
+Furthermore, because of the release-acquire relationship between cpu0()
+and cpu1(), cpu1() must see cpu0()'s writes, so that the following
+outcome is prohibited:
+        r1 == 1 && r5 == 0
+However, the transitivity of release-acquire is local to the participating
+CPUs and does not apply to cpu3().  Therefore, the following outcome
+is possible:
+        r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
+Although cpu0(), cpu1(), and cpu2() will see their respective reads and
+writes in order, CPUs not involved in the release-acquire chain might
+well disagree on the order.  This disagreement stems from the fact that
+the weak memory-barrier instructions used to implement smp_load_acquire()
+and smp_store_release() are not required to order prior stores against
+subsequent loads in all cases.  This means that cpu3() can see cpu0()'s
+store to u as happening -after- cpu1()'s load from v, even though
+both cpu0() and cpu1() agree that these two operations occurred in the
+intended order.
+However, please keep in mind that smp_load_acquire() is not magic.
+In particular, it simply reads from its argument with ordering.  It does
+-not- ensure that any particular value will be read.  Therefore, the
+following outcome is possible:
+        r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
+Note that this outcome can happen even on a mythical sequentially
+consistent system where nothing is ever reordered.
+To reiterate, if your code requires global transitivity, use general
+barriers throughout.
 ========================
author	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-01-15 12:30:42 -0500
committer	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-14 18:52:18 -0400
commit	c535cc92924baf68e238bd1b5ff8d74883f88b9b (patch)
tree	7659bf6bc6c43ba024700015f89a4b7b77b3b758 /Documentation/memory-barriers.txt
parent	92a84dd210b8263f765882d3ee1a1d5cd348c16a (diff)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index e9ebeb3b1077..ae9d306725ba 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt
@@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
1318	General barriers are therefore required to ensure that all CPUs agree	1318	General barriers are therefore required to ensure that all CPUs agree
1319	on the combined order of CPU 1's and CPU 2's accesses.	1319	on the combined order of CPU 1's and CPU 2's accesses.
1320		1320
1321	To reiterate, if your code requires transitivity, use general barriers	1321	General barriers provide "global transitivity", so that all CPUs will
1322	throughout.	1322	agree on the order of operations. In contrast, a chain of release-acquire
		1323	pairs provides only "local transitivity", so that only those CPUs on
		1324	the chain are guaranteed to agree on the combined order of the accesses.
		1325	For example, switching to C code in deference to Herman Hollerith:
		1326
		1327	int u, v, x, y, z;
		1328
		1329	void cpu0(void)
		1330	{
		1331	r0 = smp_load_acquire(&x);
		1332	WRITE_ONCE(u, 1);
		1333	smp_store_release(&y, 1);
		1334	}
		1335
		1336	void cpu1(void)
		1337	{
		1338	r1 = smp_load_acquire(&y);
		1339	r4 = READ_ONCE(v);
		1340	r5 = READ_ONCE(u);
		1341	smp_store_release(&z, 1);
		1342	}
		1343
		1344	void cpu2(void)
		1345	{
		1346	r2 = smp_load_acquire(&z);
		1347	smp_store_release(&x, 1);
		1348	}
		1349
		1350	void cpu3(void)
		1351	{
		1352	WRITE_ONCE(v, 1);
		1353	smp_mb();
		1354	r3 = READ_ONCE(u);
		1355	}
		1356
		1357	Because cpu0(), cpu1(), and cpu2() participate in a local transitive
		1358	chain of smp_store_release()/smp_load_acquire() pairs, the following
		1359	outcome is prohibited:
		1360
		1361	r0 == 1 && r1 == 1 && r2 == 1
		1362
		1363	Furthermore, because of the release-acquire relationship between cpu0()
		1364	and cpu1(), cpu1() must see cpu0()'s writes, so that the following
		1365	outcome is prohibited:
		1366
		1367	r1 == 1 && r5 == 0
		1368
		1369	However, the transitivity of release-acquire is local to the participating
		1370	CPUs and does not apply to cpu3(). Therefore, the following outcome
		1371	is possible:
		1372
		1373	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
		1374
		1375	Although cpu0(), cpu1(), and cpu2() will see their respective reads and
		1376	writes in order, CPUs not involved in the release-acquire chain might
		1377	well disagree on the order. This disagreement stems from the fact that
		1378	the weak memory-barrier instructions used to implement smp_load_acquire()
		1379	and smp_store_release() are not required to order prior stores against
		1380	subsequent loads in all cases. This means that cpu3() can see cpu0()'s
		1381	store to u as happening -after- cpu1()'s load from v, even though
		1382	both cpu0() and cpu1() agree that these two operations occurred in the
		1383	intended order.
		1384
		1385	However, please keep in mind that smp_load_acquire() is not magic.
		1386	In particular, it simply reads from its argument with ordering. It does
		1387	-not- ensure that any particular value will be read. Therefore, the
		1388	following outcome is possible:
		1389
		1390	r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
		1391
		1392	Note that this outcome can happen even on a mythical sequentially
		1393	consistent system where nothing is ever reordered.
		1394
		1395	To reiterate, if your code requires global transitivity, use general
		1396	barriers throughout.
1323		1397
1324		1398
1325	========================	1399	========================