summaryrefslogtreecommitdiffstats
path: root/Documentation/memory-barriers.txt
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@linux.vnet.ibm.com>2016-01-15 12:30:42 -0500
committerPaul E. McKenney <paulmck@linux.vnet.ibm.com>2016-03-14 18:52:18 -0400
commitc535cc92924baf68e238bd1b5ff8d74883f88b9b (patch)
tree7659bf6bc6c43ba024700015f89a4b7b77b3b758 /Documentation/memory-barriers.txt
parent92a84dd210b8263f765882d3ee1a1d5cd348c16a (diff)
documentation: Distinguish between local and global transitivity
The introduction of smp_load_acquire() and smp_store_release() had the side effect of introducing a weaker notion of transitivity: The transitivity of full smp_mb() barriers is global, but that of smp_store_release()/smp_load_acquire() chains is local. This commit therefore introduces the notion of local transitivity and gives an example. Reported-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r--Documentation/memory-barriers.txt78
1 files changed, 76 insertions, 2 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index e9ebeb3b1077..ae9d306725ba 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
1318General barriers are therefore required to ensure that all CPUs agree 1318General barriers are therefore required to ensure that all CPUs agree
1319on the combined order of CPU 1's and CPU 2's accesses. 1319on the combined order of CPU 1's and CPU 2's accesses.
1320 1320
1321To reiterate, if your code requires transitivity, use general barriers 1321General barriers provide "global transitivity", so that all CPUs will
1322throughout. 1322agree on the order of operations. In contrast, a chain of release-acquire
1323pairs provides only "local transitivity", so that only those CPUs on
1324the chain are guaranteed to agree on the combined order of the accesses.
1325For example, switching to C code in deference to Herman Hollerith:
1326
1327 int u, v, x, y, z;
1328
1329 void cpu0(void)
1330 {
1331 r0 = smp_load_acquire(&x);
1332 WRITE_ONCE(u, 1);
1333 smp_store_release(&y, 1);
1334 }
1335
1336 void cpu1(void)
1337 {
1338 r1 = smp_load_acquire(&y);
1339 r4 = READ_ONCE(v);
1340 r5 = READ_ONCE(u);
1341 smp_store_release(&z, 1);
1342 }
1343
1344 void cpu2(void)
1345 {
1346 r2 = smp_load_acquire(&z);
1347 smp_store_release(&x, 1);
1348 }
1349
1350 void cpu3(void)
1351 {
1352 WRITE_ONCE(v, 1);
1353 smp_mb();
1354 r3 = READ_ONCE(u);
1355 }
1356
1357Because cpu0(), cpu1(), and cpu2() participate in a local transitive
1358chain of smp_store_release()/smp_load_acquire() pairs, the following
1359outcome is prohibited:
1360
1361 r0 == 1 && r1 == 1 && r2 == 1
1362
1363Furthermore, because of the release-acquire relationship between cpu0()
1364and cpu1(), cpu1() must see cpu0()'s writes, so that the following
1365outcome is prohibited:
1366
1367 r1 == 1 && r5 == 0
1368
1369However, the transitivity of release-acquire is local to the participating
1370CPUs and does not apply to cpu3(). Therefore, the following outcome
1371is possible:
1372
1373 r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
1374
1375Although cpu0(), cpu1(), and cpu2() will see their respective reads and
1376writes in order, CPUs not involved in the release-acquire chain might
1377well disagree on the order. This disagreement stems from the fact that
1378the weak memory-barrier instructions used to implement smp_load_acquire()
1379and smp_store_release() are not required to order prior stores against
1380subsequent loads in all cases. This means that cpu3() can see cpu0()'s
1381store to u as happening -after- cpu1()'s load from v, even though
1382both cpu0() and cpu1() agree that these two operations occurred in the
1383intended order.
1384
1385However, please keep in mind that smp_load_acquire() is not magic.
1386In particular, it simply reads from its argument with ordering. It does
1387-not- ensure that any particular value will be read. Therefore, the
1388following outcome is possible:
1389
1390 r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
1391
1392Note that this outcome can happen even on a mythical sequentially
1393consistent system where nothing is ever reordered.
1394
1395To reiterate, if your code requires global transitivity, use general
1396barriers throughout.
1323 1397
1324 1398
1325======================== 1399========================