diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2016-01-15 12:30:42 -0500 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2016-03-14 18:52:18 -0400 |
commit | c535cc92924baf68e238bd1b5ff8d74883f88b9b (patch) | |
tree | 7659bf6bc6c43ba024700015f89a4b7b77b3b758 /Documentation/memory-barriers.txt | |
parent | 92a84dd210b8263f765882d3ee1a1d5cd348c16a (diff) |
documentation: Distinguish between local and global transitivity
The introduction of smp_load_acquire() and smp_store_release() had
the side effect of introducing a weaker notion of transitivity:
The transitivity of full smp_mb() barriers is global, but that
of smp_store_release()/smp_load_acquire() chains is local. This
commit therefore introduces the notion of local transitivity and
gives an example.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r-- | Documentation/memory-barriers.txt | 78 |
1 files changed, 76 insertions, 2 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index e9ebeb3b1077..ae9d306725ba 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes. | |||
1318 | General barriers are therefore required to ensure that all CPUs agree | 1318 | General barriers are therefore required to ensure that all CPUs agree |
1319 | on the combined order of CPU 1's and CPU 2's accesses. | 1319 | on the combined order of CPU 1's and CPU 2's accesses. |
1320 | 1320 | ||
1321 | To reiterate, if your code requires transitivity, use general barriers | 1321 | General barriers provide "global transitivity", so that all CPUs will |
1322 | throughout. | 1322 | agree on the order of operations. In contrast, a chain of release-acquire |
1323 | pairs provides only "local transitivity", so that only those CPUs on | ||
1324 | the chain are guaranteed to agree on the combined order of the accesses. | ||
1325 | For example, switching to C code in deference to Herman Hollerith: | ||
1326 | |||
1327 | int u, v, x, y, z; | ||
1328 | |||
1329 | void cpu0(void) | ||
1330 | { | ||
1331 | r0 = smp_load_acquire(&x); | ||
1332 | WRITE_ONCE(u, 1); | ||
1333 | smp_store_release(&y, 1); | ||
1334 | } | ||
1335 | |||
1336 | void cpu1(void) | ||
1337 | { | ||
1338 | r1 = smp_load_acquire(&y); | ||
1339 | r4 = READ_ONCE(v); | ||
1340 | r5 = READ_ONCE(u); | ||
1341 | smp_store_release(&z, 1); | ||
1342 | } | ||
1343 | |||
1344 | void cpu2(void) | ||
1345 | { | ||
1346 | r2 = smp_load_acquire(&z); | ||
1347 | smp_store_release(&x, 1); | ||
1348 | } | ||
1349 | |||
1350 | void cpu3(void) | ||
1351 | { | ||
1352 | WRITE_ONCE(v, 1); | ||
1353 | smp_mb(); | ||
1354 | r3 = READ_ONCE(u); | ||
1355 | } | ||
1356 | |||
1357 | Because cpu0(), cpu1(), and cpu2() participate in a local transitive | ||
1358 | chain of smp_store_release()/smp_load_acquire() pairs, the following | ||
1359 | outcome is prohibited: | ||
1360 | |||
1361 | r0 == 1 && r1 == 1 && r2 == 1 | ||
1362 | |||
1363 | Furthermore, because of the release-acquire relationship between cpu0() | ||
1364 | and cpu1(), cpu1() must see cpu0()'s writes, so that the following | ||
1365 | outcome is prohibited: | ||
1366 | |||
1367 | r1 == 1 && r5 == 0 | ||
1368 | |||
1369 | However, the transitivity of release-acquire is local to the participating | ||
1370 | CPUs and does not apply to cpu3(). Therefore, the following outcome | ||
1371 | is possible: | ||
1372 | |||
1373 | r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 | ||
1374 | |||
1375 | Although cpu0(), cpu1(), and cpu2() will see their respective reads and | ||
1376 | writes in order, CPUs not involved in the release-acquire chain might | ||
1377 | well disagree on the order. This disagreement stems from the fact that | ||
1378 | the weak memory-barrier instructions used to implement smp_load_acquire() | ||
1379 | and smp_store_release() are not required to order prior stores against | ||
1380 | subsequent loads in all cases. This means that cpu3() can see cpu0()'s | ||
1381 | store to u as happening -after- cpu1()'s load from v, even though | ||
1382 | both cpu0() and cpu1() agree that these two operations occurred in the | ||
1383 | intended order. | ||
1384 | |||
1385 | However, please keep in mind that smp_load_acquire() is not magic. | ||
1386 | In particular, it simply reads from its argument with ordering. It does | ||
1387 | -not- ensure that any particular value will be read. Therefore, the | ||
1388 | following outcome is possible: | ||
1389 | |||
1390 | r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0 | ||
1391 | |||
1392 | Note that this outcome can happen even on a mythical sequentially | ||
1393 | consistent system where nothing is ever reordered. | ||
1394 | |||
1395 | To reiterate, if your code requires global transitivity, use general | ||
1396 | barriers throughout. | ||
1323 | 1397 | ||
1324 | 1398 | ||
1325 | ======================== | 1399 | ======================== |