aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAlan Stern <stern@rowland.harvard.edu>2017-09-01 10:53:34 -0400
committerPaul E. McKenney <paulmck@linux.vnet.ibm.com>2017-10-09 17:23:37 -0400
commit0902b1f44a72558aece92f074154044861681f84 (patch)
treeeb6505ea836b4248a6effb1788ee29a5b87e18b8
parentf1ab25a30ce81f4e9be3cb33cd9bb9fb2db64b28 (diff)
memory-barriers: Rework multicopy-atomicity section
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-rw-r--r--Documentation/memory-barriers.txt58
1 files changed, 30 insertions, 28 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index b6882680247e..7deee1441640 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1343,13 +1343,13 @@ MULTICOPY ATOMICITY
1343 1343
1344Multicopy atomicity is a deeply intuitive notion about ordering that is 1344Multicopy atomicity is a deeply intuitive notion about ordering that is
1345not always provided by real computer systems, namely that a given store 1345not always provided by real computer systems, namely that a given store
1346is visible at the same time to all CPUs, or, alternatively, that all 1346becomes visible at the same time to all CPUs, or, alternatively, that all
1347CPUs agree on the order in which all stores took place. However, use of 1347CPUs agree on the order in which all stores become visible. However,
1348full multicopy atomicity would rule out valuable hardware optimizations, 1348support of full multicopy atomicity would rule out valuable hardware
1349so a weaker form called ``other multicopy atomicity'' instead guarantees 1349optimizations, so a weaker form called ``other multicopy atomicity''
1350that a given store is observed at the same time by all -other- CPUs. The 1350instead guarantees only that a given store becomes visible at the same
1351remainder of this document discusses this weaker form, but for brevity 1351time to all -other- CPUs. The remainder of this document discusses this
1352will call it simply ``multicopy atomicity''. 1352weaker form, but for brevity will call it simply ``multicopy atomicity''.
1353 1353
1354The following example demonstrates multicopy atomicity: 1354The following example demonstrates multicopy atomicity:
1355 1355
@@ -1360,24 +1360,26 @@ The following example demonstrates multicopy atomicity:
1360 <general barrier> <read barrier> 1360 <general barrier> <read barrier>
1361 STORE Y=r1 LOAD X 1361 STORE Y=r1 LOAD X
1362 1362
1363Suppose that CPU 2's load from X returns 1 which it then stores to Y and 1363Suppose that CPU 2's load from X returns 1, which it then stores to Y,
1364that CPU 3's load from Y returns 1. This indicates that CPU 2's load 1364and CPU 3's load from Y returns 1. This indicates that CPU 1's store
1365from X in some sense follows CPU 1's store to X and that CPU 2's store 1365to X precedes CPU 2's load from X and that CPU 2's store to Y precedes
1366to Y in some sense preceded CPU 3's load from Y. The question is then 1366CPU 3's load from Y. In addition, the memory barriers guarantee that
1367"Can CPU 3's load from X return 0?" 1367CPU 2 executes its load before its store, and CPU 3 loads from Y before
1368it loads from X. The question is then "Can CPU 3's load from X return 0?"
1368 1369
1369Because CPU 3's load from X in some sense came after CPU 2's load, it 1370Because CPU 3's load from X in some sense comes after CPU 2's load, it
1370is natural to expect that CPU 3's load from X must therefore return 1. 1371is natural to expect that CPU 3's load from X must therefore return 1.
1371This expectation is an example of multicopy atomicity: if a load executing 1372This expectation follows from multicopy atomicity: if a load executing
1372on CPU A follows a load from the same variable executing on CPU B, then 1373on CPU B follows a load from the same variable executing on CPU A (and
1373an understandable but incorrect expectation is that CPU A's load must 1374CPU A did not originally store the value which it read), then on
1374either return the same value that CPU B's load did, or must return some 1375multicopy-atomic systems, CPU B's load must return either the same value
1375later value. 1376that CPU A's load did or some later value. However, the Linux kernel
1376 1377does not require systems to be multicopy atomic.
1377In the Linux kernel, the above use of a general memory barrier compensates 1378
1378for any lack of multicopy atomicity. Therefore, in the above example, 1379The use of a general memory barrier in the example above compensates
1379if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's 1380for any lack of multicopy atomicity. In the example, if CPU 2's load
1380load from Y returns 1, then CPU 3's load from X must also return 1. 1381from X returns 1 and CPU 3's load from Y returns 1, then CPU 3's load
1382from X must indeed also return 1.
1381 1383
1382However, dependencies, read barriers, and write barriers are not always 1384However, dependencies, read barriers, and write barriers are not always
1383able to compensate for non-multicopy atomicity. For example, suppose 1385able to compensate for non-multicopy atomicity. For example, suppose
@@ -1396,11 +1398,11 @@ this example, it is perfectly legal for CPU 2's load from X to return 1,
1396CPU 3's load from Y to return 1, and its load from X to return 0. 1398CPU 3's load from Y to return 1, and its load from X to return 0.
1397 1399
1398The key point is that although CPU 2's data dependency orders its load 1400The key point is that although CPU 2's data dependency orders its load
1399and store, it does not guarantee to order CPU 1's store. Therefore, 1401and store, it does not guarantee to order CPU 1's store. Thus, if this
1400if this example runs on a non-multicopy-atomic system where CPUs 1 and 2 1402example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1401share a store buffer or a level of cache, CPU 2 might have early access 1403store buffer or a level of cache, CPU 2 might have early access to CPU 1's
1402to CPU 1's writes. A general barrier is therefore required to ensure 1404writes. General barriers are therefore required to ensure that all CPUs
1403that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses. 1405agree on the combined order of multiple accesses.
1404 1406
1405General barriers can compensate not only for non-multicopy atomicity, 1407General barriers can compensate not only for non-multicopy atomicity,
1406but can also generate additional ordering that can ensure that -all- 1408but can also generate additional ordering that can ensure that -all-