diff options
author | Alan Stern <stern@rowland.harvard.edu> | 2017-09-01 10:53:34 -0400 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2017-10-09 17:23:37 -0400 |
commit | 0902b1f44a72558aece92f074154044861681f84 (patch) | |
tree | eb6505ea836b4248a6effb1788ee29a5b87e18b8 | |
parent | f1ab25a30ce81f4e9be3cb33cd9bb9fb2db64b28 (diff) |
memory-barriers: Rework multicopy-atomicity section
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-rw-r--r-- | Documentation/memory-barriers.txt | 58 |
1 files changed, 30 insertions, 28 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index b6882680247e..7deee1441640 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -1343,13 +1343,13 @@ MULTICOPY ATOMICITY | |||
1343 | 1343 | ||
1344 | Multicopy atomicity is a deeply intuitive notion about ordering that is | 1344 | Multicopy atomicity is a deeply intuitive notion about ordering that is |
1345 | not always provided by real computer systems, namely that a given store | 1345 | not always provided by real computer systems, namely that a given store |
1346 | is visible at the same time to all CPUs, or, alternatively, that all | 1346 | becomes visible at the same time to all CPUs, or, alternatively, that all |
1347 | CPUs agree on the order in which all stores took place. However, use of | 1347 | CPUs agree on the order in which all stores become visible. However, |
1348 | full multicopy atomicity would rule out valuable hardware optimizations, | 1348 | support of full multicopy atomicity would rule out valuable hardware |
1349 | so a weaker form called ``other multicopy atomicity'' instead guarantees | 1349 | optimizations, so a weaker form called ``other multicopy atomicity'' |
1350 | that a given store is observed at the same time by all -other- CPUs. The | 1350 | instead guarantees only that a given store becomes visible at the same |
1351 | remainder of this document discusses this weaker form, but for brevity | 1351 | time to all -other- CPUs. The remainder of this document discusses this |
1352 | will call it simply ``multicopy atomicity''. | 1352 | weaker form, but for brevity will call it simply ``multicopy atomicity''. |
1353 | 1353 | ||
1354 | The following example demonstrates multicopy atomicity: | 1354 | The following example demonstrates multicopy atomicity: |
1355 | 1355 | ||
@@ -1360,24 +1360,26 @@ The following example demonstrates multicopy atomicity: | |||
1360 | <general barrier> <read barrier> | 1360 | <general barrier> <read barrier> |
1361 | STORE Y=r1 LOAD X | 1361 | STORE Y=r1 LOAD X |
1362 | 1362 | ||
1363 | Suppose that CPU 2's load from X returns 1 which it then stores to Y and | 1363 | Suppose that CPU 2's load from X returns 1, which it then stores to Y, |
1364 | that CPU 3's load from Y returns 1. This indicates that CPU 2's load | 1364 | and CPU 3's load from Y returns 1. This indicates that CPU 1's store |
1365 | from X in some sense follows CPU 1's store to X and that CPU 2's store | 1365 | to X precedes CPU 2's load from X and that CPU 2's store to Y precedes |
1366 | to Y in some sense preceded CPU 3's load from Y. The question is then | 1366 | CPU 3's load from Y. In addition, the memory barriers guarantee that |
1367 | "Can CPU 3's load from X return 0?" | 1367 | CPU 2 executes its load before its store, and CPU 3 loads from Y before |
1368 | it loads from X. The question is then "Can CPU 3's load from X return 0?" | ||
1368 | 1369 | ||
1369 | Because CPU 3's load from X in some sense came after CPU 2's load, it | 1370 | Because CPU 3's load from X in some sense comes after CPU 2's load, it |
1370 | is natural to expect that CPU 3's load from X must therefore return 1. | 1371 | is natural to expect that CPU 3's load from X must therefore return 1. |
1371 | This expectation is an example of multicopy atomicity: if a load executing | 1372 | This expectation follows from multicopy atomicity: if a load executing |
1372 | on CPU A follows a load from the same variable executing on CPU B, then | 1373 | on CPU B follows a load from the same variable executing on CPU A (and |
1373 | an understandable but incorrect expectation is that CPU A's load must | 1374 | CPU A did not originally store the value which it read), then on |
1374 | either return the same value that CPU B's load did, or must return some | 1375 | multicopy-atomic systems, CPU B's load must return either the same value |
1375 | later value. | 1376 | that CPU A's load did or some later value. However, the Linux kernel |
1376 | 1377 | does not require systems to be multicopy atomic. | |
1377 | In the Linux kernel, the above use of a general memory barrier compensates | 1378 | |
1378 | for any lack of multicopy atomicity. Therefore, in the above example, | 1379 | The use of a general memory barrier in the example above compensates |
1379 | if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's | 1380 | for any lack of multicopy atomicity. In the example, if CPU 2's load |
1380 | load from Y returns 1, then CPU 3's load from X must also return 1. | 1381 | from X returns 1 and CPU 3's load from Y returns 1, then CPU 3's load |
1382 | from X must indeed also return 1. | ||
1381 | 1383 | ||
1382 | However, dependencies, read barriers, and write barriers are not always | 1384 | However, dependencies, read barriers, and write barriers are not always |
1383 | able to compensate for non-multicopy atomicity. For example, suppose | 1385 | able to compensate for non-multicopy atomicity. For example, suppose |
@@ -1396,11 +1398,11 @@ this example, it is perfectly legal for CPU 2's load from X to return 1, | |||
1396 | CPU 3's load from Y to return 1, and its load from X to return 0. | 1398 | CPU 3's load from Y to return 1, and its load from X to return 0. |
1397 | 1399 | ||
1398 | The key point is that although CPU 2's data dependency orders its load | 1400 | The key point is that although CPU 2's data dependency orders its load |
1399 | and store, it does not guarantee to order CPU 1's store. Therefore, | 1401 | and store, it does not guarantee to order CPU 1's store. Thus, if this |
1400 | if this example runs on a non-multicopy-atomic system where CPUs 1 and 2 | 1402 | example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a |
1401 | share a store buffer or a level of cache, CPU 2 might have early access | 1403 | store buffer or a level of cache, CPU 2 might have early access to CPU 1's |
1402 | to CPU 1's writes. A general barrier is therefore required to ensure | 1404 | writes. General barriers are therefore required to ensure that all CPUs |
1403 | that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses. | 1405 | agree on the combined order of multiple accesses. |
1404 | 1406 | ||
1405 | General barriers can compensate not only for non-multicopy atomicity, | 1407 | General barriers can compensate not only for non-multicopy atomicity, |
1406 | but can also generate additional ordering that can ensure that -all- | 1408 | but can also generate additional ordering that can ensure that -all- |