diff options
| author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2017-08-29 18:49:21 -0400 |
|---|---|---|
| committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2017-10-09 17:23:36 -0400 |
| commit | f1ab25a30ce81f4e9be3cb33cd9bb9fb2db64b28 (patch) | |
| tree | 5bb7cb22d9e51dba5acd7b64bd2c00b015bf0149 | |
| parent | d3cf5176d0b17610b7c7f1562e496ec401fe01f8 (diff) | |
memory-barriers: Replace uses of "transitive"
The current version of memory-barriers.txt misuses the term "transitive",
so this commit replaces it with multi-copy atomic, also adding a
definition of this term.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| -rw-r--r-- | Documentation/memory-barriers.txt | 185 |
1 files changed, 91 insertions, 94 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index b759a60624fd..b6882680247e 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
| @@ -53,7 +53,7 @@ CONTENTS | |||
| 53 | - SMP barrier pairing. | 53 | - SMP barrier pairing. |
| 54 | - Examples of memory barrier sequences. | 54 | - Examples of memory barrier sequences. |
| 55 | - Read memory barriers vs load speculation. | 55 | - Read memory barriers vs load speculation. |
| 56 | - Transitivity | 56 | - Multicopy atomicity. |
| 57 | 57 | ||
| 58 | (*) Explicit kernel barriers. | 58 | (*) Explicit kernel barriers. |
| 59 | 59 | ||
| @@ -635,6 +635,11 @@ can be used to record rare error conditions and the like, and the CPUs' | |||
| 635 | naturally occurring ordering prevents such records from being lost. | 635 | naturally occurring ordering prevents such records from being lost. |
| 636 | 636 | ||
| 637 | 637 | ||
| 638 | Note well that the ordering provided by a data dependency is local to | ||
| 639 | the CPU containing it. See the section on "Multicopy atomicity" for | ||
| 640 | more information. | ||
| 641 | |||
| 642 | |||
| 638 | The data dependency barrier is very important to the RCU system, | 643 | The data dependency barrier is very important to the RCU system, |
| 639 | for example. See rcu_assign_pointer() and rcu_dereference() in | 644 | for example. See rcu_assign_pointer() and rcu_dereference() in |
| 640 | include/linux/rcupdate.h. This permits the current target of an RCU'd | 645 | include/linux/rcupdate.h. This permits the current target of an RCU'd |
| @@ -851,38 +856,11 @@ In short, control dependencies apply only to the stores in the then-clause | |||
| 851 | and else-clause of the if-statement in question (including functions | 856 | and else-clause of the if-statement in question (including functions |
| 852 | invoked by those two clauses), not to code following that if-statement. | 857 | invoked by those two clauses), not to code following that if-statement. |
| 853 | 858 | ||
| 854 | Finally, control dependencies do -not- provide transitivity. This is | ||
| 855 | demonstrated by two related examples, with the initial values of | ||
| 856 | 'x' and 'y' both being zero: | ||
| 857 | |||
| 858 | CPU 0 CPU 1 | ||
| 859 | ======================= ======================= | ||
| 860 | r1 = READ_ONCE(x); r2 = READ_ONCE(y); | ||
| 861 | if (r1 > 0) if (r2 > 0) | ||
| 862 | WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); | ||
| 863 | |||
| 864 | assert(!(r1 == 1 && r2 == 1)); | ||
| 865 | |||
| 866 | The above two-CPU example will never trigger the assert(). However, | ||
| 867 | if control dependencies guaranteed transitivity (which they do not), | ||
| 868 | then adding the following CPU would guarantee a related assertion: | ||
| 869 | |||
| 870 | CPU 2 | ||
| 871 | ===================== | ||
| 872 | WRITE_ONCE(x, 2); | ||
| 873 | |||
| 874 | assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */ | ||
| 875 | 859 | ||
| 876 | But because control dependencies do -not- provide transitivity, the above | 860 | Note well that the ordering provided by a control dependency is local |
| 877 | assertion can fail after the combined three-CPU example completes. If you | 861 | to the CPU containing it. See the section on "Multicopy atomicity" |
| 878 | need the three-CPU example to provide ordering, you will need smp_mb() | 862 | for more information. |
| 879 | between the loads and stores in the CPU 0 and CPU 1 code fragments, | ||
| 880 | that is, just before or just after the "if" statements. Furthermore, | ||
| 881 | the original two-CPU example is very fragile and should be avoided. | ||
| 882 | 863 | ||
| 883 | These two examples are the LB and WWC litmus tests from this paper: | ||
| 884 | http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this | ||
| 885 | site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html. | ||
| 886 | 864 | ||
| 887 | In summary: | 865 | In summary: |
| 888 | 866 | ||
| @@ -922,8 +900,8 @@ In summary: | |||
| 922 | 900 | ||
| 923 | (*) Control dependencies pair normally with other types of barriers. | 901 | (*) Control dependencies pair normally with other types of barriers. |
| 924 | 902 | ||
| 925 | (*) Control dependencies do -not- provide transitivity. If you | 903 | (*) Control dependencies do -not- provide multicopy atomicity. If you |
| 926 | need transitivity, use smp_mb(). | 904 | need all the CPUs to see a given store at the same time, use smp_mb(). |
| 927 | 905 | ||
| 928 | (*) Compilers do not understand control dependencies. It is therefore | 906 | (*) Compilers do not understand control dependencies. It is therefore |
| 929 | your job to ensure that they do not break your code. | 907 | your job to ensure that they do not break your code. |
| @@ -936,13 +914,14 @@ When dealing with CPU-CPU interactions, certain types of memory barrier should | |||
| 936 | always be paired. A lack of appropriate pairing is almost certainly an error. | 914 | always be paired. A lack of appropriate pairing is almost certainly an error. |
| 937 | 915 | ||
| 938 | General barriers pair with each other, though they also pair with most | 916 | General barriers pair with each other, though they also pair with most |
| 939 | other types of barriers, albeit without transitivity. An acquire barrier | 917 | other types of barriers, albeit without multicopy atomicity. An acquire |
| 940 | pairs with a release barrier, but both may also pair with other barriers, | 918 | barrier pairs with a release barrier, but both may also pair with other |
| 941 | including of course general barriers. A write barrier pairs with a data | 919 | barriers, including of course general barriers. A write barrier pairs |
| 942 | dependency barrier, a control dependency, an acquire barrier, a release | 920 | with a data dependency barrier, a control dependency, an acquire barrier, |
| 943 | barrier, a read barrier, or a general barrier. Similarly a read barrier, | 921 | a release barrier, a read barrier, or a general barrier. Similarly a |
| 944 | control dependency, or a data dependency barrier pairs with a write | 922 | read barrier, control dependency, or a data dependency barrier pairs |
| 945 | barrier, an acquire barrier, a release barrier, or a general barrier: | 923 | with a write barrier, an acquire barrier, a release barrier, or a |
| 924 | general barrier: | ||
| 946 | 925 | ||
| 947 | CPU 1 CPU 2 | 926 | CPU 1 CPU 2 |
| 948 | =============== =============== | 927 | =============== =============== |
| @@ -1359,64 +1338,77 @@ the speculation will be cancelled and the value reloaded: | |||
| 1359 | retrieved : : +-------+ | 1338 | retrieved : : +-------+ |
| 1360 | 1339 | ||
| 1361 | 1340 | ||
| 1362 | TRANSITIVITY | 1341 | MULTICOPY ATOMICITY |
| 1363 | ------------ | 1342 | -------------------- |
| 1343 | |||
| 1344 | Multicopy atomicity is a deeply intuitive notion about ordering that is | ||
| 1345 | not always provided by real computer systems, namely that a given store | ||
| 1346 | is visible at the same time to all CPUs, or, alternatively, that all | ||
| 1347 | CPUs agree on the order in which all stores took place. However, use of | ||
| 1348 | full multicopy atomicity would rule out valuable hardware optimizations, | ||
| 1349 | so a weaker form called ``other multicopy atomicity'' instead guarantees | ||
| 1350 | that a given store is observed at the same time by all -other- CPUs. The | ||
| 1351 | remainder of this document discusses this weaker form, but for brevity | ||
| 1352 | will call it simply ``multicopy atomicity''. | ||
| 1364 | 1353 | ||
| 1365 | Transitivity is a deeply intuitive notion about ordering that is not | 1354 | The following example demonstrates multicopy atomicity: |
| 1366 | always provided by real computer systems. The following example | ||
| 1367 | demonstrates transitivity: | ||
| 1368 | 1355 | ||
| 1369 | CPU 1 CPU 2 CPU 3 | 1356 | CPU 1 CPU 2 CPU 3 |
| 1370 | ======================= ======================= ======================= | 1357 | ======================= ======================= ======================= |
| 1371 | { X = 0, Y = 0 } | 1358 | { X = 0, Y = 0 } |
| 1372 | STORE X=1 LOAD X STORE Y=1 | 1359 | STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1) |
| 1373 | <general barrier> <general barrier> | 1360 | <general barrier> <read barrier> |
| 1374 | LOAD Y LOAD X | 1361 | STORE Y=r1 LOAD X |
| 1375 | 1362 | ||
| 1376 | Suppose that CPU 2's load from X returns 1 and its load from Y returns 0. | 1363 | Suppose that CPU 2's load from X returns 1 which it then stores to Y and |
| 1377 | This indicates that CPU 2's load from X in some sense follows CPU 1's | 1364 | that CPU 3's load from Y returns 1. This indicates that CPU 2's load |
| 1378 | store to X and that CPU 2's load from Y in some sense preceded CPU 3's | 1365 | from X in some sense follows CPU 1's store to X and that CPU 2's store |
| 1379 | store to Y. The question is then "Can CPU 3's load from X return 0?" | 1366 | to Y in some sense preceded CPU 3's load from Y. The question is then |
| 1367 | "Can CPU 3's load from X return 0?" | ||
| 1380 | 1368 | ||
| 1381 | Because CPU 2's load from X in some sense came after CPU 1's store, it | 1369 | Because CPU 3's load from X in some sense came after CPU 2's load, it |
| 1382 | is natural to expect that CPU 3's load from X must therefore return 1. | 1370 | is natural to expect that CPU 3's load from X must therefore return 1. |
| 1383 | This expectation is an example of transitivity: if a load executing on | 1371 | This expectation is an example of multicopy atomicity: if a load executing |
| 1384 | CPU A follows a load from the same variable executing on CPU B, then | 1372 | on CPU A follows a load from the same variable executing on CPU B, then |
| 1385 | CPU A's load must either return the same value that CPU B's load did, | 1373 | an understandable but incorrect expectation is that CPU A's load must |
| 1386 | or must return some later value. | 1374 | either return the same value that CPU B's load did, or must return some |
| 1387 | 1375 | later value. | |
| 1388 | In the Linux kernel, use of general memory barriers guarantees | 1376 | |
| 1389 | transitivity. Therefore, in the above example, if CPU 2's load from X | 1377 | In the Linux kernel, the above use of a general memory barrier compensates |
| 1390 | returns 1 and its load from Y returns 0, then CPU 3's load from X must | 1378 | for any lack of multicopy atomicity. Therefore, in the above example, |
| 1391 | also return 1. | 1379 | if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's |
| 1392 | 1380 | load from Y returns 1, then CPU 3's load from X must also return 1. | |
| 1393 | However, transitivity is -not- guaranteed for read or write barriers. | 1381 | |
| 1394 | For example, suppose that CPU 2's general barrier in the above example | 1382 | However, dependencies, read barriers, and write barriers are not always |
| 1395 | is changed to a read barrier as shown below: | 1383 | able to compensate for non-multicopy atomicity. For example, suppose |
| 1384 | that CPU 2's general barrier is removed from the above example, leaving | ||
| 1385 | only the data dependency shown below: | ||
| 1396 | 1386 | ||
| 1397 | CPU 1 CPU 2 CPU 3 | 1387 | CPU 1 CPU 2 CPU 3 |
| 1398 | ======================= ======================= ======================= | 1388 | ======================= ======================= ======================= |
| 1399 | { X = 0, Y = 0 } | 1389 | { X = 0, Y = 0 } |
| 1400 | STORE X=1 LOAD X STORE Y=1 | 1390 | STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1) |
| 1401 | <read barrier> <general barrier> | 1391 | <data dependency> <read barrier> |
| 1402 | LOAD Y LOAD X | 1392 | STORE Y=r1 LOAD X (reads 0) |
| 1403 | 1393 | ||
| 1404 | This substitution destroys transitivity: in this example, it is perfectly | 1394 | This substitution allows non-multicopy atomicity to run rampant: in |
| 1405 | legal for CPU 2's load from X to return 1, its load from Y to return 0, | 1395 | this example, it is perfectly legal for CPU 2's load from X to return 1, |
| 1406 | and CPU 3's load from X to return 0. | 1396 | CPU 3's load from Y to return 1, and its load from X to return 0. |
| 1407 | 1397 | ||
| 1408 | The key point is that although CPU 2's read barrier orders its pair | 1398 | The key point is that although CPU 2's data dependency orders its load |
| 1409 | of loads, it does not guarantee to order CPU 1's store. Therefore, if | 1399 | and store, it does not guarantee to order CPU 1's store. Therefore, |
| 1410 | this example runs on a system where CPUs 1 and 2 share a store buffer | 1400 | if this example runs on a non-multicopy-atomic system where CPUs 1 and 2 |
| 1411 | or a level of cache, CPU 2 might have early access to CPU 1's writes. | 1401 | share a store buffer or a level of cache, CPU 2 might have early access |
| 1412 | General barriers are therefore required to ensure that all CPUs agree | 1402 | to CPU 1's writes. A general barrier is therefore required to ensure |
| 1413 | on the combined order of CPU 1's and CPU 2's accesses. | 1403 | that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses. |
| 1414 | 1404 | ||
| 1415 | General barriers provide "global transitivity", so that all CPUs will | 1405 | General barriers can compensate not only for non-multicopy atomicity, |
| 1416 | agree on the order of operations. In contrast, a chain of release-acquire | 1406 | but can also generate additional ordering that can ensure that -all- |
| 1417 | pairs provides only "local transitivity", so that only those CPUs on | 1407 | CPUs will perceive the same order of -all- operations. In contrast, a |
| 1418 | the chain are guaranteed to agree on the combined order of the accesses. | 1408 | chain of release-acquire pairs do not provide this additional ordering, |
| 1419 | For example, switching to C code in deference to Herman Hollerith: | 1409 | which means that only those CPUs on the chain are guaranteed to agree |
| 1410 | on the combined order of the accesses. For example, switching to C code | ||
| 1411 | in deference to the ghost of Herman Hollerith: | ||
| 1420 | 1412 | ||
| 1421 | int u, v, x, y, z; | 1413 | int u, v, x, y, z; |
| 1422 | 1414 | ||
| @@ -1448,9 +1440,9 @@ For example, switching to C code in deference to Herman Hollerith: | |||
| 1448 | r3 = READ_ONCE(u); | 1440 | r3 = READ_ONCE(u); |
| 1449 | } | 1441 | } |
| 1450 | 1442 | ||
| 1451 | Because cpu0(), cpu1(), and cpu2() participate in a local transitive | 1443 | Because cpu0(), cpu1(), and cpu2() participate in a chain of |
| 1452 | chain of smp_store_release()/smp_load_acquire() pairs, the following | 1444 | smp_store_release()/smp_load_acquire() pairs, the following outcome |
| 1453 | outcome is prohibited: | 1445 | is prohibited: |
| 1454 | 1446 | ||
| 1455 | r0 == 1 && r1 == 1 && r2 == 1 | 1447 | r0 == 1 && r1 == 1 && r2 == 1 |
| 1456 | 1448 | ||
| @@ -1460,9 +1452,9 @@ outcome is prohibited: | |||
| 1460 | 1452 | ||
| 1461 | r1 == 1 && r5 == 0 | 1453 | r1 == 1 && r5 == 0 |
| 1462 | 1454 | ||
| 1463 | However, the transitivity of release-acquire is local to the participating | 1455 | However, the ordering provided by a release-acquire chain is local |
| 1464 | CPUs and does not apply to cpu3(). Therefore, the following outcome | 1456 | to the CPUs participating in that chain and does not apply to cpu3(), |
| 1465 | is possible: | 1457 | at least aside from stores. Therefore, the following outcome is possible: |
| 1466 | 1458 | ||
| 1467 | r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 | 1459 | r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 |
| 1468 | 1460 | ||
| @@ -1490,8 +1482,8 @@ following outcome is possible: | |||
| 1490 | Note that this outcome can happen even on a mythical sequentially | 1482 | Note that this outcome can happen even on a mythical sequentially |
| 1491 | consistent system where nothing is ever reordered. | 1483 | consistent system where nothing is ever reordered. |
| 1492 | 1484 | ||
| 1493 | To reiterate, if your code requires global transitivity, use general | 1485 | To reiterate, if your code requires full ordering of all operations, |
| 1494 | barriers throughout. | 1486 | use general barriers throughout. |
| 1495 | 1487 | ||
| 1496 | 1488 | ||
| 1497 | ======================== | 1489 | ======================== |
| @@ -3101,6 +3093,9 @@ AMD64 Architecture Programmer's Manual Volume 2: System Programming | |||
| 3101 | Chapter 7.1: Memory-Access Ordering | 3093 | Chapter 7.1: Memory-Access Ordering |
| 3102 | Chapter 7.4: Buffering and Combining Memory Writes | 3094 | Chapter 7.4: Buffering and Combining Memory Writes |
| 3103 | 3095 | ||
| 3096 | ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile) | ||
| 3097 | Chapter B2: The AArch64 Application Level Memory Model | ||
| 3098 | |||
| 3104 | IA-32 Intel Architecture Software Developer's Manual, Volume 3: | 3099 | IA-32 Intel Architecture Software Developer's Manual, Volume 3: |
| 3105 | System Programming Guide | 3100 | System Programming Guide |
| 3106 | Chapter 7.1: Locked Atomic Operations | 3101 | Chapter 7.1: Locked Atomic Operations |
| @@ -3112,6 +3107,8 @@ The SPARC Architecture Manual, Version 9 | |||
| 3112 | Appendix D: Formal Specification of the Memory Models | 3107 | Appendix D: Formal Specification of the Memory Models |
| 3113 | Appendix J: Programming with the Memory Models | 3108 | Appendix J: Programming with the Memory Models |
| 3114 | 3109 | ||
| 3110 | Storage in the PowerPC (Stone and Fitzgerald) | ||
| 3111 | |||
| 3115 | UltraSPARC Programmer Reference Manual | 3112 | UltraSPARC Programmer Reference Manual |
| 3116 | Chapter 5: Memory Accesses and Cacheability | 3113 | Chapter 5: Memory Accesses and Cacheability |
| 3117 | Chapter 15: Sparc-V9 Memory Models | 3114 | Chapter 15: Sparc-V9 Memory Models |
