diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2011-02-10 19:54:50 -0500 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2011-03-04 11:05:49 -0500 |
commit | 241e6663b5151733294d1a230a3fd8a4d32e187f (patch) | |
tree | 088c44c088d13df65851a33f02f533ee3207335a | |
parent | e611eecd6f9f91d74beda8cfbb3b5e02abdeb5a1 (diff) |
smp: Document transitivity for memory barriers.
Transitivity is guaranteed only for full memory barriers (smp_mb()).
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-rw-r--r-- | Documentation/memory-barriers.txt | 58 |
1 files changed, 58 insertions, 0 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 631ad2f1b229..f0d3a8026a56 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -21,6 +21,7 @@ Contents: | |||
21 | - SMP barrier pairing. | 21 | - SMP barrier pairing. |
22 | - Examples of memory barrier sequences. | 22 | - Examples of memory barrier sequences. |
23 | - Read memory barriers vs load speculation. | 23 | - Read memory barriers vs load speculation. |
24 | - Transitivity | ||
24 | 25 | ||
25 | (*) Explicit kernel barriers. | 26 | (*) Explicit kernel barriers. |
26 | 27 | ||
@@ -959,6 +960,63 @@ the speculation will be cancelled and the value reloaded: | |||
959 | retrieved : : +-------+ | 960 | retrieved : : +-------+ |
960 | 961 | ||
961 | 962 | ||
963 | TRANSITIVITY | ||
964 | ------------ | ||
965 | |||
966 | Transitivity is a deeply intuitive notion about ordering that is not | ||
967 | always provided by real computer systems. The following example | ||
968 | demonstrates transitivity (also called "cumulativity"): | ||
969 | |||
970 | CPU 1 CPU 2 CPU 3 | ||
971 | ======================= ======================= ======================= | ||
972 | { X = 0, Y = 0 } | ||
973 | STORE X=1 LOAD X STORE Y=1 | ||
974 | <general barrier> <general barrier> | ||
975 | LOAD Y LOAD X | ||
976 | |||
977 | Suppose that CPU 2's load from X returns 1 and its load from Y returns 0. | ||
978 | This indicates that CPU 2's load from X in some sense follows CPU 1's | ||
979 | store to X and that CPU 2's load from Y in some sense preceded CPU 3's | ||
980 | store to Y. The question is then "Can CPU 3's load from X return 0?" | ||
981 | |||
982 | Because CPU 2's load from X in some sense came after CPU 1's store, it | ||
983 | is natural to expect that CPU 3's load from X must therefore return 1. | ||
984 | This expectation is an example of transitivity: if a load executing on | ||
985 | CPU A follows a load from the same variable executing on CPU B, then | ||
986 | CPU A's load must either return the same value that CPU B's load did, | ||
987 | or must return some later value. | ||
988 | |||
989 | In the Linux kernel, use of general memory barriers guarantees | ||
990 | transitivity. Therefore, in the above example, if CPU 2's load from X | ||
991 | returns 1 and its load from Y returns 0, then CPU 3's load from X must | ||
992 | also return 1. | ||
993 | |||
994 | However, transitivity is -not- guaranteed for read or write barriers. | ||
995 | For example, suppose that CPU 2's general barrier in the above example | ||
996 | is changed to a read barrier as shown below: | ||
997 | |||
998 | CPU 1 CPU 2 CPU 3 | ||
999 | ======================= ======================= ======================= | ||
1000 | { X = 0, Y = 0 } | ||
1001 | STORE X=1 LOAD X STORE Y=1 | ||
1002 | <read barrier> <general barrier> | ||
1003 | LOAD Y LOAD X | ||
1004 | |||
1005 | This substitution destroys transitivity: in this example, it is perfectly | ||
1006 | legal for CPU 2's load from X to return 1, its load from Y to return 0, | ||
1007 | and CPU 3's load from X to return 0. | ||
1008 | |||
1009 | The key point is that although CPU 2's read barrier orders its pair | ||
1010 | of loads, it does not guarantee to order CPU 1's store. Therefore, if | ||
1011 | this example runs on a system where CPUs 1 and 2 share a store buffer | ||
1012 | or a level of cache, CPU 2 might have early access to CPU 1's writes. | ||
1013 | General barriers are therefore required to ensure that all CPUs agree | ||
1014 | on the combined order of CPU 1's and CPU 2's accesses. | ||
1015 | |||
1016 | To reiterate, if your code requires transitivity, use general barriers | ||
1017 | throughout. | ||
1018 | |||
1019 | |||
962 | ======================== | 1020 | ======================== |
963 | EXPLICIT KERNEL BARRIERS | 1021 | EXPLICIT KERNEL BARRIERS |
964 | ======================== | 1022 | ======================== |