aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/memory-barriers.txt98
1 files changed, 49 insertions, 49 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 58408dd023c7..650657c54733 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -24,7 +24,7 @@ Contents:
24 (*) Explicit kernel barriers. 24 (*) Explicit kernel barriers.
25 25
26 - Compiler barrier. 26 - Compiler barrier.
27 - The CPU memory barriers. 27 - CPU memory barriers.
28 - MMIO write barrier. 28 - MMIO write barrier.
29 29
30 (*) Implicit kernel memory barriers. 30 (*) Implicit kernel memory barriers.
@@ -265,7 +265,7 @@ Memory barriers are such interventions. They impose a perceived partial
265ordering over the memory operations on either side of the barrier. 265ordering over the memory operations on either side of the barrier.
266 266
267Such enforcement is important because the CPUs and other devices in a system 267Such enforcement is important because the CPUs and other devices in a system
268can use a variety of tricks to improve performance - including reordering, 268can use a variety of tricks to improve performance, including reordering,
269deferral and combination of memory operations; speculative loads; speculative 269deferral and combination of memory operations; speculative loads; speculative
270branch prediction and various types of caching. Memory barriers are used to 270branch prediction and various types of caching. Memory barriers are used to
271override or suppress these tricks, allowing the code to sanely control the 271override or suppress these tricks, allowing the code to sanely control the
@@ -457,7 +457,7 @@ sequence, Q must be either &A or &B, and that:
457 (Q == &A) implies (D == 1) 457 (Q == &A) implies (D == 1)
458 (Q == &B) implies (D == 4) 458 (Q == &B) implies (D == 4)
459 459
460But! CPU 2's perception of P may be updated _before_ its perception of B, thus 460But! CPU 2's perception of P may be updated _before_ its perception of B, thus
461leading to the following situation: 461leading to the following situation:
462 462
463 (Q == &B) and (D == 2) ???? 463 (Q == &B) and (D == 2) ????
@@ -573,7 +573,7 @@ Basically, the read barrier always has to be there, even though it can be of
573the "weaker" type. 573the "weaker" type.
574 574
575[!] Note that the stores before the write barrier would normally be expected to 575[!] Note that the stores before the write barrier would normally be expected to
576match the loads after the read barrier or data dependency barrier, and vice 576match the loads after the read barrier or the data dependency barrier, and vice
577versa: 577versa:
578 578
579 CPU 1 CPU 2 579 CPU 1 CPU 2
@@ -588,7 +588,7 @@ versa:
588EXAMPLES OF MEMORY BARRIER SEQUENCES 588EXAMPLES OF MEMORY BARRIER SEQUENCES
589------------------------------------ 589------------------------------------
590 590
591Firstly, write barriers act as a partial orderings on store operations. 591Firstly, write barriers act as partial orderings on store operations.
592Consider the following sequence of events: 592Consider the following sequence of events:
593 593
594 CPU 1 594 CPU 1
@@ -608,15 +608,15 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
608 +-------+ : : 608 +-------+ : :
609 | | +------+ 609 | | +------+
610 | |------>| C=3 | } /\ 610 | |------>| C=3 | } /\
611 | | : +------+ }----- \ -----> Events perceptible 611 | | : +------+ }----- \ -----> Events perceptible to
612 | | : | A=1 | } \/ to rest of system 612 | | : | A=1 | } \/ the rest of the system
613 | | : +------+ } 613 | | : +------+ }
614 | CPU 1 | : | B=2 | } 614 | CPU 1 | : | B=2 | }
615 | | +------+ } 615 | | +------+ }
616 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier 616 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
617 | | +------+ } requires all stores prior to the 617 | | +------+ } requires all stores prior to the
618 | | : | E=5 | } barrier to be committed before 618 | | : | E=5 | } barrier to be committed before
619 | | : +------+ } further stores may be take place. 619 | | : +------+ } further stores may take place
620 | |------>| D=4 | } 620 | |------>| D=4 | }
621 | | +------+ 621 | | +------+
622 +-------+ : : 622 +-------+ : :
@@ -626,7 +626,7 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
626 V 626 V
627 627
628 628
629Secondly, data dependency barriers act as a partial orderings on data-dependent 629Secondly, data dependency barriers act as partial orderings on data-dependent
630loads. Consider the following sequence of events: 630loads. Consider the following sequence of events:
631 631
632 CPU 1 CPU 2 632 CPU 1 CPU 2
@@ -975,7 +975,7 @@ compiler from moving the memory accesses either side of it to the other side:
975 975
976 barrier(); 976 barrier();
977 977
978This a general barrier - lesser varieties of compiler barrier do not exist. 978This is a general barrier - lesser varieties of compiler barrier do not exist.
979 979
980The compiler barrier has no direct effect on the CPU, which may then reorder 980The compiler barrier has no direct effect on the CPU, which may then reorder
981things however it wishes. 981things however it wishes.
@@ -997,7 +997,7 @@ The Linux kernel has eight basic CPU memory barriers:
997All CPU memory barriers unconditionally imply compiler barriers. 997All CPU memory barriers unconditionally imply compiler barriers.
998 998
999SMP memory barriers are reduced to compiler barriers on uniprocessor compiled 999SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
1000systems because it is assumed that a CPU will be appear to be self-consistent, 1000systems because it is assumed that a CPU will appear to be self-consistent,
1001and will order overlapping accesses correctly with respect to itself. 1001and will order overlapping accesses correctly with respect to itself.
1002 1002
1003[!] Note that SMP memory barriers _must_ be used to control the ordering of 1003[!] Note that SMP memory barriers _must_ be used to control the ordering of
@@ -1146,9 +1146,9 @@ for each construct. These operations all imply certain barriers:
1146Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is 1146Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is
1147equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. 1147equivalent to a full barrier, but a LOCK followed by an UNLOCK is not.
1148 1148
1149[!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way 1149[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way
1150 barriers is that the effects instructions outside of a critical section may 1150 barriers is that the effects of instructions outside of a critical section
1151 seep into the inside of the critical section. 1151 may seep into the inside of the critical section.
1152 1152
1153A LOCK followed by an UNLOCK may not be assumed to be full memory barrier 1153A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
1154because it is possible for an access preceding the LOCK to happen after the 1154because it is possible for an access preceding the LOCK to happen after the
@@ -1239,7 +1239,7 @@ three CPUs; then should the following sequence of events occur:
1239 UNLOCK M UNLOCK Q 1239 UNLOCK M UNLOCK Q
1240 *D = d; *H = h; 1240 *D = d; *H = h;
1241 1241
1242Then there is no guarantee as to what order CPU #3 will see the accesses to *A 1242Then there is no guarantee as to what order CPU 3 will see the accesses to *A
1243through *H occur in, other than the constraints imposed by the separate locks 1243through *H occur in, other than the constraints imposed by the separate locks
1244on the separate CPUs. It might, for example, see: 1244on the separate CPUs. It might, for example, see:
1245 1245
@@ -1269,12 +1269,12 @@ However, if the following occurs:
1269 UNLOCK M [2] 1269 UNLOCK M [2]
1270 *H = h; 1270 *H = h;
1271 1271
1272CPU #3 might see: 1272CPU 3 might see:
1273 1273
1274 *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], 1274 *E, LOCK M [1], *C, *B, *A, UNLOCK M [1],
1275 LOCK M [2], *H, *F, *G, UNLOCK M [2], *D 1275 LOCK M [2], *H, *F, *G, UNLOCK M [2], *D
1276 1276
1277But assuming CPU #1 gets the lock first, it won't see any of: 1277But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
1278 1278
1279 *B, *C, *D, *F, *G or *H preceding LOCK M [1] 1279 *B, *C, *D, *F, *G or *H preceding LOCK M [1]
1280 *A, *B or *C following UNLOCK M [1] 1280 *A, *B or *C following UNLOCK M [1]
@@ -1327,12 +1327,12 @@ spinlock, for example:
1327 mmiowb(); 1327 mmiowb();
1328 spin_unlock(Q); 1328 spin_unlock(Q);
1329 1329
1330this will ensure that the two stores issued on CPU #1 appear at the PCI bridge 1330this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
1331before either of the stores issued on CPU #2. 1331before either of the stores issued on CPU 2.
1332 1332
1333 1333
1334Furthermore, following a store by a load to the same device obviates the need 1334Furthermore, following a store by a load from the same device obviates the need
1335for an mmiowb(), because the load forces the store to complete before the load 1335for the mmiowb(), because the load forces the store to complete before the load
1336is performed: 1336is performed:
1337 1337
1338 CPU 1 CPU 2 1338 CPU 1 CPU 2
@@ -1363,7 +1363,7 @@ circumstances in which reordering definitely _could_ be a problem:
1363 1363
1364 (*) Atomic operations. 1364 (*) Atomic operations.
1365 1365
1366 (*) Accessing devices (I/O). 1366 (*) Accessing devices.
1367 1367
1368 (*) Interrupts. 1368 (*) Interrupts.
1369 1369
@@ -1399,7 +1399,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to:
1399 (1) read the next pointer from this waiter's record to know as to where the 1399 (1) read the next pointer from this waiter's record to know as to where the
1400 next waiter record is; 1400 next waiter record is;
1401 1401
1402 (4) read the pointer to the waiter's task structure; 1402 (2) read the pointer to the waiter's task structure;
1403 1403
1404 (3) clear the task pointer to tell the waiter it has been given the semaphore; 1404 (3) clear the task pointer to tell the waiter it has been given the semaphore;
1405 1405
@@ -1407,7 +1407,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to:
1407 1407
1408 (5) release the reference held on the waiter's task struct. 1408 (5) release the reference held on the waiter's task struct.
1409 1409
1410In otherwords, it has to perform this sequence of events: 1410In other words, it has to perform this sequence of events:
1411 1411
1412 LOAD waiter->list.next; 1412 LOAD waiter->list.next;
1413 LOAD waiter->task; 1413 LOAD waiter->task;
@@ -1502,7 +1502,7 @@ operations and adjusting reference counters towards object destruction, and as
1502such the implicit memory barrier effects are necessary. 1502such the implicit memory barrier effects are necessary.
1503 1503
1504 1504
1505The following operation are potential problems as they do _not_ imply memory 1505The following operations are potential problems as they do _not_ imply memory
1506barriers, but might be used for implementing such things as UNLOCK-class 1506barriers, but might be used for implementing such things as UNLOCK-class
1507operations: 1507operations:
1508 1508
@@ -1517,7 +1517,7 @@ With these the appropriate explicit memory barrier should be used if necessary
1517 1517
1518The following also do _not_ imply memory barriers, and so may require explicit 1518The following also do _not_ imply memory barriers, and so may require explicit
1519memory barriers under some circumstances (smp_mb__before_atomic_dec() for 1519memory barriers under some circumstances (smp_mb__before_atomic_dec() for
1520instance)): 1520instance):
1521 1521
1522 atomic_add(); 1522 atomic_add();
1523 atomic_sub(); 1523 atomic_sub();
@@ -1641,8 +1641,8 @@ functions:
1641 indeed have special I/O space access cycles and instructions, but many 1641 indeed have special I/O space access cycles and instructions, but many
1642 CPUs don't have such a concept. 1642 CPUs don't have such a concept.
1643 1643
1644 The PCI bus, amongst others, defines an I/O space concept - which on such 1644 The PCI bus, amongst others, defines an I/O space concept which - on such
1645 CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O 1645 CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
1646 space. However, it may also be mapped as a virtual I/O space in the CPU's 1646 space. However, it may also be mapped as a virtual I/O space in the CPU's
1647 memory map, particularly on those CPUs that don't support alternate I/O 1647 memory map, particularly on those CPUs that don't support alternate I/O
1648 spaces. 1648 spaces.
@@ -1664,7 +1664,7 @@ functions:
1664 i386 architecture machines, for example, this is controlled by way of the 1664 i386 architecture machines, for example, this is controlled by way of the
1665 MTRR registers. 1665 MTRR registers.
1666 1666
1667 Ordinarily, these will be guaranteed to be fully ordered and uncombined,, 1667 Ordinarily, these will be guaranteed to be fully ordered and uncombined,
1668 provided they're not accessing a prefetchable device. 1668 provided they're not accessing a prefetchable device.
1669 1669
1670 However, intermediary hardware (such as a PCI bridge) may indulge in 1670 However, intermediary hardware (such as a PCI bridge) may indulge in
@@ -1689,7 +1689,7 @@ functions:
1689 1689
1690 (*) ioreadX(), iowriteX() 1690 (*) ioreadX(), iowriteX()
1691 1691
1692 These will perform as appropriate for the type of access they're actually 1692 These will perform appropriately for the type of access they're actually
1693 doing, be it inX()/outX() or readX()/writeX(). 1693 doing, be it inX()/outX() or readX()/writeX().
1694 1694
1695 1695
@@ -1705,7 +1705,7 @@ of arch-specific code.
1705 1705
1706This means that it must be considered that the CPU will execute its instruction 1706This means that it must be considered that the CPU will execute its instruction
1707stream in any order it feels like - or even in parallel - provided that if an 1707stream in any order it feels like - or even in parallel - provided that if an
1708instruction in the stream depends on the an earlier instruction, then that 1708instruction in the stream depends on an earlier instruction, then that
1709earlier instruction must be sufficiently complete[*] before the later 1709earlier instruction must be sufficiently complete[*] before the later
1710instruction may proceed; in other words: provided that the appearance of 1710instruction may proceed; in other words: provided that the appearance of
1711causality is maintained. 1711causality is maintained.
@@ -1795,8 +1795,8 @@ eventually become visible on all CPUs, there's no guarantee that they will
1795become apparent in the same order on those other CPUs. 1795become apparent in the same order on those other CPUs.
1796 1796
1797 1797
1798Consider dealing with a system that has pair of CPUs (1 & 2), each of which has 1798Consider dealing with a system that has a pair of CPUs (1 & 2), each of which
1799a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): 1799has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D):
1800 1800
1801 : 1801 :
1802 : +--------+ 1802 : +--------+
@@ -1835,7 +1835,7 @@ Imagine the system has the following properties:
1835 1835
1836 (*) the coherency queue is not flushed by normal loads to lines already 1836 (*) the coherency queue is not flushed by normal loads to lines already
1837 present in the cache, even though the contents of the queue may 1837 present in the cache, even though the contents of the queue may
1838 potentially effect those loads. 1838 potentially affect those loads.
1839 1839
1840Imagine, then, that two writes are made on the first CPU, with a write barrier 1840Imagine, then, that two writes are made on the first CPU, with a write barrier
1841between them to guarantee that they will appear to reach that CPU's caches in 1841between them to guarantee that they will appear to reach that CPU's caches in
@@ -1845,7 +1845,7 @@ the requisite order:
1845 =============== =============== ======================================= 1845 =============== =============== =======================================
1846 u == 0, v == 1 and p == &u, q == &u 1846 u == 0, v == 1 and p == &u, q == &u
1847 v = 2; 1847 v = 2;
1848 smp_wmb(); Make sure change to v visible before 1848 smp_wmb(); Make sure change to v is visible before
1849 change to p 1849 change to p
1850 <A:modify v=2> v is now in cache A exclusively 1850 <A:modify v=2> v is now in cache A exclusively
1851 p = &v; 1851 p = &v;
@@ -1853,7 +1853,7 @@ the requisite order:
1853 1853
1854The write memory barrier forces the other CPUs in the system to perceive that 1854The write memory barrier forces the other CPUs in the system to perceive that
1855the local CPU's caches have apparently been updated in the correct order. But 1855the local CPU's caches have apparently been updated in the correct order. But
1856now imagine that the second CPU that wants to read those values: 1856now imagine that the second CPU wants to read those values:
1857 1857
1858 CPU 1 CPU 2 COMMENT 1858 CPU 1 CPU 2 COMMENT
1859 =============== =============== ======================================= 1859 =============== =============== =======================================
@@ -1861,7 +1861,7 @@ now imagine that the second CPU that wants to read those values:
1861 q = p; 1861 q = p;
1862 x = *q; 1862 x = *q;
1863 1863
1864The above pair of reads may then fail to happen in expected order, as the 1864The above pair of reads may then fail to happen in the expected order, as the
1865cacheline holding p may get updated in one of the second CPU's caches whilst 1865cacheline holding p may get updated in one of the second CPU's caches whilst
1866the update to the cacheline holding v is delayed in the other of the second 1866the update to the cacheline holding v is delayed in the other of the second
1867CPU's caches by some other cache event: 1867CPU's caches by some other cache event:
@@ -1916,7 +1916,7 @@ access depends on a read, not all do, so it may not be relied on.
1916 1916
1917Other CPUs may also have split caches, but must coordinate between the various 1917Other CPUs may also have split caches, but must coordinate between the various
1918cachelets for normal memory accesses. The semantics of the Alpha removes the 1918cachelets for normal memory accesses. The semantics of the Alpha removes the
1919need for coordination in absence of memory barriers. 1919need for coordination in the absence of memory barriers.
1920 1920
1921 1921
1922CACHE COHERENCY VS DMA 1922CACHE COHERENCY VS DMA
@@ -1931,10 +1931,10 @@ invalidate them as well).
1931 1931
1932In addition, the data DMA'd to RAM by a device may be overwritten by dirty 1932In addition, the data DMA'd to RAM by a device may be overwritten by dirty
1933cache lines being written back to RAM from a CPU's cache after the device has 1933cache lines being written back to RAM from a CPU's cache after the device has
1934installed its own data, or cache lines simply present in a CPUs cache may 1934installed its own data, or cache lines present in the CPU's cache may simply
1935simply obscure the fact that RAM has been updated, until at such time as the 1935obscure the fact that RAM has been updated, until at such time as the cacheline
1936cacheline is discarded from the CPU's cache and reloaded. To deal with this, 1936is discarded from the CPU's cache and reloaded. To deal with this, the
1937the appropriate part of the kernel must invalidate the overlapping bits of the 1937appropriate part of the kernel must invalidate the overlapping bits of the
1938cache on each CPU. 1938cache on each CPU.
1939 1939
1940See Documentation/cachetlb.txt for more information on cache management. 1940See Documentation/cachetlb.txt for more information on cache management.
@@ -1944,7 +1944,7 @@ CACHE COHERENCY VS MMIO
1944----------------------- 1944-----------------------
1945 1945
1946Memory mapped I/O usually takes place through memory locations that are part of 1946Memory mapped I/O usually takes place through memory locations that are part of
1947a window in the CPU's memory space that have different properties assigned than 1947a window in the CPU's memory space that has different properties assigned than
1948the usual RAM directed window. 1948the usual RAM directed window.
1949 1949
1950Amongst these properties is usually the fact that such accesses bypass the 1950Amongst these properties is usually the fact that such accesses bypass the
@@ -1960,7 +1960,7 @@ THE THINGS CPUS GET UP TO
1960========================= 1960=========================
1961 1961
1962A programmer might take it for granted that the CPU will perform memory 1962A programmer might take it for granted that the CPU will perform memory
1963operations in exactly the order specified, so that if a CPU is, for example, 1963operations in exactly the order specified, so that if the CPU is, for example,
1964given the following piece of code to execute: 1964given the following piece of code to execute:
1965 1965
1966 a = *A; 1966 a = *A;
@@ -1969,7 +1969,7 @@ given the following piece of code to execute:
1969 d = *D; 1969 d = *D;
1970 *E = e; 1970 *E = e;
1971 1971
1972They would then expect that the CPU will complete the memory operation for each 1972they would then expect that the CPU will complete the memory operation for each
1973instruction before moving on to the next one, leading to a definite sequence of 1973instruction before moving on to the next one, leading to a definite sequence of
1974operations as seen by external observers in the system: 1974operations as seen by external observers in the system:
1975 1975
@@ -1986,8 +1986,8 @@ assumption doesn't hold because:
1986 (*) loads may be done speculatively, and the result discarded should it prove 1986 (*) loads may be done speculatively, and the result discarded should it prove
1987 to have been unnecessary; 1987 to have been unnecessary;
1988 1988
1989 (*) loads may be done speculatively, leading to the result having being 1989 (*) loads may be done speculatively, leading to the result having been fetched
1990 fetched at the wrong time in the expected sequence of events; 1990 at the wrong time in the expected sequence of events;
1991 1991
1992 (*) the order of the memory accesses may be rearranged to promote better use 1992 (*) the order of the memory accesses may be rearranged to promote better use
1993 of the CPU buses and caches; 1993 of the CPU buses and caches;
@@ -2069,12 +2069,12 @@ AND THEN THERE'S THE ALPHA
2069 2069
2070The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, 2070The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
2071some versions of the Alpha CPU have a split data cache, permitting them to have 2071some versions of the Alpha CPU have a split data cache, permitting them to have
2072two semantically related cache lines updating at separate times. This is where 2072two semantically-related cache lines updated at separate times. This is where
2073the data dependency barrier really becomes necessary as this synchronises both 2073the data dependency barrier really becomes necessary as this synchronises both
2074caches with the memory coherence system, thus making it seem like pointer 2074caches with the memory coherence system, thus making it seem like pointer
2075changes vs new data occur in the right order. 2075changes vs new data occur in the right order.
2076 2076
2077The Alpha defines the Linux's kernel's memory barrier model. 2077The Alpha defines the Linux kernel's memory barrier model.
2078 2078
2079See the subsection on "Cache Coherency" above. 2079See the subsection on "Cache Coherency" above.
2080 2080