diff options
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r-- | Documentation/memory-barriers.txt | 98 |
1 files changed, 49 insertions, 49 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 58408dd023c7..650657c54733 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -24,7 +24,7 @@ Contents: | |||
24 | (*) Explicit kernel barriers. | 24 | (*) Explicit kernel barriers. |
25 | 25 | ||
26 | - Compiler barrier. | 26 | - Compiler barrier. |
27 | - The CPU memory barriers. | 27 | - CPU memory barriers. |
28 | - MMIO write barrier. | 28 | - MMIO write barrier. |
29 | 29 | ||
30 | (*) Implicit kernel memory barriers. | 30 | (*) Implicit kernel memory barriers. |
@@ -265,7 +265,7 @@ Memory barriers are such interventions. They impose a perceived partial | |||
265 | ordering over the memory operations on either side of the barrier. | 265 | ordering over the memory operations on either side of the barrier. |
266 | 266 | ||
267 | Such enforcement is important because the CPUs and other devices in a system | 267 | Such enforcement is important because the CPUs and other devices in a system |
268 | can use a variety of tricks to improve performance - including reordering, | 268 | can use a variety of tricks to improve performance, including reordering, |
269 | deferral and combination of memory operations; speculative loads; speculative | 269 | deferral and combination of memory operations; speculative loads; speculative |
270 | branch prediction and various types of caching. Memory barriers are used to | 270 | branch prediction and various types of caching. Memory barriers are used to |
271 | override or suppress these tricks, allowing the code to sanely control the | 271 | override or suppress these tricks, allowing the code to sanely control the |
@@ -457,7 +457,7 @@ sequence, Q must be either &A or &B, and that: | |||
457 | (Q == &A) implies (D == 1) | 457 | (Q == &A) implies (D == 1) |
458 | (Q == &B) implies (D == 4) | 458 | (Q == &B) implies (D == 4) |
459 | 459 | ||
460 | But! CPU 2's perception of P may be updated _before_ its perception of B, thus | 460 | But! CPU 2's perception of P may be updated _before_ its perception of B, thus |
461 | leading to the following situation: | 461 | leading to the following situation: |
462 | 462 | ||
463 | (Q == &B) and (D == 2) ???? | 463 | (Q == &B) and (D == 2) ???? |
@@ -573,7 +573,7 @@ Basically, the read barrier always has to be there, even though it can be of | |||
573 | the "weaker" type. | 573 | the "weaker" type. |
574 | 574 | ||
575 | [!] Note that the stores before the write barrier would normally be expected to | 575 | [!] Note that the stores before the write barrier would normally be expected to |
576 | match the loads after the read barrier or data dependency barrier, and vice | 576 | match the loads after the read barrier or the data dependency barrier, and vice |
577 | versa: | 577 | versa: |
578 | 578 | ||
579 | CPU 1 CPU 2 | 579 | CPU 1 CPU 2 |
@@ -588,7 +588,7 @@ versa: | |||
588 | EXAMPLES OF MEMORY BARRIER SEQUENCES | 588 | EXAMPLES OF MEMORY BARRIER SEQUENCES |
589 | ------------------------------------ | 589 | ------------------------------------ |
590 | 590 | ||
591 | Firstly, write barriers act as a partial orderings on store operations. | 591 | Firstly, write barriers act as partial orderings on store operations. |
592 | Consider the following sequence of events: | 592 | Consider the following sequence of events: |
593 | 593 | ||
594 | CPU 1 | 594 | CPU 1 |
@@ -608,15 +608,15 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E | |||
608 | +-------+ : : | 608 | +-------+ : : |
609 | | | +------+ | 609 | | | +------+ |
610 | | |------>| C=3 | } /\ | 610 | | |------>| C=3 | } /\ |
611 | | | : +------+ }----- \ -----> Events perceptible | 611 | | | : +------+ }----- \ -----> Events perceptible to |
612 | | | : | A=1 | } \/ to rest of system | 612 | | | : | A=1 | } \/ the rest of the system |
613 | | | : +------+ } | 613 | | | : +------+ } |
614 | | CPU 1 | : | B=2 | } | 614 | | CPU 1 | : | B=2 | } |
615 | | | +------+ } | 615 | | | +------+ } |
616 | | | wwwwwwwwwwwwwwww } <--- At this point the write barrier | 616 | | | wwwwwwwwwwwwwwww } <--- At this point the write barrier |
617 | | | +------+ } requires all stores prior to the | 617 | | | +------+ } requires all stores prior to the |
618 | | | : | E=5 | } barrier to be committed before | 618 | | | : | E=5 | } barrier to be committed before |
619 | | | : +------+ } further stores may be take place. | 619 | | | : +------+ } further stores may take place |
620 | | |------>| D=4 | } | 620 | | |------>| D=4 | } |
621 | | | +------+ | 621 | | | +------+ |
622 | +-------+ : : | 622 | +-------+ : : |
@@ -626,7 +626,7 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E | |||
626 | V | 626 | V |
627 | 627 | ||
628 | 628 | ||
629 | Secondly, data dependency barriers act as a partial orderings on data-dependent | 629 | Secondly, data dependency barriers act as partial orderings on data-dependent |
630 | loads. Consider the following sequence of events: | 630 | loads. Consider the following sequence of events: |
631 | 631 | ||
632 | CPU 1 CPU 2 | 632 | CPU 1 CPU 2 |
@@ -975,7 +975,7 @@ compiler from moving the memory accesses either side of it to the other side: | |||
975 | 975 | ||
976 | barrier(); | 976 | barrier(); |
977 | 977 | ||
978 | This a general barrier - lesser varieties of compiler barrier do not exist. | 978 | This is a general barrier - lesser varieties of compiler barrier do not exist. |
979 | 979 | ||
980 | The compiler barrier has no direct effect on the CPU, which may then reorder | 980 | The compiler barrier has no direct effect on the CPU, which may then reorder |
981 | things however it wishes. | 981 | things however it wishes. |
@@ -997,7 +997,7 @@ The Linux kernel has eight basic CPU memory barriers: | |||
997 | All CPU memory barriers unconditionally imply compiler barriers. | 997 | All CPU memory barriers unconditionally imply compiler barriers. |
998 | 998 | ||
999 | SMP memory barriers are reduced to compiler barriers on uniprocessor compiled | 999 | SMP memory barriers are reduced to compiler barriers on uniprocessor compiled |
1000 | systems because it is assumed that a CPU will be appear to be self-consistent, | 1000 | systems because it is assumed that a CPU will appear to be self-consistent, |
1001 | and will order overlapping accesses correctly with respect to itself. | 1001 | and will order overlapping accesses correctly with respect to itself. |
1002 | 1002 | ||
1003 | [!] Note that SMP memory barriers _must_ be used to control the ordering of | 1003 | [!] Note that SMP memory barriers _must_ be used to control the ordering of |
@@ -1146,9 +1146,9 @@ for each construct. These operations all imply certain barriers: | |||
1146 | Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is | 1146 | Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is |
1147 | equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. | 1147 | equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. |
1148 | 1148 | ||
1149 | [!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way | 1149 | [!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way |
1150 | barriers is that the effects instructions outside of a critical section may | 1150 | barriers is that the effects of instructions outside of a critical section |
1151 | seep into the inside of the critical section. | 1151 | may seep into the inside of the critical section. |
1152 | 1152 | ||
1153 | A LOCK followed by an UNLOCK may not be assumed to be full memory barrier | 1153 | A LOCK followed by an UNLOCK may not be assumed to be full memory barrier |
1154 | because it is possible for an access preceding the LOCK to happen after the | 1154 | because it is possible for an access preceding the LOCK to happen after the |
@@ -1239,7 +1239,7 @@ three CPUs; then should the following sequence of events occur: | |||
1239 | UNLOCK M UNLOCK Q | 1239 | UNLOCK M UNLOCK Q |
1240 | *D = d; *H = h; | 1240 | *D = d; *H = h; |
1241 | 1241 | ||
1242 | Then there is no guarantee as to what order CPU #3 will see the accesses to *A | 1242 | Then there is no guarantee as to what order CPU 3 will see the accesses to *A |
1243 | through *H occur in, other than the constraints imposed by the separate locks | 1243 | through *H occur in, other than the constraints imposed by the separate locks |
1244 | on the separate CPUs. It might, for example, see: | 1244 | on the separate CPUs. It might, for example, see: |
1245 | 1245 | ||
@@ -1269,12 +1269,12 @@ However, if the following occurs: | |||
1269 | UNLOCK M [2] | 1269 | UNLOCK M [2] |
1270 | *H = h; | 1270 | *H = h; |
1271 | 1271 | ||
1272 | CPU #3 might see: | 1272 | CPU 3 might see: |
1273 | 1273 | ||
1274 | *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], | 1274 | *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], |
1275 | LOCK M [2], *H, *F, *G, UNLOCK M [2], *D | 1275 | LOCK M [2], *H, *F, *G, UNLOCK M [2], *D |
1276 | 1276 | ||
1277 | But assuming CPU #1 gets the lock first, it won't see any of: | 1277 | But assuming CPU 1 gets the lock first, CPU 3 won't see any of: |
1278 | 1278 | ||
1279 | *B, *C, *D, *F, *G or *H preceding LOCK M [1] | 1279 | *B, *C, *D, *F, *G or *H preceding LOCK M [1] |
1280 | *A, *B or *C following UNLOCK M [1] | 1280 | *A, *B or *C following UNLOCK M [1] |
@@ -1327,12 +1327,12 @@ spinlock, for example: | |||
1327 | mmiowb(); | 1327 | mmiowb(); |
1328 | spin_unlock(Q); | 1328 | spin_unlock(Q); |
1329 | 1329 | ||
1330 | this will ensure that the two stores issued on CPU #1 appear at the PCI bridge | 1330 | this will ensure that the two stores issued on CPU 1 appear at the PCI bridge |
1331 | before either of the stores issued on CPU #2. | 1331 | before either of the stores issued on CPU 2. |
1332 | 1332 | ||
1333 | 1333 | ||
1334 | Furthermore, following a store by a load to the same device obviates the need | 1334 | Furthermore, following a store by a load from the same device obviates the need |
1335 | for an mmiowb(), because the load forces the store to complete before the load | 1335 | for the mmiowb(), because the load forces the store to complete before the load |
1336 | is performed: | 1336 | is performed: |
1337 | 1337 | ||
1338 | CPU 1 CPU 2 | 1338 | CPU 1 CPU 2 |
@@ -1363,7 +1363,7 @@ circumstances in which reordering definitely _could_ be a problem: | |||
1363 | 1363 | ||
1364 | (*) Atomic operations. | 1364 | (*) Atomic operations. |
1365 | 1365 | ||
1366 | (*) Accessing devices (I/O). | 1366 | (*) Accessing devices. |
1367 | 1367 | ||
1368 | (*) Interrupts. | 1368 | (*) Interrupts. |
1369 | 1369 | ||
@@ -1399,7 +1399,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to: | |||
1399 | (1) read the next pointer from this waiter's record to know as to where the | 1399 | (1) read the next pointer from this waiter's record to know as to where the |
1400 | next waiter record is; | 1400 | next waiter record is; |
1401 | 1401 | ||
1402 | (4) read the pointer to the waiter's task structure; | 1402 | (2) read the pointer to the waiter's task structure; |
1403 | 1403 | ||
1404 | (3) clear the task pointer to tell the waiter it has been given the semaphore; | 1404 | (3) clear the task pointer to tell the waiter it has been given the semaphore; |
1405 | 1405 | ||
@@ -1407,7 +1407,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to: | |||
1407 | 1407 | ||
1408 | (5) release the reference held on the waiter's task struct. | 1408 | (5) release the reference held on the waiter's task struct. |
1409 | 1409 | ||
1410 | In otherwords, it has to perform this sequence of events: | 1410 | In other words, it has to perform this sequence of events: |
1411 | 1411 | ||
1412 | LOAD waiter->list.next; | 1412 | LOAD waiter->list.next; |
1413 | LOAD waiter->task; | 1413 | LOAD waiter->task; |
@@ -1502,7 +1502,7 @@ operations and adjusting reference counters towards object destruction, and as | |||
1502 | such the implicit memory barrier effects are necessary. | 1502 | such the implicit memory barrier effects are necessary. |
1503 | 1503 | ||
1504 | 1504 | ||
1505 | The following operation are potential problems as they do _not_ imply memory | 1505 | The following operations are potential problems as they do _not_ imply memory |
1506 | barriers, but might be used for implementing such things as UNLOCK-class | 1506 | barriers, but might be used for implementing such things as UNLOCK-class |
1507 | operations: | 1507 | operations: |
1508 | 1508 | ||
@@ -1517,7 +1517,7 @@ With these the appropriate explicit memory barrier should be used if necessary | |||
1517 | 1517 | ||
1518 | The following also do _not_ imply memory barriers, and so may require explicit | 1518 | The following also do _not_ imply memory barriers, and so may require explicit |
1519 | memory barriers under some circumstances (smp_mb__before_atomic_dec() for | 1519 | memory barriers under some circumstances (smp_mb__before_atomic_dec() for |
1520 | instance)): | 1520 | instance): |
1521 | 1521 | ||
1522 | atomic_add(); | 1522 | atomic_add(); |
1523 | atomic_sub(); | 1523 | atomic_sub(); |
@@ -1641,8 +1641,8 @@ functions: | |||
1641 | indeed have special I/O space access cycles and instructions, but many | 1641 | indeed have special I/O space access cycles and instructions, but many |
1642 | CPUs don't have such a concept. | 1642 | CPUs don't have such a concept. |
1643 | 1643 | ||
1644 | The PCI bus, amongst others, defines an I/O space concept - which on such | 1644 | The PCI bus, amongst others, defines an I/O space concept which - on such |
1645 | CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O | 1645 | CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O |
1646 | space. However, it may also be mapped as a virtual I/O space in the CPU's | 1646 | space. However, it may also be mapped as a virtual I/O space in the CPU's |
1647 | memory map, particularly on those CPUs that don't support alternate I/O | 1647 | memory map, particularly on those CPUs that don't support alternate I/O |
1648 | spaces. | 1648 | spaces. |
@@ -1664,7 +1664,7 @@ functions: | |||
1664 | i386 architecture machines, for example, this is controlled by way of the | 1664 | i386 architecture machines, for example, this is controlled by way of the |
1665 | MTRR registers. | 1665 | MTRR registers. |
1666 | 1666 | ||
1667 | Ordinarily, these will be guaranteed to be fully ordered and uncombined,, | 1667 | Ordinarily, these will be guaranteed to be fully ordered and uncombined, |
1668 | provided they're not accessing a prefetchable device. | 1668 | provided they're not accessing a prefetchable device. |
1669 | 1669 | ||
1670 | However, intermediary hardware (such as a PCI bridge) may indulge in | 1670 | However, intermediary hardware (such as a PCI bridge) may indulge in |
@@ -1689,7 +1689,7 @@ functions: | |||
1689 | 1689 | ||
1690 | (*) ioreadX(), iowriteX() | 1690 | (*) ioreadX(), iowriteX() |
1691 | 1691 | ||
1692 | These will perform as appropriate for the type of access they're actually | 1692 | These will perform appropriately for the type of access they're actually |
1693 | doing, be it inX()/outX() or readX()/writeX(). | 1693 | doing, be it inX()/outX() or readX()/writeX(). |
1694 | 1694 | ||
1695 | 1695 | ||
@@ -1705,7 +1705,7 @@ of arch-specific code. | |||
1705 | 1705 | ||
1706 | This means that it must be considered that the CPU will execute its instruction | 1706 | This means that it must be considered that the CPU will execute its instruction |
1707 | stream in any order it feels like - or even in parallel - provided that if an | 1707 | stream in any order it feels like - or even in parallel - provided that if an |
1708 | instruction in the stream depends on the an earlier instruction, then that | 1708 | instruction in the stream depends on an earlier instruction, then that |
1709 | earlier instruction must be sufficiently complete[*] before the later | 1709 | earlier instruction must be sufficiently complete[*] before the later |
1710 | instruction may proceed; in other words: provided that the appearance of | 1710 | instruction may proceed; in other words: provided that the appearance of |
1711 | causality is maintained. | 1711 | causality is maintained. |
@@ -1795,8 +1795,8 @@ eventually become visible on all CPUs, there's no guarantee that they will | |||
1795 | become apparent in the same order on those other CPUs. | 1795 | become apparent in the same order on those other CPUs. |
1796 | 1796 | ||
1797 | 1797 | ||
1798 | Consider dealing with a system that has pair of CPUs (1 & 2), each of which has | 1798 | Consider dealing with a system that has a pair of CPUs (1 & 2), each of which |
1799 | a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): | 1799 | has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): |
1800 | 1800 | ||
1801 | : | 1801 | : |
1802 | : +--------+ | 1802 | : +--------+ |
@@ -1835,7 +1835,7 @@ Imagine the system has the following properties: | |||
1835 | 1835 | ||
1836 | (*) the coherency queue is not flushed by normal loads to lines already | 1836 | (*) the coherency queue is not flushed by normal loads to lines already |
1837 | present in the cache, even though the contents of the queue may | 1837 | present in the cache, even though the contents of the queue may |
1838 | potentially effect those loads. | 1838 | potentially affect those loads. |
1839 | 1839 | ||
1840 | Imagine, then, that two writes are made on the first CPU, with a write barrier | 1840 | Imagine, then, that two writes are made on the first CPU, with a write barrier |
1841 | between them to guarantee that they will appear to reach that CPU's caches in | 1841 | between them to guarantee that they will appear to reach that CPU's caches in |
@@ -1845,7 +1845,7 @@ the requisite order: | |||
1845 | =============== =============== ======================================= | 1845 | =============== =============== ======================================= |
1846 | u == 0, v == 1 and p == &u, q == &u | 1846 | u == 0, v == 1 and p == &u, q == &u |
1847 | v = 2; | 1847 | v = 2; |
1848 | smp_wmb(); Make sure change to v visible before | 1848 | smp_wmb(); Make sure change to v is visible before |
1849 | change to p | 1849 | change to p |
1850 | <A:modify v=2> v is now in cache A exclusively | 1850 | <A:modify v=2> v is now in cache A exclusively |
1851 | p = &v; | 1851 | p = &v; |
@@ -1853,7 +1853,7 @@ the requisite order: | |||
1853 | 1853 | ||
1854 | The write memory barrier forces the other CPUs in the system to perceive that | 1854 | The write memory barrier forces the other CPUs in the system to perceive that |
1855 | the local CPU's caches have apparently been updated in the correct order. But | 1855 | the local CPU's caches have apparently been updated in the correct order. But |
1856 | now imagine that the second CPU that wants to read those values: | 1856 | now imagine that the second CPU wants to read those values: |
1857 | 1857 | ||
1858 | CPU 1 CPU 2 COMMENT | 1858 | CPU 1 CPU 2 COMMENT |
1859 | =============== =============== ======================================= | 1859 | =============== =============== ======================================= |
@@ -1861,7 +1861,7 @@ now imagine that the second CPU that wants to read those values: | |||
1861 | q = p; | 1861 | q = p; |
1862 | x = *q; | 1862 | x = *q; |
1863 | 1863 | ||
1864 | The above pair of reads may then fail to happen in expected order, as the | 1864 | The above pair of reads may then fail to happen in the expected order, as the |
1865 | cacheline holding p may get updated in one of the second CPU's caches whilst | 1865 | cacheline holding p may get updated in one of the second CPU's caches whilst |
1866 | the update to the cacheline holding v is delayed in the other of the second | 1866 | the update to the cacheline holding v is delayed in the other of the second |
1867 | CPU's caches by some other cache event: | 1867 | CPU's caches by some other cache event: |
@@ -1916,7 +1916,7 @@ access depends on a read, not all do, so it may not be relied on. | |||
1916 | 1916 | ||
1917 | Other CPUs may also have split caches, but must coordinate between the various | 1917 | Other CPUs may also have split caches, but must coordinate between the various |
1918 | cachelets for normal memory accesses. The semantics of the Alpha removes the | 1918 | cachelets for normal memory accesses. The semantics of the Alpha removes the |
1919 | need for coordination in absence of memory barriers. | 1919 | need for coordination in the absence of memory barriers. |
1920 | 1920 | ||
1921 | 1921 | ||
1922 | CACHE COHERENCY VS DMA | 1922 | CACHE COHERENCY VS DMA |
@@ -1931,10 +1931,10 @@ invalidate them as well). | |||
1931 | 1931 | ||
1932 | In addition, the data DMA'd to RAM by a device may be overwritten by dirty | 1932 | In addition, the data DMA'd to RAM by a device may be overwritten by dirty |
1933 | cache lines being written back to RAM from a CPU's cache after the device has | 1933 | cache lines being written back to RAM from a CPU's cache after the device has |
1934 | installed its own data, or cache lines simply present in a CPUs cache may | 1934 | installed its own data, or cache lines present in the CPU's cache may simply |
1935 | simply obscure the fact that RAM has been updated, until at such time as the | 1935 | obscure the fact that RAM has been updated, until at such time as the cacheline |
1936 | cacheline is discarded from the CPU's cache and reloaded. To deal with this, | 1936 | is discarded from the CPU's cache and reloaded. To deal with this, the |
1937 | the appropriate part of the kernel must invalidate the overlapping bits of the | 1937 | appropriate part of the kernel must invalidate the overlapping bits of the |
1938 | cache on each CPU. | 1938 | cache on each CPU. |
1939 | 1939 | ||
1940 | See Documentation/cachetlb.txt for more information on cache management. | 1940 | See Documentation/cachetlb.txt for more information on cache management. |
@@ -1944,7 +1944,7 @@ CACHE COHERENCY VS MMIO | |||
1944 | ----------------------- | 1944 | ----------------------- |
1945 | 1945 | ||
1946 | Memory mapped I/O usually takes place through memory locations that are part of | 1946 | Memory mapped I/O usually takes place through memory locations that are part of |
1947 | a window in the CPU's memory space that have different properties assigned than | 1947 | a window in the CPU's memory space that has different properties assigned than |
1948 | the usual RAM directed window. | 1948 | the usual RAM directed window. |
1949 | 1949 | ||
1950 | Amongst these properties is usually the fact that such accesses bypass the | 1950 | Amongst these properties is usually the fact that such accesses bypass the |
@@ -1960,7 +1960,7 @@ THE THINGS CPUS GET UP TO | |||
1960 | ========================= | 1960 | ========================= |
1961 | 1961 | ||
1962 | A programmer might take it for granted that the CPU will perform memory | 1962 | A programmer might take it for granted that the CPU will perform memory |
1963 | operations in exactly the order specified, so that if a CPU is, for example, | 1963 | operations in exactly the order specified, so that if the CPU is, for example, |
1964 | given the following piece of code to execute: | 1964 | given the following piece of code to execute: |
1965 | 1965 | ||
1966 | a = *A; | 1966 | a = *A; |
@@ -1969,7 +1969,7 @@ given the following piece of code to execute: | |||
1969 | d = *D; | 1969 | d = *D; |
1970 | *E = e; | 1970 | *E = e; |
1971 | 1971 | ||
1972 | They would then expect that the CPU will complete the memory operation for each | 1972 | they would then expect that the CPU will complete the memory operation for each |
1973 | instruction before moving on to the next one, leading to a definite sequence of | 1973 | instruction before moving on to the next one, leading to a definite sequence of |
1974 | operations as seen by external observers in the system: | 1974 | operations as seen by external observers in the system: |
1975 | 1975 | ||
@@ -1986,8 +1986,8 @@ assumption doesn't hold because: | |||
1986 | (*) loads may be done speculatively, and the result discarded should it prove | 1986 | (*) loads may be done speculatively, and the result discarded should it prove |
1987 | to have been unnecessary; | 1987 | to have been unnecessary; |
1988 | 1988 | ||
1989 | (*) loads may be done speculatively, leading to the result having being | 1989 | (*) loads may be done speculatively, leading to the result having been fetched |
1990 | fetched at the wrong time in the expected sequence of events; | 1990 | at the wrong time in the expected sequence of events; |
1991 | 1991 | ||
1992 | (*) the order of the memory accesses may be rearranged to promote better use | 1992 | (*) the order of the memory accesses may be rearranged to promote better use |
1993 | of the CPU buses and caches; | 1993 | of the CPU buses and caches; |
@@ -2069,12 +2069,12 @@ AND THEN THERE'S THE ALPHA | |||
2069 | 2069 | ||
2070 | The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, | 2070 | The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, |
2071 | some versions of the Alpha CPU have a split data cache, permitting them to have | 2071 | some versions of the Alpha CPU have a split data cache, permitting them to have |
2072 | two semantically related cache lines updating at separate times. This is where | 2072 | two semantically-related cache lines updated at separate times. This is where |
2073 | the data dependency barrier really becomes necessary as this synchronises both | 2073 | the data dependency barrier really becomes necessary as this synchronises both |
2074 | caches with the memory coherence system, thus making it seem like pointer | 2074 | caches with the memory coherence system, thus making it seem like pointer |
2075 | changes vs new data occur in the right order. | 2075 | changes vs new data occur in the right order. |
2076 | 2076 | ||
2077 | The Alpha defines the Linux's kernel's memory barrier model. | 2077 | The Alpha defines the Linux kernel's memory barrier model. |
2078 | 2078 | ||
2079 | See the subsection on "Cache Coherency" above. | 2079 | See the subsection on "Cache Coherency" above. |
2080 | 2080 | ||