diff options
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r-- | Documentation/memory-barriers.txt | 36 |
1 files changed, 20 insertions, 16 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 4710845dbac4..28d1bc3edb1c 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the | |||
262 | CPU to restrict the order. | 262 | CPU to restrict the order. |
263 | 263 | ||
264 | Memory barriers are such interventions. They impose a perceived partial | 264 | Memory barriers are such interventions. They impose a perceived partial |
265 | ordering between the memory operations specified on either side of the barrier. | 265 | ordering over the memory operations on either side of the barrier. |
266 | They request that the sequence of memory events generated appears to other | 266 | |
267 | parts of the system as if the barrier is effective on that CPU. | 267 | Such enforcement is important because the CPUs and other devices in a system |
268 | can use a variety of tricks to improve performance - including reordering, | ||
269 | deferral and combination of memory operations; speculative loads; speculative | ||
270 | branch prediction and various types of caching. Memory barriers are used to | ||
271 | override or suppress these tricks, allowing the code to sanely control the | ||
272 | interaction of multiple CPUs and/or devices. | ||
268 | 273 | ||
269 | 274 | ||
270 | VARIETIES OF MEMORY BARRIER | 275 | VARIETIES OF MEMORY BARRIER |
@@ -282,7 +287,7 @@ Memory barriers come in four basic varieties: | |||
282 | A write barrier is a partial ordering on stores only; it is not required | 287 | A write barrier is a partial ordering on stores only; it is not required |
283 | to have any effect on loads. | 288 | to have any effect on loads. |
284 | 289 | ||
285 | A CPU can be viewed as as commiting a sequence of store operations to the | 290 | A CPU can be viewed as committing a sequence of store operations to the |
286 | memory system as time progresses. All stores before a write barrier will | 291 | memory system as time progresses. All stores before a write barrier will |
287 | occur in the sequence _before_ all the stores after the write barrier. | 292 | occur in the sequence _before_ all the stores after the write barrier. |
288 | 293 | ||
@@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee: | |||
413 | indirect effect will be the order in which the second CPU sees the effects | 418 | indirect effect will be the order in which the second CPU sees the effects |
414 | of the first CPU's accesses occur, but see the next point: | 419 | of the first CPU's accesses occur, but see the next point: |
415 | 420 | ||
416 | (*) There is no guarantee that the a CPU will see the correct order of effects | 421 | (*) There is no guarantee that a CPU will see the correct order of effects |
417 | from a second CPU's accesses, even _if_ the second CPU uses a memory | 422 | from a second CPU's accesses, even _if_ the second CPU uses a memory |
418 | barrier, unless the first CPU _also_ uses a matching memory barrier (see | 423 | barrier, unless the first CPU _also_ uses a matching memory barrier (see |
419 | the subsection on "SMP Barrier Pairing"). | 424 | the subsection on "SMP Barrier Pairing"). |
@@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it | |||
461 | isn't, and this behaviour can be observed on certain real CPUs (such as the DEC | 466 | isn't, and this behaviour can be observed on certain real CPUs (such as the DEC |
462 | Alpha). | 467 | Alpha). |
463 | 468 | ||
464 | To deal with this, a data dependency barrier must be inserted between the | 469 | To deal with this, a data dependency barrier or better must be inserted |
465 | address load and the data load: | 470 | between the address load and the data load: |
466 | 471 | ||
467 | CPU 1 CPU 2 | 472 | CPU 1 CPU 2 |
468 | =============== =============== | 473 | =============== =============== |
@@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the | |||
484 | variable B might be stored in an even-numbered cache line. Then, if the | 489 | variable B might be stored in an even-numbered cache line. Then, if the |
485 | even-numbered bank of the reading CPU's cache is extremely busy while the | 490 | even-numbered bank of the reading CPU's cache is extremely busy while the |
486 | odd-numbered bank is idle, one can see the new value of the pointer P (&B), | 491 | odd-numbered bank is idle, one can see the new value of the pointer P (&B), |
487 | but the old value of the variable B (1). | 492 | but the old value of the variable B (2). |
488 | 493 | ||
489 | 494 | ||
490 | Another example of where data dependency barriers might by required is where a | 495 | Another example of where data dependency barriers might by required is where a |
@@ -597,7 +602,7 @@ Consider the following sequence of events: | |||
597 | 602 | ||
598 | This sequence of events is committed to the memory coherence system in an order | 603 | This sequence of events is committed to the memory coherence system in an order |
599 | that the rest of the system might perceive as the unordered set of { STORE A, | 604 | that the rest of the system might perceive as the unordered set of { STORE A, |
600 | STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E | 605 | STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E |
601 | }: | 606 | }: |
602 | 607 | ||
603 | +-------+ : : | 608 | +-------+ : : |
@@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1: | |||
744 | : : | 749 | : : |
745 | 750 | ||
746 | 751 | ||
747 | If, however, a read barrier were to be placed between the load of E and the | 752 | If, however, a read barrier were to be placed between the load of B and the |
748 | load of A on CPU 2: | 753 | load of A on CPU 2: |
749 | 754 | ||
750 | CPU 1 CPU 2 | 755 | CPU 1 CPU 2 |
@@ -1461,9 +1466,8 @@ instruction itself is complete. | |||
1461 | 1466 | ||
1462 | On a UP system - where this wouldn't be a problem - the smp_mb() is just a | 1467 | On a UP system - where this wouldn't be a problem - the smp_mb() is just a |
1463 | compiler barrier, thus making sure the compiler emits the instructions in the | 1468 | compiler barrier, thus making sure the compiler emits the instructions in the |
1464 | right order without actually intervening in the CPU. Since there there's only | 1469 | right order without actually intervening in the CPU. Since there's only one |
1465 | one CPU, that CPU's dependency ordering logic will take care of everything | 1470 | CPU, that CPU's dependency ordering logic will take care of everything else. |
1466 | else. | ||
1467 | 1471 | ||
1468 | 1472 | ||
1469 | ATOMIC OPERATIONS | 1473 | ATOMIC OPERATIONS |
@@ -1640,9 +1644,9 @@ functions: | |||
1640 | 1644 | ||
1641 | The PCI bus, amongst others, defines an I/O space concept - which on such | 1645 | The PCI bus, amongst others, defines an I/O space concept - which on such |
1642 | CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O | 1646 | CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O |
1643 | space. However, it may also mapped as a virtual I/O space in the CPU's | 1647 | space. However, it may also be mapped as a virtual I/O space in the CPU's |
1644 | memory map, particularly on those CPUs that don't support alternate | 1648 | memory map, particularly on those CPUs that don't support alternate I/O |
1645 | I/O spaces. | 1649 | spaces. |
1646 | 1650 | ||
1647 | Accesses to this space may be fully synchronous (as on i386), but | 1651 | Accesses to this space may be fully synchronous (as on i386), but |
1648 | intermediary bridges (such as the PCI host bridge) may not fully honour | 1652 | intermediary bridges (such as the PCI host bridge) may not fully honour |