aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/memory-barriers.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r--Documentation/memory-barriers.txt36
1 files changed, 20 insertions, 16 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 4710845dbac4..28d1bc3edb1c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the
262CPU to restrict the order. 262CPU to restrict the order.
263 263
264Memory barriers are such interventions. They impose a perceived partial 264Memory barriers are such interventions. They impose a perceived partial
265ordering between the memory operations specified on either side of the barrier. 265ordering over the memory operations on either side of the barrier.
266They request that the sequence of memory events generated appears to other 266
267parts of the system as if the barrier is effective on that CPU. 267Such enforcement is important because the CPUs and other devices in a system
268can use a variety of tricks to improve performance - including reordering,
269deferral and combination of memory operations; speculative loads; speculative
270branch prediction and various types of caching. Memory barriers are used to
271override or suppress these tricks, allowing the code to sanely control the
272interaction of multiple CPUs and/or devices.
268 273
269 274
270VARIETIES OF MEMORY BARRIER 275VARIETIES OF MEMORY BARRIER
@@ -282,7 +287,7 @@ Memory barriers come in four basic varieties:
282 A write barrier is a partial ordering on stores only; it is not required 287 A write barrier is a partial ordering on stores only; it is not required
283 to have any effect on loads. 288 to have any effect on loads.
284 289
285 A CPU can be viewed as as commiting a sequence of store operations to the 290 A CPU can be viewed as committing a sequence of store operations to the
286 memory system as time progresses. All stores before a write barrier will 291 memory system as time progresses. All stores before a write barrier will
287 occur in the sequence _before_ all the stores after the write barrier. 292 occur in the sequence _before_ all the stores after the write barrier.
288 293
@@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
413 indirect effect will be the order in which the second CPU sees the effects 418 indirect effect will be the order in which the second CPU sees the effects
414 of the first CPU's accesses occur, but see the next point: 419 of the first CPU's accesses occur, but see the next point:
415 420
416 (*) There is no guarantee that the a CPU will see the correct order of effects 421 (*) There is no guarantee that a CPU will see the correct order of effects
417 from a second CPU's accesses, even _if_ the second CPU uses a memory 422 from a second CPU's accesses, even _if_ the second CPU uses a memory
418 barrier, unless the first CPU _also_ uses a matching memory barrier (see 423 barrier, unless the first CPU _also_ uses a matching memory barrier (see
419 the subsection on "SMP Barrier Pairing"). 424 the subsection on "SMP Barrier Pairing").
@@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it
461isn't, and this behaviour can be observed on certain real CPUs (such as the DEC 466isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
462Alpha). 467Alpha).
463 468
464To deal with this, a data dependency barrier must be inserted between the 469To deal with this, a data dependency barrier or better must be inserted
465address load and the data load: 470between the address load and the data load:
466 471
467 CPU 1 CPU 2 472 CPU 1 CPU 2
468 =============== =============== 473 =============== ===============
@@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the
484variable B might be stored in an even-numbered cache line. Then, if the 489variable B might be stored in an even-numbered cache line. Then, if the
485even-numbered bank of the reading CPU's cache is extremely busy while the 490even-numbered bank of the reading CPU's cache is extremely busy while the
486odd-numbered bank is idle, one can see the new value of the pointer P (&B), 491odd-numbered bank is idle, one can see the new value of the pointer P (&B),
487but the old value of the variable B (1). 492but the old value of the variable B (2).
488 493
489 494
490Another example of where data dependency barriers might by required is where a 495Another example of where data dependency barriers might by required is where a
@@ -597,7 +602,7 @@ Consider the following sequence of events:
597 602
598This sequence of events is committed to the memory coherence system in an order 603This sequence of events is committed to the memory coherence system in an order
599that the rest of the system might perceive as the unordered set of { STORE A, 604that the rest of the system might perceive as the unordered set of { STORE A,
600STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E 605STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
601}: 606}:
602 607
603 +-------+ : : 608 +-------+ : :
@@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1:
744 : : 749 : :
745 750
746 751
747If, however, a read barrier were to be placed between the load of E and the 752If, however, a read barrier were to be placed between the load of B and the
748load of A on CPU 2: 753load of A on CPU 2:
749 754
750 CPU 1 CPU 2 755 CPU 1 CPU 2
@@ -1461,9 +1466,8 @@ instruction itself is complete.
1461 1466
1462On a UP system - where this wouldn't be a problem - the smp_mb() is just a 1467On a UP system - where this wouldn't be a problem - the smp_mb() is just a
1463compiler barrier, thus making sure the compiler emits the instructions in the 1468compiler barrier, thus making sure the compiler emits the instructions in the
1464right order without actually intervening in the CPU. Since there there's only 1469right order without actually intervening in the CPU. Since there's only one
1465one CPU, that CPU's dependency ordering logic will take care of everything 1470CPU, that CPU's dependency ordering logic will take care of everything else.
1466else.
1467 1471
1468 1472
1469ATOMIC OPERATIONS 1473ATOMIC OPERATIONS
@@ -1640,9 +1644,9 @@ functions:
1640 1644
1641 The PCI bus, amongst others, defines an I/O space concept - which on such 1645 The PCI bus, amongst others, defines an I/O space concept - which on such
1642 CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O 1646 CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
1643 space. However, it may also mapped as a virtual I/O space in the CPU's 1647 space. However, it may also be mapped as a virtual I/O space in the CPU's
1644 memory map, particularly on those CPUs that don't support alternate 1648 memory map, particularly on those CPUs that don't support alternate I/O
1645 I/O spaces. 1649 spaces.
1646 1650
1647 Accesses to this space may be fully synchronous (as on i386), but 1651 Accesses to this space may be fully synchronous (as on i386), but
1648 intermediary bridges (such as the PCI host bridge) may not fully honour 1652 intermediary bridges (such as the PCI host bridge) may not fully honour