1 files changed, 20 insertions, 16 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 4710845dbac4..28d1bc3edb1c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the
 CPU to restrict the order.
 Memory barriers are such interventions.  They impose a perceived partial
-ordering between the memory operations specified on either side of the barrier.
+ordering over the memory operations on either side of the barrier.
-They request that the sequence of memory events generated appears to other
-parts of the system as if the barrier is effective on that CPU.
+Such enforcement is important because the CPUs and other devices in a system
+can use a variety of tricks to improve performance - including reordering,
+deferral and combination of memory operations; speculative loads; speculative
+branch prediction and various types of caching.  Memory barriers are used to
+override or suppress these tricks, allowing the code to sanely control the
+interaction of multiple CPUs and/or devices.
 VARIETIES OF MEMORY BARRIER
@@ -282,7 +287,7 @@ Memory barriers come in four basic varieties:
     A write barrier is a partial ordering on stores only; it is not required
     to have any effect on loads.
-     A CPU can be viewed as as commiting a sequence of store operations to the
+     A CPU can be viewed as committing a sequence of store operations to the
     memory system as time progresses.  All stores before a write barrier will
     occur in the sequence _before_ all the stores after the write barrier.
@@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
     indirect effect will be the order in which the second CPU sees the effects
     of the first CPU's accesses occur, but see the next point:
- (*) There is no guarantee that the a CPU will see the correct order of effects
+ (*) There is no guarantee that a CPU will see the correct order of effects
     from a second CPU's accesses, even _if_ the second CPU uses a memory
     barrier, unless the first CPU _also_ uses a matching memory barrier (see
     the subsection on "SMP Barrier Pairing").
@@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it
 isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
 Alpha).
-To deal with this, a data dependency barrier must be inserted between the
+To deal with this, a data dependency barrier or better must be inserted
-address load and the data load:
+between the address load and the data load:
        CPU 1           CPU 2
        =============== ===============
@@ -484,7 +489,7 @@ lines.  The pointer P might be stored in an odd-numbered cache line, and the
 variable B might be stored in an even-numbered cache line.  Then, if the
 even-numbered bank of the reading CPU's cache is extremely busy while the
 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
-but the old value of the variable B (1).
+but the old value of the variable B (2).
 Another example of where data dependency barriers might by required is where a
@@ -597,7 +602,7 @@ Consider the following sequence of events:
 This sequence of events is committed to the memory coherence system in an order
 that the rest of the system might perceive as the unordered set of { STORE A,
-STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E
+STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
 }:
        +-------+       :      :
@@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1:
                                                :       :
-If, however, a read barrier were to be placed between the load of E and the
+If, however, a read barrier were to be placed between the load of B and the
 load of A on CPU 2:
        CPU 1                   CPU 2
@@ -1461,9 +1466,8 @@ instruction itself is complete.
 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
 compiler barrier, thus making sure the compiler emits the instructions in the
-right order without actually intervening in the CPU.  Since there there's only
+right order without actually intervening in the CPU.  Since there's only one
-one CPU, that CPU's dependency ordering logic will take care of everything
+CPU, that CPU's dependency ordering logic will take care of everything else.
-else.
 ATOMIC OPERATIONS
@@ -1640,9 +1644,9 @@ functions:
     The PCI bus, amongst others, defines an I/O space concept - which on such
     CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
-     space.  However, it may also mapped as a virtual I/O space in the CPU's
+     space.  However, it may also be mapped as a virtual I/O space in the CPU's
-     memory map, particularly on those CPUs that don't support alternate
+     memory map, particularly on those CPUs that don't support alternate I/O
-     I/O spaces.
+     spaces.
     Accesses to this space may be fully synchronous (as on i386), but
     intermediary bridges (such as the PCI host bridge) may not fully honour

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 4710845dbac4..28d1bc3edb1c 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt
@@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the
262	CPU to restrict the order.	262	CPU to restrict the order.
263		263
264	Memory barriers are such interventions. They impose a perceived partial	264	Memory barriers are such interventions. They impose a perceived partial
265	ordering between the memory operations specified on either side of the barrier.	265	ordering over the memory operations on either side of the barrier.
266	They request that the sequence of memory events generated appears to other	266
267	parts of the system as if the barrier is effective on that CPU.	267	Such enforcement is important because the CPUs and other devices in a system
		268	can use a variety of tricks to improve performance - including reordering,
		269	deferral and combination of memory operations; speculative loads; speculative
		270	branch prediction and various types of caching. Memory barriers are used to
		271	override or suppress these tricks, allowing the code to sanely control the
		272	interaction of multiple CPUs and/or devices.
268		273
269		274
270	VARIETIES OF MEMORY BARRIER	275	VARIETIES OF MEMORY BARRIER
@@ -282,7 +287,7 @@ Memory barriers come in four basic varieties:
282	A write barrier is a partial ordering on stores only; it is not required	287	A write barrier is a partial ordering on stores only; it is not required
283	to have any effect on loads.	288	to have any effect on loads.
284		289
285	A CPU can be viewed as as commiting a sequence of store operations to the	290	A CPU can be viewed as committing a sequence of store operations to the
286	memory system as time progresses. All stores before a write barrier will	291	memory system as time progresses. All stores before a write barrier will
287	occur in the sequence _before_ all the stores after the write barrier.	292	occur in the sequence _before_ all the stores after the write barrier.
288		293
@@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
413	indirect effect will be the order in which the second CPU sees the effects	418	indirect effect will be the order in which the second CPU sees the effects
414	of the first CPU's accesses occur, but see the next point:	419	of the first CPU's accesses occur, but see the next point:
415		420
416	(*) There is no guarantee that the a CPU will see the correct order of effects	421	(*) There is no guarantee that a CPU will see the correct order of effects
417	from a second CPU's accesses, even _if_ the second CPU uses a memory	422	from a second CPU's accesses, even _if_ the second CPU uses a memory
418	barrier, unless the first CPU _also_ uses a matching memory barrier (see	423	barrier, unless the first CPU _also_ uses a matching memory barrier (see
419	the subsection on "SMP Barrier Pairing").	424	the subsection on "SMP Barrier Pairing").
@@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it
461	isn't, and this behaviour can be observed on certain real CPUs (such as the DEC	466	isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
462	Alpha).	467	Alpha).
463		468
464	To deal with this, a data dependency barrier must be inserted between the	469	To deal with this, a data dependency barrier or better must be inserted
465	address load and the data load:	470	between the address load and the data load:
466		471
467	CPU 1 CPU 2	472	CPU 1 CPU 2
468	=============== ===============	473	=============== ===============
@@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the
484	variable B might be stored in an even-numbered cache line. Then, if the	489	variable B might be stored in an even-numbered cache line. Then, if the
485	even-numbered bank of the reading CPU's cache is extremely busy while the	490	even-numbered bank of the reading CPU's cache is extremely busy while the
486	odd-numbered bank is idle, one can see the new value of the pointer P (&B),	491	odd-numbered bank is idle, one can see the new value of the pointer P (&B),
487	but the old value of the variable B (1).	492	but the old value of the variable B (2).
488		493
489		494
490	Another example of where data dependency barriers might by required is where a	495	Another example of where data dependency barriers might by required is where a
@@ -597,7 +602,7 @@ Consider the following sequence of events:
597		602
598	This sequence of events is committed to the memory coherence system in an order	603	This sequence of events is committed to the memory coherence system in an order
599	that the rest of the system might perceive as the unordered set of { STORE A,	604	that the rest of the system might perceive as the unordered set of { STORE A,
600	STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E	605	STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
601	}:	606	}:
602		607
603	+-------+ : :	608	+-------+ : :
@@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1:
744	: :	749	: :
745		750
746		751
747	If, however, a read barrier were to be placed between the load of E and the	752	If, however, a read barrier were to be placed between the load of B and the
748	load of A on CPU 2:	753	load of A on CPU 2:
749		754
750	CPU 1 CPU 2	755	CPU 1 CPU 2
@@ -1461,9 +1466,8 @@ instruction itself is complete.
1461		1466
1462	On a UP system - where this wouldn't be a problem - the smp_mb() is just a	1467	On a UP system - where this wouldn't be a problem - the smp_mb() is just a
1463	compiler barrier, thus making sure the compiler emits the instructions in the	1468	compiler barrier, thus making sure the compiler emits the instructions in the
1464	right order without actually intervening in the CPU. Since there there's only	1469	right order without actually intervening in the CPU. Since there's only one
1465	one CPU, that CPU's dependency ordering logic will take care of everything	1470	CPU, that CPU's dependency ordering logic will take care of everything else.
1466	else.
1467		1471
1468		1472
1469	ATOMIC OPERATIONS	1473	ATOMIC OPERATIONS
@@ -1640,9 +1644,9 @@ functions:
1640		1644
1641	The PCI bus, amongst others, defines an I/O space concept - which on such	1645	The PCI bus, amongst others, defines an I/O space concept - which on such
1642	CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O	1646	CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
1643	space. However, it may also mapped as a virtual I/O space in the CPU's	1647	space. However, it may also be mapped as a virtual I/O space in the CPU's
1644	memory map, particularly on those CPUs that don't support alternate	1648	memory map, particularly on those CPUs that don't support alternate I/O
1645	I/O spaces.	1649	spaces.
1646		1650
1647	Accesses to this space may be fully synchronous (as on i386), but	1651	Accesses to this space may be fully synchronous (as on i386), but
1648	intermediary bridges (such as the PCI host bridge) may not fully honour	1652	intermediary bridges (such as the PCI host bridge) may not fully honour