aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDavid Howells <dhowells@redhat.com>2006-06-10 12:54:12 -0400
committerLinus Torvalds <torvalds@g5.osdl.org>2006-06-10 14:02:05 -0400
commit670bd95e0413c43f878b73a4a3919d1f452a4157 (patch)
treedb7b05810c5cc61c89b856996174e31147611cba
parentd90d2c385d4d832428d1e51c2a7edeef39c822f5 (diff)
[PATCH] Further alterations for memory barrier document
From: David Howells <dhowells@redhat.com> Apply some alterations to the memory barrier document that I worked out with Paul McKenney of IBM, plus some of the alterations suggested by Alan Stern. The following changes were made: (*) One of the examples given for what can happen with overlapping memory barriers was wrong. (*) The description of general memory barriers said that a general barrier is a combination of a read barrier and a write barrier. This isn't entirely true: it implies both, but is more than a combination of both. (*) The first example in the "SMP Barrier Pairing" section was wrong: the loads around the read barrier need to touch the memory locations in the opposite order to the stores around the write barrier. (*) Added a note to make explicit that the loads should be in reverse order to the stores. (*) Adjusted the diagrams in the "Examples Of Memory Barrier Sequences" section to make them clearer. Added a couple of diagrams to make it more clear as to how it could go wrong without the barrier. (*) Added a section on memory speculation. (*) Dropped any references to memory allocation routines doing memory barriers. They may do sometimes, but it can't be relied on. This may be worthy of further documentation later. (*) Made the fact that a LOCK followed by an UNLOCK should not be considered a full memory barrier more explicit and gave an example. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-rw-r--r--Documentation/memory-barriers.txt348
1 files changed, 270 insertions, 78 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index c61d8b876fdb..4710845dbac4 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -19,6 +19,7 @@ Contents:
19 - Control dependencies. 19 - Control dependencies.
20 - SMP barrier pairing. 20 - SMP barrier pairing.
21 - Examples of memory barrier sequences. 21 - Examples of memory barrier sequences.
22 - Read memory barriers vs load speculation.
22 23
23 (*) Explicit kernel barriers. 24 (*) Explicit kernel barriers.
24 25
@@ -248,7 +249,7 @@ And there are a number of things that _must_ or _must_not_ be assumed:
248 we may get either of: 249 we may get either of:
249 250
250 STORE *A = X; Y = LOAD *A; 251 STORE *A = X; Y = LOAD *A;
251 STORE *A = Y; 252 STORE *A = Y = X;
252 253
253 254
254========================= 255=========================
@@ -344,9 +345,12 @@ Memory barriers come in four basic varieties:
344 345
345 (4) General memory barriers. 346 (4) General memory barriers.
346 347
347 A general memory barrier is a combination of both a read memory barrier 348 A general memory barrier gives a guarantee that all the LOAD and STORE
348 and a write memory barrier. It is a partial ordering over both loads and 349 operations specified before the barrier will appear to happen before all
349 stores. 350 the LOAD and STORE operations specified after the barrier with respect to
351 the other components of the system.
352
353 A general memory barrier is a partial ordering over both loads and stores.
350 354
351 General memory barriers imply both read and write memory barriers, and so 355 General memory barriers imply both read and write memory barriers, and so
352 can substitute for either. 356 can substitute for either.
@@ -546,9 +550,9 @@ write barrier, though, again, a general barrier is viable:
546 =============== =============== 550 =============== ===============
547 a = 1; 551 a = 1;
548 <write barrier> 552 <write barrier>
549 b = 2; x = a; 553 b = 2; x = b;
550 <read barrier> 554 <read barrier>
551 y = b; 555 y = a;
552 556
553Or: 557Or:
554 558
@@ -563,6 +567,18 @@ Or:
563Basically, the read barrier always has to be there, even though it can be of 567Basically, the read barrier always has to be there, even though it can be of
564the "weaker" type. 568the "weaker" type.
565 569
570[!] Note that the stores before the write barrier would normally be expected to
571match the loads after the read barrier or data dependency barrier, and vice
572versa:
573
574 CPU 1 CPU 2
575 =============== ===============
576 a = 1; }---- --->{ v = c
577 b = 2; } \ / { w = d
578 <write barrier> \ <read barrier>
579 c = 3; } / \ { x = a;
580 d = 4; }---- --->{ y = b;
581
566 582
567EXAMPLES OF MEMORY BARRIER SEQUENCES 583EXAMPLES OF MEMORY BARRIER SEQUENCES
568------------------------------------ 584------------------------------------
@@ -600,8 +616,8 @@ STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E
600 | | +------+ 616 | | +------+
601 +-------+ : : 617 +-------+ : :
602 | 618 |
603 | Sequence in which stores committed to memory system 619 | Sequence in which stores are committed to the
604 | by CPU 1 620 | memory system by CPU 1
605 V 621 V
606 622
607 623
@@ -683,14 +699,12 @@ then the following will occur:
683 | : : | | 699 | : : | |
684 | : : | CPU 2 | 700 | : : | CPU 2 |
685 | +-------+ | | 701 | +-------+ | |
686 \ | X->9 |------>| | 702 | | X->9 |------>| |
687 \ +-------+ | | 703 | +-------+ | |
688 ----->| B->2 | | | 704 Makes sure all effects ---> \ ddddddddddddddddd | |
689 +-------+ | | 705 prior to the store of C \ +-------+ | |
690 Makes sure all effects ---> ddddddddddddddddd | | 706 are perceptible to ----->| B->2 |------>| |
691 prior to the store of C +-------+ | | 707 subsequent loads +-------+ | |
692 are perceptible to | B->2 |------>| |
693 successive loads +-------+ | |
694 : : +-------+ 708 : : +-------+
695 709
696 710
@@ -699,73 +713,239 @@ following sequence of events:
699 713
700 CPU 1 CPU 2 714 CPU 1 CPU 2
701 ======================= ======================= 715 ======================= =======================
716 { A = 0, B = 9 }
702 STORE A=1 717 STORE A=1
703 STORE B=2
704 STORE C=3
705 <write barrier> 718 <write barrier>
706 STORE D=4 719 STORE B=2
707 STORE E=5
708 LOAD A
709 LOAD B 720 LOAD B
710 LOAD C 721 LOAD A
711 LOAD D
712 LOAD E
713 722
714Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in 723Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
715some effectively random order, despite the write barrier issued by CPU 1: 724some effectively random order, despite the write barrier issued by CPU 1:
716 725
717 +-------+ : : 726 +-------+ : : : :
718 | | +------+ 727 | | +------+ +-------+
719 | |------>| C=3 | } 728 | |------>| A=1 |------ --->| A->0 |
720 | | : +------+ } 729 | | +------+ \ +-------+
721 | | : | A=1 | } 730 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
722 | | : +------+ } 731 | | +------+ | +-------+
723 | CPU 1 | : | B=2 | }--- 732 | |------>| B=2 |--- | : :
724 | | +------+ } \ 733 | | +------+ \ | : : +-------+
725 | | wwwwwwwwwwwww} \ 734 +-------+ : : \ | +-------+ | |
726 | | +------+ } \ : : +-------+ 735 ---------->| B->2 |------>| |
727 | | : | E=5 | } \ +-------+ | | 736 | +-------+ | CPU 2 |
728 | | : +------+ } \ { | C->3 |------>| | 737 | | A->0 |------>| |
729 | |------>| D=4 | } \ { +-------+ : | | 738 | +-------+ | |
730 | | +------+ \ { | E->5 | : | | 739 | : : +-------+
731 +-------+ : : \ { +-------+ : | | 740 \ : :
732 Transfer -->{ | A->1 | : | CPU 2 | 741 \ +-------+
733 from CPU 1 { +-------+ : | | 742 ---->| A->1 |
734 to CPU 2 { | D->4 | : | | 743 +-------+
735 { +-------+ : | | 744 : :
736 { | B->2 |------>| |
737 +-------+ | |
738 : : +-------+
739
740
741If, however, a read barrier were to be placed between the load of C and the
742load of D on CPU 2, then the partial ordering imposed by CPU 1 will be
743perceived correctly by CPU 2.
744 745
745 +-------+ : : 746
746 | | +------+ 747If, however, a read barrier were to be placed between the load of E and the
747 | |------>| C=3 | } 748load of A on CPU 2:
748 | | : +------+ } 749
749 | | : | A=1 | }--- 750 CPU 1 CPU 2
750 | | : +------+ } \ 751 ======================= =======================
751 | CPU 1 | : | B=2 | } \ 752 { A = 0, B = 9 }
752 | | +------+ \ 753 STORE A=1
753 | | wwwwwwwwwwwwwwww \ 754 <write barrier>
754 | | +------+ \ : : +-------+ 755 STORE B=2
755 | | : | E=5 | } \ +-------+ | | 756 LOAD B
756 | | : +------+ }--- \ { | C->3 |------>| | 757 <read barrier>
757 | |------>| D=4 | } \ \ { +-------+ : | | 758 LOAD A
758 | | +------+ \ -->{ | B->2 | : | | 759
759 +-------+ : : \ { +-------+ : | | 760then the partial ordering imposed by CPU 1 will be perceived correctly by CPU
760 \ { | A->1 | : | CPU 2 | 7612:
761 \ +-------+ | | 762
762 At this point the read ----> \ rrrrrrrrrrrrrrrrr | | 763 +-------+ : : : :
763 barrier causes all effects \ +-------+ | | 764 | | +------+ +-------+
764 prior to the storage of C \ { | E->5 | : | | 765 | |------>| A=1 |------ --->| A->0 |
765 to be perceptible to CPU 2 -->{ +-------+ : | | 766 | | +------+ \ +-------+
766 { | D->4 |------>| | 767 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
767 +-------+ | | 768 | | +------+ | +-------+
768 : : +-------+ 769 | |------>| B=2 |--- | : :
770 | | +------+ \ | : : +-------+
771 +-------+ : : \ | +-------+ | |
772 ---------->| B->2 |------>| |
773 | +-------+ | CPU 2 |
774 | : : | |
775 | : : | |
776 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
777 barrier causes all effects \ +-------+ | |
778 prior to the storage of B ---->| A->1 |------>| |
779 to be perceptible to CPU 2 +-------+ | |
780 : : +-------+
781
782
783To illustrate this more completely, consider what could happen if the code
784contained a load of A either side of the read barrier:
785
786 CPU 1 CPU 2
787 ======================= =======================
788 { A = 0, B = 9 }
789 STORE A=1
790 <write barrier>
791 STORE B=2
792 LOAD B
793 LOAD A [first load of A]
794 <read barrier>
795 LOAD A [second load of A]
796
797Even though the two loads of A both occur after the load of B, they may both
798come up with different values:
799
800 +-------+ : : : :
801 | | +------+ +-------+
802 | |------>| A=1 |------ --->| A->0 |
803 | | +------+ \ +-------+
804 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
805 | | +------+ | +-------+
806 | |------>| B=2 |--- | : :
807 | | +------+ \ | : : +-------+
808 +-------+ : : \ | +-------+ | |
809 ---------->| B->2 |------>| |
810 | +-------+ | CPU 2 |
811 | : : | |
812 | : : | |
813 | +-------+ | |
814 | | A->0 |------>| 1st |
815 | +-------+ | |
816 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
817 barrier causes all effects \ +-------+ | |
818 prior to the storage of B ---->| A->1 |------>| 2nd |
819 to be perceptible to CPU 2 +-------+ | |
820 : : +-------+
821
822
823But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
824before the read barrier completes anyway:
825
826 +-------+ : : : :
827 | | +------+ +-------+
828 | |------>| A=1 |------ --->| A->0 |
829 | | +------+ \ +-------+
830 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
831 | | +------+ | +-------+
832 | |------>| B=2 |--- | : :
833 | | +------+ \ | : : +-------+
834 +-------+ : : \ | +-------+ | |
835 ---------->| B->2 |------>| |
836 | +-------+ | CPU 2 |
837 | : : | |
838 \ : : | |
839 \ +-------+ | |
840 ---->| A->1 |------>| 1st |
841 +-------+ | |
842 rrrrrrrrrrrrrrrrr | |
843 +-------+ | |
844 | A->1 |------>| 2nd |
845 +-------+ | |
846 : : +-------+
847
848
849The guarantee is that the second load will always come up with A == 1 if the
850load of B came up with B == 2. No such guarantee exists for the first load of
851A; that may come up with either A == 0 or A == 1.
852
853
854READ MEMORY BARRIERS VS LOAD SPECULATION
855----------------------------------------
856
857Many CPUs speculate with loads: that is they see that they will need to load an
858item from memory, and they find a time where they're not using the bus for any
859other loads, and so do the load in advance - even though they haven't actually
860got to that point in the instruction execution flow yet. This permits the
861actual load instruction to potentially complete immediately because the CPU
862already has the value to hand.
863
864It may turn out that the CPU didn't actually need the value - perhaps because a
865branch circumvented the load - in which case it can discard the value or just
866cache it for later use.
867
868Consider:
869
870 CPU 1 CPU 2
871 ======================= =======================
872 LOAD B
873 DIVIDE } Divide instructions generally
874 DIVIDE } take a long time to perform
875 LOAD A
876
877Which might appear as this:
878
879 : : +-------+
880 +-------+ | |
881 --->| B->2 |------>| |
882 +-------+ | CPU 2 |
883 : :DIVIDE | |
884 +-------+ | |
885 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
886 division speculates on the +-------+ ~ | |
887 LOAD of A : : ~ | |
888 : :DIVIDE | |
889 : : ~ | |
890 Once the divisions are complete --> : : ~-->| |
891 the CPU can then perform the : : | |
892 LOAD with immediate effect : : +-------+
893
894
895Placing a read barrier or a data dependency barrier just before the second
896load:
897
898 CPU 1 CPU 2
899 ======================= =======================
900 LOAD B
901 DIVIDE
902 DIVIDE
903 <read barrier>
904 LOAD A
905
906will force any value speculatively obtained to be reconsidered to an extent
907dependent on the type of barrier used. If there was no change made to the
908speculated memory location, then the speculated value will just be used:
909
910 : : +-------+
911 +-------+ | |
912 --->| B->2 |------>| |
913 +-------+ | CPU 2 |
914 : :DIVIDE | |
915 +-------+ | |
916 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
917 division speculates on the +-------+ ~ | |
918 LOAD of A : : ~ | |
919 : :DIVIDE | |
920 : : ~ | |
921 : : ~ | |
922 rrrrrrrrrrrrrrrr~ | |
923 : : ~ | |
924 : : ~-->| |
925 : : | |
926 : : +-------+
927
928
929but if there was an update or an invalidation from another CPU pending, then
930the speculation will be cancelled and the value reloaded:
931
932 : : +-------+
933 +-------+ | |
934 --->| B->2 |------>| |
935 +-------+ | CPU 2 |
936 : :DIVIDE | |
937 +-------+ | |
938 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
939 division speculates on the +-------+ ~ | |
940 LOAD of A : : ~ | |
941 : :DIVIDE | |
942 : : ~ | |
943 : : ~ | |
944 rrrrrrrrrrrrrrrrr | |
945 +-------+ | |
946 The speculation is discarded ---> --->| A->1 |------>| |
947 and an updated value is +-------+ | |
948 retrieved : : +-------+
769 949
770 950
771======================== 951========================
@@ -901,7 +1081,7 @@ IMPLICIT KERNEL MEMORY BARRIERS
901=============================== 1081===============================
902 1082
903Some of the other functions in the linux kernel imply memory barriers, amongst 1083Some of the other functions in the linux kernel imply memory barriers, amongst
904which are locking, scheduling and memory allocation functions. 1084which are locking and scheduling functions.
905 1085
906This specification is a _minimum_ guarantee; any particular architecture may 1086This specification is a _minimum_ guarantee; any particular architecture may
907provide more substantial guarantees, but these may not be relied upon outside 1087provide more substantial guarantees, but these may not be relied upon outside
@@ -966,6 +1146,20 @@ equivalent to a full barrier, but a LOCK followed by an UNLOCK is not.
966 barriers is that the effects instructions outside of a critical section may 1146 barriers is that the effects instructions outside of a critical section may
967 seep into the inside of the critical section. 1147 seep into the inside of the critical section.
968 1148
1149A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
1150because it is possible for an access preceding the LOCK to happen after the
1151LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
1152two accesses can themselves then cross:
1153
1154 *A = a;
1155 LOCK
1156 UNLOCK
1157 *B = b;
1158
1159may occur as:
1160
1161 LOCK, STORE *B, STORE *A, UNLOCK
1162
969Locks and semaphores may not provide any guarantee of ordering on UP compiled 1163Locks and semaphores may not provide any guarantee of ordering on UP compiled
970systems, and so cannot be counted on in such a situation to actually achieve 1164systems, and so cannot be counted on in such a situation to actually achieve
971anything at all - especially with respect to I/O accesses - unless combined 1165anything at all - especially with respect to I/O accesses - unless combined
@@ -1016,8 +1210,6 @@ Other functions that imply barriers:
1016 1210
1017 (*) schedule() and similar imply full memory barriers. 1211 (*) schedule() and similar imply full memory barriers.
1018 1212
1019 (*) Memory allocation and release functions imply full memory barriers.
1020
1021 1213
1022================================= 1214=================================
1023INTER-CPU LOCKING BARRIER EFFECTS 1215INTER-CPU LOCKING BARRIER EFFECTS