diff options
author | Peter Zijlstra <peterz@infradead.org> | 2013-12-11 16:59:06 -0500 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2013-12-16 05:36:11 -0500 |
commit | 18c03c61444a211237f3d4782353cb38dba795df (patch) | |
tree | 32f92af0726cfdd576370f2965418f5a859e603b | |
parent | fb2b581968db140586e8d7db38ff278f60872313 (diff) |
Documentation/memory-barriers.txt: Prohibit speculative writes
No SMP architecture currently supporting Linux allows
speculative writes, so this commit updates
Documentation/memory-barriers.txt to prohibit them in Linux core
code. It also records restrictions on their use.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-3-git-send-email-paulmck@linux.vnet.ibm.com
[ Paul modified the original patch from Peter. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-rw-r--r-- | Documentation/memory-barriers.txt | 183 |
1 files changed, 175 insertions, 8 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 2d22da095a60..deafa36aeea1 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -571,11 +571,10 @@ dependency barrier to make it work correctly. Consider the following bit of | |||
571 | code: | 571 | code: |
572 | 572 | ||
573 | q = ACCESS_ONCE(a); | 573 | q = ACCESS_ONCE(a); |
574 | if (p) { | 574 | if (q) { |
575 | <data dependency barrier> | 575 | <data dependency barrier> /* BUG: No data dependency!!! */ |
576 | q = ACCESS_ONCE(b); | 576 | p = ACCESS_ONCE(b); |
577 | } | 577 | } |
578 | x = *q; | ||
579 | 578 | ||
580 | This will not have the desired effect because there is no actual data | 579 | This will not have the desired effect because there is no actual data |
581 | dependency, but rather a control dependency that the CPU may short-circuit | 580 | dependency, but rather a control dependency that the CPU may short-circuit |
@@ -584,11 +583,176 @@ the load from b as having happened before the load from a. In such a | |||
584 | case what's actually required is: | 583 | case what's actually required is: |
585 | 584 | ||
586 | q = ACCESS_ONCE(a); | 585 | q = ACCESS_ONCE(a); |
587 | if (p) { | 586 | if (q) { |
588 | <read barrier> | 587 | <read barrier> |
589 | q = ACCESS_ONCE(b); | 588 | p = ACCESS_ONCE(b); |
590 | } | 589 | } |
591 | x = *q; | 590 | |
591 | However, stores are not speculated. This means that ordering -is- provided | ||
592 | in the following example: | ||
593 | |||
594 | q = ACCESS_ONCE(a); | ||
595 | if (ACCESS_ONCE(q)) { | ||
596 | ACCESS_ONCE(b) = p; | ||
597 | } | ||
598 | |||
599 | Please note that ACCESS_ONCE() is not optional! Without the ACCESS_ONCE(), | ||
600 | the compiler is within its rights to transform this example: | ||
601 | |||
602 | q = a; | ||
603 | if (q) { | ||
604 | b = p; /* BUG: Compiler can reorder!!! */ | ||
605 | do_something(); | ||
606 | } else { | ||
607 | b = p; /* BUG: Compiler can reorder!!! */ | ||
608 | do_something_else(); | ||
609 | } | ||
610 | |||
611 | into this, which of course defeats the ordering: | ||
612 | |||
613 | b = p; | ||
614 | q = a; | ||
615 | if (q) | ||
616 | do_something(); | ||
617 | else | ||
618 | do_something_else(); | ||
619 | |||
620 | Worse yet, if the compiler is able to prove (say) that the value of | ||
621 | variable 'a' is always non-zero, it would be well within its rights | ||
622 | to optimize the original example by eliminating the "if" statement | ||
623 | as follows: | ||
624 | |||
625 | q = a; | ||
626 | b = p; /* BUG: Compiler can reorder!!! */ | ||
627 | do_something(); | ||
628 | |||
629 | The solution is again ACCESS_ONCE(), which preserves the ordering between | ||
630 | the load from variable 'a' and the store to variable 'b': | ||
631 | |||
632 | q = ACCESS_ONCE(a); | ||
633 | if (q) { | ||
634 | ACCESS_ONCE(b) = p; | ||
635 | do_something(); | ||
636 | } else { | ||
637 | ACCESS_ONCE(b) = p; | ||
638 | do_something_else(); | ||
639 | } | ||
640 | |||
641 | You could also use barrier() to prevent the compiler from moving | ||
642 | the stores to variable 'b', but barrier() would not prevent the | ||
643 | compiler from proving to itself that a==1 always, so ACCESS_ONCE() | ||
644 | is also needed. | ||
645 | |||
646 | It is important to note that control dependencies absolutely require a | ||
647 | a conditional. For example, the following "optimized" version of | ||
648 | the above example breaks ordering: | ||
649 | |||
650 | q = ACCESS_ONCE(a); | ||
651 | ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ | ||
652 | if (q) { | ||
653 | /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ | ||
654 | do_something(); | ||
655 | } else { | ||
656 | /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ | ||
657 | do_something_else(); | ||
658 | } | ||
659 | |||
660 | It is of course legal for the prior load to be part of the conditional, | ||
661 | for example, as follows: | ||
662 | |||
663 | if (ACCESS_ONCE(a) > 0) { | ||
664 | ACCESS_ONCE(b) = q / 2; | ||
665 | do_something(); | ||
666 | } else { | ||
667 | ACCESS_ONCE(b) = q / 3; | ||
668 | do_something_else(); | ||
669 | } | ||
670 | |||
671 | This will again ensure that the load from variable 'a' is ordered before the | ||
672 | stores to variable 'b'. | ||
673 | |||
674 | In addition, you need to be careful what you do with the local variable 'q', | ||
675 | otherwise the compiler might be able to guess the value and again remove | ||
676 | the needed conditional. For example: | ||
677 | |||
678 | q = ACCESS_ONCE(a); | ||
679 | if (q % MAX) { | ||
680 | ACCESS_ONCE(b) = p; | ||
681 | do_something(); | ||
682 | } else { | ||
683 | ACCESS_ONCE(b) = p; | ||
684 | do_something_else(); | ||
685 | } | ||
686 | |||
687 | If MAX is defined to be 1, then the compiler knows that (q % MAX) is | ||
688 | equal to zero, in which case the compiler is within its rights to | ||
689 | transform the above code into the following: | ||
690 | |||
691 | q = ACCESS_ONCE(a); | ||
692 | ACCESS_ONCE(b) = p; | ||
693 | do_something_else(); | ||
694 | |||
695 | This transformation loses the ordering between the load from variable 'a' | ||
696 | and the store to variable 'b'. If you are relying on this ordering, you | ||
697 | should do something like the following: | ||
698 | |||
699 | q = ACCESS_ONCE(a); | ||
700 | BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ | ||
701 | if (q % MAX) { | ||
702 | ACCESS_ONCE(b) = p; | ||
703 | do_something(); | ||
704 | } else { | ||
705 | ACCESS_ONCE(b) = p; | ||
706 | do_something_else(); | ||
707 | } | ||
708 | |||
709 | Finally, control dependencies do -not- provide transitivity. This is | ||
710 | demonstrated by two related examples: | ||
711 | |||
712 | CPU 0 CPU 1 | ||
713 | ===================== ===================== | ||
714 | r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(y); | ||
715 | if (r1 >= 0) if (r2 >= 0) | ||
716 | ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1; | ||
717 | |||
718 | assert(!(r1 == 1 && r2 == 1)); | ||
719 | |||
720 | The above two-CPU example will never trigger the assert(). However, | ||
721 | if control dependencies guaranteed transitivity (which they do not), | ||
722 | then adding the following two CPUs would guarantee a related assertion: | ||
723 | |||
724 | CPU 2 CPU 3 | ||
725 | ===================== ===================== | ||
726 | ACCESS_ONCE(x) = 2; ACCESS_ONCE(y) = 2; | ||
727 | |||
728 | assert(!(r1 == 2 && r2 == 2 && x == 1 && y == 1)); /* FAILS!!! */ | ||
729 | |||
730 | But because control dependencies do -not- provide transitivity, the | ||
731 | above assertion can fail after the combined four-CPU example completes. | ||
732 | If you need the four-CPU example to provide ordering, you will need | ||
733 | smp_mb() between the loads and stores in the CPU 0 and CPU 1 code fragments. | ||
734 | |||
735 | In summary: | ||
736 | |||
737 | (*) Control dependencies can order prior loads against later stores. | ||
738 | However, they do -not- guarantee any other sort of ordering: | ||
739 | Not prior loads against later loads, nor prior stores against | ||
740 | later anything. If you need these other forms of ordering, | ||
741 | use smb_rmb(), smp_wmb(), or, in the case of prior stores and | ||
742 | later loads, smp_mb(). | ||
743 | |||
744 | (*) Control dependencies require at least one run-time conditional | ||
745 | between the prior load and the subsequent store. If the compiler | ||
746 | is able to optimize the conditional away, it will have also | ||
747 | optimized away the ordering. Careful use of ACCESS_ONCE() can | ||
748 | help to preserve the needed conditional. | ||
749 | |||
750 | (*) Control dependencies require that the compiler avoid reordering the | ||
751 | dependency into nonexistence. Careful use of ACCESS_ONCE() or | ||
752 | barrier() can help to preserve your control dependency. | ||
753 | |||
754 | (*) Control dependencies do -not- provide transitivity. If you | ||
755 | need transitivity, use smp_mb(). | ||
592 | 756 | ||
593 | 757 | ||
594 | SMP BARRIER PAIRING | 758 | SMP BARRIER PAIRING |
@@ -1083,7 +1247,10 @@ compiler from moving the memory accesses either side of it to the other side: | |||
1083 | 1247 | ||
1084 | barrier(); | 1248 | barrier(); |
1085 | 1249 | ||
1086 | This is a general barrier - lesser varieties of compiler barrier do not exist. | 1250 | This is a general barrier -- there are no read-read or write-write variants |
1251 | of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form | ||
1252 | for barrier() that affects only the specific accesses flagged by the | ||
1253 | ACCESS_ONCE(). | ||
1087 | 1254 | ||
1088 | The compiler barrier has no direct effect on the CPU, which may then reorder | 1255 | The compiler barrier has no direct effect on the CPU, which may then reorder |
1089 | things however it wishes. | 1256 | things however it wishes. |