diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2016-12-13 19:42:32 -0500 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2017-01-15 00:29:15 -0500 |
commit | c8241f8553e888199283c75ab89ef6b092b2afd2 (patch) | |
tree | d9ec5fc28d706ed249c9e91c5eda0b5890fe8011 /Documentation/memory-barriers.txt | |
parent | 526914a0aee7c5cfc5b27e6cc2fe1fae15248d79 (diff) |
doc: Update control-dependencies section of memory-barriers.txt
This commit adds consistency to examples, formatting, and a couple of
additional warnings.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r-- | Documentation/memory-barriers.txt | 70 |
1 files changed, 38 insertions, 32 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index ba818ecce6f9..d2b0a8d81258 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -640,6 +640,10 @@ See also the subsection on "Cache Coherency" for a more thorough example. | |||
640 | CONTROL DEPENDENCIES | 640 | CONTROL DEPENDENCIES |
641 | -------------------- | 641 | -------------------- |
642 | 642 | ||
643 | Control dependencies can be a bit tricky because current compilers do | ||
644 | not understand them. The purpose of this section is to help you prevent | ||
645 | the compiler's ignorance from breaking your code. | ||
646 | |||
643 | A load-load control dependency requires a full read memory barrier, not | 647 | A load-load control dependency requires a full read memory barrier, not |
644 | simply a data dependency barrier to make it work correctly. Consider the | 648 | simply a data dependency barrier to make it work correctly. Consider the |
645 | following bit of code: | 649 | following bit of code: |
@@ -667,14 +671,15 @@ for load-store control dependencies, as in the following example: | |||
667 | 671 | ||
668 | q = READ_ONCE(a); | 672 | q = READ_ONCE(a); |
669 | if (q) { | 673 | if (q) { |
670 | WRITE_ONCE(b, p); | 674 | WRITE_ONCE(b, 1); |
671 | } | 675 | } |
672 | 676 | ||
673 | Control dependencies pair normally with other types of barriers. That | 677 | Control dependencies pair normally with other types of barriers. |
674 | said, please note that READ_ONCE() is not optional! Without the | 678 | That said, please note that neither READ_ONCE() nor WRITE_ONCE() |
675 | READ_ONCE(), the compiler might combine the load from 'a' with other | 679 | are optional! Without the READ_ONCE(), the compiler might combine the |
676 | loads from 'a', and the store to 'b' with other stores to 'b', with | 680 | load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), |
677 | possible highly counterintuitive effects on ordering. | 681 | the compiler might combine the store to 'b' with other stores to 'b'. |
682 | Either can result in highly counterintuitive effects on ordering. | ||
678 | 683 | ||
679 | Worse yet, if the compiler is able to prove (say) that the value of | 684 | Worse yet, if the compiler is able to prove (say) that the value of |
680 | variable 'a' is always non-zero, it would be well within its rights | 685 | variable 'a' is always non-zero, it would be well within its rights |
@@ -682,7 +687,7 @@ to optimize the original example by eliminating the "if" statement | |||
682 | as follows: | 687 | as follows: |
683 | 688 | ||
684 | q = a; | 689 | q = a; |
685 | b = p; /* BUG: Compiler and CPU can both reorder!!! */ | 690 | b = 1; /* BUG: Compiler and CPU can both reorder!!! */ |
686 | 691 | ||
687 | So don't leave out the READ_ONCE(). | 692 | So don't leave out the READ_ONCE(). |
688 | 693 | ||
@@ -692,11 +697,11 @@ branches of the "if" statement as follows: | |||
692 | q = READ_ONCE(a); | 697 | q = READ_ONCE(a); |
693 | if (q) { | 698 | if (q) { |
694 | barrier(); | 699 | barrier(); |
695 | WRITE_ONCE(b, p); | 700 | WRITE_ONCE(b, 1); |
696 | do_something(); | 701 | do_something(); |
697 | } else { | 702 | } else { |
698 | barrier(); | 703 | barrier(); |
699 | WRITE_ONCE(b, p); | 704 | WRITE_ONCE(b, 1); |
700 | do_something_else(); | 705 | do_something_else(); |
701 | } | 706 | } |
702 | 707 | ||
@@ -705,12 +710,12 @@ optimization levels: | |||
705 | 710 | ||
706 | q = READ_ONCE(a); | 711 | q = READ_ONCE(a); |
707 | barrier(); | 712 | barrier(); |
708 | WRITE_ONCE(b, p); /* BUG: No ordering vs. load from a!!! */ | 713 | WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */ |
709 | if (q) { | 714 | if (q) { |
710 | /* WRITE_ONCE(b, p); -- moved up, BUG!!! */ | 715 | /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ |
711 | do_something(); | 716 | do_something(); |
712 | } else { | 717 | } else { |
713 | /* WRITE_ONCE(b, p); -- moved up, BUG!!! */ | 718 | /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ |
714 | do_something_else(); | 719 | do_something_else(); |
715 | } | 720 | } |
716 | 721 | ||
@@ -723,10 +728,10 @@ memory barriers, for example, smp_store_release(): | |||
723 | 728 | ||
724 | q = READ_ONCE(a); | 729 | q = READ_ONCE(a); |
725 | if (q) { | 730 | if (q) { |
726 | smp_store_release(&b, p); | 731 | smp_store_release(&b, 1); |
727 | do_something(); | 732 | do_something(); |
728 | } else { | 733 | } else { |
729 | smp_store_release(&b, p); | 734 | smp_store_release(&b, 1); |
730 | do_something_else(); | 735 | do_something_else(); |
731 | } | 736 | } |
732 | 737 | ||
@@ -735,10 +740,10 @@ ordering is guaranteed only when the stores differ, for example: | |||
735 | 740 | ||
736 | q = READ_ONCE(a); | 741 | q = READ_ONCE(a); |
737 | if (q) { | 742 | if (q) { |
738 | WRITE_ONCE(b, p); | 743 | WRITE_ONCE(b, 1); |
739 | do_something(); | 744 | do_something(); |
740 | } else { | 745 | } else { |
741 | WRITE_ONCE(b, r); | 746 | WRITE_ONCE(b, 2); |
742 | do_something_else(); | 747 | do_something_else(); |
743 | } | 748 | } |
744 | 749 | ||
@@ -751,10 +756,10 @@ the needed conditional. For example: | |||
751 | 756 | ||
752 | q = READ_ONCE(a); | 757 | q = READ_ONCE(a); |
753 | if (q % MAX) { | 758 | if (q % MAX) { |
754 | WRITE_ONCE(b, p); | 759 | WRITE_ONCE(b, 1); |
755 | do_something(); | 760 | do_something(); |
756 | } else { | 761 | } else { |
757 | WRITE_ONCE(b, r); | 762 | WRITE_ONCE(b, 2); |
758 | do_something_else(); | 763 | do_something_else(); |
759 | } | 764 | } |
760 | 765 | ||
@@ -763,7 +768,7 @@ equal to zero, in which case the compiler is within its rights to | |||
763 | transform the above code into the following: | 768 | transform the above code into the following: |
764 | 769 | ||
765 | q = READ_ONCE(a); | 770 | q = READ_ONCE(a); |
766 | WRITE_ONCE(b, p); | 771 | WRITE_ONCE(b, 1); |
767 | do_something_else(); | 772 | do_something_else(); |
768 | 773 | ||
769 | Given this transformation, the CPU is not required to respect the ordering | 774 | Given this transformation, the CPU is not required to respect the ordering |
@@ -776,10 +781,10 @@ one, perhaps as follows: | |||
776 | q = READ_ONCE(a); | 781 | q = READ_ONCE(a); |
777 | BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ | 782 | BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ |
778 | if (q % MAX) { | 783 | if (q % MAX) { |
779 | WRITE_ONCE(b, p); | 784 | WRITE_ONCE(b, 1); |
780 | do_something(); | 785 | do_something(); |
781 | } else { | 786 | } else { |
782 | WRITE_ONCE(b, r); | 787 | WRITE_ONCE(b, 2); |
783 | do_something_else(); | 788 | do_something_else(); |
784 | } | 789 | } |
785 | 790 | ||
@@ -812,30 +817,28 @@ not necessarily apply to code following the if-statement: | |||
812 | 817 | ||
813 | q = READ_ONCE(a); | 818 | q = READ_ONCE(a); |
814 | if (q) { | 819 | if (q) { |
815 | WRITE_ONCE(b, p); | 820 | WRITE_ONCE(b, 1); |
816 | } else { | 821 | } else { |
817 | WRITE_ONCE(b, r); | 822 | WRITE_ONCE(b, 2); |
818 | } | 823 | } |
819 | WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */ | 824 | WRITE_ONCE(c, 1); /* BUG: No ordering against the read from 'a'. */ |
820 | 825 | ||
821 | It is tempting to argue that there in fact is ordering because the | 826 | It is tempting to argue that there in fact is ordering because the |
822 | compiler cannot reorder volatile accesses and also cannot reorder | 827 | compiler cannot reorder volatile accesses and also cannot reorder |
823 | the writes to "b" with the condition. Unfortunately for this line | 828 | the writes to 'b' with the condition. Unfortunately for this line |
824 | of reasoning, the compiler might compile the two writes to "b" as | 829 | of reasoning, the compiler might compile the two writes to 'b' as |
825 | conditional-move instructions, as in this fanciful pseudo-assembly | 830 | conditional-move instructions, as in this fanciful pseudo-assembly |
826 | language: | 831 | language: |
827 | 832 | ||
828 | ld r1,a | 833 | ld r1,a |
829 | ld r2,p | ||
830 | ld r3,r | ||
831 | cmp r1,$0 | 834 | cmp r1,$0 |
832 | cmov,ne r4,r2 | 835 | cmov,ne r4,$1 |
833 | cmov,eq r4,r3 | 836 | cmov,eq r4,$2 |
834 | st r4,b | 837 | st r4,b |
835 | st $1,c | 838 | st $1,c |
836 | 839 | ||
837 | A weakly ordered CPU would have no dependency of any sort between the load | 840 | A weakly ordered CPU would have no dependency of any sort between the load |
838 | from "a" and the store to "c". The control dependencies would extend | 841 | from 'a' and the store to 'c'. The control dependencies would extend |
839 | only to the pair of cmov instructions and the store depending on them. | 842 | only to the pair of cmov instructions and the store depending on them. |
840 | In short, control dependencies apply only to the stores in the then-clause | 843 | In short, control dependencies apply only to the stores in the then-clause |
841 | and else-clause of the if-statement in question (including functions | 844 | and else-clause of the if-statement in question (including functions |
@@ -843,7 +846,7 @@ invoked by those two clauses), not to code following that if-statement. | |||
843 | 846 | ||
844 | Finally, control dependencies do -not- provide transitivity. This is | 847 | Finally, control dependencies do -not- provide transitivity. This is |
845 | demonstrated by two related examples, with the initial values of | 848 | demonstrated by two related examples, with the initial values of |
846 | x and y both being zero: | 849 | 'x' and 'y' both being zero: |
847 | 850 | ||
848 | CPU 0 CPU 1 | 851 | CPU 0 CPU 1 |
849 | ======================= ======================= | 852 | ======================= ======================= |
@@ -915,6 +918,9 @@ In summary: | |||
915 | (*) Control dependencies do -not- provide transitivity. If you | 918 | (*) Control dependencies do -not- provide transitivity. If you |
916 | need transitivity, use smp_mb(). | 919 | need transitivity, use smp_mb(). |
917 | 920 | ||
921 | (*) Compilers do not understand control dependencies. It is therefore | ||
922 | your job to ensure that they do not break your code. | ||
923 | |||
918 | 924 | ||
919 | SMP BARRIER PAIRING | 925 | SMP BARRIER PAIRING |
920 | ------------------- | 926 | ------------------- |