diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2014-02-23 11:34:24 -0500 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2014-02-24 11:37:29 -0500 |
commit | 8dd853d7b6efcabba631a590dad3ed55bba7f0f2 (patch) | |
tree | f45c0ba0f5fb37120d100f159858bf40d89da054 | |
parent | e4696a1d3b1125d427de685531a44258ea6263df (diff) |
Documentation/memory-barriers.txt: Clarify release/acquire ordering
This commit fixes a couple of typos and clarifies what happens when
the CPU chooses to execute a later lock acquisition before a prior
lock release, in particular, why deadlock is avoided.
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Reported-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-rw-r--r-- | Documentation/memory-barriers.txt | 91 |
1 files changed, 61 insertions, 30 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 9dde54c55b24..11c1d2049662 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers: | |||
1674 | Memory operations issued after the ACQUIRE will be completed after the | 1674 | Memory operations issued after the ACQUIRE will be completed after the |
1675 | ACQUIRE operation has completed. | 1675 | ACQUIRE operation has completed. |
1676 | 1676 | ||
1677 | Memory operations issued before the ACQUIRE may be completed after the | 1677 | Memory operations issued before the ACQUIRE may be completed after |
1678 | ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined | 1678 | the ACQUIRE operation has completed. An smp_mb__before_spinlock(), |
1679 | with a following ACQUIRE, orders prior loads against subsequent stores and | 1679 | combined with a following ACQUIRE, orders prior loads against |
1680 | stores and prior stores against subsequent stores. Note that this is | 1680 | subsequent loads and stores and also orders prior stores against |
1681 | weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on | 1681 | subsequent stores. Note that this is weaker than smp_mb()! The |
1682 | many architectures. | 1682 | smp_mb__before_spinlock() primitive is free on many architectures. |
1683 | 1683 | ||
1684 | (2) RELEASE operation implication: | 1684 | (2) RELEASE operation implication: |
1685 | 1685 | ||
@@ -1724,24 +1724,21 @@ may occur as: | |||
1724 | 1724 | ||
1725 | ACQUIRE M, STORE *B, STORE *A, RELEASE M | 1725 | ACQUIRE M, STORE *B, STORE *A, RELEASE M |
1726 | 1726 | ||
1727 | This same reordering can of course occur if the lock's ACQUIRE and RELEASE are | 1727 | When the ACQUIRE and RELEASE are a lock acquisition and release, |
1728 | to the same lock variable, but only from the perspective of another CPU not | 1728 | respectively, this same reordering can occur if the lock's ACQUIRE and |
1729 | holding that lock. | 1729 | RELEASE are to the same lock variable, but only from the perspective of |
1730 | 1730 | another CPU not holding that lock. In short, a ACQUIRE followed by an | |
1731 | In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full | 1731 | RELEASE may -not- be assumed to be a full memory barrier. |
1732 | memory barrier because it is possible for a preceding RELEASE to pass a | 1732 | |
1733 | later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint | 1733 | Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not |
1734 | of the compiler. Note that deadlocks cannot be introduced by this | 1734 | imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE |
1735 | interchange because if such a deadlock threatened, the RELEASE would | 1735 | pair to produce a full barrier, the ACQUIRE can be followed by an |
1736 | simply complete. | 1736 | smp_mb__after_unlock_lock() invocation. This will produce a full barrier |
1737 | 1737 | if either (a) the RELEASE and the ACQUIRE are executed by the same | |
1738 | If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the | 1738 | CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable. |
1739 | ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This | 1739 | The smp_mb__after_unlock_lock() primitive is free on many architectures. |
1740 | will produce a full barrier if either (a) the RELEASE and the ACQUIRE are | 1740 | Without smp_mb__after_unlock_lock(), the CPU's execution of the critical |
1741 | executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the | 1741 | sections corresponding to the RELEASE and the ACQUIRE can cross, so that: |
1742 | same variable. The smp_mb__after_unlock_lock() primitive is free on many | ||
1743 | architectures. Without smp_mb__after_unlock_lock(), the critical sections | ||
1744 | corresponding to the RELEASE and the ACQUIRE can cross: | ||
1745 | 1742 | ||
1746 | *A = a; | 1743 | *A = a; |
1747 | RELEASE M | 1744 | RELEASE M |
@@ -1752,7 +1749,36 @@ could occur as: | |||
1752 | 1749 | ||
1753 | ACQUIRE N, STORE *B, STORE *A, RELEASE M | 1750 | ACQUIRE N, STORE *B, STORE *A, RELEASE M |
1754 | 1751 | ||
1755 | With smp_mb__after_unlock_lock(), they cannot, so that: | 1752 | It might appear that this reordering could introduce a deadlock. |
1753 | However, this cannot happen because if such a deadlock threatened, | ||
1754 | the RELEASE would simply complete, thereby avoiding the deadlock. | ||
1755 | |||
1756 | Why does this work? | ||
1757 | |||
1758 | One key point is that we are only talking about the CPU doing | ||
1759 | the reordering, not the compiler. If the compiler (or, for | ||
1760 | that matter, the developer) switched the operations, deadlock | ||
1761 | -could- occur. | ||
1762 | |||
1763 | But suppose the CPU reordered the operations. In this case, | ||
1764 | the unlock precedes the lock in the assembly code. The CPU | ||
1765 | simply elected to try executing the later lock operation first. | ||
1766 | If there is a deadlock, this lock operation will simply spin (or | ||
1767 | try to sleep, but more on that later). The CPU will eventually | ||
1768 | execute the unlock operation (which preceded the lock operation | ||
1769 | in the assembly code), which will unravel the potential deadlock, | ||
1770 | allowing the lock operation to succeed. | ||
1771 | |||
1772 | But what if the lock is a sleeplock? In that case, the code will | ||
1773 | try to enter the scheduler, where it will eventually encounter | ||
1774 | a memory barrier, which will force the earlier unlock operation | ||
1775 | to complete, again unraveling the deadlock. There might be | ||
1776 | a sleep-unlock race, but the locking primitive needs to resolve | ||
1777 | such races properly in any case. | ||
1778 | |||
1779 | With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. | ||
1780 | For example, with the following code, the store to *A will always be | ||
1781 | seen by other CPUs before the store to *B: | ||
1756 | 1782 | ||
1757 | *A = a; | 1783 | *A = a; |
1758 | RELEASE M | 1784 | RELEASE M |
@@ -1760,13 +1786,18 @@ With smp_mb__after_unlock_lock(), they cannot, so that: | |||
1760 | smp_mb__after_unlock_lock(); | 1786 | smp_mb__after_unlock_lock(); |
1761 | *B = b; | 1787 | *B = b; |
1762 | 1788 | ||
1763 | will always occur as either of the following: | 1789 | The operations will always occur in one of the following orders: |
1764 | 1790 | ||
1765 | STORE *A, RELEASE, ACQUIRE, STORE *B | 1791 | STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B |
1766 | STORE *A, ACQUIRE, RELEASE, STORE *B | 1792 | STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B |
1793 | ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B | ||
1767 | 1794 | ||
1768 | If the RELEASE and ACQUIRE were instead both operating on the same lock | 1795 | If the RELEASE and ACQUIRE were instead both operating on the same lock |
1769 | variable, only the first of these two alternatives can occur. | 1796 | variable, only the first of these alternatives can occur. In addition, |
1797 | the more strongly ordered systems may rule out some of the above orders. | ||
1798 | But in any case, as noted earlier, the smp_mb__after_unlock_lock() | ||
1799 | ensures that the store to *A will always be seen as happening before | ||
1800 | the store to *B. | ||
1770 | 1801 | ||
1771 | Locks and semaphores may not provide any guarantee of ordering on UP compiled | 1802 | Locks and semaphores may not provide any guarantee of ordering on UP compiled |
1772 | systems, and so cannot be counted on in such a situation to actually achieve | 1803 | systems, and so cannot be counted on in such a situation to actually achieve |
@@ -2787,7 +2818,7 @@ in that order, but, without intervention, the sequence may have almost any | |||
2787 | combination of elements combined or discarded, provided the program's view of | 2818 | combination of elements combined or discarded, provided the program's view of |
2788 | the world remains consistent. Note that ACCESS_ONCE() is -not- optional | 2819 | the world remains consistent. Note that ACCESS_ONCE() is -not- optional |
2789 | in the above example, as there are architectures where a given CPU might | 2820 | in the above example, as there are architectures where a given CPU might |
2790 | interchange successive loads to the same location. On such architectures, | 2821 | reorder successive loads to the same location. On such architectures, |
2791 | ACCESS_ONCE() does whatever is necessary to prevent this, for example, on | 2822 | ACCESS_ONCE() does whatever is necessary to prevent this, for example, on |
2792 | Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the | 2823 | Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the |
2793 | special ld.acq and st.rel instructions that prevent such reordering. | 2824 | special ld.acq and st.rel instructions that prevent such reordering. |