summaryrefslogtreecommitdiffstats
path: root/Documentation/memory-barriers.txt
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@linux.vnet.ibm.com>2013-12-11 16:59:09 -0500
committerIngo Molnar <mingo@kernel.org>2013-12-16 05:36:15 -0500
commit17eb88e068430014deb709e5af34197cdf2390c9 (patch)
tree869d7c1e27ff7eeb2b0b846b8f844d32ac375222 /Documentation/memory-barriers.txt
parent01352fb81658cbf78c55844de8e3d1d606bbf3f8 (diff)
Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK
Historically, an UNLOCK+LOCK pair executed by one CPU, by one task, or on a given lock variable has implied a full memory barrier. In a recent LKML thread, the wisdom of this historical approach was called into question: http://www.spinics.net/lists/linux-mm/msg65653.html, in part due to the memory-order complexities of low-handoff-overhead queued locks on x86 systems. This patch therefore removes this guarantee from the documentation, and further documents how to restore it via a new smp_mb__after_unlock_lock() primitive. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-6-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r--Documentation/memory-barriers.txt84
1 files changed, 69 insertions, 15 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 919fd604969d..cb753c8158f2 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -402,12 +402,18 @@ And a couple of implicit varieties:
402 Memory operations that occur after an UNLOCK operation may appear to 402 Memory operations that occur after an UNLOCK operation may appear to
403 happen before it completes. 403 happen before it completes.
404 404
405 LOCK and UNLOCK operations are guaranteed to appear with respect to each
406 other strictly in the order specified.
407
408 The use of LOCK and UNLOCK operations generally precludes the need for 405 The use of LOCK and UNLOCK operations generally precludes the need for
409 other sorts of memory barrier (but note the exceptions mentioned in the 406 other sorts of memory barrier (but note the exceptions mentioned in the
410 subsection "MMIO write barrier"). 407 subsection "MMIO write barrier"). In addition, an UNLOCK+LOCK pair
408 is -not- guaranteed to act as a full memory barrier. However,
409 after a LOCK on a given lock variable, all memory accesses preceding any
410 prior UNLOCK on that same variable are guaranteed to be visible.
411 In other words, within a given lock variable's critical section,
412 all accesses of all previous critical sections for that lock variable
413 are guaranteed to have completed.
414
415 This means that LOCK acts as a minimal "acquire" operation and
416 UNLOCK acts as a minimal "release" operation.
411 417
412 418
413Memory barriers are only required where there's a possibility of interaction 419Memory barriers are only required where there's a possibility of interaction
@@ -1633,8 +1639,12 @@ for each construct. These operations all imply certain barriers:
1633 Memory operations issued after the LOCK will be completed after the LOCK 1639 Memory operations issued after the LOCK will be completed after the LOCK
1634 operation has completed. 1640 operation has completed.
1635 1641
1636 Memory operations issued before the LOCK may be completed after the LOCK 1642 Memory operations issued before the LOCK may be completed after the
1637 operation has completed. 1643 LOCK operation has completed. An smp_mb__before_spinlock(), combined
1644 with a following LOCK, orders prior loads against subsequent stores
1645 and stores and prior stores against subsequent stores. Note that
1646 this is weaker than smp_mb()! The smp_mb__before_spinlock()
1647 primitive is free on many architectures.
1638 1648
1639 (2) UNLOCK operation implication: 1649 (2) UNLOCK operation implication:
1640 1650
@@ -1654,9 +1664,6 @@ for each construct. These operations all imply certain barriers:
1654 All LOCK operations issued before an UNLOCK operation will be completed 1664 All LOCK operations issued before an UNLOCK operation will be completed
1655 before the UNLOCK operation. 1665 before the UNLOCK operation.
1656 1666
1657 All UNLOCK operations issued before a LOCK operation will be completed
1658 before the LOCK operation.
1659
1660 (5) Failed conditional LOCK implication: 1667 (5) Failed conditional LOCK implication:
1661 1668
1662 Certain variants of the LOCK operation may fail, either due to being 1669 Certain variants of the LOCK operation may fail, either due to being
@@ -1664,9 +1671,6 @@ for each construct. These operations all imply certain barriers:
1664 signal whilst asleep waiting for the lock to become available. Failed 1671 signal whilst asleep waiting for the lock to become available. Failed
1665 locks do not imply any sort of barrier. 1672 locks do not imply any sort of barrier.
1666 1673
1667Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is
1668equivalent to a full barrier, but a LOCK followed by an UNLOCK is not.
1669
1670[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way 1674[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way
1671 barriers is that the effects of instructions outside of a critical section 1675 barriers is that the effects of instructions outside of a critical section
1672 may seep into the inside of the critical section. 1676 may seep into the inside of the critical section.
@@ -1677,13 +1681,57 @@ LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
1677two accesses can themselves then cross: 1681two accesses can themselves then cross:
1678 1682
1679 *A = a; 1683 *A = a;
1680 LOCK 1684 LOCK M
1681 UNLOCK 1685 UNLOCK M
1682 *B = b; 1686 *B = b;
1683 1687
1684may occur as: 1688may occur as:
1685 1689
1686 LOCK, STORE *B, STORE *A, UNLOCK 1690 LOCK M, STORE *B, STORE *A, UNLOCK M
1691
1692This same reordering can of course occur if the LOCK and UNLOCK are
1693to the same lock variable, but only from the perspective of another
1694CPU not holding that lock.
1695
1696In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
1697memory barrier because it is possible for a preceding UNLOCK to pass a
1698later LOCK from the viewpoint of the CPU, but not from the viewpoint
1699of the compiler. Note that deadlocks cannot be introduced by this
1700interchange because if such a deadlock threatened, the UNLOCK would
1701simply complete.
1702
1703If it is necessary for an UNLOCK-LOCK pair to produce a full barrier,
1704the LOCK can be followed by an smp_mb__after_unlock_lock() invocation.
1705This will produce a full barrier if either (a) the UNLOCK and the LOCK
1706are executed by the same CPU or task, or (b) the UNLOCK and LOCK act
1707on the same lock variable. The smp_mb__after_unlock_lock() primitive
1708is free on many architectures. Without smp_mb__after_unlock_lock(),
1709the critical sections corresponding to the UNLOCK and the LOCK can cross:
1710
1711 *A = a;
1712 UNLOCK M
1713 LOCK N
1714 *B = b;
1715
1716could occur as:
1717
1718 LOCK N, STORE *B, STORE *A, UNLOCK M
1719
1720With smp_mb__after_unlock_lock(), they cannot, so that:
1721
1722 *A = a;
1723 UNLOCK M
1724 LOCK N
1725 smp_mb__after_unlock_lock();
1726 *B = b;
1727
1728will always occur as either of the following:
1729
1730 STORE *A, UNLOCK, LOCK, STORE *B
1731 STORE *A, LOCK, UNLOCK, STORE *B
1732
1733If the UNLOCK and LOCK were instead both operating on the same lock
1734variable, only the first of these two alternatives can occur.
1687 1735
1688Locks and semaphores may not provide any guarantee of ordering on UP compiled 1736Locks and semaphores may not provide any guarantee of ordering on UP compiled
1689systems, and so cannot be counted on in such a situation to actually achieve 1737systems, and so cannot be counted on in such a situation to actually achieve
@@ -1911,6 +1959,7 @@ However, if the following occurs:
1911 UNLOCK M [1] 1959 UNLOCK M [1]
1912 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e; 1960 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e;
1913 LOCK M [2] 1961 LOCK M [2]
1962 smp_mb__after_unlock_lock();
1914 ACCESS_ONCE(*F) = f; 1963 ACCESS_ONCE(*F) = f;
1915 ACCESS_ONCE(*G) = g; 1964 ACCESS_ONCE(*G) = g;
1916 UNLOCK M [2] 1965 UNLOCK M [2]
@@ -1928,6 +1977,11 @@ But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
1928 *F, *G or *H preceding LOCK M [2] 1977 *F, *G or *H preceding LOCK M [2]
1929 *A, *B, *C, *E, *F or *G following UNLOCK M [2] 1978 *A, *B, *C, *E, *F or *G following UNLOCK M [2]
1930 1979
1980Note that the smp_mb__after_unlock_lock() is critically important
1981here: Without it CPU 3 might see some of the above orderings.
1982Without smp_mb__after_unlock_lock(), the accesses are not guaranteed
1983to be seen in order unless CPU 3 holds lock M.
1984
1931 1985
1932LOCKS VS I/O ACCESSES 1986LOCKS VS I/O ACCESSES
1933--------------------- 1987---------------------