aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/memory-barriers.txt141
1 files changed, 116 insertions, 25 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 904ee42d078e..3729cbe60e41 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -232,7 +232,7 @@ And there are a number of things that _must_ or _must_not_ be assumed:
232 with memory references that are not protected by READ_ONCE() and 232 with memory references that are not protected by READ_ONCE() and
233 WRITE_ONCE(). Without them, the compiler is within its rights to 233 WRITE_ONCE(). Without them, the compiler is within its rights to
234 do all sorts of "creative" transformations, which are covered in 234 do all sorts of "creative" transformations, which are covered in
235 the Compiler Barrier section. 235 the COMPILER BARRIER section.
236 236
237 (*) It _must_not_ be assumed that independent loads and stores will be issued 237 (*) It _must_not_ be assumed that independent loads and stores will be issued
238 in the order given. This means that for: 238 in the order given. This means that for:
@@ -555,6 +555,30 @@ between the address load and the data load:
555This enforces the occurrence of one of the two implications, and prevents the 555This enforces the occurrence of one of the two implications, and prevents the
556third possibility from arising. 556third possibility from arising.
557 557
558A data-dependency barrier must also order against dependent writes:
559
560 CPU 1 CPU 2
561 =============== ===============
562 { A == 1, B == 2, C = 3, P == &A, Q == &C }
563 B = 4;
564 <write barrier>
565 WRITE_ONCE(P, &B);
566 Q = READ_ONCE(P);
567 <data dependency barrier>
568 *Q = 5;
569
570The data-dependency barrier must order the read into Q with the store
571into *Q. This prohibits this outcome:
572
573 (Q == B) && (B == 4)
574
575Please note that this pattern should be rare. After all, the whole point
576of dependency ordering is to -prevent- writes to the data structure, along
577with the expensive cache misses associated with those writes. This pattern
578can be used to record rare error conditions and the like, and the ordering
579prevents such records from being lost.
580
581
558[!] Note that this extremely counterintuitive situation arises most easily on 582[!] Note that this extremely counterintuitive situation arises most easily on
559machines with split caches, so that, for example, one cache bank processes 583machines with split caches, so that, for example, one cache bank processes
560even-numbered cache lines and the other bank processes odd-numbered cache 584even-numbered cache lines and the other bank processes odd-numbered cache
@@ -565,21 +589,6 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
565but the old value of the variable B (2). 589but the old value of the variable B (2).
566 590
567 591
568Another example of where data dependency barriers might be required is where a
569number is read from memory and then used to calculate the index for an array
570access:
571
572 CPU 1 CPU 2
573 =============== ===============
574 { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
575 M[1] = 4;
576 <write barrier>
577 WRITE_ONCE(P, 1);
578 Q = READ_ONCE(P);
579 <data dependency barrier>
580 D = M[Q];
581
582
583The data dependency barrier is very important to the RCU system, 592The data dependency barrier is very important to the RCU system,
584for example. See rcu_assign_pointer() and rcu_dereference() in 593for example. See rcu_assign_pointer() and rcu_dereference() in
585include/linux/rcupdate.h. This permits the current target of an RCU'd 594include/linux/rcupdate.h. This permits the current target of an RCU'd
@@ -800,9 +809,13 @@ In summary:
800 use smp_rmb(), smp_wmb(), or, in the case of prior stores and 809 use smp_rmb(), smp_wmb(), or, in the case of prior stores and
801 later loads, smp_mb(). 810 later loads, smp_mb().
802 811
803 (*) If both legs of the "if" statement begin with identical stores 812 (*) If both legs of the "if" statement begin with identical stores to
804 to the same variable, a barrier() statement is required at the 813 the same variable, then those stores must be ordered, either by
805 beginning of each leg of the "if" statement. 814 preceding both of them with smp_mb() or by using smp_store_release()
815 to carry out the stores. Please note that it is -not- sufficient
816 to use barrier() at beginning of each leg of the "if" statement,
817 as optimizing compilers do not necessarily respect barrier()
818 in this case.
806 819
807 (*) Control dependencies require at least one run-time conditional 820 (*) Control dependencies require at least one run-time conditional
808 between the prior load and the subsequent store, and this 821 between the prior load and the subsequent store, and this
@@ -814,7 +827,7 @@ In summary:
814 (*) Control dependencies require that the compiler avoid reordering the 827 (*) Control dependencies require that the compiler avoid reordering the
815 dependency into nonexistence. Careful use of READ_ONCE() or 828 dependency into nonexistence. Careful use of READ_ONCE() or
816 atomic{,64}_read() can help to preserve your control dependency. 829 atomic{,64}_read() can help to preserve your control dependency.
817 Please see the Compiler Barrier section for more information. 830 Please see the COMPILER BARRIER section for more information.
818 831
819 (*) Control dependencies pair normally with other types of barriers. 832 (*) Control dependencies pair normally with other types of barriers.
820 833
@@ -1257,7 +1270,7 @@ TRANSITIVITY
1257 1270
1258Transitivity is a deeply intuitive notion about ordering that is not 1271Transitivity is a deeply intuitive notion about ordering that is not
1259always provided by real computer systems. The following example 1272always provided by real computer systems. The following example
1260demonstrates transitivity (also called "cumulativity"): 1273demonstrates transitivity:
1261 1274
1262 CPU 1 CPU 2 CPU 3 1275 CPU 1 CPU 2 CPU 3
1263 ======================= ======================= ======================= 1276 ======================= ======================= =======================
@@ -1305,8 +1318,86 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
1305General barriers are therefore required to ensure that all CPUs agree 1318General barriers are therefore required to ensure that all CPUs agree
1306on the combined order of CPU 1's and CPU 2's accesses. 1319on the combined order of CPU 1's and CPU 2's accesses.
1307 1320
1308To reiterate, if your code requires transitivity, use general barriers 1321General barriers provide "global transitivity", so that all CPUs will
1309throughout. 1322agree on the order of operations. In contrast, a chain of release-acquire
1323pairs provides only "local transitivity", so that only those CPUs on
1324the chain are guaranteed to agree on the combined order of the accesses.
1325For example, switching to C code in deference to Herman Hollerith:
1326
1327 int u, v, x, y, z;
1328
1329 void cpu0(void)
1330 {
1331 r0 = smp_load_acquire(&x);
1332 WRITE_ONCE(u, 1);
1333 smp_store_release(&y, 1);
1334 }
1335
1336 void cpu1(void)
1337 {
1338 r1 = smp_load_acquire(&y);
1339 r4 = READ_ONCE(v);
1340 r5 = READ_ONCE(u);
1341 smp_store_release(&z, 1);
1342 }
1343
1344 void cpu2(void)
1345 {
1346 r2 = smp_load_acquire(&z);
1347 smp_store_release(&x, 1);
1348 }
1349
1350 void cpu3(void)
1351 {
1352 WRITE_ONCE(v, 1);
1353 smp_mb();
1354 r3 = READ_ONCE(u);
1355 }
1356
1357Because cpu0(), cpu1(), and cpu2() participate in a local transitive
1358chain of smp_store_release()/smp_load_acquire() pairs, the following
1359outcome is prohibited:
1360
1361 r0 == 1 && r1 == 1 && r2 == 1
1362
1363Furthermore, because of the release-acquire relationship between cpu0()
1364and cpu1(), cpu1() must see cpu0()'s writes, so that the following
1365outcome is prohibited:
1366
1367 r1 == 1 && r5 == 0
1368
1369However, the transitivity of release-acquire is local to the participating
1370CPUs and does not apply to cpu3(). Therefore, the following outcome
1371is possible:
1372
1373 r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
1374
1375As an aside, the following outcome is also possible:
1376
1377 r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1
1378
1379Although cpu0(), cpu1(), and cpu2() will see their respective reads and
1380writes in order, CPUs not involved in the release-acquire chain might
1381well disagree on the order. This disagreement stems from the fact that
1382the weak memory-barrier instructions used to implement smp_load_acquire()
1383and smp_store_release() are not required to order prior stores against
1384subsequent loads in all cases. This means that cpu3() can see cpu0()'s
1385store to u as happening -after- cpu1()'s load from v, even though
1386both cpu0() and cpu1() agree that these two operations occurred in the
1387intended order.
1388
1389However, please keep in mind that smp_load_acquire() is not magic.
1390In particular, it simply reads from its argument with ordering. It does
1391-not- ensure that any particular value will be read. Therefore, the
1392following outcome is possible:
1393
1394 r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
1395
1396Note that this outcome can happen even on a mythical sequentially
1397consistent system where nothing is ever reordered.
1398
1399To reiterate, if your code requires global transitivity, use general
1400barriers throughout.
1310 1401
1311 1402
1312======================== 1403========================
@@ -1459,7 +1550,7 @@ of optimizations:
1459 the following: 1550 the following:
1460 1551
1461 a = 0; 1552 a = 0;
1462 /* Code that does not store to variable a. */ 1553 ... Code that does not store to variable a ...
1463 a = 0; 1554 a = 0;
1464 1555
1465 The compiler sees that the value of variable 'a' is already zero, so 1556 The compiler sees that the value of variable 'a' is already zero, so
@@ -1471,7 +1562,7 @@ of optimizations:
1471 wrong guess: 1562 wrong guess:
1472 1563
1473 WRITE_ONCE(a, 0); 1564 WRITE_ONCE(a, 0);
1474 /* Code that does not store to variable a. */ 1565 ... Code that does not store to variable a ...
1475 WRITE_ONCE(a, 0); 1566 WRITE_ONCE(a, 0);
1476 1567
1477 (*) The compiler is within its rights to reorder memory accesses unless 1568 (*) The compiler is within its rights to reorder memory accesses unless