aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/memory-barriers.txt
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@linux.vnet.ibm.com>2013-12-11 16:59:07 -0500
committerIngo Molnar <mingo@kernel.org>2013-12-16 05:36:12 -0500
commit692118dac47e65f5131686b1103ebfebf0cbfa8e (patch)
tree1bba0afd9809bec501d5ec65d43ad3bcbb2e0574 /Documentation/memory-barriers.txt
parent18c03c61444a211237f3d4782353cb38dba795df (diff)
Documentation/memory-barriers.txt: Document ACCESS_ONCE()
The situations in which ACCESS_ONCE() is required are not well documented, so this commit adds some verbiage to memory-barriers.txt. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-4-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r--Documentation/memory-barriers.txt306
1 files changed, 271 insertions, 35 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index deafa36aeea1..919fd604969d 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -231,37 +231,8 @@ And there are a number of things that _must_ or _must_not_ be assumed:
231 (*) It _must_not_ be assumed that the compiler will do what you want with 231 (*) It _must_not_ be assumed that the compiler will do what you want with
232 memory references that are not protected by ACCESS_ONCE(). Without 232 memory references that are not protected by ACCESS_ONCE(). Without
233 ACCESS_ONCE(), the compiler is within its rights to do all sorts 233 ACCESS_ONCE(), the compiler is within its rights to do all sorts
234 of "creative" transformations: 234 of "creative" transformations, which are covered in the Compiler
235 235 Barrier section.
236 (-) Repeat the load, possibly getting a different value on the second
237 and subsequent loads. This is especially prone to happen when
238 register pressure is high.
239
240 (-) Merge adjacent loads and stores to the same location. The most
241 familiar example is the transformation from:
242
243 while (a)
244 do_something();
245
246 to something like:
247
248 if (a)
249 for (;;)
250 do_something();
251
252 Using ACCESS_ONCE() as follows prevents this sort of optimization:
253
254 while (ACCESS_ONCE(a))
255 do_something();
256
257 (-) "Store tearing", where a single store in the source code is split
258 into smaller stores in the object code. Note that gcc really
259 will do this on some architectures when storing certain constants.
260 It can be cheaper to do a series of immediate stores than to
261 form the constant in a register and then to store that register.
262
263 (-) "Load tearing", which splits loads in a manner analogous to
264 store tearing.
265 236
266 (*) It _must_not_ be assumed that independent loads and stores will be issued 237 (*) It _must_not_ be assumed that independent loads and stores will be issued
267 in the order given. This means that for: 238 in the order given. This means that for:
@@ -749,7 +720,8 @@ In summary:
749 720
750 (*) Control dependencies require that the compiler avoid reordering the 721 (*) Control dependencies require that the compiler avoid reordering the
751 dependency into nonexistence. Careful use of ACCESS_ONCE() or 722 dependency into nonexistence. Careful use of ACCESS_ONCE() or
752 barrier() can help to preserve your control dependency. 723 barrier() can help to preserve your control dependency. Please
724 see the Compiler Barrier section for more information.
753 725
754 (*) Control dependencies do -not- provide transitivity. If you 726 (*) Control dependencies do -not- provide transitivity. If you
755 need transitivity, use smp_mb(). 727 need transitivity, use smp_mb().
@@ -1248,12 +1220,276 @@ compiler from moving the memory accesses either side of it to the other side:
1248 barrier(); 1220 barrier();
1249 1221
1250This is a general barrier -- there are no read-read or write-write variants 1222This is a general barrier -- there are no read-read or write-write variants
1251of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form 1223of barrier(). However, ACCESS_ONCE() can be thought of as a weak form
1252for barrier() that affects only the specific accesses flagged by the 1224for barrier() that affects only the specific accesses flagged by the
1253ACCESS_ONCE(). 1225ACCESS_ONCE().
1254 1226
1255The compiler barrier has no direct effect on the CPU, which may then reorder 1227The barrier() function has the following effects:
1256things however it wishes. 1228
1229 (*) Prevents the compiler from reordering accesses following the
1230 barrier() to precede any accesses preceding the barrier().
1231 One example use for this property is to ease communication between
1232 interrupt-handler code and the code that was interrupted.
1233
1234 (*) Within a loop, forces the compiler to load the variables used
1235 in that loop's conditional on each pass through that loop.
1236
1237The ACCESS_ONCE() function can prevent any number of optimizations that,
1238while perfectly safe in single-threaded code, can be fatal in concurrent
1239code. Here are some examples of these sorts of optimizations:
1240
1241 (*) The compiler is within its rights to merge successive loads from
1242 the same variable. Such merging can cause the compiler to "optimize"
1243 the following code:
1244
1245 while (tmp = a)
1246 do_something_with(tmp);
1247
1248 into the following code, which, although in some sense legitimate
1249 for single-threaded code, is almost certainly not what the developer
1250 intended:
1251
1252 if (tmp = a)
1253 for (;;)
1254 do_something_with(tmp);
1255
1256 Use ACCESS_ONCE() to prevent the compiler from doing this to you:
1257
1258 while (tmp = ACCESS_ONCE(a))
1259 do_something_with(tmp);
1260
1261 (*) The compiler is within its rights to reload a variable, for example,
1262 in cases where high register pressure prevents the compiler from
1263 keeping all data of interest in registers. The compiler might
1264 therefore optimize the variable 'tmp' out of our previous example:
1265
1266 while (tmp = a)
1267 do_something_with(tmp);
1268
1269 This could result in the following code, which is perfectly safe in
1270 single-threaded code, but can be fatal in concurrent code:
1271
1272 while (a)
1273 do_something_with(a);
1274
1275 For example, the optimized version of this code could result in
1276 passing a zero to do_something_with() in the case where the variable
1277 a was modified by some other CPU between the "while" statement and
1278 the call to do_something_with().
1279
1280 Again, use ACCESS_ONCE() to prevent the compiler from doing this:
1281
1282 while (tmp = ACCESS_ONCE(a))
1283 do_something_with(tmp);
1284
1285 Note that if the compiler runs short of registers, it might save
1286 tmp onto the stack. The overhead of this saving and later restoring
1287 is why compilers reload variables. Doing so is perfectly safe for
1288 single-threaded code, so you need to tell the compiler about cases
1289 where it is not safe.
1290
1291 (*) The compiler is within its rights to omit a load entirely if it knows
1292 what the value will be. For example, if the compiler can prove that
1293 the value of variable 'a' is always zero, it can optimize this code:
1294
1295 while (tmp = a)
1296 do_something_with(tmp);
1297
1298 Into this:
1299
1300 do { } while (0);
1301
1302 This transformation is a win for single-threaded code because it gets
1303 rid of a load and a branch. The problem is that the compiler will
1304 carry out its proof assuming that the current CPU is the only one
1305 updating variable 'a'. If variable 'a' is shared, then the compiler's
1306 proof will be erroneous. Use ACCESS_ONCE() to tell the compiler
1307 that it doesn't know as much as it thinks it does:
1308
1309 while (tmp = ACCESS_ONCE(a))
1310 do_something_with(tmp);
1311
1312 But please note that the compiler is also closely watching what you
1313 do with the value after the ACCESS_ONCE(). For example, suppose you
1314 do the following and MAX is a preprocessor macro with the value 1:
1315
1316 while ((tmp = ACCESS_ONCE(a)) % MAX)
1317 do_something_with(tmp);
1318
1319 Then the compiler knows that the result of the "%" operator applied
1320 to MAX will always be zero, again allowing the compiler to optimize
1321 the code into near-nonexistence. (It will still load from the
1322 variable 'a'.)
1323
1324 (*) Similarly, the compiler is within its rights to omit a store entirely
1325 if it knows that the variable already has the value being stored.
1326 Again, the compiler assumes that the current CPU is the only one
1327 storing into the variable, which can cause the compiler to do the
1328 wrong thing for shared variables. For example, suppose you have
1329 the following:
1330
1331 a = 0;
1332 /* Code that does not store to variable a. */
1333 a = 0;
1334
1335 The compiler sees that the value of variable 'a' is already zero, so
1336 it might well omit the second store. This would come as a fatal
1337 surprise if some other CPU might have stored to variable 'a' in the
1338 meantime.
1339
1340 Use ACCESS_ONCE() to prevent the compiler from making this sort of
1341 wrong guess:
1342
1343 ACCESS_ONCE(a) = 0;
1344 /* Code that does not store to variable a. */
1345 ACCESS_ONCE(a) = 0;
1346
1347 (*) The compiler is within its rights to reorder memory accesses unless
1348 you tell it not to. For example, consider the following interaction
1349 between process-level code and an interrupt handler:
1350
1351 void process_level(void)
1352 {
1353 msg = get_message();
1354 flag = true;
1355 }
1356
1357 void interrupt_handler(void)
1358 {
1359 if (flag)
1360 process_message(msg);
1361 }
1362
1363 There is nothing to prevent the the compiler from transforming
1364 process_level() to the following, in fact, this might well be a
1365 win for single-threaded code:
1366
1367 void process_level(void)
1368 {
1369 flag = true;
1370 msg = get_message();
1371 }
1372
1373 If the interrupt occurs between these two statement, then
1374 interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE()
1375 to prevent this as follows:
1376
1377 void process_level(void)
1378 {
1379 ACCESS_ONCE(msg) = get_message();
1380 ACCESS_ONCE(flag) = true;
1381 }
1382
1383 void interrupt_handler(void)
1384 {
1385 if (ACCESS_ONCE(flag))
1386 process_message(ACCESS_ONCE(msg));
1387 }
1388
1389 Note that the ACCESS_ONCE() wrappers in interrupt_handler()
1390 are needed if this interrupt handler can itself be interrupted
1391 by something that also accesses 'flag' and 'msg', for example,
1392 a nested interrupt or an NMI. Otherwise, ACCESS_ONCE() is not
1393 needed in interrupt_handler() other than for documentation purposes.
1394 (Note also that nested interrupts do not typically occur in modern
1395 Linux kernels, in fact, if an interrupt handler returns with
1396 interrupts enabled, you will get a WARN_ONCE() splat.)
1397
1398 You should assume that the compiler can move ACCESS_ONCE() past
1399 code not containing ACCESS_ONCE(), barrier(), or similar primitives.
1400
1401 This effect could also be achieved using barrier(), but ACCESS_ONCE()
1402 is more selective: With ACCESS_ONCE(), the compiler need only forget
1403 the contents of the indicated memory locations, while with barrier()
1404 the compiler must discard the value of all memory locations that
1405 it has currented cached in any machine registers. Of course,
1406 the compiler must also respect the order in which the ACCESS_ONCE()s
1407 occur, though the CPU of course need not do so.
1408
1409 (*) The compiler is within its rights to invent stores to a variable,
1410 as in the following example:
1411
1412 if (a)
1413 b = a;
1414 else
1415 b = 42;
1416
1417 The compiler might save a branch by optimizing this as follows:
1418
1419 b = 42;
1420 if (a)
1421 b = a;
1422
1423 In single-threaded code, this is not only safe, but also saves
1424 a branch. Unfortunately, in concurrent code, this optimization
1425 could cause some other CPU to see a spurious value of 42 -- even
1426 if variable 'a' was never zero -- when loading variable 'b'.
1427 Use ACCESS_ONCE() to prevent this as follows:
1428
1429 if (a)
1430 ACCESS_ONCE(b) = a;
1431 else
1432 ACCESS_ONCE(b) = 42;
1433
1434 The compiler can also invent loads. These are usually less
1435 damaging, but they can result in cache-line bouncing and thus in
1436 poor performance and scalability. Use ACCESS_ONCE() to prevent
1437 invented loads.
1438
1439 (*) For aligned memory locations whose size allows them to be accessed
1440 with a single memory-reference instruction, prevents "load tearing"
1441 and "store tearing," in which a single large access is replaced by
1442 multiple smaller accesses. For example, given an architecture having
1443 16-bit store instructions with 7-bit immediate fields, the compiler
1444 might be tempted to use two 16-bit store-immediate instructions to
1445 implement the following 32-bit store:
1446
1447 p = 0x00010002;
1448
1449 Please note that GCC really does use this sort of optimization,
1450 which is not surprising given that it would likely take more
1451 than two instructions to build the constant and then store it.
1452 This optimization can therefore be a win in single-threaded code.
1453 In fact, a recent bug (since fixed) caused GCC to incorrectly use
1454 this optimization in a volatile store. In the absence of such bugs,
1455 use of ACCESS_ONCE() prevents store tearing in the following example:
1456
1457 ACCESS_ONCE(p) = 0x00010002;
1458
1459 Use of packed structures can also result in load and store tearing,
1460 as in this example:
1461
1462 struct __attribute__((__packed__)) foo {
1463 short a;
1464 int b;
1465 short c;
1466 };
1467 struct foo foo1, foo2;
1468 ...
1469
1470 foo2.a = foo1.a;
1471 foo2.b = foo1.b;
1472 foo2.c = foo1.c;
1473
1474 Because there are no ACCESS_ONCE() wrappers and no volatile markings,
1475 the compiler would be well within its rights to implement these three
1476 assignment statements as a pair of 32-bit loads followed by a pair
1477 of 32-bit stores. This would result in load tearing on 'foo1.b'
1478 and store tearing on 'foo2.b'. ACCESS_ONCE() again prevents tearing
1479 in this example:
1480
1481 foo2.a = foo1.a;
1482 ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b);
1483 foo2.c = foo1.c;
1484
1485All that aside, it is never necessary to use ACCESS_ONCE() on a variable
1486that has been marked volatile. For example, because 'jiffies' is marked
1487volatile, it is never necessary to say ACCESS_ONCE(jiffies). The reason
1488for this is that ACCESS_ONCE() is implemented as a volatile cast, which
1489has no effect when its argument is already marked volatile.
1490
1491Please note that these compiler barriers have no direct effect on the CPU,
1492which may then reorder things however it wishes.
1257 1493
1258 1494
1259CPU MEMORY BARRIERS 1495CPU MEMORY BARRIERS