diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2013-12-11 16:59:07 -0500 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2013-12-16 05:36:12 -0500 |
commit | 692118dac47e65f5131686b1103ebfebf0cbfa8e (patch) | |
tree | 1bba0afd9809bec501d5ec65d43ad3bcbb2e0574 /Documentation/memory-barriers.txt | |
parent | 18c03c61444a211237f3d4782353cb38dba795df (diff) |
Documentation/memory-barriers.txt: Document ACCESS_ONCE()
The situations in which ACCESS_ONCE() is required are not well
documented, so this commit adds some verbiage to
memory-barriers.txt.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-4-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r-- | Documentation/memory-barriers.txt | 306 |
1 files changed, 271 insertions, 35 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index deafa36aeea1..919fd604969d 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -231,37 +231,8 @@ And there are a number of things that _must_ or _must_not_ be assumed: | |||
231 | (*) It _must_not_ be assumed that the compiler will do what you want with | 231 | (*) It _must_not_ be assumed that the compiler will do what you want with |
232 | memory references that are not protected by ACCESS_ONCE(). Without | 232 | memory references that are not protected by ACCESS_ONCE(). Without |
233 | ACCESS_ONCE(), the compiler is within its rights to do all sorts | 233 | ACCESS_ONCE(), the compiler is within its rights to do all sorts |
234 | of "creative" transformations: | 234 | of "creative" transformations, which are covered in the Compiler |
235 | 235 | Barrier section. | |
236 | (-) Repeat the load, possibly getting a different value on the second | ||
237 | and subsequent loads. This is especially prone to happen when | ||
238 | register pressure is high. | ||
239 | |||
240 | (-) Merge adjacent loads and stores to the same location. The most | ||
241 | familiar example is the transformation from: | ||
242 | |||
243 | while (a) | ||
244 | do_something(); | ||
245 | |||
246 | to something like: | ||
247 | |||
248 | if (a) | ||
249 | for (;;) | ||
250 | do_something(); | ||
251 | |||
252 | Using ACCESS_ONCE() as follows prevents this sort of optimization: | ||
253 | |||
254 | while (ACCESS_ONCE(a)) | ||
255 | do_something(); | ||
256 | |||
257 | (-) "Store tearing", where a single store in the source code is split | ||
258 | into smaller stores in the object code. Note that gcc really | ||
259 | will do this on some architectures when storing certain constants. | ||
260 | It can be cheaper to do a series of immediate stores than to | ||
261 | form the constant in a register and then to store that register. | ||
262 | |||
263 | (-) "Load tearing", which splits loads in a manner analogous to | ||
264 | store tearing. | ||
265 | 236 | ||
266 | (*) It _must_not_ be assumed that independent loads and stores will be issued | 237 | (*) It _must_not_ be assumed that independent loads and stores will be issued |
267 | in the order given. This means that for: | 238 | in the order given. This means that for: |
@@ -749,7 +720,8 @@ In summary: | |||
749 | 720 | ||
750 | (*) Control dependencies require that the compiler avoid reordering the | 721 | (*) Control dependencies require that the compiler avoid reordering the |
751 | dependency into nonexistence. Careful use of ACCESS_ONCE() or | 722 | dependency into nonexistence. Careful use of ACCESS_ONCE() or |
752 | barrier() can help to preserve your control dependency. | 723 | barrier() can help to preserve your control dependency. Please |
724 | see the Compiler Barrier section for more information. | ||
753 | 725 | ||
754 | (*) Control dependencies do -not- provide transitivity. If you | 726 | (*) Control dependencies do -not- provide transitivity. If you |
755 | need transitivity, use smp_mb(). | 727 | need transitivity, use smp_mb(). |
@@ -1248,12 +1220,276 @@ compiler from moving the memory accesses either side of it to the other side: | |||
1248 | barrier(); | 1220 | barrier(); |
1249 | 1221 | ||
1250 | This is a general barrier -- there are no read-read or write-write variants | 1222 | This is a general barrier -- there are no read-read or write-write variants |
1251 | of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form | 1223 | of barrier(). However, ACCESS_ONCE() can be thought of as a weak form |
1252 | for barrier() that affects only the specific accesses flagged by the | 1224 | for barrier() that affects only the specific accesses flagged by the |
1253 | ACCESS_ONCE(). | 1225 | ACCESS_ONCE(). |
1254 | 1226 | ||
1255 | The compiler barrier has no direct effect on the CPU, which may then reorder | 1227 | The barrier() function has the following effects: |
1256 | things however it wishes. | 1228 | |
1229 | (*) Prevents the compiler from reordering accesses following the | ||
1230 | barrier() to precede any accesses preceding the barrier(). | ||
1231 | One example use for this property is to ease communication between | ||
1232 | interrupt-handler code and the code that was interrupted. | ||
1233 | |||
1234 | (*) Within a loop, forces the compiler to load the variables used | ||
1235 | in that loop's conditional on each pass through that loop. | ||
1236 | |||
1237 | The ACCESS_ONCE() function can prevent any number of optimizations that, | ||
1238 | while perfectly safe in single-threaded code, can be fatal in concurrent | ||
1239 | code. Here are some examples of these sorts of optimizations: | ||
1240 | |||
1241 | (*) The compiler is within its rights to merge successive loads from | ||
1242 | the same variable. Such merging can cause the compiler to "optimize" | ||
1243 | the following code: | ||
1244 | |||
1245 | while (tmp = a) | ||
1246 | do_something_with(tmp); | ||
1247 | |||
1248 | into the following code, which, although in some sense legitimate | ||
1249 | for single-threaded code, is almost certainly not what the developer | ||
1250 | intended: | ||
1251 | |||
1252 | if (tmp = a) | ||
1253 | for (;;) | ||
1254 | do_something_with(tmp); | ||
1255 | |||
1256 | Use ACCESS_ONCE() to prevent the compiler from doing this to you: | ||
1257 | |||
1258 | while (tmp = ACCESS_ONCE(a)) | ||
1259 | do_something_with(tmp); | ||
1260 | |||
1261 | (*) The compiler is within its rights to reload a variable, for example, | ||
1262 | in cases where high register pressure prevents the compiler from | ||
1263 | keeping all data of interest in registers. The compiler might | ||
1264 | therefore optimize the variable 'tmp' out of our previous example: | ||
1265 | |||
1266 | while (tmp = a) | ||
1267 | do_something_with(tmp); | ||
1268 | |||
1269 | This could result in the following code, which is perfectly safe in | ||
1270 | single-threaded code, but can be fatal in concurrent code: | ||
1271 | |||
1272 | while (a) | ||
1273 | do_something_with(a); | ||
1274 | |||
1275 | For example, the optimized version of this code could result in | ||
1276 | passing a zero to do_something_with() in the case where the variable | ||
1277 | a was modified by some other CPU between the "while" statement and | ||
1278 | the call to do_something_with(). | ||
1279 | |||
1280 | Again, use ACCESS_ONCE() to prevent the compiler from doing this: | ||
1281 | |||
1282 | while (tmp = ACCESS_ONCE(a)) | ||
1283 | do_something_with(tmp); | ||
1284 | |||
1285 | Note that if the compiler runs short of registers, it might save | ||
1286 | tmp onto the stack. The overhead of this saving and later restoring | ||
1287 | is why compilers reload variables. Doing so is perfectly safe for | ||
1288 | single-threaded code, so you need to tell the compiler about cases | ||
1289 | where it is not safe. | ||
1290 | |||
1291 | (*) The compiler is within its rights to omit a load entirely if it knows | ||
1292 | what the value will be. For example, if the compiler can prove that | ||
1293 | the value of variable 'a' is always zero, it can optimize this code: | ||
1294 | |||
1295 | while (tmp = a) | ||
1296 | do_something_with(tmp); | ||
1297 | |||
1298 | Into this: | ||
1299 | |||
1300 | do { } while (0); | ||
1301 | |||
1302 | This transformation is a win for single-threaded code because it gets | ||
1303 | rid of a load and a branch. The problem is that the compiler will | ||
1304 | carry out its proof assuming that the current CPU is the only one | ||
1305 | updating variable 'a'. If variable 'a' is shared, then the compiler's | ||
1306 | proof will be erroneous. Use ACCESS_ONCE() to tell the compiler | ||
1307 | that it doesn't know as much as it thinks it does: | ||
1308 | |||
1309 | while (tmp = ACCESS_ONCE(a)) | ||
1310 | do_something_with(tmp); | ||
1311 | |||
1312 | But please note that the compiler is also closely watching what you | ||
1313 | do with the value after the ACCESS_ONCE(). For example, suppose you | ||
1314 | do the following and MAX is a preprocessor macro with the value 1: | ||
1315 | |||
1316 | while ((tmp = ACCESS_ONCE(a)) % MAX) | ||
1317 | do_something_with(tmp); | ||
1318 | |||
1319 | Then the compiler knows that the result of the "%" operator applied | ||
1320 | to MAX will always be zero, again allowing the compiler to optimize | ||
1321 | the code into near-nonexistence. (It will still load from the | ||
1322 | variable 'a'.) | ||
1323 | |||
1324 | (*) Similarly, the compiler is within its rights to omit a store entirely | ||
1325 | if it knows that the variable already has the value being stored. | ||
1326 | Again, the compiler assumes that the current CPU is the only one | ||
1327 | storing into the variable, which can cause the compiler to do the | ||
1328 | wrong thing for shared variables. For example, suppose you have | ||
1329 | the following: | ||
1330 | |||
1331 | a = 0; | ||
1332 | /* Code that does not store to variable a. */ | ||
1333 | a = 0; | ||
1334 | |||
1335 | The compiler sees that the value of variable 'a' is already zero, so | ||
1336 | it might well omit the second store. This would come as a fatal | ||
1337 | surprise if some other CPU might have stored to variable 'a' in the | ||
1338 | meantime. | ||
1339 | |||
1340 | Use ACCESS_ONCE() to prevent the compiler from making this sort of | ||
1341 | wrong guess: | ||
1342 | |||
1343 | ACCESS_ONCE(a) = 0; | ||
1344 | /* Code that does not store to variable a. */ | ||
1345 | ACCESS_ONCE(a) = 0; | ||
1346 | |||
1347 | (*) The compiler is within its rights to reorder memory accesses unless | ||
1348 | you tell it not to. For example, consider the following interaction | ||
1349 | between process-level code and an interrupt handler: | ||
1350 | |||
1351 | void process_level(void) | ||
1352 | { | ||
1353 | msg = get_message(); | ||
1354 | flag = true; | ||
1355 | } | ||
1356 | |||
1357 | void interrupt_handler(void) | ||
1358 | { | ||
1359 | if (flag) | ||
1360 | process_message(msg); | ||
1361 | } | ||
1362 | |||
1363 | There is nothing to prevent the the compiler from transforming | ||
1364 | process_level() to the following, in fact, this might well be a | ||
1365 | win for single-threaded code: | ||
1366 | |||
1367 | void process_level(void) | ||
1368 | { | ||
1369 | flag = true; | ||
1370 | msg = get_message(); | ||
1371 | } | ||
1372 | |||
1373 | If the interrupt occurs between these two statement, then | ||
1374 | interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE() | ||
1375 | to prevent this as follows: | ||
1376 | |||
1377 | void process_level(void) | ||
1378 | { | ||
1379 | ACCESS_ONCE(msg) = get_message(); | ||
1380 | ACCESS_ONCE(flag) = true; | ||
1381 | } | ||
1382 | |||
1383 | void interrupt_handler(void) | ||
1384 | { | ||
1385 | if (ACCESS_ONCE(flag)) | ||
1386 | process_message(ACCESS_ONCE(msg)); | ||
1387 | } | ||
1388 | |||
1389 | Note that the ACCESS_ONCE() wrappers in interrupt_handler() | ||
1390 | are needed if this interrupt handler can itself be interrupted | ||
1391 | by something that also accesses 'flag' and 'msg', for example, | ||
1392 | a nested interrupt or an NMI. Otherwise, ACCESS_ONCE() is not | ||
1393 | needed in interrupt_handler() other than for documentation purposes. | ||
1394 | (Note also that nested interrupts do not typically occur in modern | ||
1395 | Linux kernels, in fact, if an interrupt handler returns with | ||
1396 | interrupts enabled, you will get a WARN_ONCE() splat.) | ||
1397 | |||
1398 | You should assume that the compiler can move ACCESS_ONCE() past | ||
1399 | code not containing ACCESS_ONCE(), barrier(), or similar primitives. | ||
1400 | |||
1401 | This effect could also be achieved using barrier(), but ACCESS_ONCE() | ||
1402 | is more selective: With ACCESS_ONCE(), the compiler need only forget | ||
1403 | the contents of the indicated memory locations, while with barrier() | ||
1404 | the compiler must discard the value of all memory locations that | ||
1405 | it has currented cached in any machine registers. Of course, | ||
1406 | the compiler must also respect the order in which the ACCESS_ONCE()s | ||
1407 | occur, though the CPU of course need not do so. | ||
1408 | |||
1409 | (*) The compiler is within its rights to invent stores to a variable, | ||
1410 | as in the following example: | ||
1411 | |||
1412 | if (a) | ||
1413 | b = a; | ||
1414 | else | ||
1415 | b = 42; | ||
1416 | |||
1417 | The compiler might save a branch by optimizing this as follows: | ||
1418 | |||
1419 | b = 42; | ||
1420 | if (a) | ||
1421 | b = a; | ||
1422 | |||
1423 | In single-threaded code, this is not only safe, but also saves | ||
1424 | a branch. Unfortunately, in concurrent code, this optimization | ||
1425 | could cause some other CPU to see a spurious value of 42 -- even | ||
1426 | if variable 'a' was never zero -- when loading variable 'b'. | ||
1427 | Use ACCESS_ONCE() to prevent this as follows: | ||
1428 | |||
1429 | if (a) | ||
1430 | ACCESS_ONCE(b) = a; | ||
1431 | else | ||
1432 | ACCESS_ONCE(b) = 42; | ||
1433 | |||
1434 | The compiler can also invent loads. These are usually less | ||
1435 | damaging, but they can result in cache-line bouncing and thus in | ||
1436 | poor performance and scalability. Use ACCESS_ONCE() to prevent | ||
1437 | invented loads. | ||
1438 | |||
1439 | (*) For aligned memory locations whose size allows them to be accessed | ||
1440 | with a single memory-reference instruction, prevents "load tearing" | ||
1441 | and "store tearing," in which a single large access is replaced by | ||
1442 | multiple smaller accesses. For example, given an architecture having | ||
1443 | 16-bit store instructions with 7-bit immediate fields, the compiler | ||
1444 | might be tempted to use two 16-bit store-immediate instructions to | ||
1445 | implement the following 32-bit store: | ||
1446 | |||
1447 | p = 0x00010002; | ||
1448 | |||
1449 | Please note that GCC really does use this sort of optimization, | ||
1450 | which is not surprising given that it would likely take more | ||
1451 | than two instructions to build the constant and then store it. | ||
1452 | This optimization can therefore be a win in single-threaded code. | ||
1453 | In fact, a recent bug (since fixed) caused GCC to incorrectly use | ||
1454 | this optimization in a volatile store. In the absence of such bugs, | ||
1455 | use of ACCESS_ONCE() prevents store tearing in the following example: | ||
1456 | |||
1457 | ACCESS_ONCE(p) = 0x00010002; | ||
1458 | |||
1459 | Use of packed structures can also result in load and store tearing, | ||
1460 | as in this example: | ||
1461 | |||
1462 | struct __attribute__((__packed__)) foo { | ||
1463 | short a; | ||
1464 | int b; | ||
1465 | short c; | ||
1466 | }; | ||
1467 | struct foo foo1, foo2; | ||
1468 | ... | ||
1469 | |||
1470 | foo2.a = foo1.a; | ||
1471 | foo2.b = foo1.b; | ||
1472 | foo2.c = foo1.c; | ||
1473 | |||
1474 | Because there are no ACCESS_ONCE() wrappers and no volatile markings, | ||
1475 | the compiler would be well within its rights to implement these three | ||
1476 | assignment statements as a pair of 32-bit loads followed by a pair | ||
1477 | of 32-bit stores. This would result in load tearing on 'foo1.b' | ||
1478 | and store tearing on 'foo2.b'. ACCESS_ONCE() again prevents tearing | ||
1479 | in this example: | ||
1480 | |||
1481 | foo2.a = foo1.a; | ||
1482 | ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b); | ||
1483 | foo2.c = foo1.c; | ||
1484 | |||
1485 | All that aside, it is never necessary to use ACCESS_ONCE() on a variable | ||
1486 | that has been marked volatile. For example, because 'jiffies' is marked | ||
1487 | volatile, it is never necessary to say ACCESS_ONCE(jiffies). The reason | ||
1488 | for this is that ACCESS_ONCE() is implemented as a volatile cast, which | ||
1489 | has no effect when its argument is already marked volatile. | ||
1490 | |||
1491 | Please note that these compiler barriers have no direct effect on the CPU, | ||
1492 | which may then reorder things however it wishes. | ||
1257 | 1493 | ||
1258 | 1494 | ||
1259 | CPU MEMORY BARRIERS | 1495 | CPU MEMORY BARRIERS |