Documentation/memory-barriers.txt: Document ACCESS_ONCE()

The situations in which ACCESS_ONCE() is required are not well documented, so this commit adds some verbiage to memory-barriers.txt. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-4-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2013-12-11 16:59:07 -0500
committer: Ingo Molnar <mingo@kernel.org> 2013-12-16 05:36:12 -0500
commit: 692118dac47e65f5131686b1103ebfebf0cbfa8e (patch)
tree: 1bba0afd9809bec501d5ec65d43ad3bcbb2e0574 /Documentation/memory-barriers.txt
parent: 18c03c61444a211237f3d4782353cb38dba795df (diff)
1 files changed, 271 insertions, 35 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index deafa36aeea1..919fd604969d 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -231,37 +231,8 @@ And there are a number of things that _must_ or _must_not_ be assumed:
 (*) It _must_not_ be assumed that the compiler will do what you want with
     memory references that are not protected by ACCESS_ONCE().  Without
     ACCESS_ONCE(), the compiler is within its rights to do all sorts
-     of "creative" transformations:
+     of "creative" transformations, which are covered in the Compiler
+     Barrier section.
-     (-) Repeat the load, possibly getting a different value on the second
-         and subsequent loads.  This is especially prone to happen when
-         register pressure is high.
-     (-) Merge adjacent loads and stores to the same location.  The most
-         familiar example is the transformation from:
-                while (a)
-                        do_something();
-         to something like:
-                if (a)
-                        for (;;)
-                                do_something();
-         Using ACCESS_ONCE() as follows prevents this sort of optimization:
-                while (ACCESS_ONCE(a))
-                        do_something();
-     (-) "Store tearing", where a single store in the source code is split
-         into smaller stores in the object code.  Note that gcc really
-         will do this on some architectures when storing certain constants.
-         It can be cheaper to do a series of immediate stores than to
-         form the constant in a register and then to store that register.
-     (-) "Load tearing", which splits loads in a manner analogous to
-         store tearing.
 (*) It _must_not_ be assumed that independent loads and stores will be issued
     in the order given.  This means that for:
@@ -749,7 +720,8 @@ In summary:
  (*) Control dependencies require that the compiler avoid reordering the
      dependency into nonexistence.  Careful use of ACCESS_ONCE() or
-      barrier() can help to preserve your control dependency.
+      barrier() can help to preserve your control dependency.  Please
+      see the Compiler Barrier section for more information.
  (*) Control dependencies do -not- provide transitivity.  If you
      need transitivity, use smp_mb().
@@ -1248,12 +1220,276 @@ compiler from moving the memory accesses either side of it to the other side:
        barrier();
 This is a general barrier -- there are no read-read or write-write variants
-of barrier().  Howevever, ACCESS_ONCE() can be thought of as a weak form
+of barrier().  However, ACCESS_ONCE() can be thought of as a weak form
 for barrier() that affects only the specific accesses flagged by the
 ACCESS_ONCE().
-The compiler barrier has no direct effect on the CPU, which may then reorder
+The barrier() function has the following effects:
-things however it wishes.
+ (*) Prevents the compiler from reordering accesses following the
+     barrier() to precede any accesses preceding the barrier().
+     One example use for this property is to ease communication between
+     interrupt-handler code and the code that was interrupted.
+ (*) Within a loop, forces the compiler to load the variables used
+     in that loop's conditional on each pass through that loop.
+The ACCESS_ONCE() function can prevent any number of optimizations that,
+while perfectly safe in single-threaded code, can be fatal in concurrent
+code.  Here are some examples of these sorts of optimizations:
+ (*) The compiler is within its rights to merge successive loads from
+     the same variable.  Such merging can cause the compiler to "optimize"
+     the following code:
+        while (tmp = a)
+                do_something_with(tmp);
+     into the following code, which, although in some sense legitimate
+     for single-threaded code, is almost certainly not what the developer
+     intended:
+        if (tmp = a)
+                for (;;)
+                        do_something_with(tmp);
+     Use ACCESS_ONCE() to prevent the compiler from doing this to you:
+        while (tmp = ACCESS_ONCE(a))
+                do_something_with(tmp);
+ (*) The compiler is within its rights to reload a variable, for example,
+     in cases where high register pressure prevents the compiler from
+     keeping all data of interest in registers.  The compiler might
+     therefore optimize the variable 'tmp' out of our previous example:
+        while (tmp = a)
+                do_something_with(tmp);
+     This could result in the following code, which is perfectly safe in
+     single-threaded code, but can be fatal in concurrent code:
+        while (a)
+                do_something_with(a);
+     For example, the optimized version of this code could result in
+     passing a zero to do_something_with() in the case where the variable
+     a was modified by some other CPU between the "while" statement and
+     the call to do_something_with().
+     Again, use ACCESS_ONCE() to prevent the compiler from doing this:
+        while (tmp = ACCESS_ONCE(a))
+                do_something_with(tmp);
+     Note that if the compiler runs short of registers, it might save
+     tmp onto the stack.  The overhead of this saving and later restoring
+     is why compilers reload variables.  Doing so is perfectly safe for
+     single-threaded code, so you need to tell the compiler about cases
+     where it is not safe.
+ (*) The compiler is within its rights to omit a load entirely if it knows
+     what the value will be.  For example, if the compiler can prove that
+     the value of variable 'a' is always zero, it can optimize this code:
+        while (tmp = a)
+                do_something_with(tmp);
+     Into this:
+        do { } while (0);
+     This transformation is a win for single-threaded code because it gets
+     rid of a load and a branch.  The problem is that the compiler will
+     carry out its proof assuming that the current CPU is the only one
+     updating variable 'a'.  If variable 'a' is shared, then the compiler's
+     proof will be erroneous.  Use ACCESS_ONCE() to tell the compiler
+     that it doesn't know as much as it thinks it does:
+        while (tmp = ACCESS_ONCE(a))
+                do_something_with(tmp);
+     But please note that the compiler is also closely watching what you
+     do with the value after the ACCESS_ONCE().  For example, suppose you
+     do the following and MAX is a preprocessor macro with the value 1:
+        while ((tmp = ACCESS_ONCE(a)) % MAX)
+                do_something_with(tmp);
+     Then the compiler knows that the result of the "%" operator applied
+     to MAX will always be zero, again allowing the compiler to optimize
+     the code into near-nonexistence.  (It will still load from the
+     variable 'a'.)
+ (*) Similarly, the compiler is within its rights to omit a store entirely
+     if it knows that the variable already has the value being stored.
+     Again, the compiler assumes that the current CPU is the only one
+     storing into the variable, which can cause the compiler to do the
+     wrong thing for shared variables.  For example, suppose you have
+     the following:
+        a = 0;
+        /* Code that does not store to variable a. */
+        a = 0;
+     The compiler sees that the value of variable 'a' is already zero, so
+     it might well omit the second store.  This would come as a fatal
+     surprise if some other CPU might have stored to variable 'a' in the
+     meantime.
+     Use ACCESS_ONCE() to prevent the compiler from making this sort of
+     wrong guess:
+        ACCESS_ONCE(a) = 0;
+        /* Code that does not store to variable a. */
+        ACCESS_ONCE(a) = 0;
+ (*) The compiler is within its rights to reorder memory accesses unless
+     you tell it not to.  For example, consider the following interaction
+     between process-level code and an interrupt handler:
+        void process_level(void)
+        {
+                msg = get_message();
+                flag = true;
+        }
+        void interrupt_handler(void)
+        {
+                if (flag)
+                        process_message(msg);
+        }
+     There is nothing to prevent the the compiler from transforming
+     process_level() to the following, in fact, this might well be a
+     win for single-threaded code:
+        void process_level(void)
+        {
+                flag = true;
+                msg = get_message();
+        }
+     If the interrupt occurs between these two statement, then
+     interrupt_handler() might be passed a garbled msg.  Use ACCESS_ONCE()
+     to prevent this as follows:
+        void process_level(void)
+        {
+                ACCESS_ONCE(msg) = get_message();
+                ACCESS_ONCE(flag) = true;
+        }
+        void interrupt_handler(void)
+        {
+                if (ACCESS_ONCE(flag))
+                        process_message(ACCESS_ONCE(msg));
+        }
+     Note that the ACCESS_ONCE() wrappers in interrupt_handler()
+     are needed if this interrupt handler can itself be interrupted
+     by something that also accesses 'flag' and 'msg', for example,
+     a nested interrupt or an NMI.  Otherwise, ACCESS_ONCE() is not
+     needed in interrupt_handler() other than for documentation purposes.
+     (Note also that nested interrupts do not typically occur in modern
+     Linux kernels, in fact, if an interrupt handler returns with
+     interrupts enabled, you will get a WARN_ONCE() splat.)
+     You should assume that the compiler can move ACCESS_ONCE() past
+     code not containing ACCESS_ONCE(), barrier(), or similar primitives.
+     This effect could also be achieved using barrier(), but ACCESS_ONCE()
+     is more selective:  With ACCESS_ONCE(), the compiler need only forget
+     the contents of the indicated memory locations, while with barrier()
+     the compiler must discard the value of all memory locations that
+     it has currented cached in any machine registers.  Of course,
+     the compiler must also respect the order in which the ACCESS_ONCE()s
+     occur, though the CPU of course need not do so.
+ (*) The compiler is within its rights to invent stores to a variable,
+     as in the following example:
+        if (a)
+                b = a;
+        else
+                b = 42;
+     The compiler might save a branch by optimizing this as follows:
+        b = 42;
+        if (a)
+                b = a;
+     In single-threaded code, this is not only safe, but also saves
+     a branch.  Unfortunately, in concurrent code, this optimization
+     could cause some other CPU to see a spurious value of 42 -- even
+     if variable 'a' was never zero -- when loading variable 'b'.
+     Use ACCESS_ONCE() to prevent this as follows:
+        if (a)
+                ACCESS_ONCE(b) = a;
+        else
+                ACCESS_ONCE(b) = 42;
+     The compiler can also invent loads.  These are usually less
+     damaging, but they can result in cache-line bouncing and thus in
+     poor performance and scalability.  Use ACCESS_ONCE() to prevent
+     invented loads.
+ (*) For aligned memory locations whose size allows them to be accessed
+     with a single memory-reference instruction, prevents "load tearing"
+     and "store tearing," in which a single large access is replaced by
+     multiple smaller accesses.  For example, given an architecture having
+     16-bit store instructions with 7-bit immediate fields, the compiler
+     might be tempted to use two 16-bit store-immediate instructions to
+     implement the following 32-bit store:
+        p = 0x00010002;
+     Please note that GCC really does use this sort of optimization,
+     which is not surprising given that it would likely take more
+     than two instructions to build the constant and then store it.
+     This optimization can therefore be a win in single-threaded code.
+     In fact, a recent bug (since fixed) caused GCC to incorrectly use
+     this optimization in a volatile store.  In the absence of such bugs,
+     use of ACCESS_ONCE() prevents store tearing in the following example:
+        ACCESS_ONCE(p) = 0x00010002;
+     Use of packed structures can also result in load and store tearing,
+     as in this example:
+        struct __attribute__((__packed__)) foo {
+                short a;
+                int b;
+                short c;
+        };
+        struct foo foo1, foo2;
+        ...
+        foo2.a = foo1.a;
+        foo2.b = foo1.b;
+        foo2.c = foo1.c;
+     Because there are no ACCESS_ONCE() wrappers and no volatile markings,
+     the compiler would be well within its rights to implement these three
+     assignment statements as a pair of 32-bit loads followed by a pair
+     of 32-bit stores.  This would result in load tearing on 'foo1.b'
+     and store tearing on 'foo2.b'.  ACCESS_ONCE() again prevents tearing
+     in this example:
+        foo2.a = foo1.a;
+        ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b);
+        foo2.c = foo1.c;
+All that aside, it is never necessary to use ACCESS_ONCE() on a variable
+that has been marked volatile.  For example, because 'jiffies' is marked
+volatile, it is never necessary to say ACCESS_ONCE(jiffies).  The reason
+for this is that ACCESS_ONCE() is implemented as a volatile cast, which
+has no effect when its argument is already marked volatile.
+Please note that these compiler barriers have no direct effect on the CPU,
+which may then reorder things however it wishes.
 CPU MEMORY BARRIERS
author	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-11 16:59:07 -0500
committer	Ingo Molnar <mingo@kernel.org>	2013-12-16 05:36:12 -0500
commit	692118dac47e65f5131686b1103ebfebf0cbfa8e (patch)
tree	1bba0afd9809bec501d5ec65d43ad3bcbb2e0574 /Documentation/memory-barriers.txt
parent	18c03c61444a211237f3d4782353cb38dba795df (diff)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index deafa36aeea1..919fd604969d 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt
@@ -231,37 +231,8 @@ And there are a number of things that _must_ or _must_not_ be assumed:
231	(*) It _must_not_ be assumed that the compiler will do what you want with	231	(*) It _must_not_ be assumed that the compiler will do what you want with
232	memory references that are not protected by ACCESS_ONCE(). Without	232	memory references that are not protected by ACCESS_ONCE(). Without
233	ACCESS_ONCE(), the compiler is within its rights to do all sorts	233	ACCESS_ONCE(), the compiler is within its rights to do all sorts
234	of "creative" transformations:	234	of "creative" transformations, which are covered in the Compiler
235		235	Barrier section.
236	(-) Repeat the load, possibly getting a different value on the second
237	and subsequent loads. This is especially prone to happen when
238	register pressure is high.
239
240	(-) Merge adjacent loads and stores to the same location. The most
241	familiar example is the transformation from:
242
243	while (a)
244	do_something();
245
246	to something like:
247
248	if (a)
249	for (;;)
250	do_something();
251
252	Using ACCESS_ONCE() as follows prevents this sort of optimization:
253
254	while (ACCESS_ONCE(a))
255	do_something();
256
257	(-) "Store tearing", where a single store in the source code is split
258	into smaller stores in the object code. Note that gcc really
259	will do this on some architectures when storing certain constants.
260	It can be cheaper to do a series of immediate stores than to
261	form the constant in a register and then to store that register.
262
263	(-) "Load tearing", which splits loads in a manner analogous to
264	store tearing.
265		236
266	(*) It _must_not_ be assumed that independent loads and stores will be issued	237	(*) It _must_not_ be assumed that independent loads and stores will be issued
267	in the order given. This means that for:	238	in the order given. This means that for:
@@ -749,7 +720,8 @@ In summary:
749		720
750	(*) Control dependencies require that the compiler avoid reordering the	721	(*) Control dependencies require that the compiler avoid reordering the
751	dependency into nonexistence. Careful use of ACCESS_ONCE() or	722	dependency into nonexistence. Careful use of ACCESS_ONCE() or
752	barrier() can help to preserve your control dependency.	723	barrier() can help to preserve your control dependency. Please
		724	see the Compiler Barrier section for more information.
753		725
754	(*) Control dependencies do -not- provide transitivity. If you	726	(*) Control dependencies do -not- provide transitivity. If you
755	need transitivity, use smp_mb().	727	need transitivity, use smp_mb().
@@ -1248,12 +1220,276 @@ compiler from moving the memory accesses either side of it to the other side:
1248	barrier();	1220	barrier();
1249		1221
1250	This is a general barrier -- there are no read-read or write-write variants	1222	This is a general barrier -- there are no read-read or write-write variants
1251	of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form	1223	of barrier(). However, ACCESS_ONCE() can be thought of as a weak form
1252	for barrier() that affects only the specific accesses flagged by the	1224	for barrier() that affects only the specific accesses flagged by the
1253	ACCESS_ONCE().	1225	ACCESS_ONCE().
1254		1226
1255	The compiler barrier has no direct effect on the CPU, which may then reorder	1227	The barrier() function has the following effects:
1256	things however it wishes.	1228
		1229	(*) Prevents the compiler from reordering accesses following the
		1230	barrier() to precede any accesses preceding the barrier().
		1231	One example use for this property is to ease communication between
		1232	interrupt-handler code and the code that was interrupted.
		1233
		1234	(*) Within a loop, forces the compiler to load the variables used
		1235	in that loop's conditional on each pass through that loop.
		1236
		1237	The ACCESS_ONCE() function can prevent any number of optimizations that,
		1238	while perfectly safe in single-threaded code, can be fatal in concurrent
		1239	code. Here are some examples of these sorts of optimizations:
		1240
		1241	(*) The compiler is within its rights to merge successive loads from
		1242	the same variable. Such merging can cause the compiler to "optimize"
		1243	the following code:
		1244
		1245	while (tmp = a)
		1246	do_something_with(tmp);
		1247
		1248	into the following code, which, although in some sense legitimate
		1249	for single-threaded code, is almost certainly not what the developer
		1250	intended:
		1251
		1252	if (tmp = a)
		1253	for (;;)
		1254	do_something_with(tmp);
		1255
		1256	Use ACCESS_ONCE() to prevent the compiler from doing this to you:
		1257
		1258	while (tmp = ACCESS_ONCE(a))
		1259	do_something_with(tmp);
		1260
		1261	(*) The compiler is within its rights to reload a variable, for example,
		1262	in cases where high register pressure prevents the compiler from
		1263	keeping all data of interest in registers. The compiler might
		1264	therefore optimize the variable 'tmp' out of our previous example:
		1265
		1266	while (tmp = a)
		1267	do_something_with(tmp);
		1268
		1269	This could result in the following code, which is perfectly safe in
		1270	single-threaded code, but can be fatal in concurrent code:
		1271
		1272	while (a)
		1273	do_something_with(a);
		1274
		1275	For example, the optimized version of this code could result in
		1276	passing a zero to do_something_with() in the case where the variable
		1277	a was modified by some other CPU between the "while" statement and
		1278	the call to do_something_with().
		1279
		1280	Again, use ACCESS_ONCE() to prevent the compiler from doing this:
		1281
		1282	while (tmp = ACCESS_ONCE(a))
		1283	do_something_with(tmp);
		1284
		1285	Note that if the compiler runs short of registers, it might save
		1286	tmp onto the stack. The overhead of this saving and later restoring
		1287	is why compilers reload variables. Doing so is perfectly safe for
		1288	single-threaded code, so you need to tell the compiler about cases
		1289	where it is not safe.
		1290
		1291	(*) The compiler is within its rights to omit a load entirely if it knows
		1292	what the value will be. For example, if the compiler can prove that
		1293	the value of variable 'a' is always zero, it can optimize this code:
		1294
		1295	while (tmp = a)
		1296	do_something_with(tmp);
		1297
		1298	Into this:
		1299
		1300	do { } while (0);
		1301
		1302	This transformation is a win for single-threaded code because it gets
		1303	rid of a load and a branch. The problem is that the compiler will
		1304	carry out its proof assuming that the current CPU is the only one
		1305	updating variable 'a'. If variable 'a' is shared, then the compiler's
		1306	proof will be erroneous. Use ACCESS_ONCE() to tell the compiler
		1307	that it doesn't know as much as it thinks it does:
		1308
		1309	while (tmp = ACCESS_ONCE(a))
		1310	do_something_with(tmp);
		1311
		1312	But please note that the compiler is also closely watching what you
		1313	do with the value after the ACCESS_ONCE(). For example, suppose you
		1314	do the following and MAX is a preprocessor macro with the value 1:
		1315
		1316	while ((tmp = ACCESS_ONCE(a)) % MAX)
		1317	do_something_with(tmp);
		1318
		1319	Then the compiler knows that the result of the "%" operator applied
		1320	to MAX will always be zero, again allowing the compiler to optimize
		1321	the code into near-nonexistence. (It will still load from the
		1322	variable 'a'.)
		1323
		1324	(*) Similarly, the compiler is within its rights to omit a store entirely
		1325	if it knows that the variable already has the value being stored.
		1326	Again, the compiler assumes that the current CPU is the only one
		1327	storing into the variable, which can cause the compiler to do the
		1328	wrong thing for shared variables. For example, suppose you have
		1329	the following:
		1330
		1331	a = 0;
		1332	/* Code that does not store to variable a. */
		1333	a = 0;
		1334
		1335	The compiler sees that the value of variable 'a' is already zero, so
		1336	it might well omit the second store. This would come as a fatal
		1337	surprise if some other CPU might have stored to variable 'a' in the
		1338	meantime.
		1339
		1340	Use ACCESS_ONCE() to prevent the compiler from making this sort of
		1341	wrong guess:
		1342
		1343	ACCESS_ONCE(a) = 0;
		1344	/* Code that does not store to variable a. */
		1345	ACCESS_ONCE(a) = 0;
		1346
		1347	(*) The compiler is within its rights to reorder memory accesses unless
		1348	you tell it not to. For example, consider the following interaction
		1349	between process-level code and an interrupt handler:
		1350
		1351	void process_level(void)
		1352	{
		1353	msg = get_message();
		1354	flag = true;
		1355	}
		1356
		1357	void interrupt_handler(void)
		1358	{
		1359	if (flag)
		1360	process_message(msg);
		1361	}
		1362
		1363	There is nothing to prevent the the compiler from transforming
		1364	process_level() to the following, in fact, this might well be a
		1365	win for single-threaded code:
		1366
		1367	void process_level(void)
		1368	{
		1369	flag = true;
		1370	msg = get_message();
		1371	}
		1372
		1373	If the interrupt occurs between these two statement, then
		1374	interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE()
		1375	to prevent this as follows:
		1376
		1377	void process_level(void)
		1378	{
		1379	ACCESS_ONCE(msg) = get_message();
		1380	ACCESS_ONCE(flag) = true;
		1381	}
		1382
		1383	void interrupt_handler(void)
		1384	{
		1385	if (ACCESS_ONCE(flag))
		1386	process_message(ACCESS_ONCE(msg));
		1387	}
		1388
		1389	Note that the ACCESS_ONCE() wrappers in interrupt_handler()
		1390	are needed if this interrupt handler can itself be interrupted
		1391	by something that also accesses 'flag' and 'msg', for example,
		1392	a nested interrupt or an NMI. Otherwise, ACCESS_ONCE() is not
		1393	needed in interrupt_handler() other than for documentation purposes.
		1394	(Note also that nested interrupts do not typically occur in modern
		1395	Linux kernels, in fact, if an interrupt handler returns with
		1396	interrupts enabled, you will get a WARN_ONCE() splat.)
		1397
		1398	You should assume that the compiler can move ACCESS_ONCE() past
		1399	code not containing ACCESS_ONCE(), barrier(), or similar primitives.
		1400
		1401	This effect could also be achieved using barrier(), but ACCESS_ONCE()
		1402	is more selective: With ACCESS_ONCE(), the compiler need only forget
		1403	the contents of the indicated memory locations, while with barrier()
		1404	the compiler must discard the value of all memory locations that
		1405	it has currented cached in any machine registers. Of course,
		1406	the compiler must also respect the order in which the ACCESS_ONCE()s
		1407	occur, though the CPU of course need not do so.
		1408
		1409	(*) The compiler is within its rights to invent stores to a variable,
		1410	as in the following example:
		1411
		1412	if (a)
		1413	b = a;
		1414	else
		1415	b = 42;
		1416
		1417	The compiler might save a branch by optimizing this as follows:
		1418
		1419	b = 42;
		1420	if (a)
		1421	b = a;
		1422
		1423	In single-threaded code, this is not only safe, but also saves
		1424	a branch. Unfortunately, in concurrent code, this optimization
		1425	could cause some other CPU to see a spurious value of 42 -- even
		1426	if variable 'a' was never zero -- when loading variable 'b'.
		1427	Use ACCESS_ONCE() to prevent this as follows:
		1428
		1429	if (a)
		1430	ACCESS_ONCE(b) = a;
		1431	else
		1432	ACCESS_ONCE(b) = 42;
		1433
		1434	The compiler can also invent loads. These are usually less
		1435	damaging, but they can result in cache-line bouncing and thus in
		1436	poor performance and scalability. Use ACCESS_ONCE() to prevent
		1437	invented loads.
		1438
		1439	(*) For aligned memory locations whose size allows them to be accessed
		1440	with a single memory-reference instruction, prevents "load tearing"
		1441	and "store tearing," in which a single large access is replaced by
		1442	multiple smaller accesses. For example, given an architecture having
		1443	16-bit store instructions with 7-bit immediate fields, the compiler
		1444	might be tempted to use two 16-bit store-immediate instructions to
		1445	implement the following 32-bit store:
		1446
		1447	p = 0x00010002;
		1448
		1449	Please note that GCC really does use this sort of optimization,
		1450	which is not surprising given that it would likely take more
		1451	than two instructions to build the constant and then store it.
		1452	This optimization can therefore be a win in single-threaded code.
		1453	In fact, a recent bug (since fixed) caused GCC to incorrectly use
		1454	this optimization in a volatile store. In the absence of such bugs,
		1455	use of ACCESS_ONCE() prevents store tearing in the following example:
		1456
		1457	ACCESS_ONCE(p) = 0x00010002;
		1458
		1459	Use of packed structures can also result in load and store tearing,
		1460	as in this example:
		1461
		1462	struct __attribute__((__packed__)) foo {
		1463	short a;
		1464	int b;
		1465	short c;
		1466	};
		1467	struct foo foo1, foo2;
		1468	...
		1469
		1470	foo2.a = foo1.a;
		1471	foo2.b = foo1.b;
		1472	foo2.c = foo1.c;
		1473
		1474	Because there are no ACCESS_ONCE() wrappers and no volatile markings,
		1475	the compiler would be well within its rights to implement these three
		1476	assignment statements as a pair of 32-bit loads followed by a pair
		1477	of 32-bit stores. This would result in load tearing on 'foo1.b'
		1478	and store tearing on 'foo2.b'. ACCESS_ONCE() again prevents tearing
		1479	in this example:
		1480
		1481	foo2.a = foo1.a;
		1482	ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b);
		1483	foo2.c = foo1.c;
		1484
		1485	All that aside, it is never necessary to use ACCESS_ONCE() on a variable
		1486	that has been marked volatile. For example, because 'jiffies' is marked
		1487	volatile, it is never necessary to say ACCESS_ONCE(jiffies). The reason
		1488	for this is that ACCESS_ONCE() is implemented as a volatile cast, which
		1489	has no effect when its argument is already marked volatile.
		1490
		1491	Please note that these compiler barriers have no direct effect on the CPU,
		1492	which may then reorder things however it wishes.
1257		1493
1258		1494
1259	CPU MEMORY BARRIERS	1495	CPU MEMORY BARRIERS