aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/RCU/rcu_dereference.txt2
-rw-r--r--Documentation/RCU/stallwarn.txt29
-rw-r--r--Documentation/RCU/trace.txt36
-rw-r--r--Documentation/RCU/whatisRCU.txt2
-rw-r--r--Documentation/kernel-parameters.txt35
-rw-r--r--Documentation/memory-barriers.txt359
-rw-r--r--MAINTAINERS6
-rw-r--r--arch/x86/kernel/cpu/mcheck/mce.c6
-rw-r--r--arch/x86/kernel/traps.c2
-rw-r--r--drivers/base/power/opp.c4
-rw-r--r--include/linux/fdtable.h4
-rw-r--r--include/linux/rcupdate.h144
-rw-r--r--include/linux/rcutiny.h10
-rw-r--r--include/linux/rcutree.h2
-rw-r--r--include/linux/types.h3
-rw-r--r--include/trace/events/rcu.h1
-rw-r--r--init/Kconfig10
-rw-r--r--kernel/cgroup.c4
-rw-r--r--kernel/cpu.c10
-rw-r--r--kernel/pid.c5
-rw-r--r--kernel/rcu/rcutorture.c42
-rw-r--r--kernel/rcu/srcu.c15
-rw-r--r--kernel/rcu/tiny.c8
-rw-r--r--kernel/rcu/tree.c681
-rw-r--r--kernel/rcu/tree.h84
-rw-r--r--kernel/rcu/tree_plugin.h130
-rw-r--r--kernel/rcu/tree_trace.c19
-rw-r--r--kernel/rcu/update.c90
-rw-r--r--kernel/sched/core.c8
-rw-r--r--kernel/time/Kconfig2
-rw-r--r--kernel/workqueue.c20
-rw-r--r--lib/Kconfig.debug16
-rwxr-xr-xscripts/checkpatch.pl19
-rw-r--r--security/device_cgroup.c6
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TASKS014
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE011
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE021
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE02-T1
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE031
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE041
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE051
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE061
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE071
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE081
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE08-T1
-rw-r--r--tools/testing/selftests/rcutorture/configs/rcu/TREE091
-rw-r--r--tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt1
47 files changed, 989 insertions, 841 deletions
diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt
index 1e6c0da994f5..c0bf2441a2ba 100644
--- a/Documentation/RCU/rcu_dereference.txt
+++ b/Documentation/RCU/rcu_dereference.txt
@@ -28,7 +28,7 @@ o You must use one of the rcu_dereference() family of primitives
28o Avoid cancellation when using the "+" and "-" infix arithmetic 28o Avoid cancellation when using the "+" and "-" infix arithmetic
29 operators. For example, for a given variable "x", avoid 29 operators. For example, for a given variable "x", avoid
30 "(x-x)". There are similar arithmetic pitfalls from other 30 "(x-x)". There are similar arithmetic pitfalls from other
31 arithmetic operatiors, such as "(x*0)", "(x/(x+1))" or "(x%1)". 31 arithmetic operators, such as "(x*0)", "(x/(x+1))" or "(x%1)".
32 The compiler is within its rights to substitute zero for all of 32 The compiler is within its rights to substitute zero for all of
33 these expressions, so that subsequent accesses no longer depend 33 these expressions, so that subsequent accesses no longer depend
34 on the rcu_dereference(), again possibly resulting in bugs due 34 on the rcu_dereference(), again possibly resulting in bugs due
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index b57c0c1cdac6..efb9454875ab 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -26,12 +26,6 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
26 Stall-warning messages may be enabled and disabled completely via 26 Stall-warning messages may be enabled and disabled completely via
27 /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress. 27 /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
28 28
29CONFIG_RCU_CPU_STALL_INFO
30
31 This kernel configuration parameter causes the stall warning to
32 print out additional per-CPU diagnostic information, including
33 information on scheduling-clock ticks and RCU's idle-CPU tracking.
34
35RCU_STALL_DELAY_DELTA 29RCU_STALL_DELAY_DELTA
36 30
37 Although the lockdep facility is extremely useful, it does add 31 Although the lockdep facility is extremely useful, it does add
@@ -101,15 +95,13 @@ interact. Please note that it is not possible to entirely eliminate this
101sort of false positive without resorting to things like stop_machine(), 95sort of false positive without resorting to things like stop_machine(),
102which is overkill for this sort of problem. 96which is overkill for this sort of problem.
103 97
104If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, 98Recent kernels will print a long form of the stall-warning message:
105more information is printed with the stall-warning message, for example:
106 99
107 INFO: rcu_preempt detected stall on CPU 100 INFO: rcu_preempt detected stall on CPU
108 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543 101 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
109 (t=65000 jiffies) 102 (t=65000 jiffies)
110 103
111In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is 104In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed:
112printed:
113 105
114 INFO: rcu_preempt detected stall on CPU 106 INFO: rcu_preempt detected stall on CPU
115 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D 107 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
@@ -171,6 +163,23 @@ message will be about three times the interval between the beginning
171of the stall and the first message. 163of the stall and the first message.
172 164
173 165
166Stall Warnings for Expedited Grace Periods
167
168If an expedited grace period detects a stall, it will place a message
169like the following in dmesg:
170
171 INFO: rcu_sched detected expedited stalls on CPUs: { 1 2 6 } 26009 jiffies s: 1043
172
173This indicates that CPUs 1, 2, and 6 have failed to respond to a
174reschedule IPI, that the expedited grace period has been going on for
17526,009 jiffies, and that the expedited grace-period sequence counter is
1761043. The fact that this last value is odd indicates that an expedited
177grace period is in flight.
178
179It is entirely possible to see stall warnings from normal and from
180expedited grace periods at about the same time from the same run.
181
182
174What Causes RCU CPU Stall Warnings? 183What Causes RCU CPU Stall Warnings?
175 184
176So your kernel printed an RCU CPU stall warning. The next question is 185So your kernel printed an RCU CPU stall warning. The next question is
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 08651da15448..97f17e9decda 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -237,42 +237,26 @@ o "ktl" is the low-order 16 bits (in hexadecimal) of the count of
237 237
238The output of "cat rcu/rcu_preempt/rcuexp" looks as follows: 238The output of "cat rcu/rcu_preempt/rcuexp" looks as follows:
239 239
240s=21872 d=21872 w=0 tf=0 wd1=0 wd2=0 n=0 sc=21872 dt=21872 dl=0 dx=21872 240s=21872 wd0=0 wd1=0 wd2=0 wd3=5 n=0 enq=0 sc=21872
241 241
242These fields are as follows: 242These fields are as follows:
243 243
244o "s" is the starting sequence number. 244o "s" is the sequence number, with an odd number indicating that
245 an expedited grace period is in progress.
245 246
246o "d" is the ending sequence number. When the starting and ending 247o "wd0", "wd1", "wd2", and "wd3" are the number of times that an
247 numbers differ, there is an expedited grace period in progress. 248 attempt to start an expedited grace period found that someone
248 249 else had completed an expedited grace period that satisfies the
249o "w" is the number of times that the sequence numbers have been
250 in danger of wrapping.
251
252o "tf" is the number of times that contention has resulted in a
253 failure to begin an expedited grace period.
254
255o "wd1" and "wd2" are the number of times that an attempt to
256 start an expedited grace period found that someone else had
257 completed an expedited grace period that satisfies the
258 attempted request. "Our work is done." 250 attempted request. "Our work is done."
259 251
260o "n" is number of times that contention was so great that 252o "n" is number of times that a concurrent CPU-hotplug operation
261 the request was demoted from an expedited grace period to 253 forced a fallback to a normal grace period.
262 a normal grace period. 254
255o "enq" is the number of quiescent states still outstanding.
263 256
264o "sc" is the number of times that the attempt to start a 257o "sc" is the number of times that the attempt to start a
265 new expedited grace period succeeded. 258 new expedited grace period succeeded.
266 259
267o "dt" is the number of times that we attempted to update
268 the "d" counter.
269
270o "dl" is the number of times that we failed to update the "d"
271 counter.
272
273o "dx" is the number of times that we succeeded in updating
274 the "d" counter.
275
276 260
277The output of "cat rcu/rcu_preempt/rcugp" looks as follows: 261The output of "cat rcu/rcu_preempt/rcugp" looks as follows:
278 262
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 5746b0c77f3e..adc2184009c5 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -883,7 +883,7 @@ All: lockdep-checked RCU-protected pointer access
883 883
884 rcu_access_pointer 884 rcu_access_pointer
885 rcu_dereference_raw 885 rcu_dereference_raw
886 rcu_lockdep_assert 886 RCU_LOCKDEP_WARN
887 rcu_sleep_check 887 rcu_sleep_check
888 RCU_NONIDLE 888 RCU_NONIDLE
889 889
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f0459cd7b..01b5b68a237a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3135,22 +3135,35 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
3135 in a given burst of a callback-flood test. 3135 in a given burst of a callback-flood test.
3136 3136
3137 rcutorture.fqs_duration= [KNL] 3137 rcutorture.fqs_duration= [KNL]
3138 Set duration of force_quiescent_state bursts. 3138 Set duration of force_quiescent_state bursts
3139 in microseconds.
3139 3140
3140 rcutorture.fqs_holdoff= [KNL] 3141 rcutorture.fqs_holdoff= [KNL]
3141 Set holdoff time within force_quiescent_state bursts. 3142 Set holdoff time within force_quiescent_state bursts
3143 in microseconds.
3142 3144
3143 rcutorture.fqs_stutter= [KNL] 3145 rcutorture.fqs_stutter= [KNL]
3144 Set wait time between force_quiescent_state bursts. 3146 Set wait time between force_quiescent_state bursts
3147 in seconds.
3148
3149 rcutorture.gp_cond= [KNL]
3150 Use conditional/asynchronous update-side
3151 primitives, if available.
3145 3152
3146 rcutorture.gp_exp= [KNL] 3153 rcutorture.gp_exp= [KNL]
3147 Use expedited update-side primitives. 3154 Use expedited update-side primitives, if available.
3148 3155
3149 rcutorture.gp_normal= [KNL] 3156 rcutorture.gp_normal= [KNL]
3150 Use normal (non-expedited) update-side primitives. 3157 Use normal (non-expedited) asynchronous
3151 If both gp_exp and gp_normal are set, do both. 3158 update-side primitives, if available.
3152 If neither gp_exp nor gp_normal are set, still 3159
3153 do both. 3160 rcutorture.gp_sync= [KNL]
3161 Use normal (non-expedited) synchronous
3162 update-side primitives, if available. If all
3163 of rcutorture.gp_cond=, rcutorture.gp_exp=,
3164 rcutorture.gp_normal=, and rcutorture.gp_sync=
3165 are zero, rcutorture acts as if is interpreted
3166 they are all non-zero.
3154 3167
3155 rcutorture.n_barrier_cbs= [KNL] 3168 rcutorture.n_barrier_cbs= [KNL]
3156 Set callbacks/threads for rcu_barrier() testing. 3169 Set callbacks/threads for rcu_barrier() testing.
@@ -3177,9 +3190,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
3177 Set time (s) between CPU-hotplug operations, or 3190 Set time (s) between CPU-hotplug operations, or
3178 zero to disable CPU-hotplug testing. 3191 zero to disable CPU-hotplug testing.
3179 3192
3180 rcutorture.torture_runnable= [BOOT]
3181 Start rcutorture running at boot time.
3182
3183 rcutorture.shuffle_interval= [KNL] 3193 rcutorture.shuffle_interval= [KNL]
3184 Set task-shuffle interval (s). Shuffling tasks 3194 Set task-shuffle interval (s). Shuffling tasks
3185 allows some CPUs to go into dyntick-idle mode 3195 allows some CPUs to go into dyntick-idle mode
@@ -3220,6 +3230,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
3220 Test RCU's dyntick-idle handling. See also the 3230 Test RCU's dyntick-idle handling. See also the
3221 rcutorture.shuffle_interval parameter. 3231 rcutorture.shuffle_interval parameter.
3222 3232
3233 rcutorture.torture_runnable= [BOOT]
3234 Start rcutorture running at boot time.
3235
3223 rcutorture.torture_type= [KNL] 3236 rcutorture.torture_type= [KNL]
3224 Specify the RCU implementation to test. 3237 Specify the RCU implementation to test.
3225 3238
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 13feb697271f..318523872db5 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -194,22 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
194 (*) On any given CPU, dependent memory accesses will be issued in order, with 194 (*) On any given CPU, dependent memory accesses will be issued in order, with
195 respect to itself. This means that for: 195 respect to itself. This means that for:
196 196
197 ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q); 197 WRITE_ONCE(Q, P); smp_read_barrier_depends(); D = READ_ONCE(*Q);
198 198
199 the CPU will issue the following memory operations: 199 the CPU will issue the following memory operations:
200 200
201 Q = LOAD P, D = LOAD *Q 201 Q = LOAD P, D = LOAD *Q
202 202
203 and always in that order. On most systems, smp_read_barrier_depends() 203 and always in that order. On most systems, smp_read_barrier_depends()
204 does nothing, but it is required for DEC Alpha. The ACCESS_ONCE() 204 does nothing, but it is required for DEC Alpha. The READ_ONCE()
205 is required to prevent compiler mischief. Please note that you 205 and WRITE_ONCE() are required to prevent compiler mischief. Please
206 should normally use something like rcu_dereference() instead of 206 note that you should normally use something like rcu_dereference()
207 open-coding smp_read_barrier_depends(). 207 instead of open-coding smp_read_barrier_depends().
208 208
209 (*) Overlapping loads and stores within a particular CPU will appear to be 209 (*) Overlapping loads and stores within a particular CPU will appear to be
210 ordered within that CPU. This means that for: 210 ordered within that CPU. This means that for:
211 211
212 a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b; 212 a = READ_ONCE(*X); WRITE_ONCE(*X, b);
213 213
214 the CPU will only issue the following sequence of memory operations: 214 the CPU will only issue the following sequence of memory operations:
215 215
@@ -217,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
217 217
218 And for: 218 And for:
219 219
220 ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X); 220 WRITE_ONCE(*X, c); d = READ_ONCE(*X);
221 221
222 the CPU will only issue: 222 the CPU will only issue:
223 223
@@ -228,11 +228,11 @@ There are some minimal guarantees that may be expected of a CPU:
228 228
229And there are a number of things that _must_ or _must_not_ be assumed: 229And there are a number of things that _must_ or _must_not_ be assumed:
230 230
231 (*) It _must_not_ be assumed that the compiler will do what you want with 231 (*) It _must_not_ be assumed that the compiler will do what you want
232 memory references that are not protected by ACCESS_ONCE(). Without 232 with memory references that are not protected by READ_ONCE() and
233 ACCESS_ONCE(), the compiler is within its rights to do all sorts 233 WRITE_ONCE(). Without them, the compiler is within its rights to
234 of "creative" transformations, which are covered in the Compiler 234 do all sorts of "creative" transformations, which are covered in
235 Barrier section. 235 the Compiler Barrier section.
236 236
237 (*) It _must_not_ be assumed that independent loads and stores will be issued 237 (*) It _must_not_ be assumed that independent loads and stores will be issued
238 in the order given. This means that for: 238 in the order given. This means that for:
@@ -520,8 +520,8 @@ following sequence of events:
520 { A == 1, B == 2, C = 3, P == &A, Q == &C } 520 { A == 1, B == 2, C = 3, P == &A, Q == &C }
521 B = 4; 521 B = 4;
522 <write barrier> 522 <write barrier>
523 ACCESS_ONCE(P) = &B 523 WRITE_ONCE(P, &B)
524 Q = ACCESS_ONCE(P); 524 Q = READ_ONCE(P);
525 D = *Q; 525 D = *Q;
526 526
527There's a clear data dependency here, and it would seem that by the end of the 527There's a clear data dependency here, and it would seem that by the end of the
@@ -547,8 +547,8 @@ between the address load and the data load:
547 { A == 1, B == 2, C = 3, P == &A, Q == &C } 547 { A == 1, B == 2, C = 3, P == &A, Q == &C }
548 B = 4; 548 B = 4;
549 <write barrier> 549 <write barrier>
550 ACCESS_ONCE(P) = &B 550 WRITE_ONCE(P, &B);
551 Q = ACCESS_ONCE(P); 551 Q = READ_ONCE(P);
552 <data dependency barrier> 552 <data dependency barrier>
553 D = *Q; 553 D = *Q;
554 554
@@ -574,8 +574,8 @@ access:
574 { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } 574 { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
575 M[1] = 4; 575 M[1] = 4;
576 <write barrier> 576 <write barrier>
577 ACCESS_ONCE(P) = 1 577 WRITE_ONCE(P, 1);
578 Q = ACCESS_ONCE(P); 578 Q = READ_ONCE(P);
579 <data dependency barrier> 579 <data dependency barrier>
580 D = M[Q]; 580 D = M[Q];
581 581
@@ -596,10 +596,10 @@ A load-load control dependency requires a full read memory barrier, not
596simply a data dependency barrier to make it work correctly. Consider the 596simply a data dependency barrier to make it work correctly. Consider the
597following bit of code: 597following bit of code:
598 598
599 q = ACCESS_ONCE(a); 599 q = READ_ONCE(a);
600 if (q) { 600 if (q) {
601 <data dependency barrier> /* BUG: No data dependency!!! */ 601 <data dependency barrier> /* BUG: No data dependency!!! */
602 p = ACCESS_ONCE(b); 602 p = READ_ONCE(b);
603 } 603 }
604 604
605This will not have the desired effect because there is no actual data 605This will not have the desired effect because there is no actual data
@@ -608,10 +608,10 @@ by attempting to predict the outcome in advance, so that other CPUs see
608the load from b as having happened before the load from a. In such a 608the load from b as having happened before the load from a. In such a
609case what's actually required is: 609case what's actually required is:
610 610
611 q = ACCESS_ONCE(a); 611 q = READ_ONCE(a);
612 if (q) { 612 if (q) {
613 <read barrier> 613 <read barrier>
614 p = ACCESS_ONCE(b); 614 p = READ_ONCE(b);
615 } 615 }
616 616
617However, stores are not speculated. This means that ordering -is- provided 617However, stores are not speculated. This means that ordering -is- provided
@@ -619,7 +619,7 @@ for load-store control dependencies, as in the following example:
619 619
620 q = READ_ONCE_CTRL(a); 620 q = READ_ONCE_CTRL(a);
621 if (q) { 621 if (q) {
622 ACCESS_ONCE(b) = p; 622 WRITE_ONCE(b, p);
623 } 623 }
624 624
625Control dependencies pair normally with other types of barriers. That 625Control dependencies pair normally with other types of barriers. That
@@ -647,11 +647,11 @@ branches of the "if" statement as follows:
647 q = READ_ONCE_CTRL(a); 647 q = READ_ONCE_CTRL(a);
648 if (q) { 648 if (q) {
649 barrier(); 649 barrier();
650 ACCESS_ONCE(b) = p; 650 WRITE_ONCE(b, p);
651 do_something(); 651 do_something();
652 } else { 652 } else {
653 barrier(); 653 barrier();
654 ACCESS_ONCE(b) = p; 654 WRITE_ONCE(b, p);
655 do_something_else(); 655 do_something_else();
656 } 656 }
657 657
@@ -660,12 +660,12 @@ optimization levels:
660 660
661 q = READ_ONCE_CTRL(a); 661 q = READ_ONCE_CTRL(a);
662 barrier(); 662 barrier();
663 ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ 663 WRITE_ONCE(b, p); /* BUG: No ordering vs. load from a!!! */
664 if (q) { 664 if (q) {
665 /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ 665 /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
666 do_something(); 666 do_something();
667 } else { 667 } else {
668 /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ 668 /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
669 do_something_else(); 669 do_something_else();
670 } 670 }
671 671
@@ -676,7 +676,7 @@ assembly code even after all compiler optimizations have been applied.
676Therefore, if you need ordering in this example, you need explicit 676Therefore, if you need ordering in this example, you need explicit
677memory barriers, for example, smp_store_release(): 677memory barriers, for example, smp_store_release():
678 678
679 q = ACCESS_ONCE(a); 679 q = READ_ONCE(a);
680 if (q) { 680 if (q) {
681 smp_store_release(&b, p); 681 smp_store_release(&b, p);
682 do_something(); 682 do_something();
@@ -690,10 +690,10 @@ ordering is guaranteed only when the stores differ, for example:
690 690
691 q = READ_ONCE_CTRL(a); 691 q = READ_ONCE_CTRL(a);
692 if (q) { 692 if (q) {
693 ACCESS_ONCE(b) = p; 693 WRITE_ONCE(b, p);
694 do_something(); 694 do_something();
695 } else { 695 } else {
696 ACCESS_ONCE(b) = r; 696 WRITE_ONCE(b, r);
697 do_something_else(); 697 do_something_else();
698 } 698 }
699 699
@@ -706,10 +706,10 @@ the needed conditional. For example:
706 706
707 q = READ_ONCE_CTRL(a); 707 q = READ_ONCE_CTRL(a);
708 if (q % MAX) { 708 if (q % MAX) {
709 ACCESS_ONCE(b) = p; 709 WRITE_ONCE(b, p);
710 do_something(); 710 do_something();
711 } else { 711 } else {
712 ACCESS_ONCE(b) = r; 712 WRITE_ONCE(b, r);
713 do_something_else(); 713 do_something_else();
714 } 714 }
715 715
@@ -718,7 +718,7 @@ equal to zero, in which case the compiler is within its rights to
718transform the above code into the following: 718transform the above code into the following:
719 719
720 q = READ_ONCE_CTRL(a); 720 q = READ_ONCE_CTRL(a);
721 ACCESS_ONCE(b) = p; 721 WRITE_ONCE(b, p);
722 do_something_else(); 722 do_something_else();
723 723
724Given this transformation, the CPU is not required to respect the ordering 724Given this transformation, the CPU is not required to respect the ordering
@@ -731,10 +731,10 @@ one, perhaps as follows:
731 q = READ_ONCE_CTRL(a); 731 q = READ_ONCE_CTRL(a);
732 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ 732 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
733 if (q % MAX) { 733 if (q % MAX) {
734 ACCESS_ONCE(b) = p; 734 WRITE_ONCE(b, p);
735 do_something(); 735 do_something();
736 } else { 736 } else {
737 ACCESS_ONCE(b) = r; 737 WRITE_ONCE(b, r);
738 do_something_else(); 738 do_something_else();
739 } 739 }
740 740
@@ -746,18 +746,18 @@ You must also be careful not to rely too much on boolean short-circuit
746evaluation. Consider this example: 746evaluation. Consider this example:
747 747
748 q = READ_ONCE_CTRL(a); 748 q = READ_ONCE_CTRL(a);
749 if (a || 1 > 0) 749 if (q || 1 > 0)
750 ACCESS_ONCE(b) = 1; 750 WRITE_ONCE(b, 1);
751 751
752Because the first condition cannot fault and the second condition is 752Because the first condition cannot fault and the second condition is
753always true, the compiler can transform this example as following, 753always true, the compiler can transform this example as following,
754defeating control dependency: 754defeating control dependency:
755 755
756 q = READ_ONCE_CTRL(a); 756 q = READ_ONCE_CTRL(a);
757 ACCESS_ONCE(b) = 1; 757 WRITE_ONCE(b, 1);
758 758
759This example underscores the need to ensure that the compiler cannot 759This example underscores the need to ensure that the compiler cannot
760out-guess your code. More generally, although ACCESS_ONCE() does force 760out-guess your code. More generally, although READ_ONCE() does force
761the compiler to actually emit code for a given load, it does not force 761the compiler to actually emit code for a given load, it does not force
762the compiler to use the results. 762the compiler to use the results.
763 763
@@ -769,7 +769,7 @@ x and y both being zero:
769 ======================= ======================= 769 ======================= =======================
770 r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y); 770 r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y);
771 if (r1 > 0) if (r2 > 0) 771 if (r1 > 0) if (r2 > 0)
772 ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1; 772 WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
773 773
774 assert(!(r1 == 1 && r2 == 1)); 774 assert(!(r1 == 1 && r2 == 1));
775 775
@@ -779,7 +779,7 @@ then adding the following CPU would guarantee a related assertion:
779 779
780 CPU 2 780 CPU 2
781 ===================== 781 =====================
782 ACCESS_ONCE(x) = 2; 782 WRITE_ONCE(x, 2);
783 783
784 assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */ 784 assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
785 785
@@ -798,8 +798,7 @@ In summary:
798 798
799 (*) Control dependencies must be headed by READ_ONCE_CTRL(). 799 (*) Control dependencies must be headed by READ_ONCE_CTRL().
800 Or, as a much less preferable alternative, interpose 800 Or, as a much less preferable alternative, interpose
801 be headed by READ_ONCE() or an ACCESS_ONCE() read and must 801 smp_read_barrier_depends() between a READ_ONCE() and the
802 have smp_read_barrier_depends() between this read and the
803 control-dependent write. 802 control-dependent write.
804 803
805 (*) Control dependencies can order prior loads against later stores. 804 (*) Control dependencies can order prior loads against later stores.
@@ -815,15 +814,16 @@ In summary:
815 814
816 (*) Control dependencies require at least one run-time conditional 815 (*) Control dependencies require at least one run-time conditional
817 between the prior load and the subsequent store, and this 816 between the prior load and the subsequent store, and this
818 conditional must involve the prior load. If the compiler 817 conditional must involve the prior load. If the compiler is able
819 is able to optimize the conditional away, it will have also 818 to optimize the conditional away, it will have also optimized
820 optimized away the ordering. Careful use of ACCESS_ONCE() can 819 away the ordering. Careful use of READ_ONCE_CTRL() READ_ONCE(),
821 help to preserve the needed conditional. 820 and WRITE_ONCE() can help to preserve the needed conditional.
822 821
823 (*) Control dependencies require that the compiler avoid reordering the 822 (*) Control dependencies require that the compiler avoid reordering the
824 dependency into nonexistence. Careful use of ACCESS_ONCE() or 823 dependency into nonexistence. Careful use of READ_ONCE_CTRL()
825 barrier() can help to preserve your control dependency. Please 824 or smp_read_barrier_depends() can help to preserve your control
826 see the Compiler Barrier section for more information. 825 dependency. Please see the Compiler Barrier section for more
826 information.
827 827
828 (*) Control dependencies pair normally with other types of barriers. 828 (*) Control dependencies pair normally with other types of barriers.
829 829
@@ -848,11 +848,11 @@ barrier, an acquire barrier, a release barrier, or a general barrier:
848 848
849 CPU 1 CPU 2 849 CPU 1 CPU 2
850 =============== =============== 850 =============== ===============
851 ACCESS_ONCE(a) = 1; 851 WRITE_ONCE(a, 1);
852 <write barrier> 852 <write barrier>
853 ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b); 853 WRITE_ONCE(b, 2); x = READ_ONCE(b);
854 <read barrier> 854 <read barrier>
855 y = ACCESS_ONCE(a); 855 y = READ_ONCE(a);
856 856
857Or: 857Or:
858 858
@@ -860,7 +860,7 @@ Or:
860 =============== =============================== 860 =============== ===============================
861 a = 1; 861 a = 1;
862 <write barrier> 862 <write barrier>
863 ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b); 863 WRITE_ONCE(b, &a); x = READ_ONCE(b);
864 <data dependency barrier> 864 <data dependency barrier>
865 y = *x; 865 y = *x;
866 866
@@ -868,11 +868,11 @@ Or even:
868 868
869 CPU 1 CPU 2 869 CPU 1 CPU 2
870 =============== =============================== 870 =============== ===============================
871 r1 = ACCESS_ONCE(y); 871 r1 = READ_ONCE(y);
872 <general barrier> 872 <general barrier>
873 ACCESS_ONCE(y) = 1; if (r2 = ACCESS_ONCE(x)) { 873 WRITE_ONCE(y, 1); if (r2 = READ_ONCE(x)) {
874 <implicit control dependency> 874 <implicit control dependency>
875 ACCESS_ONCE(y) = 1; 875 WRITE_ONCE(y, 1);
876 } 876 }
877 877
878 assert(r1 == 0 || r2 == 0); 878 assert(r1 == 0 || r2 == 0);
@@ -886,11 +886,11 @@ versa:
886 886
887 CPU 1 CPU 2 887 CPU 1 CPU 2
888 =================== =================== 888 =================== ===================
889 ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c); 889 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
890 ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d); 890 WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d);
891 <write barrier> \ <read barrier> 891 <write barrier> \ <read barrier>
892 ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a); 892 WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a);
893 ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b); 893 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
894 894
895 895
896EXAMPLES OF MEMORY BARRIER SEQUENCES 896EXAMPLES OF MEMORY BARRIER SEQUENCES
@@ -1340,10 +1340,10 @@ compiler from moving the memory accesses either side of it to the other side:
1340 1340
1341 barrier(); 1341 barrier();
1342 1342
1343This is a general barrier -- there are no read-read or write-write variants 1343This is a general barrier -- there are no read-read or write-write
1344of barrier(). However, ACCESS_ONCE() can be thought of as a weak form 1344variants of barrier(). However, READ_ONCE() and WRITE_ONCE() can be
1345for barrier() that affects only the specific accesses flagged by the 1345thought of as weak forms of barrier() that affect only the specific
1346ACCESS_ONCE(). 1346accesses flagged by the READ_ONCE() or WRITE_ONCE().
1347 1347
1348The barrier() function has the following effects: 1348The barrier() function has the following effects:
1349 1349
@@ -1355,9 +1355,10 @@ The barrier() function has the following effects:
1355 (*) Within a loop, forces the compiler to load the variables used 1355 (*) Within a loop, forces the compiler to load the variables used
1356 in that loop's conditional on each pass through that loop. 1356 in that loop's conditional on each pass through that loop.
1357 1357
1358The ACCESS_ONCE() function can prevent any number of optimizations that, 1358The READ_ONCE() and WRITE_ONCE() functions can prevent any number of
1359while perfectly safe in single-threaded code, can be fatal in concurrent 1359optimizations that, while perfectly safe in single-threaded code, can
1360code. Here are some examples of these sorts of optimizations: 1360be fatal in concurrent code. Here are some examples of these sorts
1361of optimizations:
1361 1362
1362 (*) The compiler is within its rights to reorder loads and stores 1363 (*) The compiler is within its rights to reorder loads and stores
1363 to the same variable, and in some cases, the CPU is within its 1364 to the same variable, and in some cases, the CPU is within its
@@ -1370,11 +1371,11 @@ code. Here are some examples of these sorts of optimizations:
1370 Might result in an older value of x stored in a[1] than in a[0]. 1371 Might result in an older value of x stored in a[1] than in a[0].
1371 Prevent both the compiler and the CPU from doing this as follows: 1372 Prevent both the compiler and the CPU from doing this as follows:
1372 1373
1373 a[0] = ACCESS_ONCE(x); 1374 a[0] = READ_ONCE(x);
1374 a[1] = ACCESS_ONCE(x); 1375 a[1] = READ_ONCE(x);
1375 1376
1376 In short, ACCESS_ONCE() provides cache coherence for accesses from 1377 In short, READ_ONCE() and WRITE_ONCE() provide cache coherence for
1377 multiple CPUs to a single variable. 1378 accesses from multiple CPUs to a single variable.
1378 1379
1379 (*) The compiler is within its rights to merge successive loads from 1380 (*) The compiler is within its rights to merge successive loads from
1380 the same variable. Such merging can cause the compiler to "optimize" 1381 the same variable. Such merging can cause the compiler to "optimize"
@@ -1391,9 +1392,9 @@ code. Here are some examples of these sorts of optimizations:
1391 for (;;) 1392 for (;;)
1392 do_something_with(tmp); 1393 do_something_with(tmp);
1393 1394
1394 Use ACCESS_ONCE() to prevent the compiler from doing this to you: 1395 Use READ_ONCE() to prevent the compiler from doing this to you:
1395 1396
1396 while (tmp = ACCESS_ONCE(a)) 1397 while (tmp = READ_ONCE(a))
1397 do_something_with(tmp); 1398 do_something_with(tmp);
1398 1399
1399 (*) The compiler is within its rights to reload a variable, for example, 1400 (*) The compiler is within its rights to reload a variable, for example,
@@ -1415,9 +1416,9 @@ code. Here are some examples of these sorts of optimizations:
1415 a was modified by some other CPU between the "while" statement and 1416 a was modified by some other CPU between the "while" statement and
1416 the call to do_something_with(). 1417 the call to do_something_with().
1417 1418
1418 Again, use ACCESS_ONCE() to prevent the compiler from doing this: 1419 Again, use READ_ONCE() to prevent the compiler from doing this:
1419 1420
1420 while (tmp = ACCESS_ONCE(a)) 1421 while (tmp = READ_ONCE(a))
1421 do_something_with(tmp); 1422 do_something_with(tmp);
1422 1423
1423 Note that if the compiler runs short of registers, it might save 1424 Note that if the compiler runs short of registers, it might save
@@ -1437,21 +1438,21 @@ code. Here are some examples of these sorts of optimizations:
1437 1438
1438 do { } while (0); 1439 do { } while (0);
1439 1440
1440 This transformation is a win for single-threaded code because it gets 1441 This transformation is a win for single-threaded code because it
1441 rid of a load and a branch. The problem is that the compiler will 1442 gets rid of a load and a branch. The problem is that the compiler
1442 carry out its proof assuming that the current CPU is the only one 1443 will carry out its proof assuming that the current CPU is the only
1443 updating variable 'a'. If variable 'a' is shared, then the compiler's 1444 one updating variable 'a'. If variable 'a' is shared, then the
1444 proof will be erroneous. Use ACCESS_ONCE() to tell the compiler 1445 compiler's proof will be erroneous. Use READ_ONCE() to tell the
1445 that it doesn't know as much as it thinks it does: 1446 compiler that it doesn't know as much as it thinks it does:
1446 1447
1447 while (tmp = ACCESS_ONCE(a)) 1448 while (tmp = READ_ONCE(a))
1448 do_something_with(tmp); 1449 do_something_with(tmp);
1449 1450
1450 But please note that the compiler is also closely watching what you 1451 But please note that the compiler is also closely watching what you
1451 do with the value after the ACCESS_ONCE(). For example, suppose you 1452 do with the value after the READ_ONCE(). For example, suppose you
1452 do the following and MAX is a preprocessor macro with the value 1: 1453 do the following and MAX is a preprocessor macro with the value 1:
1453 1454
1454 while ((tmp = ACCESS_ONCE(a)) % MAX) 1455 while ((tmp = READ_ONCE(a)) % MAX)
1455 do_something_with(tmp); 1456 do_something_with(tmp);
1456 1457
1457 Then the compiler knows that the result of the "%" operator applied 1458 Then the compiler knows that the result of the "%" operator applied
@@ -1475,12 +1476,12 @@ code. Here are some examples of these sorts of optimizations:
1475 surprise if some other CPU might have stored to variable 'a' in the 1476 surprise if some other CPU might have stored to variable 'a' in the
1476 meantime. 1477 meantime.
1477 1478
1478 Use ACCESS_ONCE() to prevent the compiler from making this sort of 1479 Use WRITE_ONCE() to prevent the compiler from making this sort of
1479 wrong guess: 1480 wrong guess:
1480 1481
1481 ACCESS_ONCE(a) = 0; 1482 WRITE_ONCE(a, 0);
1482 /* Code that does not store to variable a. */ 1483 /* Code that does not store to variable a. */
1483 ACCESS_ONCE(a) = 0; 1484 WRITE_ONCE(a, 0);
1484 1485
1485 (*) The compiler is within its rights to reorder memory accesses unless 1486 (*) The compiler is within its rights to reorder memory accesses unless
1486 you tell it not to. For example, consider the following interaction 1487 you tell it not to. For example, consider the following interaction
@@ -1509,40 +1510,43 @@ code. Here are some examples of these sorts of optimizations:
1509 } 1510 }
1510 1511
1511 If the interrupt occurs between these two statement, then 1512 If the interrupt occurs between these two statement, then
1512 interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE() 1513 interrupt_handler() might be passed a garbled msg. Use WRITE_ONCE()
1513 to prevent this as follows: 1514 to prevent this as follows:
1514 1515
1515 void process_level(void) 1516 void process_level(void)
1516 { 1517 {
1517 ACCESS_ONCE(msg) = get_message(); 1518 WRITE_ONCE(msg, get_message());
1518 ACCESS_ONCE(flag) = true; 1519 WRITE_ONCE(flag, true);
1519 } 1520 }
1520 1521
1521 void interrupt_handler(void) 1522 void interrupt_handler(void)
1522 { 1523 {
1523 if (ACCESS_ONCE(flag)) 1524 if (READ_ONCE(flag))
1524 process_message(ACCESS_ONCE(msg)); 1525 process_message(READ_ONCE(msg));
1525 } 1526 }
1526 1527
1527 Note that the ACCESS_ONCE() wrappers in interrupt_handler() 1528 Note that the READ_ONCE() and WRITE_ONCE() wrappers in
1528 are needed if this interrupt handler can itself be interrupted 1529 interrupt_handler() are needed if this interrupt handler can itself
1529 by something that also accesses 'flag' and 'msg', for example, 1530 be interrupted by something that also accesses 'flag' and 'msg',
1530 a nested interrupt or an NMI. Otherwise, ACCESS_ONCE() is not 1531 for example, a nested interrupt or an NMI. Otherwise, READ_ONCE()
1531 needed in interrupt_handler() other than for documentation purposes. 1532 and WRITE_ONCE() are not needed in interrupt_handler() other than
1532 (Note also that nested interrupts do not typically occur in modern 1533 for documentation purposes. (Note also that nested interrupts
1533 Linux kernels, in fact, if an interrupt handler returns with 1534 do not typically occur in modern Linux kernels, in fact, if an
1534 interrupts enabled, you will get a WARN_ONCE() splat.) 1535 interrupt handler returns with interrupts enabled, you will get a
1535 1536 WARN_ONCE() splat.)
1536 You should assume that the compiler can move ACCESS_ONCE() past 1537
1537 code not containing ACCESS_ONCE(), barrier(), or similar primitives. 1538 You should assume that the compiler can move READ_ONCE() and
1538 1539 WRITE_ONCE() past code not containing READ_ONCE(), WRITE_ONCE(),
1539 This effect could also be achieved using barrier(), but ACCESS_ONCE() 1540 barrier(), or similar primitives.
1540 is more selective: With ACCESS_ONCE(), the compiler need only forget 1541
1541 the contents of the indicated memory locations, while with barrier() 1542 This effect could also be achieved using barrier(), but READ_ONCE()
1542 the compiler must discard the value of all memory locations that 1543 and WRITE_ONCE() are more selective: With READ_ONCE() and
1543 it has currented cached in any machine registers. Of course, 1544 WRITE_ONCE(), the compiler need only forget the contents of the
1544 the compiler must also respect the order in which the ACCESS_ONCE()s 1545 indicated memory locations, while with barrier() the compiler must
1545 occur, though the CPU of course need not do so. 1546 discard the value of all memory locations that it has currented
1547 cached in any machine registers. Of course, the compiler must also
1548 respect the order in which the READ_ONCE()s and WRITE_ONCE()s occur,
1549 though the CPU of course need not do so.
1546 1550
1547 (*) The compiler is within its rights to invent stores to a variable, 1551 (*) The compiler is within its rights to invent stores to a variable,
1548 as in the following example: 1552 as in the following example:
@@ -1562,16 +1566,16 @@ code. Here are some examples of these sorts of optimizations:
1562 a branch. Unfortunately, in concurrent code, this optimization 1566 a branch. Unfortunately, in concurrent code, this optimization
1563 could cause some other CPU to see a spurious value of 42 -- even 1567 could cause some other CPU to see a spurious value of 42 -- even
1564 if variable 'a' was never zero -- when loading variable 'b'. 1568 if variable 'a' was never zero -- when loading variable 'b'.
1565 Use ACCESS_ONCE() to prevent this as follows: 1569 Use WRITE_ONCE() to prevent this as follows:
1566 1570
1567 if (a) 1571 if (a)
1568 ACCESS_ONCE(b) = a; 1572 WRITE_ONCE(b, a);
1569 else 1573 else
1570 ACCESS_ONCE(b) = 42; 1574 WRITE_ONCE(b, 42);
1571 1575
1572 The compiler can also invent loads. These are usually less 1576 The compiler can also invent loads. These are usually less
1573 damaging, but they can result in cache-line bouncing and thus in 1577 damaging, but they can result in cache-line bouncing and thus in
1574 poor performance and scalability. Use ACCESS_ONCE() to prevent 1578 poor performance and scalability. Use READ_ONCE() to prevent
1575 invented loads. 1579 invented loads.
1576 1580
1577 (*) For aligned memory locations whose size allows them to be accessed 1581 (*) For aligned memory locations whose size allows them to be accessed
@@ -1590,9 +1594,9 @@ code. Here are some examples of these sorts of optimizations:
1590 This optimization can therefore be a win in single-threaded code. 1594 This optimization can therefore be a win in single-threaded code.
1591 In fact, a recent bug (since fixed) caused GCC to incorrectly use 1595 In fact, a recent bug (since fixed) caused GCC to incorrectly use
1592 this optimization in a volatile store. In the absence of such bugs, 1596 this optimization in a volatile store. In the absence of such bugs,
1593 use of ACCESS_ONCE() prevents store tearing in the following example: 1597 use of WRITE_ONCE() prevents store tearing in the following example:
1594 1598
1595 ACCESS_ONCE(p) = 0x00010002; 1599 WRITE_ONCE(p, 0x00010002);
1596 1600
1597 Use of packed structures can also result in load and store tearing, 1601 Use of packed structures can also result in load and store tearing,
1598 as in this example: 1602 as in this example:
@@ -1609,22 +1613,23 @@ code. Here are some examples of these sorts of optimizations:
1609 foo2.b = foo1.b; 1613 foo2.b = foo1.b;
1610 foo2.c = foo1.c; 1614 foo2.c = foo1.c;
1611 1615
1612 Because there are no ACCESS_ONCE() wrappers and no volatile markings, 1616 Because there are no READ_ONCE() or WRITE_ONCE() wrappers and no
1613 the compiler would be well within its rights to implement these three 1617 volatile markings, the compiler would be well within its rights to
1614 assignment statements as a pair of 32-bit loads followed by a pair 1618 implement these three assignment statements as a pair of 32-bit
1615 of 32-bit stores. This would result in load tearing on 'foo1.b' 1619 loads followed by a pair of 32-bit stores. This would result in
1616 and store tearing on 'foo2.b'. ACCESS_ONCE() again prevents tearing 1620 load tearing on 'foo1.b' and store tearing on 'foo2.b'. READ_ONCE()
1617 in this example: 1621 and WRITE_ONCE() again prevent tearing in this example:
1618 1622
1619 foo2.a = foo1.a; 1623 foo2.a = foo1.a;
1620 ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b); 1624 WRITE_ONCE(foo2.b, READ_ONCE(foo1.b));
1621 foo2.c = foo1.c; 1625 foo2.c = foo1.c;
1622 1626
1623All that aside, it is never necessary to use ACCESS_ONCE() on a variable 1627All that aside, it is never necessary to use READ_ONCE() and
1624that has been marked volatile. For example, because 'jiffies' is marked 1628WRITE_ONCE() on a variable that has been marked volatile. For example,
1625volatile, it is never necessary to say ACCESS_ONCE(jiffies). The reason 1629because 'jiffies' is marked volatile, it is never necessary to
1626for this is that ACCESS_ONCE() is implemented as a volatile cast, which 1630say READ_ONCE(jiffies). The reason for this is that READ_ONCE() and
1627has no effect when its argument is already marked volatile. 1631WRITE_ONCE() are implemented as volatile casts, which has no effect when
1632its argument is already marked volatile.
1628 1633
1629Please note that these compiler barriers have no direct effect on the CPU, 1634Please note that these compiler barriers have no direct effect on the CPU,
1630which may then reorder things however it wishes. 1635which may then reorder things however it wishes.
@@ -1646,14 +1651,15 @@ The Linux kernel has eight basic CPU memory barriers:
1646All memory barriers except the data dependency barriers imply a compiler 1651All memory barriers except the data dependency barriers imply a compiler
1647barrier. Data dependencies do not impose any additional compiler ordering. 1652barrier. Data dependencies do not impose any additional compiler ordering.
1648 1653
1649Aside: In the case of data dependencies, the compiler would be expected to 1654Aside: In the case of data dependencies, the compiler would be expected
1650issue the loads in the correct order (eg. `a[b]` would have to load the value 1655to issue the loads in the correct order (eg. `a[b]` would have to load
1651of b before loading a[b]), however there is no guarantee in the C specification 1656the value of b before loading a[b]), however there is no guarantee in
1652that the compiler may not speculate the value of b (eg. is equal to 1) and load 1657the C specification that the compiler may not speculate the value of b
1653a before b (eg. tmp = a[1]; if (b != 1) tmp = a[b]; ). There is also the 1658(eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
1654problem of a compiler reloading b after having loaded a[b], thus having a newer 1659tmp = a[b]; ). There is also the problem of a compiler reloading b after
1655copy of b than a[b]. A consensus has not yet been reached about these problems, 1660having loaded a[b], thus having a newer copy of b than a[b]. A consensus
1656however the ACCESS_ONCE macro is a good place to start looking. 1661has not yet been reached about these problems, however the READ_ONCE()
1662macro is a good place to start looking.
1657 1663
1658SMP memory barriers are reduced to compiler barriers on uniprocessor compiled 1664SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
1659systems because it is assumed that a CPU will appear to be self-consistent, 1665systems because it is assumed that a CPU will appear to be self-consistent,
@@ -1852,11 +1858,12 @@ Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
1852imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE 1858imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE
1853pair to produce a full barrier, the ACQUIRE can be followed by an 1859pair to produce a full barrier, the ACQUIRE can be followed by an
1854smp_mb__after_unlock_lock() invocation. This will produce a full barrier 1860smp_mb__after_unlock_lock() invocation. This will produce a full barrier
1855if either (a) the RELEASE and the ACQUIRE are executed by the same 1861(including transitivity) if either (a) the RELEASE and the ACQUIRE are
1856CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable. 1862executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
1857The smp_mb__after_unlock_lock() primitive is free on many architectures. 1863the same variable. The smp_mb__after_unlock_lock() primitive is free
1858Without smp_mb__after_unlock_lock(), the CPU's execution of the critical 1864on many architectures. Without smp_mb__after_unlock_lock(), the CPU's
1859sections corresponding to the RELEASE and the ACQUIRE can cross, so that: 1865execution of the critical sections corresponding to the RELEASE and the
1866ACQUIRE can cross, so that:
1860 1867
1861 *A = a; 1868 *A = a;
1862 RELEASE M 1869 RELEASE M
@@ -2126,12 +2133,12 @@ three CPUs; then should the following sequence of events occur:
2126 2133
2127 CPU 1 CPU 2 2134 CPU 1 CPU 2
2128 =============================== =============================== 2135 =============================== ===============================
2129 ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e; 2136 WRITE_ONCE(*A, a); WRITE_ONCE(*E, e);
2130 ACQUIRE M ACQUIRE Q 2137 ACQUIRE M ACQUIRE Q
2131 ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f; 2138 WRITE_ONCE(*B, b); WRITE_ONCE(*F, f);
2132 ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g; 2139 WRITE_ONCE(*C, c); WRITE_ONCE(*G, g);
2133 RELEASE M RELEASE Q 2140 RELEASE M RELEASE Q
2134 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h; 2141 WRITE_ONCE(*D, d); WRITE_ONCE(*H, h);
2135 2142
2136Then there is no guarantee as to what order CPU 3 will see the accesses to *A 2143Then there is no guarantee as to what order CPU 3 will see the accesses to *A
2137through *H occur in, other than the constraints imposed by the separate locks 2144through *H occur in, other than the constraints imposed by the separate locks
@@ -2151,18 +2158,18 @@ However, if the following occurs:
2151 2158
2152 CPU 1 CPU 2 2159 CPU 1 CPU 2
2153 =============================== =============================== 2160 =============================== ===============================
2154 ACCESS_ONCE(*A) = a; 2161 WRITE_ONCE(*A, a);
2155 ACQUIRE M [1] 2162 ACQUIRE M [1]
2156 ACCESS_ONCE(*B) = b; 2163 WRITE_ONCE(*B, b);
2157 ACCESS_ONCE(*C) = c; 2164 WRITE_ONCE(*C, c);
2158 RELEASE M [1] 2165 RELEASE M [1]
2159 ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e; 2166 WRITE_ONCE(*D, d); WRITE_ONCE(*E, e);
2160 ACQUIRE M [2] 2167 ACQUIRE M [2]
2161 smp_mb__after_unlock_lock(); 2168 smp_mb__after_unlock_lock();
2162 ACCESS_ONCE(*F) = f; 2169 WRITE_ONCE(*F, f);
2163 ACCESS_ONCE(*G) = g; 2170 WRITE_ONCE(*G, g);
2164 RELEASE M [2] 2171 RELEASE M [2]
2165 ACCESS_ONCE(*H) = h; 2172 WRITE_ONCE(*H, h);
2166 2173
2167CPU 3 might see: 2174CPU 3 might see:
2168 2175
@@ -2881,11 +2888,11 @@ A programmer might take it for granted that the CPU will perform memory
2881operations in exactly the order specified, so that if the CPU is, for example, 2888operations in exactly the order specified, so that if the CPU is, for example,
2882given the following piece of code to execute: 2889given the following piece of code to execute:
2883 2890
2884 a = ACCESS_ONCE(*A); 2891 a = READ_ONCE(*A);
2885 ACCESS_ONCE(*B) = b; 2892 WRITE_ONCE(*B, b);
2886 c = ACCESS_ONCE(*C); 2893 c = READ_ONCE(*C);
2887 d = ACCESS_ONCE(*D); 2894 d = READ_ONCE(*D);
2888 ACCESS_ONCE(*E) = e; 2895 WRITE_ONCE(*E, e);
2889 2896
2890they would then expect that the CPU will complete the memory operation for each 2897they would then expect that the CPU will complete the memory operation for each
2891instruction before moving on to the next one, leading to a definite sequence of 2898instruction before moving on to the next one, leading to a definite sequence of
@@ -2932,12 +2939,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
2932_own_ accesses appear to be correctly ordered, without the need for a memory 2939_own_ accesses appear to be correctly ordered, without the need for a memory
2933barrier. For instance with the following code: 2940barrier. For instance with the following code:
2934 2941
2935 U = ACCESS_ONCE(*A); 2942 U = READ_ONCE(*A);
2936 ACCESS_ONCE(*A) = V; 2943 WRITE_ONCE(*A, V);
2937 ACCESS_ONCE(*A) = W; 2944 WRITE_ONCE(*A, W);
2938 X = ACCESS_ONCE(*A); 2945 X = READ_ONCE(*A);
2939 ACCESS_ONCE(*A) = Y; 2946 WRITE_ONCE(*A, Y);
2940 Z = ACCESS_ONCE(*A); 2947 Z = READ_ONCE(*A);
2941 2948
2942and assuming no intervention by an external influence, it can be assumed that 2949and assuming no intervention by an external influence, it can be assumed that
2943the final result will appear to be: 2950the final result will appear to be:
@@ -2953,13 +2960,14 @@ accesses:
2953 U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A 2960 U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A
2954 2961
2955in that order, but, without intervention, the sequence may have almost any 2962in that order, but, without intervention, the sequence may have almost any
2956combination of elements combined or discarded, provided the program's view of 2963combination of elements combined or discarded, provided the program's view
2957the world remains consistent. Note that ACCESS_ONCE() is -not- optional 2964of the world remains consistent. Note that READ_ONCE() and WRITE_ONCE()
2958in the above example, as there are architectures where a given CPU might 2965are -not- optional in the above example, as there are architectures
2959reorder successive loads to the same location. On such architectures, 2966where a given CPU might reorder successive loads to the same location.
2960ACCESS_ONCE() does whatever is necessary to prevent this, for example, on 2967On such architectures, READ_ONCE() and WRITE_ONCE() do whatever is
2961Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the 2968necessary to prevent this, for example, on Itanium the volatile casts
2962special ld.acq and st.rel instructions that prevent such reordering. 2969used by READ_ONCE() and WRITE_ONCE() cause GCC to emit the special ld.acq
2970and st.rel instructions (respectively) that prevent such reordering.
2963 2971
2964The compiler may also combine, discard or defer elements of the sequence before 2972The compiler may also combine, discard or defer elements of the sequence before
2965the CPU even sees them. 2973the CPU even sees them.
@@ -2973,13 +2981,14 @@ may be reduced to:
2973 2981
2974 *A = W; 2982 *A = W;
2975 2983
2976since, without either a write barrier or an ACCESS_ONCE(), it can be 2984since, without either a write barrier or an WRITE_ONCE(), it can be
2977assumed that the effect of the storage of V to *A is lost. Similarly: 2985assumed that the effect of the storage of V to *A is lost. Similarly:
2978 2986
2979 *A = Y; 2987 *A = Y;
2980 Z = *A; 2988 Z = *A;
2981 2989
2982may, without a memory barrier or an ACCESS_ONCE(), be reduced to: 2990may, without a memory barrier or an READ_ONCE() and WRITE_ONCE(), be
2991reduced to:
2983 2992
2984 *A = Y; 2993 *A = Y;
2985 Z = Y; 2994 Z = Y;
diff --git a/MAINTAINERS b/MAINTAINERS
index a9ae6c105520..20f3735fbda7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8472,7 +8472,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
8472M: Josh Triplett <josh@joshtriplett.org> 8472M: Josh Triplett <josh@joshtriplett.org>
8473R: Steven Rostedt <rostedt@goodmis.org> 8473R: Steven Rostedt <rostedt@goodmis.org>
8474R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> 8474R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
8475R: Lai Jiangshan <laijs@cn.fujitsu.com> 8475R: Lai Jiangshan <jiangshanlai@gmail.com>
8476L: linux-kernel@vger.kernel.org 8476L: linux-kernel@vger.kernel.org
8477S: Supported 8477S: Supported
8478T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 8478T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
@@ -8499,7 +8499,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
8499M: Josh Triplett <josh@joshtriplett.org> 8499M: Josh Triplett <josh@joshtriplett.org>
8500R: Steven Rostedt <rostedt@goodmis.org> 8500R: Steven Rostedt <rostedt@goodmis.org>
8501R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> 8501R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
8502R: Lai Jiangshan <laijs@cn.fujitsu.com> 8502R: Lai Jiangshan <jiangshanlai@gmail.com>
8503L: linux-kernel@vger.kernel.org 8503L: linux-kernel@vger.kernel.org
8504W: http://www.rdrop.com/users/paulmck/RCU/ 8504W: http://www.rdrop.com/users/paulmck/RCU/
8505S: Supported 8505S: Supported
@@ -9367,7 +9367,7 @@ F: include/linux/sl?b*.h
9367F: mm/sl?b* 9367F: mm/sl?b*
9368 9368
9369SLEEPABLE READ-COPY UPDATE (SRCU) 9369SLEEPABLE READ-COPY UPDATE (SRCU)
9370M: Lai Jiangshan <laijs@cn.fujitsu.com> 9370M: Lai Jiangshan <jiangshanlai@gmail.com>
9371M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> 9371M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
9372M: Josh Triplett <josh@joshtriplett.org> 9372M: Josh Triplett <josh@joshtriplett.org>
9373R: Steven Rostedt <rostedt@goodmis.org> 9373R: Steven Rostedt <rostedt@goodmis.org>
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index df919ff103c3..3d6b5269fb2e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -54,9 +54,9 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
54 54
55#define rcu_dereference_check_mce(p) \ 55#define rcu_dereference_check_mce(p) \
56({ \ 56({ \
57 rcu_lockdep_assert(rcu_read_lock_sched_held() || \ 57 RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
58 lockdep_is_held(&mce_chrdev_read_mutex), \ 58 !lockdep_is_held(&mce_chrdev_read_mutex), \
59 "suspicious rcu_dereference_check_mce() usage"); \ 59 "suspicious rcu_dereference_check_mce() usage"); \
60 smp_load_acquire(&(p)); \ 60 smp_load_acquire(&(p)); \
61}) 61})
62 62
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f5791927aa64..c5a5231d1d11 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -136,7 +136,7 @@ enum ctx_state ist_enter(struct pt_regs *regs)
136 preempt_count_add(HARDIRQ_OFFSET); 136 preempt_count_add(HARDIRQ_OFFSET);
137 137
138 /* This code is a bit fragile. Test it. */ 138 /* This code is a bit fragile. Test it. */
139 rcu_lockdep_assert(rcu_is_watching(), "ist_enter didn't work"); 139 RCU_LOCKDEP_WARN(!rcu_is_watching(), "ist_enter didn't work");
140 140
141 return prev_state; 141 return prev_state;
142} 142}
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 677fb2843553..3b188f20b43f 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -110,8 +110,8 @@ static DEFINE_MUTEX(dev_opp_list_lock);
110 110
111#define opp_rcu_lockdep_assert() \ 111#define opp_rcu_lockdep_assert() \
112do { \ 112do { \
113 rcu_lockdep_assert(rcu_read_lock_held() || \ 113 RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
114 lockdep_is_held(&dev_opp_list_lock), \ 114 !lockdep_is_held(&dev_opp_list_lock), \
115 "Missing rcu_read_lock() or " \ 115 "Missing rcu_read_lock() or " \
116 "dev_opp_list_lock protection"); \ 116 "dev_opp_list_lock protection"); \
117} while (0) 117} while (0)
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index fbb88740634a..674e3e226465 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -86,8 +86,8 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
86 86
87static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd) 87static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd)
88{ 88{
89 rcu_lockdep_assert(rcu_read_lock_held() || 89 RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
90 lockdep_is_held(&files->file_lock), 90 !lockdep_is_held(&files->file_lock),
91 "suspicious rcu_dereference_check() usage"); 91 "suspicious rcu_dereference_check() usage");
92 return __fcheck_files(files, fd); 92 return __fcheck_files(files, fd);
93} 93}
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4cf5f51b4c9c..ff476515f716 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -226,6 +226,37 @@ struct rcu_synchronize {
226}; 226};
227void wakeme_after_rcu(struct rcu_head *head); 227void wakeme_after_rcu(struct rcu_head *head);
228 228
229void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
230 struct rcu_synchronize *rs_array);
231
232#define _wait_rcu_gp(checktiny, ...) \
233do { \
234 call_rcu_func_t __crcu_array[] = { __VA_ARGS__ }; \
235 const int __n = ARRAY_SIZE(__crcu_array); \
236 struct rcu_synchronize __rs_array[__n]; \
237 \
238 __wait_rcu_gp(checktiny, __n, __crcu_array, __rs_array); \
239} while (0)
240
241#define wait_rcu_gp(...) _wait_rcu_gp(false, __VA_ARGS__)
242
243/**
244 * synchronize_rcu_mult - Wait concurrently for multiple grace periods
245 * @...: List of call_rcu() functions for the flavors to wait on.
246 *
247 * This macro waits concurrently for multiple flavors of RCU grace periods.
248 * For example, synchronize_rcu_mult(call_rcu, call_rcu_bh) would wait
249 * on concurrent RCU and RCU-bh grace periods. Waiting on a give SRCU
250 * domain requires you to write a wrapper function for that SRCU domain's
251 * call_srcu() function, supplying the corresponding srcu_struct.
252 *
253 * If Tiny RCU, tell _wait_rcu_gp() not to bother waiting for RCU
254 * or RCU-bh, given that anywhere synchronize_rcu_mult() can be called
255 * is automatically a grace period.
256 */
257#define synchronize_rcu_mult(...) \
258 _wait_rcu_gp(IS_ENABLED(CONFIG_TINY_RCU), __VA_ARGS__)
259
229/** 260/**
230 * call_rcu_tasks() - Queue an RCU for invocation task-based grace period 261 * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
231 * @head: structure to be used for queueing the RCU updates. 262 * @head: structure to be used for queueing the RCU updates.
@@ -309,7 +340,7 @@ static inline void rcu_sysrq_end(void)
309} 340}
310#endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */ 341#endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */
311 342
312#ifdef CONFIG_RCU_USER_QS 343#ifdef CONFIG_NO_HZ_FULL
313void rcu_user_enter(void); 344void rcu_user_enter(void);
314void rcu_user_exit(void); 345void rcu_user_exit(void);
315#else 346#else
@@ -317,7 +348,7 @@ static inline void rcu_user_enter(void) { }
317static inline void rcu_user_exit(void) { } 348static inline void rcu_user_exit(void) { }
318static inline void rcu_user_hooks_switch(struct task_struct *prev, 349static inline void rcu_user_hooks_switch(struct task_struct *prev,
319 struct task_struct *next) { } 350 struct task_struct *next) { }
320#endif /* CONFIG_RCU_USER_QS */ 351#endif /* CONFIG_NO_HZ_FULL */
321 352
322#ifdef CONFIG_RCU_NOCB_CPU 353#ifdef CONFIG_RCU_NOCB_CPU
323void rcu_init_nohz(void); 354void rcu_init_nohz(void);
@@ -392,10 +423,6 @@ bool __rcu_is_watching(void);
392 * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. 423 * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
393 */ 424 */
394 425
395typedef void call_rcu_func_t(struct rcu_head *head,
396 void (*func)(struct rcu_head *head));
397void wait_rcu_gp(call_rcu_func_t crf);
398
399#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) 426#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
400#include <linux/rcutree.h> 427#include <linux/rcutree.h>
401#elif defined(CONFIG_TINY_RCU) 428#elif defined(CONFIG_TINY_RCU)
@@ -469,46 +496,10 @@ int rcu_read_lock_bh_held(void);
469 * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an 496 * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
470 * RCU-sched read-side critical section. In absence of 497 * RCU-sched read-side critical section. In absence of
471 * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side 498 * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
472 * critical section unless it can prove otherwise. Note that disabling 499 * critical section unless it can prove otherwise.
473 * of preemption (including disabling irqs) counts as an RCU-sched
474 * read-side critical section. This is useful for debug checks in functions
475 * that required that they be called within an RCU-sched read-side
476 * critical section.
477 *
478 * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
479 * and while lockdep is disabled.
480 *
481 * Note that if the CPU is in the idle loop from an RCU point of
482 * view (ie: that we are in the section between rcu_idle_enter() and
483 * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
484 * did an rcu_read_lock(). The reason for this is that RCU ignores CPUs
485 * that are in such a section, considering these as in extended quiescent
486 * state, so such a CPU is effectively never in an RCU read-side critical
487 * section regardless of what RCU primitives it invokes. This state of
488 * affairs is required --- we need to keep an RCU-free window in idle
489 * where the CPU may possibly enter into low power mode. This way we can
490 * notice an extended quiescent state to other CPUs that started a grace
491 * period. Otherwise we would delay any grace period as long as we run in
492 * the idle task.
493 *
494 * Similarly, we avoid claiming an SRCU read lock held if the current
495 * CPU is offline.
496 */ 500 */
497#ifdef CONFIG_PREEMPT_COUNT 501#ifdef CONFIG_PREEMPT_COUNT
498static inline int rcu_read_lock_sched_held(void) 502int rcu_read_lock_sched_held(void);
499{
500 int lockdep_opinion = 0;
501
502 if (!debug_lockdep_rcu_enabled())
503 return 1;
504 if (!rcu_is_watching())
505 return 0;
506 if (!rcu_lockdep_current_cpu_online())
507 return 0;
508 if (debug_locks)
509 lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
510 return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
511}
512#else /* #ifdef CONFIG_PREEMPT_COUNT */ 503#else /* #ifdef CONFIG_PREEMPT_COUNT */
513static inline int rcu_read_lock_sched_held(void) 504static inline int rcu_read_lock_sched_held(void)
514{ 505{
@@ -545,6 +536,11 @@ static inline int rcu_read_lock_sched_held(void)
545 536
546#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */ 537#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
547 538
539/* Deprecate rcu_lockdep_assert(): Use RCU_LOCKDEP_WARN() instead. */
540static inline void __attribute((deprecated)) deprecate_rcu_lockdep_assert(void)
541{
542}
543
548#ifdef CONFIG_PROVE_RCU 544#ifdef CONFIG_PROVE_RCU
549 545
550/** 546/**
@@ -555,17 +551,32 @@ static inline int rcu_read_lock_sched_held(void)
555#define rcu_lockdep_assert(c, s) \ 551#define rcu_lockdep_assert(c, s) \
556 do { \ 552 do { \
557 static bool __section(.data.unlikely) __warned; \ 553 static bool __section(.data.unlikely) __warned; \
554 deprecate_rcu_lockdep_assert(); \
558 if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \ 555 if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \
559 __warned = true; \ 556 __warned = true; \
560 lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ 557 lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
561 } \ 558 } \
562 } while (0) 559 } while (0)
563 560
561/**
562 * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
563 * @c: condition to check
564 * @s: informative message
565 */
566#define RCU_LOCKDEP_WARN(c, s) \
567 do { \
568 static bool __section(.data.unlikely) __warned; \
569 if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
570 __warned = true; \
571 lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
572 } \
573 } while (0)
574
564#if defined(CONFIG_PROVE_RCU) && !defined(CONFIG_PREEMPT_RCU) 575#if defined(CONFIG_PROVE_RCU) && !defined(CONFIG_PREEMPT_RCU)
565static inline void rcu_preempt_sleep_check(void) 576static inline void rcu_preempt_sleep_check(void)
566{ 577{
567 rcu_lockdep_assert(!lock_is_held(&rcu_lock_map), 578 RCU_LOCKDEP_WARN(lock_is_held(&rcu_lock_map),
568 "Illegal context switch in RCU read-side critical section"); 579 "Illegal context switch in RCU read-side critical section");
569} 580}
570#else /* #ifdef CONFIG_PROVE_RCU */ 581#else /* #ifdef CONFIG_PROVE_RCU */
571static inline void rcu_preempt_sleep_check(void) 582static inline void rcu_preempt_sleep_check(void)
@@ -576,15 +587,16 @@ static inline void rcu_preempt_sleep_check(void)
576#define rcu_sleep_check() \ 587#define rcu_sleep_check() \
577 do { \ 588 do { \
578 rcu_preempt_sleep_check(); \ 589 rcu_preempt_sleep_check(); \
579 rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map), \ 590 RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map), \
580 "Illegal context switch in RCU-bh read-side critical section"); \ 591 "Illegal context switch in RCU-bh read-side critical section"); \
581 rcu_lockdep_assert(!lock_is_held(&rcu_sched_lock_map), \ 592 RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map), \
582 "Illegal context switch in RCU-sched read-side critical section"); \ 593 "Illegal context switch in RCU-sched read-side critical section"); \
583 } while (0) 594 } while (0)
584 595
585#else /* #ifdef CONFIG_PROVE_RCU */ 596#else /* #ifdef CONFIG_PROVE_RCU */
586 597
587#define rcu_lockdep_assert(c, s) do { } while (0) 598#define rcu_lockdep_assert(c, s) deprecate_rcu_lockdep_assert()
599#define RCU_LOCKDEP_WARN(c, s) do { } while (0)
588#define rcu_sleep_check() do { } while (0) 600#define rcu_sleep_check() do { } while (0)
589 601
590#endif /* #else #ifdef CONFIG_PROVE_RCU */ 602#endif /* #else #ifdef CONFIG_PROVE_RCU */
@@ -615,13 +627,13 @@ static inline void rcu_preempt_sleep_check(void)
615({ \ 627({ \
616 /* Dependency order vs. p above. */ \ 628 /* Dependency order vs. p above. */ \
617 typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ 629 typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
618 rcu_lockdep_assert(c, "suspicious rcu_dereference_check() usage"); \ 630 RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
619 rcu_dereference_sparse(p, space); \ 631 rcu_dereference_sparse(p, space); \
620 ((typeof(*p) __force __kernel *)(________p1)); \ 632 ((typeof(*p) __force __kernel *)(________p1)); \
621}) 633})
622#define __rcu_dereference_protected(p, c, space) \ 634#define __rcu_dereference_protected(p, c, space) \
623({ \ 635({ \
624 rcu_lockdep_assert(c, "suspicious rcu_dereference_protected() usage"); \ 636 RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_protected() usage"); \
625 rcu_dereference_sparse(p, space); \ 637 rcu_dereference_sparse(p, space); \
626 ((typeof(*p) __force __kernel *)(p)); \ 638 ((typeof(*p) __force __kernel *)(p)); \
627}) 639})
@@ -845,8 +857,8 @@ static inline void rcu_read_lock(void)
845 __rcu_read_lock(); 857 __rcu_read_lock();
846 __acquire(RCU); 858 __acquire(RCU);
847 rcu_lock_acquire(&rcu_lock_map); 859 rcu_lock_acquire(&rcu_lock_map);
848 rcu_lockdep_assert(rcu_is_watching(), 860 RCU_LOCKDEP_WARN(!rcu_is_watching(),
849 "rcu_read_lock() used illegally while idle"); 861 "rcu_read_lock() used illegally while idle");
850} 862}
851 863
852/* 864/*
@@ -896,8 +908,8 @@ static inline void rcu_read_lock(void)
896 */ 908 */
897static inline void rcu_read_unlock(void) 909static inline void rcu_read_unlock(void)
898{ 910{
899 rcu_lockdep_assert(rcu_is_watching(), 911 RCU_LOCKDEP_WARN(!rcu_is_watching(),
900 "rcu_read_unlock() used illegally while idle"); 912 "rcu_read_unlock() used illegally while idle");
901 __release(RCU); 913 __release(RCU);
902 __rcu_read_unlock(); 914 __rcu_read_unlock();
903 rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */ 915 rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */
@@ -925,8 +937,8 @@ static inline void rcu_read_lock_bh(void)
925 local_bh_disable(); 937 local_bh_disable();
926 __acquire(RCU_BH); 938 __acquire(RCU_BH);
927 rcu_lock_acquire(&rcu_bh_lock_map); 939 rcu_lock_acquire(&rcu_bh_lock_map);
928 rcu_lockdep_assert(rcu_is_watching(), 940 RCU_LOCKDEP_WARN(!rcu_is_watching(),
929 "rcu_read_lock_bh() used illegally while idle"); 941 "rcu_read_lock_bh() used illegally while idle");
930} 942}
931 943
932/* 944/*
@@ -936,8 +948,8 @@ static inline void rcu_read_lock_bh(void)
936 */ 948 */
937static inline void rcu_read_unlock_bh(void) 949static inline void rcu_read_unlock_bh(void)
938{ 950{
939 rcu_lockdep_assert(rcu_is_watching(), 951 RCU_LOCKDEP_WARN(!rcu_is_watching(),
940 "rcu_read_unlock_bh() used illegally while idle"); 952 "rcu_read_unlock_bh() used illegally while idle");
941 rcu_lock_release(&rcu_bh_lock_map); 953 rcu_lock_release(&rcu_bh_lock_map);
942 __release(RCU_BH); 954 __release(RCU_BH);
943 local_bh_enable(); 955 local_bh_enable();
@@ -961,8 +973,8 @@ static inline void rcu_read_lock_sched(void)
961 preempt_disable(); 973 preempt_disable();
962 __acquire(RCU_SCHED); 974 __acquire(RCU_SCHED);
963 rcu_lock_acquire(&rcu_sched_lock_map); 975 rcu_lock_acquire(&rcu_sched_lock_map);
964 rcu_lockdep_assert(rcu_is_watching(), 976 RCU_LOCKDEP_WARN(!rcu_is_watching(),
965 "rcu_read_lock_sched() used illegally while idle"); 977 "rcu_read_lock_sched() used illegally while idle");
966} 978}
967 979
968/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */ 980/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
@@ -979,8 +991,8 @@ static inline notrace void rcu_read_lock_sched_notrace(void)
979 */ 991 */
980static inline void rcu_read_unlock_sched(void) 992static inline void rcu_read_unlock_sched(void)
981{ 993{
982 rcu_lockdep_assert(rcu_is_watching(), 994 RCU_LOCKDEP_WARN(!rcu_is_watching(),
983 "rcu_read_unlock_sched() used illegally while idle"); 995 "rcu_read_unlock_sched() used illegally while idle");
984 rcu_lock_release(&rcu_sched_lock_map); 996 rcu_lock_release(&rcu_sched_lock_map);
985 __release(RCU_SCHED); 997 __release(RCU_SCHED);
986 preempt_enable(); 998 preempt_enable();
@@ -1031,7 +1043,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
1031#define RCU_INIT_POINTER(p, v) \ 1043#define RCU_INIT_POINTER(p, v) \
1032 do { \ 1044 do { \
1033 rcu_dereference_sparse(p, __rcu); \ 1045 rcu_dereference_sparse(p, __rcu); \
1034 p = RCU_INITIALIZER(v); \ 1046 WRITE_ONCE(p, RCU_INITIALIZER(v)); \
1035 } while (0) 1047 } while (0)
1036 1048
1037/** 1049/**
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 3df6c1ec4e25..ff968b7af3a4 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -37,6 +37,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
37 might_sleep(); 37 might_sleep();
38} 38}
39 39
40static inline unsigned long get_state_synchronize_sched(void)
41{
42 return 0;
43}
44
45static inline void cond_synchronize_sched(unsigned long oldstate)
46{
47 might_sleep();
48}
49
40static inline void rcu_barrier_bh(void) 50static inline void rcu_barrier_bh(void)
41{ 51{
42 wait_rcu_gp(call_rcu_bh); 52 wait_rcu_gp(call_rcu_bh);
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 456879143f89..5abec82f325e 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -76,6 +76,8 @@ void rcu_barrier_bh(void);
76void rcu_barrier_sched(void); 76void rcu_barrier_sched(void);
77unsigned long get_state_synchronize_rcu(void); 77unsigned long get_state_synchronize_rcu(void);
78void cond_synchronize_rcu(unsigned long oldstate); 78void cond_synchronize_rcu(unsigned long oldstate);
79unsigned long get_state_synchronize_sched(void);
80void cond_synchronize_sched(unsigned long oldstate);
79 81
80extern unsigned long rcutorture_testseq; 82extern unsigned long rcutorture_testseq;
81extern unsigned long rcutorture_vernum; 83extern unsigned long rcutorture_vernum;
diff --git a/include/linux/types.h b/include/linux/types.h
index 8715287c3b1f..c314989d9158 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -212,6 +212,9 @@ struct callback_head {
212}; 212};
213#define rcu_head callback_head 213#define rcu_head callback_head
214 214
215typedef void (*rcu_callback_t)(struct rcu_head *head);
216typedef void (*call_rcu_func_t)(struct rcu_head *head, rcu_callback_t func);
217
215/* clocksource cycle base type */ 218/* clocksource cycle base type */
216typedef u64 cycle_t; 219typedef u64 cycle_t;
217 220
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index c78e88ce5ea3..ef72c4aada56 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -661,7 +661,6 @@ TRACE_EVENT(rcu_torture_read,
661 * Tracepoint for _rcu_barrier() execution. The string "s" describes 661 * Tracepoint for _rcu_barrier() execution. The string "s" describes
662 * the _rcu_barrier phase: 662 * the _rcu_barrier phase:
663 * "Begin": _rcu_barrier() started. 663 * "Begin": _rcu_barrier() started.
664 * "Check": _rcu_barrier() checking for piggybacking.
665 * "EarlyExit": _rcu_barrier() piggybacked, thus early exit. 664 * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
666 * "Inc1": _rcu_barrier() piggyback check counter incremented. 665 * "Inc1": _rcu_barrier() piggyback check counter incremented.
667 * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU 666 * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
diff --git a/init/Kconfig b/init/Kconfig
index af09b4fb43d2..ba1e6eaf4c36 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -538,15 +538,6 @@ config RCU_STALL_COMMON
538config CONTEXT_TRACKING 538config CONTEXT_TRACKING
539 bool 539 bool
540 540
541config RCU_USER_QS
542 bool
543 help
544 This option sets hooks on kernel / userspace boundaries and
545 puts RCU in extended quiescent state when the CPU runs in
546 userspace. It means that when a CPU runs in userspace, it is
547 excluded from the global RCU state machine and thus doesn't
548 try to keep the timer tick on for RCU.
549
550config CONTEXT_TRACKING_FORCE 541config CONTEXT_TRACKING_FORCE
551 bool "Force context tracking" 542 bool "Force context tracking"
552 depends on CONTEXT_TRACKING 543 depends on CONTEXT_TRACKING
@@ -707,6 +698,7 @@ config RCU_BOOST_DELAY
707config RCU_NOCB_CPU 698config RCU_NOCB_CPU
708 bool "Offload RCU callback processing from boot-selected CPUs" 699 bool "Offload RCU callback processing from boot-selected CPUs"
709 depends on TREE_RCU || PREEMPT_RCU 700 depends on TREE_RCU || PREEMPT_RCU
701 depends on RCU_EXPERT || NO_HZ_FULL
710 default n 702 default n
711 help 703 help
712 Use this option to reduce OS jitter for aggressive HPC or 704 Use this option to reduce OS jitter for aggressive HPC or
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f89d9292eee6..b89f3168411b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -107,8 +107,8 @@ static DEFINE_SPINLOCK(release_agent_path_lock);
107struct percpu_rw_semaphore cgroup_threadgroup_rwsem; 107struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
108 108
109#define cgroup_assert_mutex_or_rcu_locked() \ 109#define cgroup_assert_mutex_or_rcu_locked() \
110 rcu_lockdep_assert(rcu_read_lock_held() || \ 110 RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
111 lockdep_is_held(&cgroup_mutex), \ 111 !lockdep_is_held(&cgroup_mutex), \
112 "cgroup_mutex or RCU read lock required"); 112 "cgroup_mutex or RCU read lock required");
113 113
114/* 114/*
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5644ec5582b9..910d709b578a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -381,14 +381,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
381 * will observe it. 381 * will observe it.
382 * 382 *
383 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might 383 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
384 * not imply sync_sched(), so explicitly call both. 384 * not imply sync_sched(), so wait for both.
385 * 385 *
386 * Do sync before park smpboot threads to take care the rcu boost case. 386 * Do sync before park smpboot threads to take care the rcu boost case.
387 */ 387 */
388#ifdef CONFIG_PREEMPT 388 if (IS_ENABLED(CONFIG_PREEMPT))
389 synchronize_sched(); 389 synchronize_rcu_mult(call_rcu, call_rcu_sched);
390#endif 390 else
391 synchronize_rcu(); 391 synchronize_rcu();
392 392
393 smpboot_park_threads(cpu); 393 smpboot_park_threads(cpu);
394 394
diff --git a/kernel/pid.c b/kernel/pid.c
index 4fd07d5b7baf..ca368793808e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -451,9 +451,8 @@ EXPORT_SYMBOL(pid_task);
451 */ 451 */
452struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns) 452struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns)
453{ 453{
454 rcu_lockdep_assert(rcu_read_lock_held(), 454 RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
455 "find_task_by_pid_ns() needs rcu_read_lock()" 455 "find_task_by_pid_ns() needs rcu_read_lock() protection");
456 " protection");
457 return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID); 456 return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID);
458} 457}
459 458
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 59e32684c23b..77192953dee5 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -635,6 +635,8 @@ static struct rcu_torture_ops sched_ops = {
635 .deferred_free = rcu_sched_torture_deferred_free, 635 .deferred_free = rcu_sched_torture_deferred_free,
636 .sync = synchronize_sched, 636 .sync = synchronize_sched,
637 .exp_sync = synchronize_sched_expedited, 637 .exp_sync = synchronize_sched_expedited,
638 .get_state = get_state_synchronize_sched,
639 .cond_sync = cond_synchronize_sched,
638 .call = call_rcu_sched, 640 .call = call_rcu_sched,
639 .cb_barrier = rcu_barrier_sched, 641 .cb_barrier = rcu_barrier_sched,
640 .fqs = rcu_sched_force_quiescent_state, 642 .fqs = rcu_sched_force_quiescent_state,
@@ -684,10 +686,20 @@ static struct rcu_torture_ops tasks_ops = {
684 686
685#define RCUTORTURE_TASKS_OPS &tasks_ops, 687#define RCUTORTURE_TASKS_OPS &tasks_ops,
686 688
689static bool __maybe_unused torturing_tasks(void)
690{
691 return cur_ops == &tasks_ops;
692}
693
687#else /* #ifdef CONFIG_TASKS_RCU */ 694#else /* #ifdef CONFIG_TASKS_RCU */
688 695
689#define RCUTORTURE_TASKS_OPS 696#define RCUTORTURE_TASKS_OPS
690 697
698static bool torturing_tasks(void)
699{
700 return false;
701}
702
691#endif /* #else #ifdef CONFIG_TASKS_RCU */ 703#endif /* #else #ifdef CONFIG_TASKS_RCU */
692 704
693/* 705/*
@@ -823,9 +835,7 @@ rcu_torture_cbflood(void *arg)
823 } 835 }
824 if (err) { 836 if (err) {
825 VERBOSE_TOROUT_STRING("rcu_torture_cbflood disabled: Bad args or OOM"); 837 VERBOSE_TOROUT_STRING("rcu_torture_cbflood disabled: Bad args or OOM");
826 while (!torture_must_stop()) 838 goto wait_for_stop;
827 schedule_timeout_interruptible(HZ);
828 return 0;
829 } 839 }
830 VERBOSE_TOROUT_STRING("rcu_torture_cbflood task started"); 840 VERBOSE_TOROUT_STRING("rcu_torture_cbflood task started");
831 do { 841 do {
@@ -844,6 +854,7 @@ rcu_torture_cbflood(void *arg)
844 stutter_wait("rcu_torture_cbflood"); 854 stutter_wait("rcu_torture_cbflood");
845 } while (!torture_must_stop()); 855 } while (!torture_must_stop());
846 vfree(rhp); 856 vfree(rhp);
857wait_for_stop:
847 torture_kthread_stopping("rcu_torture_cbflood"); 858 torture_kthread_stopping("rcu_torture_cbflood");
848 return 0; 859 return 0;
849} 860}
@@ -1088,7 +1099,8 @@ static void rcu_torture_timer(unsigned long unused)
1088 p = rcu_dereference_check(rcu_torture_current, 1099 p = rcu_dereference_check(rcu_torture_current,
1089 rcu_read_lock_bh_held() || 1100 rcu_read_lock_bh_held() ||
1090 rcu_read_lock_sched_held() || 1101 rcu_read_lock_sched_held() ||
1091 srcu_read_lock_held(srcu_ctlp)); 1102 srcu_read_lock_held(srcu_ctlp) ||
1103 torturing_tasks());
1092 if (p == NULL) { 1104 if (p == NULL) {
1093 /* Leave because rcu_torture_writer is not yet underway */ 1105 /* Leave because rcu_torture_writer is not yet underway */
1094 cur_ops->readunlock(idx); 1106 cur_ops->readunlock(idx);
@@ -1162,7 +1174,8 @@ rcu_torture_reader(void *arg)
1162 p = rcu_dereference_check(rcu_torture_current, 1174 p = rcu_dereference_check(rcu_torture_current,
1163 rcu_read_lock_bh_held() || 1175 rcu_read_lock_bh_held() ||
1164 rcu_read_lock_sched_held() || 1176 rcu_read_lock_sched_held() ||
1165 srcu_read_lock_held(srcu_ctlp)); 1177 srcu_read_lock_held(srcu_ctlp) ||
1178 torturing_tasks());
1166 if (p == NULL) { 1179 if (p == NULL) {
1167 /* Wait for rcu_torture_writer to get underway */ 1180 /* Wait for rcu_torture_writer to get underway */
1168 cur_ops->readunlock(idx); 1181 cur_ops->readunlock(idx);
@@ -1507,7 +1520,7 @@ static int rcu_torture_barrier_init(void)
1507 int i; 1520 int i;
1508 int ret; 1521 int ret;
1509 1522
1510 if (n_barrier_cbs == 0) 1523 if (n_barrier_cbs <= 0)
1511 return 0; 1524 return 0;
1512 if (cur_ops->call == NULL || cur_ops->cb_barrier == NULL) { 1525 if (cur_ops->call == NULL || cur_ops->cb_barrier == NULL) {
1513 pr_alert("%s" TORTURE_FLAG 1526 pr_alert("%s" TORTURE_FLAG
@@ -1786,12 +1799,15 @@ rcu_torture_init(void)
1786 writer_task); 1799 writer_task);
1787 if (firsterr) 1800 if (firsterr)
1788 goto unwind; 1801 goto unwind;
1789 fakewriter_tasks = kzalloc(nfakewriters * sizeof(fakewriter_tasks[0]), 1802 if (nfakewriters > 0) {
1790 GFP_KERNEL); 1803 fakewriter_tasks = kzalloc(nfakewriters *
1791 if (fakewriter_tasks == NULL) { 1804 sizeof(fakewriter_tasks[0]),
1792 VERBOSE_TOROUT_ERRSTRING("out of memory"); 1805 GFP_KERNEL);
1793 firsterr = -ENOMEM; 1806 if (fakewriter_tasks == NULL) {
1794 goto unwind; 1807 VERBOSE_TOROUT_ERRSTRING("out of memory");
1808 firsterr = -ENOMEM;
1809 goto unwind;
1810 }
1795 } 1811 }
1796 for (i = 0; i < nfakewriters; i++) { 1812 for (i = 0; i < nfakewriters; i++) {
1797 firsterr = torture_create_kthread(rcu_torture_fakewriter, 1813 firsterr = torture_create_kthread(rcu_torture_fakewriter,
@@ -1818,7 +1834,7 @@ rcu_torture_init(void)
1818 if (firsterr) 1834 if (firsterr)
1819 goto unwind; 1835 goto unwind;
1820 } 1836 }
1821 if (test_no_idle_hz) { 1837 if (test_no_idle_hz && shuffle_interval > 0) {
1822 firsterr = torture_shuffle_init(shuffle_interval * HZ); 1838 firsterr = torture_shuffle_init(shuffle_interval * HZ);
1823 if (firsterr) 1839 if (firsterr)
1824 goto unwind; 1840 goto unwind;
diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index fb33d35ee0b7..d3fcb2ec8536 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -252,14 +252,15 @@ static bool srcu_readers_active_idx_check(struct srcu_struct *sp, int idx)
252} 252}
253 253
254/** 254/**
255 * srcu_readers_active - returns approximate number of readers. 255 * srcu_readers_active - returns true if there are readers. and false
256 * otherwise
256 * @sp: which srcu_struct to count active readers (holding srcu_read_lock). 257 * @sp: which srcu_struct to count active readers (holding srcu_read_lock).
257 * 258 *
258 * Note that this is not an atomic primitive, and can therefore suffer 259 * Note that this is not an atomic primitive, and can therefore suffer
259 * severe errors when invoked on an active srcu_struct. That said, it 260 * severe errors when invoked on an active srcu_struct. That said, it
260 * can be useful as an error check at cleanup time. 261 * can be useful as an error check at cleanup time.
261 */ 262 */
262static int srcu_readers_active(struct srcu_struct *sp) 263static bool srcu_readers_active(struct srcu_struct *sp)
263{ 264{
264 int cpu; 265 int cpu;
265 unsigned long sum = 0; 266 unsigned long sum = 0;
@@ -414,11 +415,11 @@ static void __synchronize_srcu(struct srcu_struct *sp, int trycount)
414 struct rcu_head *head = &rcu.head; 415 struct rcu_head *head = &rcu.head;
415 bool done = false; 416 bool done = false;
416 417
417 rcu_lockdep_assert(!lock_is_held(&sp->dep_map) && 418 RCU_LOCKDEP_WARN(lock_is_held(&sp->dep_map) ||
418 !lock_is_held(&rcu_bh_lock_map) && 419 lock_is_held(&rcu_bh_lock_map) ||
419 !lock_is_held(&rcu_lock_map) && 420 lock_is_held(&rcu_lock_map) ||
420 !lock_is_held(&rcu_sched_lock_map), 421 lock_is_held(&rcu_sched_lock_map),
421 "Illegal synchronize_srcu() in same-type SRCU (or RCU) read-side critical section"); 422 "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");
422 423
423 might_sleep(); 424 might_sleep();
424 init_completion(&rcu.completion); 425 init_completion(&rcu.completion);
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index c291bd65d2cb..d0471056d0af 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -191,10 +191,10 @@ static void rcu_process_callbacks(struct softirq_action *unused)
191 */ 191 */
192void synchronize_sched(void) 192void synchronize_sched(void)
193{ 193{
194 rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) && 194 RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
195 !lock_is_held(&rcu_lock_map) && 195 lock_is_held(&rcu_lock_map) ||
196 !lock_is_held(&rcu_sched_lock_map), 196 lock_is_held(&rcu_sched_lock_map),
197 "Illegal synchronize_sched() in RCU read-side critical section"); 197 "Illegal synchronize_sched() in RCU read-side critical section");
198 cond_resched(); 198 cond_resched();
199} 199}
200EXPORT_SYMBOL_GPL(synchronize_sched); 200EXPORT_SYMBOL_GPL(synchronize_sched);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65137bc28b2b..9f75f25cc5d9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -70,6 +70,8 @@ MODULE_ALIAS("rcutree");
70 70
71static struct lock_class_key rcu_node_class[RCU_NUM_LVLS]; 71static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
72static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS]; 72static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS];
73static struct lock_class_key rcu_exp_class[RCU_NUM_LVLS];
74static struct lock_class_key rcu_exp_sched_class[RCU_NUM_LVLS];
73 75
74/* 76/*
75 * In order to export the rcu_state name to the tracing tools, it 77 * In order to export the rcu_state name to the tracing tools, it
@@ -124,13 +126,8 @@ module_param(rcu_fanout_exact, bool, 0444);
124static int rcu_fanout_leaf = RCU_FANOUT_LEAF; 126static int rcu_fanout_leaf = RCU_FANOUT_LEAF;
125module_param(rcu_fanout_leaf, int, 0444); 127module_param(rcu_fanout_leaf, int, 0444);
126int rcu_num_lvls __read_mostly = RCU_NUM_LVLS; 128int rcu_num_lvls __read_mostly = RCU_NUM_LVLS;
127static int num_rcu_lvl[] = { /* Number of rcu_nodes at specified level. */ 129/* Number of rcu_nodes at specified level. */
128 NUM_RCU_LVL_0, 130static int num_rcu_lvl[] = NUM_RCU_LVL_INIT;
129 NUM_RCU_LVL_1,
130 NUM_RCU_LVL_2,
131 NUM_RCU_LVL_3,
132 NUM_RCU_LVL_4,
133};
134int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */ 131int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
135 132
136/* 133/*
@@ -649,12 +646,12 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
649 * It is illegal to enter an extended quiescent state while 646 * It is illegal to enter an extended quiescent state while
650 * in an RCU read-side critical section. 647 * in an RCU read-side critical section.
651 */ 648 */
652 rcu_lockdep_assert(!lock_is_held(&rcu_lock_map), 649 RCU_LOCKDEP_WARN(lock_is_held(&rcu_lock_map),
653 "Illegal idle entry in RCU read-side critical section."); 650 "Illegal idle entry in RCU read-side critical section.");
654 rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map), 651 RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map),
655 "Illegal idle entry in RCU-bh read-side critical section."); 652 "Illegal idle entry in RCU-bh read-side critical section.");
656 rcu_lockdep_assert(!lock_is_held(&rcu_sched_lock_map), 653 RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map),
657 "Illegal idle entry in RCU-sched read-side critical section."); 654 "Illegal idle entry in RCU-sched read-side critical section.");
658} 655}
659 656
660/* 657/*
@@ -701,7 +698,7 @@ void rcu_idle_enter(void)
701} 698}
702EXPORT_SYMBOL_GPL(rcu_idle_enter); 699EXPORT_SYMBOL_GPL(rcu_idle_enter);
703 700
704#ifdef CONFIG_RCU_USER_QS 701#ifdef CONFIG_NO_HZ_FULL
705/** 702/**
706 * rcu_user_enter - inform RCU that we are resuming userspace. 703 * rcu_user_enter - inform RCU that we are resuming userspace.
707 * 704 *
@@ -714,7 +711,7 @@ void rcu_user_enter(void)
714{ 711{
715 rcu_eqs_enter(1); 712 rcu_eqs_enter(1);
716} 713}
717#endif /* CONFIG_RCU_USER_QS */ 714#endif /* CONFIG_NO_HZ_FULL */
718 715
719/** 716/**
720 * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle 717 * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle
@@ -828,7 +825,7 @@ void rcu_idle_exit(void)
828} 825}
829EXPORT_SYMBOL_GPL(rcu_idle_exit); 826EXPORT_SYMBOL_GPL(rcu_idle_exit);
830 827
831#ifdef CONFIG_RCU_USER_QS 828#ifdef CONFIG_NO_HZ_FULL
832/** 829/**
833 * rcu_user_exit - inform RCU that we are exiting userspace. 830 * rcu_user_exit - inform RCU that we are exiting userspace.
834 * 831 *
@@ -839,7 +836,7 @@ void rcu_user_exit(void)
839{ 836{
840 rcu_eqs_exit(1); 837 rcu_eqs_exit(1);
841} 838}
842#endif /* CONFIG_RCU_USER_QS */ 839#endif /* CONFIG_NO_HZ_FULL */
843 840
844/** 841/**
845 * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle 842 * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
@@ -978,9 +975,9 @@ bool notrace rcu_is_watching(void)
978{ 975{
979 bool ret; 976 bool ret;
980 977
981 preempt_disable(); 978 preempt_disable_notrace();
982 ret = __rcu_is_watching(); 979 ret = __rcu_is_watching();
983 preempt_enable(); 980 preempt_enable_notrace();
984 return ret; 981 return ret;
985} 982}
986EXPORT_SYMBOL_GPL(rcu_is_watching); 983EXPORT_SYMBOL_GPL(rcu_is_watching);
@@ -1178,9 +1175,11 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
1178 j = jiffies; 1175 j = jiffies;
1179 gpa = READ_ONCE(rsp->gp_activity); 1176 gpa = READ_ONCE(rsp->gp_activity);
1180 if (j - gpa > 2 * HZ) 1177 if (j - gpa > 2 * HZ)
1181 pr_err("%s kthread starved for %ld jiffies! g%lu c%lu f%#x\n", 1178 pr_err("%s kthread starved for %ld jiffies! g%lu c%lu f%#x s%d ->state=%#lx\n",
1182 rsp->name, j - gpa, 1179 rsp->name, j - gpa,
1183 rsp->gpnum, rsp->completed, rsp->gp_flags); 1180 rsp->gpnum, rsp->completed,
1181 rsp->gp_flags, rsp->gp_state,
1182 rsp->gp_kthread ? rsp->gp_kthread->state : 0);
1184} 1183}
1185 1184
1186/* 1185/*
@@ -1906,6 +1905,26 @@ static int rcu_gp_init(struct rcu_state *rsp)
1906} 1905}
1907 1906
1908/* 1907/*
1908 * Helper function for wait_event_interruptible_timeout() wakeup
1909 * at force-quiescent-state time.
1910 */
1911static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
1912{
1913 struct rcu_node *rnp = rcu_get_root(rsp);
1914
1915 /* Someone like call_rcu() requested a force-quiescent-state scan. */
1916 *gfp = READ_ONCE(rsp->gp_flags);
1917 if (*gfp & RCU_GP_FLAG_FQS)
1918 return true;
1919
1920 /* The current grace period has completed. */
1921 if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp))
1922 return true;
1923
1924 return false;
1925}
1926
1927/*
1909 * Do one round of quiescent-state forcing. 1928 * Do one round of quiescent-state forcing.
1910 */ 1929 */
1911static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) 1930static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
@@ -2041,6 +2060,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
2041 wait_event_interruptible(rsp->gp_wq, 2060 wait_event_interruptible(rsp->gp_wq,
2042 READ_ONCE(rsp->gp_flags) & 2061 READ_ONCE(rsp->gp_flags) &
2043 RCU_GP_FLAG_INIT); 2062 RCU_GP_FLAG_INIT);
2063 rsp->gp_state = RCU_GP_DONE_GPS;
2044 /* Locking provides needed memory barrier. */ 2064 /* Locking provides needed memory barrier. */
2045 if (rcu_gp_init(rsp)) 2065 if (rcu_gp_init(rsp))
2046 break; 2066 break;
@@ -2068,11 +2088,8 @@ static int __noreturn rcu_gp_kthread(void *arg)
2068 TPS("fqswait")); 2088 TPS("fqswait"));
2069 rsp->gp_state = RCU_GP_WAIT_FQS; 2089 rsp->gp_state = RCU_GP_WAIT_FQS;
2070 ret = wait_event_interruptible_timeout(rsp->gp_wq, 2090 ret = wait_event_interruptible_timeout(rsp->gp_wq,
2071 ((gf = READ_ONCE(rsp->gp_flags)) & 2091 rcu_gp_fqs_check_wake(rsp, &gf), j);
2072 RCU_GP_FLAG_FQS) || 2092 rsp->gp_state = RCU_GP_DOING_FQS;
2073 (!READ_ONCE(rnp->qsmask) &&
2074 !rcu_preempt_blocked_readers_cgp(rnp)),
2075 j);
2076 /* Locking provides needed memory barriers. */ 2093 /* Locking provides needed memory barriers. */
2077 /* If grace period done, leave loop. */ 2094 /* If grace period done, leave loop. */
2078 if (!READ_ONCE(rnp->qsmask) && 2095 if (!READ_ONCE(rnp->qsmask) &&
@@ -2110,7 +2127,9 @@ static int __noreturn rcu_gp_kthread(void *arg)
2110 } 2127 }
2111 2128
2112 /* Handle grace-period end. */ 2129 /* Handle grace-period end. */
2130 rsp->gp_state = RCU_GP_CLEANUP;
2113 rcu_gp_cleanup(rsp); 2131 rcu_gp_cleanup(rsp);
2132 rsp->gp_state = RCU_GP_CLEANED;
2114 } 2133 }
2115} 2134}
2116 2135
@@ -3161,10 +3180,10 @@ static inline int rcu_blocking_is_gp(void)
3161 */ 3180 */
3162void synchronize_sched(void) 3181void synchronize_sched(void)
3163{ 3182{
3164 rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) && 3183 RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
3165 !lock_is_held(&rcu_lock_map) && 3184 lock_is_held(&rcu_lock_map) ||
3166 !lock_is_held(&rcu_sched_lock_map), 3185 lock_is_held(&rcu_sched_lock_map),
3167 "Illegal synchronize_sched() in RCU-sched read-side critical section"); 3186 "Illegal synchronize_sched() in RCU-sched read-side critical section");
3168 if (rcu_blocking_is_gp()) 3187 if (rcu_blocking_is_gp())
3169 return; 3188 return;
3170 if (rcu_gp_is_expedited()) 3189 if (rcu_gp_is_expedited())
@@ -3188,10 +3207,10 @@ EXPORT_SYMBOL_GPL(synchronize_sched);
3188 */ 3207 */
3189void synchronize_rcu_bh(void) 3208void synchronize_rcu_bh(void)
3190{ 3209{
3191 rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) && 3210 RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
3192 !lock_is_held(&rcu_lock_map) && 3211 lock_is_held(&rcu_lock_map) ||
3193 !lock_is_held(&rcu_sched_lock_map), 3212 lock_is_held(&rcu_sched_lock_map),
3194 "Illegal synchronize_rcu_bh() in RCU-bh read-side critical section"); 3213 "Illegal synchronize_rcu_bh() in RCU-bh read-side critical section");
3195 if (rcu_blocking_is_gp()) 3214 if (rcu_blocking_is_gp())
3196 return; 3215 return;
3197 if (rcu_gp_is_expedited()) 3216 if (rcu_gp_is_expedited())
@@ -3253,23 +3272,247 @@ void cond_synchronize_rcu(unsigned long oldstate)
3253} 3272}
3254EXPORT_SYMBOL_GPL(cond_synchronize_rcu); 3273EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
3255 3274
3256static int synchronize_sched_expedited_cpu_stop(void *data) 3275/**
3276 * get_state_synchronize_sched - Snapshot current RCU-sched state
3277 *
3278 * Returns a cookie that is used by a later call to cond_synchronize_sched()
3279 * to determine whether or not a full grace period has elapsed in the
3280 * meantime.
3281 */
3282unsigned long get_state_synchronize_sched(void)
3257{ 3283{
3258 /* 3284 /*
3259 * There must be a full memory barrier on each affected CPU 3285 * Any prior manipulation of RCU-protected data must happen
3260 * between the time that try_stop_cpus() is called and the 3286 * before the load from ->gpnum.
3261 * time that it returns. 3287 */
3262 * 3288 smp_mb(); /* ^^^ */
3263 * In the current initial implementation of cpu_stop, the 3289
3264 * above condition is already met when the control reaches 3290 /*
3265 * this point and the following smp_mb() is not strictly 3291 * Make sure this load happens before the purportedly
3266 * necessary. Do smp_mb() anyway for documentation and 3292 * time-consuming work between get_state_synchronize_sched()
3267 * robustness against future implementation changes. 3293 * and cond_synchronize_sched().
3294 */
3295 return smp_load_acquire(&rcu_sched_state.gpnum);
3296}
3297EXPORT_SYMBOL_GPL(get_state_synchronize_sched);
3298
3299/**
3300 * cond_synchronize_sched - Conditionally wait for an RCU-sched grace period
3301 *
3302 * @oldstate: return value from earlier call to get_state_synchronize_sched()
3303 *
3304 * If a full RCU-sched grace period has elapsed since the earlier call to
3305 * get_state_synchronize_sched(), just return. Otherwise, invoke
3306 * synchronize_sched() to wait for a full grace period.
3307 *
3308 * Yes, this function does not take counter wrap into account. But
3309 * counter wrap is harmless. If the counter wraps, we have waited for
3310 * more than 2 billion grace periods (and way more on a 64-bit system!),
3311 * so waiting for one additional grace period should be just fine.
3312 */
3313void cond_synchronize_sched(unsigned long oldstate)
3314{
3315 unsigned long newstate;
3316
3317 /*
3318 * Ensure that this load happens before any RCU-destructive
3319 * actions the caller might carry out after we return.
3268 */ 3320 */
3269 smp_mb(); /* See above comment block. */ 3321 newstate = smp_load_acquire(&rcu_sched_state.completed);
3322 if (ULONG_CMP_GE(oldstate, newstate))
3323 synchronize_sched();
3324}
3325EXPORT_SYMBOL_GPL(cond_synchronize_sched);
3326
3327/* Adjust sequence number for start of update-side operation. */
3328static void rcu_seq_start(unsigned long *sp)
3329{
3330 WRITE_ONCE(*sp, *sp + 1);
3331 smp_mb(); /* Ensure update-side operation after counter increment. */
3332 WARN_ON_ONCE(!(*sp & 0x1));
3333}
3334
3335/* Adjust sequence number for end of update-side operation. */
3336static void rcu_seq_end(unsigned long *sp)
3337{
3338 smp_mb(); /* Ensure update-side operation before counter increment. */
3339 WRITE_ONCE(*sp, *sp + 1);
3340 WARN_ON_ONCE(*sp & 0x1);
3341}
3342
3343/* Take a snapshot of the update side's sequence number. */
3344static unsigned long rcu_seq_snap(unsigned long *sp)
3345{
3346 unsigned long s;
3347
3348 smp_mb(); /* Caller's modifications seen first by other CPUs. */
3349 s = (READ_ONCE(*sp) + 3) & ~0x1;
3350 smp_mb(); /* Above access must not bleed into critical section. */
3351 return s;
3352}
3353
3354/*
3355 * Given a snapshot from rcu_seq_snap(), determine whether or not a
3356 * full update-side operation has occurred.
3357 */
3358static bool rcu_seq_done(unsigned long *sp, unsigned long s)
3359{
3360 return ULONG_CMP_GE(READ_ONCE(*sp), s);
3361}
3362
3363/* Wrapper functions for expedited grace periods. */
3364static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
3365{
3366 rcu_seq_start(&rsp->expedited_sequence);
3367}
3368static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
3369{
3370 rcu_seq_end(&rsp->expedited_sequence);
3371 smp_mb(); /* Ensure that consecutive grace periods serialize. */
3372}
3373static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
3374{
3375 return rcu_seq_snap(&rsp->expedited_sequence);
3376}
3377static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
3378{
3379 return rcu_seq_done(&rsp->expedited_sequence, s);
3380}
3381
3382/* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
3383static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
3384 struct rcu_data *rdp,
3385 atomic_long_t *stat, unsigned long s)
3386{
3387 if (rcu_exp_gp_seq_done(rsp, s)) {
3388 if (rnp)
3389 mutex_unlock(&rnp->exp_funnel_mutex);
3390 else if (rdp)
3391 mutex_unlock(&rdp->exp_funnel_mutex);
3392 /* Ensure test happens before caller kfree(). */
3393 smp_mb__before_atomic(); /* ^^^ */
3394 atomic_long_inc(stat);
3395 return true;
3396 }
3397 return false;
3398}
3399
3400/*
3401 * Funnel-lock acquisition for expedited grace periods. Returns a
3402 * pointer to the root rcu_node structure, or NULL if some other
3403 * task did the expedited grace period for us.
3404 */
3405static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
3406{
3407 struct rcu_data *rdp;
3408 struct rcu_node *rnp0;
3409 struct rcu_node *rnp1 = NULL;
3410
3411 /*
3412 * First try directly acquiring the root lock in order to reduce
3413 * latency in the common case where expedited grace periods are
3414 * rare. We check mutex_is_locked() to avoid pathological levels of
3415 * memory contention on ->exp_funnel_mutex in the heavy-load case.
3416 */
3417 rnp0 = rcu_get_root(rsp);
3418 if (!mutex_is_locked(&rnp0->exp_funnel_mutex)) {
3419 if (mutex_trylock(&rnp0->exp_funnel_mutex)) {
3420 if (sync_exp_work_done(rsp, rnp0, NULL,
3421 &rsp->expedited_workdone0, s))
3422 return NULL;
3423 return rnp0;
3424 }
3425 }
3426
3427 /*
3428 * Each pass through the following loop works its way
3429 * up the rcu_node tree, returning if others have done the
3430 * work or otherwise falls through holding the root rnp's
3431 * ->exp_funnel_mutex. The mapping from CPU to rcu_node structure
3432 * can be inexact, as it is just promoting locality and is not
3433 * strictly needed for correctness.
3434 */
3435 rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
3436 if (sync_exp_work_done(rsp, NULL, NULL, &rsp->expedited_workdone1, s))
3437 return NULL;
3438 mutex_lock(&rdp->exp_funnel_mutex);
3439 rnp0 = rdp->mynode;
3440 for (; rnp0 != NULL; rnp0 = rnp0->parent) {
3441 if (sync_exp_work_done(rsp, rnp1, rdp,
3442 &rsp->expedited_workdone2, s))
3443 return NULL;
3444 mutex_lock(&rnp0->exp_funnel_mutex);
3445 if (rnp1)
3446 mutex_unlock(&rnp1->exp_funnel_mutex);
3447 else
3448 mutex_unlock(&rdp->exp_funnel_mutex);
3449 rnp1 = rnp0;
3450 }
3451 if (sync_exp_work_done(rsp, rnp1, rdp,
3452 &rsp->expedited_workdone3, s))
3453 return NULL;
3454 return rnp1;
3455}
3456
3457/* Invoked on each online non-idle CPU for expedited quiescent state. */
3458static int synchronize_sched_expedited_cpu_stop(void *data)
3459{
3460 struct rcu_data *rdp = data;
3461 struct rcu_state *rsp = rdp->rsp;
3462
3463 /* We are here: If we are last, do the wakeup. */
3464 rdp->exp_done = true;
3465 if (atomic_dec_and_test(&rsp->expedited_need_qs))
3466 wake_up(&rsp->expedited_wq);
3270 return 0; 3467 return 0;
3271} 3468}
3272 3469
3470static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
3471{
3472 int cpu;
3473 unsigned long jiffies_stall;
3474 unsigned long jiffies_start;
3475 struct rcu_data *rdp;
3476 int ret;
3477
3478 jiffies_stall = rcu_jiffies_till_stall_check();
3479 jiffies_start = jiffies;
3480
3481 for (;;) {
3482 ret = wait_event_interruptible_timeout(
3483 rsp->expedited_wq,
3484 !atomic_read(&rsp->expedited_need_qs),
3485 jiffies_stall);
3486 if (ret > 0)
3487 return;
3488 if (ret < 0) {
3489 /* Hit a signal, disable CPU stall warnings. */
3490 wait_event(rsp->expedited_wq,
3491 !atomic_read(&rsp->expedited_need_qs));
3492 return;
3493 }
3494 pr_err("INFO: %s detected expedited stalls on CPUs: {",
3495 rsp->name);
3496 for_each_online_cpu(cpu) {
3497 rdp = per_cpu_ptr(rsp->rda, cpu);
3498
3499 if (rdp->exp_done)
3500 continue;
3501 pr_cont(" %d", cpu);
3502 }
3503 pr_cont(" } %lu jiffies s: %lu\n",
3504 jiffies - jiffies_start, rsp->expedited_sequence);
3505 for_each_online_cpu(cpu) {
3506 rdp = per_cpu_ptr(rsp->rda, cpu);
3507
3508 if (rdp->exp_done)
3509 continue;
3510 dump_cpu_task(cpu);
3511 }
3512 jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
3513 }
3514}
3515
3273/** 3516/**
3274 * synchronize_sched_expedited - Brute-force RCU-sched grace period 3517 * synchronize_sched_expedited - Brute-force RCU-sched grace period
3275 * 3518 *
@@ -3281,58 +3524,21 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
3281 * restructure your code to batch your updates, and then use a single 3524 * restructure your code to batch your updates, and then use a single
3282 * synchronize_sched() instead. 3525 * synchronize_sched() instead.
3283 * 3526 *
3284 * This implementation can be thought of as an application of ticket 3527 * This implementation can be thought of as an application of sequence
3285 * locking to RCU, with sync_sched_expedited_started and 3528 * locking to expedited grace periods, but using the sequence counter to
3286 * sync_sched_expedited_done taking on the roles of the halves 3529 * determine when someone else has already done the work instead of for
3287 * of the ticket-lock word. Each task atomically increments 3530 * retrying readers.
3288 * sync_sched_expedited_started upon entry, snapshotting the old value,
3289 * then attempts to stop all the CPUs. If this succeeds, then each
3290 * CPU will have executed a context switch, resulting in an RCU-sched
3291 * grace period. We are then done, so we use atomic_cmpxchg() to
3292 * update sync_sched_expedited_done to match our snapshot -- but
3293 * only if someone else has not already advanced past our snapshot.
3294 *
3295 * On the other hand, if try_stop_cpus() fails, we check the value
3296 * of sync_sched_expedited_done. If it has advanced past our
3297 * initial snapshot, then someone else must have forced a grace period
3298 * some time after we took our snapshot. In this case, our work is
3299 * done for us, and we can simply return. Otherwise, we try again,
3300 * but keep our initial snapshot for purposes of checking for someone
3301 * doing our work for us.
3302 *
3303 * If we fail too many times in a row, we fall back to synchronize_sched().
3304 */ 3531 */
3305void synchronize_sched_expedited(void) 3532void synchronize_sched_expedited(void)
3306{ 3533{
3307 cpumask_var_t cm;
3308 bool cma = false;
3309 int cpu; 3534 int cpu;
3310 long firstsnap, s, snap; 3535 unsigned long s;
3311 int trycount = 0; 3536 struct rcu_node *rnp;
3312 struct rcu_state *rsp = &rcu_sched_state; 3537 struct rcu_state *rsp = &rcu_sched_state;
3313 3538
3314 /* 3539 /* Take a snapshot of the sequence number. */
3315 * If we are in danger of counter wrap, just do synchronize_sched(). 3540 s = rcu_exp_gp_seq_snap(rsp);
3316 * By allowing sync_sched_expedited_started to advance no more than
3317 * ULONG_MAX/8 ahead of sync_sched_expedited_done, we are ensuring
3318 * that more than 3.5 billion CPUs would be required to force a
3319 * counter wrap on a 32-bit system. Quite a few more CPUs would of
3320 * course be required on a 64-bit system.
3321 */
3322 if (ULONG_CMP_GE((ulong)atomic_long_read(&rsp->expedited_start),
3323 (ulong)atomic_long_read(&rsp->expedited_done) +
3324 ULONG_MAX / 8)) {
3325 wait_rcu_gp(call_rcu_sched);
3326 atomic_long_inc(&rsp->expedited_wrap);
3327 return;
3328 }
3329 3541
3330 /*
3331 * Take a ticket. Note that atomic_inc_return() implies a
3332 * full memory barrier.
3333 */
3334 snap = atomic_long_inc_return(&rsp->expedited_start);
3335 firstsnap = snap;
3336 if (!try_get_online_cpus()) { 3542 if (!try_get_online_cpus()) {
3337 /* CPU hotplug operation in flight, fall back to normal GP. */ 3543 /* CPU hotplug operation in flight, fall back to normal GP. */
3338 wait_rcu_gp(call_rcu_sched); 3544 wait_rcu_gp(call_rcu_sched);
@@ -3341,100 +3547,38 @@ void synchronize_sched_expedited(void)
3341 } 3547 }
3342 WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id())); 3548 WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
3343 3549
3344 /* Offline CPUs, idle CPUs, and any CPU we run on are quiescent. */ 3550 rnp = exp_funnel_lock(rsp, s);
3345 cma = zalloc_cpumask_var(&cm, GFP_KERNEL); 3551 if (rnp == NULL) {
3346 if (cma) { 3552 put_online_cpus();
3347 cpumask_copy(cm, cpu_online_mask); 3553 return; /* Someone else did our work for us. */
3348 cpumask_clear_cpu(raw_smp_processor_id(), cm);
3349 for_each_cpu(cpu, cm) {
3350 struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
3351
3352 if (!(atomic_add_return(0, &rdtp->dynticks) & 0x1))
3353 cpumask_clear_cpu(cpu, cm);
3354 }
3355 if (cpumask_weight(cm) == 0)
3356 goto all_cpus_idle;
3357 } 3554 }
3358 3555
3359 /* 3556 rcu_exp_gp_seq_start(rsp);
3360 * Each pass through the following loop attempts to force a
3361 * context switch on each CPU.
3362 */
3363 while (try_stop_cpus(cma ? cm : cpu_online_mask,
3364 synchronize_sched_expedited_cpu_stop,
3365 NULL) == -EAGAIN) {
3366 put_online_cpus();
3367 atomic_long_inc(&rsp->expedited_tryfail);
3368
3369 /* Check to see if someone else did our work for us. */
3370 s = atomic_long_read(&rsp->expedited_done);
3371 if (ULONG_CMP_GE((ulong)s, (ulong)firstsnap)) {
3372 /* ensure test happens before caller kfree */
3373 smp_mb__before_atomic(); /* ^^^ */
3374 atomic_long_inc(&rsp->expedited_workdone1);
3375 free_cpumask_var(cm);
3376 return;
3377 }
3378 3557
3379 /* No joy, try again later. Or just synchronize_sched(). */ 3558 /* Stop each CPU that is online, non-idle, and not us. */
3380 if (trycount++ < 10) { 3559 init_waitqueue_head(&rsp->expedited_wq);
3381 udelay(trycount * num_online_cpus()); 3560 atomic_set(&rsp->expedited_need_qs, 1); /* Extra count avoids race. */
3382 } else { 3561 for_each_online_cpu(cpu) {
3383 wait_rcu_gp(call_rcu_sched); 3562 struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
3384 atomic_long_inc(&rsp->expedited_normal); 3563 struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
3385 free_cpumask_var(cm);
3386 return;
3387 }
3388 3564
3389 /* Recheck to see if someone else did our work for us. */ 3565 rdp->exp_done = false;
3390 s = atomic_long_read(&rsp->expedited_done);
3391 if (ULONG_CMP_GE((ulong)s, (ulong)firstsnap)) {
3392 /* ensure test happens before caller kfree */
3393 smp_mb__before_atomic(); /* ^^^ */
3394 atomic_long_inc(&rsp->expedited_workdone2);
3395 free_cpumask_var(cm);
3396 return;
3397 }
3398 3566
3399 /* 3567 /* Skip our CPU and any idle CPUs. */
3400 * Refetching sync_sched_expedited_started allows later 3568 if (raw_smp_processor_id() == cpu ||
3401 * callers to piggyback on our grace period. We retry 3569 !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
3402 * after they started, so our grace period works for them, 3570 continue;
3403 * and they started after our first try, so their grace 3571 atomic_inc(&rsp->expedited_need_qs);
3404 * period works for us. 3572 stop_one_cpu_nowait(cpu, synchronize_sched_expedited_cpu_stop,
3405 */ 3573 rdp, &rdp->exp_stop_work);
3406 if (!try_get_online_cpus()) {
3407 /* CPU hotplug operation in flight, use normal GP. */
3408 wait_rcu_gp(call_rcu_sched);
3409 atomic_long_inc(&rsp->expedited_normal);
3410 free_cpumask_var(cm);
3411 return;
3412 }
3413 snap = atomic_long_read(&rsp->expedited_start);
3414 smp_mb(); /* ensure read is before try_stop_cpus(). */
3415 } 3574 }
3416 atomic_long_inc(&rsp->expedited_stoppedcpus);
3417 3575
3418all_cpus_idle: 3576 /* Remove extra count and, if necessary, wait for CPUs to stop. */
3419 free_cpumask_var(cm); 3577 if (!atomic_dec_and_test(&rsp->expedited_need_qs))
3578 synchronize_sched_expedited_wait(rsp);
3420 3579
3421 /* 3580 rcu_exp_gp_seq_end(rsp);
3422 * Everyone up to our most recent fetch is covered by our grace 3581 mutex_unlock(&rnp->exp_funnel_mutex);
3423 * period. Update the counter, but only if our work is still
3424 * relevant -- which it won't be if someone who started later
3425 * than we did already did their update.
3426 */
3427 do {
3428 atomic_long_inc(&rsp->expedited_done_tries);
3429 s = atomic_long_read(&rsp->expedited_done);
3430 if (ULONG_CMP_GE((ulong)s, (ulong)snap)) {
3431 /* ensure test happens before caller kfree */
3432 smp_mb__before_atomic(); /* ^^^ */
3433 atomic_long_inc(&rsp->expedited_done_lost);
3434 break;
3435 }
3436 } while (atomic_long_cmpxchg(&rsp->expedited_done, s, snap) != s);
3437 atomic_long_inc(&rsp->expedited_done_exit);
3438 3582
3439 put_online_cpus(); 3583 put_online_cpus();
3440} 3584}
@@ -3571,10 +3715,10 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
3571 struct rcu_state *rsp = rdp->rsp; 3715 struct rcu_state *rsp = rdp->rsp;
3572 3716
3573 if (atomic_dec_and_test(&rsp->barrier_cpu_count)) { 3717 if (atomic_dec_and_test(&rsp->barrier_cpu_count)) {
3574 _rcu_barrier_trace(rsp, "LastCB", -1, rsp->n_barrier_done); 3718 _rcu_barrier_trace(rsp, "LastCB", -1, rsp->barrier_sequence);
3575 complete(&rsp->barrier_completion); 3719 complete(&rsp->barrier_completion);
3576 } else { 3720 } else {
3577 _rcu_barrier_trace(rsp, "CB", -1, rsp->n_barrier_done); 3721 _rcu_barrier_trace(rsp, "CB", -1, rsp->barrier_sequence);
3578 } 3722 }
3579} 3723}
3580 3724
@@ -3586,7 +3730,7 @@ static void rcu_barrier_func(void *type)
3586 struct rcu_state *rsp = type; 3730 struct rcu_state *rsp = type;
3587 struct rcu_data *rdp = raw_cpu_ptr(rsp->rda); 3731 struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
3588 3732
3589 _rcu_barrier_trace(rsp, "IRQ", -1, rsp->n_barrier_done); 3733 _rcu_barrier_trace(rsp, "IRQ", -1, rsp->barrier_sequence);
3590 atomic_inc(&rsp->barrier_cpu_count); 3734 atomic_inc(&rsp->barrier_cpu_count);
3591 rsp->call(&rdp->barrier_head, rcu_barrier_callback); 3735 rsp->call(&rdp->barrier_head, rcu_barrier_callback);
3592} 3736}
@@ -3599,55 +3743,24 @@ static void _rcu_barrier(struct rcu_state *rsp)
3599{ 3743{
3600 int cpu; 3744 int cpu;
3601 struct rcu_data *rdp; 3745 struct rcu_data *rdp;
3602 unsigned long snap = READ_ONCE(rsp->n_barrier_done); 3746 unsigned long s = rcu_seq_snap(&rsp->barrier_sequence);
3603 unsigned long snap_done;
3604 3747
3605 _rcu_barrier_trace(rsp, "Begin", -1, snap); 3748 _rcu_barrier_trace(rsp, "Begin", -1, s);
3606 3749
3607 /* Take mutex to serialize concurrent rcu_barrier() requests. */ 3750 /* Take mutex to serialize concurrent rcu_barrier() requests. */
3608 mutex_lock(&rsp->barrier_mutex); 3751 mutex_lock(&rsp->barrier_mutex);
3609 3752
3610 /* 3753 /* Did someone else do our work for us? */
3611 * Ensure that all prior references, including to ->n_barrier_done, 3754 if (rcu_seq_done(&rsp->barrier_sequence, s)) {
3612 * are ordered before the _rcu_barrier() machinery. 3755 _rcu_barrier_trace(rsp, "EarlyExit", -1, rsp->barrier_sequence);
3613 */
3614 smp_mb(); /* See above block comment. */
3615
3616 /*
3617 * Recheck ->n_barrier_done to see if others did our work for us.
3618 * This means checking ->n_barrier_done for an even-to-odd-to-even
3619 * transition. The "if" expression below therefore rounds the old
3620 * value up to the next even number and adds two before comparing.
3621 */
3622 snap_done = rsp->n_barrier_done;
3623 _rcu_barrier_trace(rsp, "Check", -1, snap_done);
3624
3625 /*
3626 * If the value in snap is odd, we needed to wait for the current
3627 * rcu_barrier() to complete, then wait for the next one, in other
3628 * words, we need the value of snap_done to be three larger than
3629 * the value of snap. On the other hand, if the value in snap is
3630 * even, we only had to wait for the next rcu_barrier() to complete,
3631 * in other words, we need the value of snap_done to be only two
3632 * greater than the value of snap. The "(snap + 3) & ~0x1" computes
3633 * this for us (thank you, Linus!).
3634 */
3635 if (ULONG_CMP_GE(snap_done, (snap + 3) & ~0x1)) {
3636 _rcu_barrier_trace(rsp, "EarlyExit", -1, snap_done);
3637 smp_mb(); /* caller's subsequent code after above check. */ 3756 smp_mb(); /* caller's subsequent code after above check. */
3638 mutex_unlock(&rsp->barrier_mutex); 3757 mutex_unlock(&rsp->barrier_mutex);
3639 return; 3758 return;
3640 } 3759 }
3641 3760
3642 /* 3761 /* Mark the start of the barrier operation. */
3643 * Increment ->n_barrier_done to avoid duplicate work. Use 3762 rcu_seq_start(&rsp->barrier_sequence);
3644 * WRITE_ONCE() to prevent the compiler from speculating 3763 _rcu_barrier_trace(rsp, "Inc1", -1, rsp->barrier_sequence);
3645 * the increment to precede the early-exit check.
3646 */
3647 WRITE_ONCE(rsp->n_barrier_done, rsp->n_barrier_done + 1);
3648 WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 1);
3649 _rcu_barrier_trace(rsp, "Inc1", -1, rsp->n_barrier_done);
3650 smp_mb(); /* Order ->n_barrier_done increment with below mechanism. */
3651 3764
3652 /* 3765 /*
3653 * Initialize the count to one rather than to zero in order to 3766 * Initialize the count to one rather than to zero in order to
@@ -3671,10 +3784,10 @@ static void _rcu_barrier(struct rcu_state *rsp)
3671 if (rcu_is_nocb_cpu(cpu)) { 3784 if (rcu_is_nocb_cpu(cpu)) {
3672 if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) { 3785 if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
3673 _rcu_barrier_trace(rsp, "OfflineNoCB", cpu, 3786 _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
3674 rsp->n_barrier_done); 3787 rsp->barrier_sequence);
3675 } else { 3788 } else {
3676 _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, 3789 _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
3677 rsp->n_barrier_done); 3790 rsp->barrier_sequence);
3678 smp_mb__before_atomic(); 3791 smp_mb__before_atomic();
3679 atomic_inc(&rsp->barrier_cpu_count); 3792 atomic_inc(&rsp->barrier_cpu_count);
3680 __call_rcu(&rdp->barrier_head, 3793 __call_rcu(&rdp->barrier_head,
@@ -3682,11 +3795,11 @@ static void _rcu_barrier(struct rcu_state *rsp)
3682 } 3795 }
3683 } else if (READ_ONCE(rdp->qlen)) { 3796 } else if (READ_ONCE(rdp->qlen)) {
3684 _rcu_barrier_trace(rsp, "OnlineQ", cpu, 3797 _rcu_barrier_trace(rsp, "OnlineQ", cpu,
3685 rsp->n_barrier_done); 3798 rsp->barrier_sequence);
3686 smp_call_function_single(cpu, rcu_barrier_func, rsp, 1); 3799 smp_call_function_single(cpu, rcu_barrier_func, rsp, 1);
3687 } else { 3800 } else {
3688 _rcu_barrier_trace(rsp, "OnlineNQ", cpu, 3801 _rcu_barrier_trace(rsp, "OnlineNQ", cpu,
3689 rsp->n_barrier_done); 3802 rsp->barrier_sequence);
3690 } 3803 }
3691 } 3804 }
3692 put_online_cpus(); 3805 put_online_cpus();
@@ -3698,16 +3811,13 @@ static void _rcu_barrier(struct rcu_state *rsp)
3698 if (atomic_dec_and_test(&rsp->barrier_cpu_count)) 3811 if (atomic_dec_and_test(&rsp->barrier_cpu_count))
3699 complete(&rsp->barrier_completion); 3812 complete(&rsp->barrier_completion);
3700 3813
3701 /* Increment ->n_barrier_done to prevent duplicate work. */
3702 smp_mb(); /* Keep increment after above mechanism. */
3703 WRITE_ONCE(rsp->n_barrier_done, rsp->n_barrier_done + 1);
3704 WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 0);
3705 _rcu_barrier_trace(rsp, "Inc2", -1, rsp->n_barrier_done);
3706 smp_mb(); /* Keep increment before caller's subsequent code. */
3707
3708 /* Wait for all rcu_barrier_callback() callbacks to be invoked. */ 3814 /* Wait for all rcu_barrier_callback() callbacks to be invoked. */
3709 wait_for_completion(&rsp->barrier_completion); 3815 wait_for_completion(&rsp->barrier_completion);
3710 3816
3817 /* Mark the end of the barrier operation. */
3818 _rcu_barrier_trace(rsp, "Inc2", -1, rsp->barrier_sequence);
3819 rcu_seq_end(&rsp->barrier_sequence);
3820
3711 /* Other rcu_barrier() invocations can now safely proceed. */ 3821 /* Other rcu_barrier() invocations can now safely proceed. */
3712 mutex_unlock(&rsp->barrier_mutex); 3822 mutex_unlock(&rsp->barrier_mutex);
3713} 3823}
@@ -3770,6 +3880,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
3770 WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1); 3880 WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
3771 rdp->cpu = cpu; 3881 rdp->cpu = cpu;
3772 rdp->rsp = rsp; 3882 rdp->rsp = rsp;
3883 mutex_init(&rdp->exp_funnel_mutex);
3773 rcu_boot_init_nocb_percpu_data(rdp); 3884 rcu_boot_init_nocb_percpu_data(rdp);
3774 raw_spin_unlock_irqrestore(&rnp->lock, flags); 3885 raw_spin_unlock_irqrestore(&rnp->lock, flags);
3775} 3886}
@@ -3961,22 +4072,22 @@ void rcu_scheduler_starting(void)
3961 * Compute the per-level fanout, either using the exact fanout specified 4072 * Compute the per-level fanout, either using the exact fanout specified
3962 * or balancing the tree, depending on the rcu_fanout_exact boot parameter. 4073 * or balancing the tree, depending on the rcu_fanout_exact boot parameter.
3963 */ 4074 */
3964static void __init rcu_init_levelspread(struct rcu_state *rsp) 4075static void __init rcu_init_levelspread(int *levelspread, const int *levelcnt)
3965{ 4076{
3966 int i; 4077 int i;
3967 4078
3968 if (rcu_fanout_exact) { 4079 if (rcu_fanout_exact) {
3969 rsp->levelspread[rcu_num_lvls - 1] = rcu_fanout_leaf; 4080 levelspread[rcu_num_lvls - 1] = rcu_fanout_leaf;
3970 for (i = rcu_num_lvls - 2; i >= 0; i--) 4081 for (i = rcu_num_lvls - 2; i >= 0; i--)
3971 rsp->levelspread[i] = RCU_FANOUT; 4082 levelspread[i] = RCU_FANOUT;
3972 } else { 4083 } else {
3973 int ccur; 4084 int ccur;
3974 int cprv; 4085 int cprv;
3975 4086
3976 cprv = nr_cpu_ids; 4087 cprv = nr_cpu_ids;
3977 for (i = rcu_num_lvls - 1; i >= 0; i--) { 4088 for (i = rcu_num_lvls - 1; i >= 0; i--) {
3978 ccur = rsp->levelcnt[i]; 4089 ccur = levelcnt[i];
3979 rsp->levelspread[i] = (cprv + ccur - 1) / ccur; 4090 levelspread[i] = (cprv + ccur - 1) / ccur;
3980 cprv = ccur; 4091 cprv = ccur;
3981 } 4092 }
3982 } 4093 }
@@ -3988,23 +4099,20 @@ static void __init rcu_init_levelspread(struct rcu_state *rsp)
3988static void __init rcu_init_one(struct rcu_state *rsp, 4099static void __init rcu_init_one(struct rcu_state *rsp,
3989 struct rcu_data __percpu *rda) 4100 struct rcu_data __percpu *rda)
3990{ 4101{
3991 static const char * const buf[] = { 4102 static const char * const buf[] = RCU_NODE_NAME_INIT;
3992 "rcu_node_0", 4103 static const char * const fqs[] = RCU_FQS_NAME_INIT;
3993 "rcu_node_1", 4104 static const char * const exp[] = RCU_EXP_NAME_INIT;
3994 "rcu_node_2", 4105 static const char * const exp_sched[] = RCU_EXP_SCHED_NAME_INIT;
3995 "rcu_node_3" }; /* Match MAX_RCU_LVLS */
3996 static const char * const fqs[] = {
3997 "rcu_node_fqs_0",
3998 "rcu_node_fqs_1",
3999 "rcu_node_fqs_2",
4000 "rcu_node_fqs_3" }; /* Match MAX_RCU_LVLS */
4001 static u8 fl_mask = 0x1; 4106 static u8 fl_mask = 0x1;
4107
4108 int levelcnt[RCU_NUM_LVLS]; /* # nodes in each level. */
4109 int levelspread[RCU_NUM_LVLS]; /* kids/node in each level. */
4002 int cpustride = 1; 4110 int cpustride = 1;
4003 int i; 4111 int i;
4004 int j; 4112 int j;
4005 struct rcu_node *rnp; 4113 struct rcu_node *rnp;
4006 4114
4007 BUILD_BUG_ON(MAX_RCU_LVLS > ARRAY_SIZE(buf)); /* Fix buf[] init! */ 4115 BUILD_BUG_ON(RCU_NUM_LVLS > ARRAY_SIZE(buf)); /* Fix buf[] init! */
4008 4116
4009 /* Silence gcc 4.8 false positive about array index out of range. */ 4117 /* Silence gcc 4.8 false positive about array index out of range. */
4010 if (rcu_num_lvls <= 0 || rcu_num_lvls > RCU_NUM_LVLS) 4118 if (rcu_num_lvls <= 0 || rcu_num_lvls > RCU_NUM_LVLS)
@@ -4013,19 +4121,19 @@ static void __init rcu_init_one(struct rcu_state *rsp,
4013 /* Initialize the level-tracking arrays. */ 4121 /* Initialize the level-tracking arrays. */
4014 4122
4015 for (i = 0; i < rcu_num_lvls; i++) 4123 for (i = 0; i < rcu_num_lvls; i++)
4016 rsp->levelcnt[i] = num_rcu_lvl[i]; 4124 levelcnt[i] = num_rcu_lvl[i];
4017 for (i = 1; i < rcu_num_lvls; i++) 4125 for (i = 1; i < rcu_num_lvls; i++)
4018 rsp->level[i] = rsp->level[i - 1] + rsp->levelcnt[i - 1]; 4126 rsp->level[i] = rsp->level[i - 1] + levelcnt[i - 1];
4019 rcu_init_levelspread(rsp); 4127 rcu_init_levelspread(levelspread, levelcnt);
4020 rsp->flavor_mask = fl_mask; 4128 rsp->flavor_mask = fl_mask;
4021 fl_mask <<= 1; 4129 fl_mask <<= 1;
4022 4130
4023 /* Initialize the elements themselves, starting from the leaves. */ 4131 /* Initialize the elements themselves, starting from the leaves. */
4024 4132
4025 for (i = rcu_num_lvls - 1; i >= 0; i--) { 4133 for (i = rcu_num_lvls - 1; i >= 0; i--) {
4026 cpustride *= rsp->levelspread[i]; 4134 cpustride *= levelspread[i];
4027 rnp = rsp->level[i]; 4135 rnp = rsp->level[i];
4028 for (j = 0; j < rsp->levelcnt[i]; j++, rnp++) { 4136 for (j = 0; j < levelcnt[i]; j++, rnp++) {
4029 raw_spin_lock_init(&rnp->lock); 4137 raw_spin_lock_init(&rnp->lock);
4030 lockdep_set_class_and_name(&rnp->lock, 4138 lockdep_set_class_and_name(&rnp->lock,
4031 &rcu_node_class[i], buf[i]); 4139 &rcu_node_class[i], buf[i]);
@@ -4045,14 +4153,23 @@ static void __init rcu_init_one(struct rcu_state *rsp,
4045 rnp->grpmask = 0; 4153 rnp->grpmask = 0;
4046 rnp->parent = NULL; 4154 rnp->parent = NULL;
4047 } else { 4155 } else {
4048 rnp->grpnum = j % rsp->levelspread[i - 1]; 4156 rnp->grpnum = j % levelspread[i - 1];
4049 rnp->grpmask = 1UL << rnp->grpnum; 4157 rnp->grpmask = 1UL << rnp->grpnum;
4050 rnp->parent = rsp->level[i - 1] + 4158 rnp->parent = rsp->level[i - 1] +
4051 j / rsp->levelspread[i - 1]; 4159 j / levelspread[i - 1];
4052 } 4160 }
4053 rnp->level = i; 4161 rnp->level = i;
4054 INIT_LIST_HEAD(&rnp->blkd_tasks); 4162 INIT_LIST_HEAD(&rnp->blkd_tasks);
4055 rcu_init_one_nocb(rnp); 4163 rcu_init_one_nocb(rnp);
4164 mutex_init(&rnp->exp_funnel_mutex);
4165 if (rsp == &rcu_sched_state)
4166 lockdep_set_class_and_name(
4167 &rnp->exp_funnel_mutex,
4168 &rcu_exp_sched_class[i], exp_sched[i]);
4169 else
4170 lockdep_set_class_and_name(
4171 &rnp->exp_funnel_mutex,
4172 &rcu_exp_class[i], exp[i]);
4056 } 4173 }
4057 } 4174 }
4058 4175
@@ -4076,9 +4193,7 @@ static void __init rcu_init_geometry(void)
4076{ 4193{
4077 ulong d; 4194 ulong d;
4078 int i; 4195 int i;
4079 int j; 4196 int rcu_capacity[RCU_NUM_LVLS];
4080 int n = nr_cpu_ids;
4081 int rcu_capacity[MAX_RCU_LVLS + 1];
4082 4197
4083 /* 4198 /*
4084 * Initialize any unspecified boot parameters. 4199 * Initialize any unspecified boot parameters.
@@ -4101,47 +4216,49 @@ static void __init rcu_init_geometry(void)
4101 rcu_fanout_leaf, nr_cpu_ids); 4216 rcu_fanout_leaf, nr_cpu_ids);
4102 4217
4103 /* 4218 /*
4104 * Compute number of nodes that can be handled an rcu_node tree
4105 * with the given number of levels. Setting rcu_capacity[0] makes
4106 * some of the arithmetic easier.
4107 */
4108 rcu_capacity[0] = 1;
4109 rcu_capacity[1] = rcu_fanout_leaf;
4110 for (i = 2; i <= MAX_RCU_LVLS; i++)
4111 rcu_capacity[i] = rcu_capacity[i - 1] * RCU_FANOUT;
4112
4113 /*
4114 * The boot-time rcu_fanout_leaf parameter is only permitted 4219 * The boot-time rcu_fanout_leaf parameter is only permitted
4115 * to increase the leaf-level fanout, not decrease it. Of course, 4220 * to increase the leaf-level fanout, not decrease it. Of course,
4116 * the leaf-level fanout cannot exceed the number of bits in 4221 * the leaf-level fanout cannot exceed the number of bits in
4117 * the rcu_node masks. Finally, the tree must be able to accommodate 4222 * the rcu_node masks. Complain and fall back to the compile-
4118 * the configured number of CPUs. Complain and fall back to the 4223 * time values if these limits are exceeded.
4119 * compile-time values if these limits are exceeded.
4120 */ 4224 */
4121 if (rcu_fanout_leaf < RCU_FANOUT_LEAF || 4225 if (rcu_fanout_leaf < RCU_FANOUT_LEAF ||
4122 rcu_fanout_leaf > sizeof(unsigned long) * 8 || 4226 rcu_fanout_leaf > sizeof(unsigned long) * 8) {
4123 n > rcu_capacity[MAX_RCU_LVLS]) { 4227 rcu_fanout_leaf = RCU_FANOUT_LEAF;
4124 WARN_ON(1); 4228 WARN_ON(1);
4125 return; 4229 return;
4126 } 4230 }
4127 4231
4232 /*
4233 * Compute number of nodes that can be handled an rcu_node tree
4234 * with the given number of levels.
4235 */
4236 rcu_capacity[0] = rcu_fanout_leaf;
4237 for (i = 1; i < RCU_NUM_LVLS; i++)
4238 rcu_capacity[i] = rcu_capacity[i - 1] * RCU_FANOUT;
4239
4240 /*
4241 * The tree must be able to accommodate the configured number of CPUs.
4242 * If this limit is exceeded than we have a serious problem elsewhere.
4243 */
4244 if (nr_cpu_ids > rcu_capacity[RCU_NUM_LVLS - 1])
4245 panic("rcu_init_geometry: rcu_capacity[] is too small");
4246
4247 /* Calculate the number of levels in the tree. */
4248 for (i = 0; nr_cpu_ids > rcu_capacity[i]; i++) {
4249 }
4250 rcu_num_lvls = i + 1;
4251
4128 /* Calculate the number of rcu_nodes at each level of the tree. */ 4252 /* Calculate the number of rcu_nodes at each level of the tree. */
4129 for (i = 1; i <= MAX_RCU_LVLS; i++) 4253 for (i = 0; i < rcu_num_lvls; i++) {
4130 if (n <= rcu_capacity[i]) { 4254 int cap = rcu_capacity[(rcu_num_lvls - 1) - i];
4131 for (j = 0; j <= i; j++) 4255 num_rcu_lvl[i] = DIV_ROUND_UP(nr_cpu_ids, cap);
4132 num_rcu_lvl[j] = 4256 }
4133 DIV_ROUND_UP(n, rcu_capacity[i - j]);
4134 rcu_num_lvls = i;
4135 for (j = i + 1; j <= MAX_RCU_LVLS; j++)
4136 num_rcu_lvl[j] = 0;
4137 break;
4138 }
4139 4257
4140 /* Calculate the total number of rcu_node structures. */ 4258 /* Calculate the total number of rcu_node structures. */
4141 rcu_num_nodes = 0; 4259 rcu_num_nodes = 0;
4142 for (i = 0; i <= MAX_RCU_LVLS; i++) 4260 for (i = 0; i < rcu_num_lvls; i++)
4143 rcu_num_nodes += num_rcu_lvl[i]; 4261 rcu_num_nodes += num_rcu_lvl[i];
4144 rcu_num_nodes -= n;
4145} 4262}
4146 4263
4147/* 4264/*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4adb7ca0bf47..0412030ca882 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -27,6 +27,7 @@
27#include <linux/threads.h> 27#include <linux/threads.h>
28#include <linux/cpumask.h> 28#include <linux/cpumask.h>
29#include <linux/seqlock.h> 29#include <linux/seqlock.h>
30#include <linux/stop_machine.h>
30 31
31/* 32/*
32 * Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and 33 * Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
@@ -36,8 +37,6 @@
36 * Of course, your mileage may vary. 37 * Of course, your mileage may vary.
37 */ 38 */
38 39
39#define MAX_RCU_LVLS 4
40
41#ifdef CONFIG_RCU_FANOUT 40#ifdef CONFIG_RCU_FANOUT
42#define RCU_FANOUT CONFIG_RCU_FANOUT 41#define RCU_FANOUT CONFIG_RCU_FANOUT
43#else /* #ifdef CONFIG_RCU_FANOUT */ 42#else /* #ifdef CONFIG_RCU_FANOUT */
@@ -66,38 +65,53 @@
66#if NR_CPUS <= RCU_FANOUT_1 65#if NR_CPUS <= RCU_FANOUT_1
67# define RCU_NUM_LVLS 1 66# define RCU_NUM_LVLS 1
68# define NUM_RCU_LVL_0 1 67# define NUM_RCU_LVL_0 1
69# define NUM_RCU_LVL_1 (NR_CPUS) 68# define NUM_RCU_NODES NUM_RCU_LVL_0
70# define NUM_RCU_LVL_2 0 69# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0 }
71# define NUM_RCU_LVL_3 0 70# define RCU_NODE_NAME_INIT { "rcu_node_0" }
72# define NUM_RCU_LVL_4 0 71# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0" }
72# define RCU_EXP_NAME_INIT { "rcu_node_exp_0" }
73# define RCU_EXP_SCHED_NAME_INIT \
74 { "rcu_node_exp_sched_0" }
73#elif NR_CPUS <= RCU_FANOUT_2 75#elif NR_CPUS <= RCU_FANOUT_2
74# define RCU_NUM_LVLS 2 76# define RCU_NUM_LVLS 2
75# define NUM_RCU_LVL_0 1 77# define NUM_RCU_LVL_0 1
76# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1) 78# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
77# define NUM_RCU_LVL_2 (NR_CPUS) 79# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1)
78# define NUM_RCU_LVL_3 0 80# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1 }
79# define NUM_RCU_LVL_4 0 81# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1" }
82# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1" }
83# define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1" }
84# define RCU_EXP_SCHED_NAME_INIT \
85 { "rcu_node_exp_sched_0", "rcu_node_exp_sched_1" }
80#elif NR_CPUS <= RCU_FANOUT_3 86#elif NR_CPUS <= RCU_FANOUT_3
81# define RCU_NUM_LVLS 3 87# define RCU_NUM_LVLS 3
82# define NUM_RCU_LVL_0 1 88# define NUM_RCU_LVL_0 1
83# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2) 89# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
84# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1) 90# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
85# define NUM_RCU_LVL_3 (NR_CPUS) 91# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2)
86# define NUM_RCU_LVL_4 0 92# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 }
93# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
94# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
95# define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2" }
96# define RCU_EXP_SCHED_NAME_INIT \
97 { "rcu_node_exp_sched_0", "rcu_node_exp_sched_1", "rcu_node_exp_sched_2" }
87#elif NR_CPUS <= RCU_FANOUT_4 98#elif NR_CPUS <= RCU_FANOUT_4
88# define RCU_NUM_LVLS 4 99# define RCU_NUM_LVLS 4
89# define NUM_RCU_LVL_0 1 100# define NUM_RCU_LVL_0 1
90# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3) 101# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3)
91# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2) 102# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
92# define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1) 103# define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
93# define NUM_RCU_LVL_4 (NR_CPUS) 104# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3)
105# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 }
106# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
107# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
108# define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2", "rcu_node_exp_3" }
109# define RCU_EXP_SCHED_NAME_INIT \
110 { "rcu_node_exp_sched_0", "rcu_node_exp_sched_1", "rcu_node_exp_sched_2", "rcu_node_exp_sched_3" }
94#else 111#else
95# error "CONFIG_RCU_FANOUT insufficient for NR_CPUS" 112# error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
96#endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */ 113#endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */
97 114
98#define RCU_SUM (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3 + NUM_RCU_LVL_4)
99#define NUM_RCU_NODES (RCU_SUM - NR_CPUS)
100
101extern int rcu_num_lvls; 115extern int rcu_num_lvls;
102extern int rcu_num_nodes; 116extern int rcu_num_nodes;
103 117
@@ -236,6 +250,8 @@ struct rcu_node {
236 int need_future_gp[2]; 250 int need_future_gp[2];
237 /* Counts of upcoming no-CB GP requests. */ 251 /* Counts of upcoming no-CB GP requests. */
238 raw_spinlock_t fqslock ____cacheline_internodealigned_in_smp; 252 raw_spinlock_t fqslock ____cacheline_internodealigned_in_smp;
253
254 struct mutex exp_funnel_mutex ____cacheline_internodealigned_in_smp;
239} ____cacheline_internodealigned_in_smp; 255} ____cacheline_internodealigned_in_smp;
240 256
241/* 257/*
@@ -287,12 +303,13 @@ struct rcu_data {
287 bool gpwrap; /* Possible gpnum/completed wrap. */ 303 bool gpwrap; /* Possible gpnum/completed wrap. */
288 struct rcu_node *mynode; /* This CPU's leaf of hierarchy */ 304 struct rcu_node *mynode; /* This CPU's leaf of hierarchy */
289 unsigned long grpmask; /* Mask to apply to leaf qsmask. */ 305 unsigned long grpmask; /* Mask to apply to leaf qsmask. */
290#ifdef CONFIG_RCU_CPU_STALL_INFO
291 unsigned long ticks_this_gp; /* The number of scheduling-clock */ 306 unsigned long ticks_this_gp; /* The number of scheduling-clock */
292 /* ticks this CPU has handled */ 307 /* ticks this CPU has handled */
293 /* during and after the last grace */ 308 /* during and after the last grace */
294 /* period it is aware of. */ 309 /* period it is aware of. */
295#endif /* #ifdef CONFIG_RCU_CPU_STALL_INFO */ 310 struct cpu_stop_work exp_stop_work;
311 /* Expedited grace-period control */
312 /* for CPU stopping. */
296 313
297 /* 2) batch handling */ 314 /* 2) batch handling */
298 /* 315 /*
@@ -355,11 +372,13 @@ struct rcu_data {
355 unsigned long n_rp_nocb_defer_wakeup; 372 unsigned long n_rp_nocb_defer_wakeup;
356 unsigned long n_rp_need_nothing; 373 unsigned long n_rp_need_nothing;
357 374
358 /* 6) _rcu_barrier() and OOM callbacks. */ 375 /* 6) _rcu_barrier(), OOM callbacks, and expediting. */
359 struct rcu_head barrier_head; 376 struct rcu_head barrier_head;
360#ifdef CONFIG_RCU_FAST_NO_HZ 377#ifdef CONFIG_RCU_FAST_NO_HZ
361 struct rcu_head oom_head; 378 struct rcu_head oom_head;
362#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */ 379#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
380 struct mutex exp_funnel_mutex;
381 bool exp_done; /* Expedited QS for this CPU? */
363 382
364 /* 7) Callback offloading. */ 383 /* 7) Callback offloading. */
365#ifdef CONFIG_RCU_NOCB_CPU 384#ifdef CONFIG_RCU_NOCB_CPU
@@ -387,9 +406,7 @@ struct rcu_data {
387#endif /* #ifdef CONFIG_RCU_NOCB_CPU */ 406#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
388 407
389 /* 8) RCU CPU stall data. */ 408 /* 8) RCU CPU stall data. */
390#ifdef CONFIG_RCU_CPU_STALL_INFO
391 unsigned int softirq_snap; /* Snapshot of softirq activity. */ 409 unsigned int softirq_snap; /* Snapshot of softirq activity. */
392#endif /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
393 410
394 int cpu; 411 int cpu;
395 struct rcu_state *rsp; 412 struct rcu_state *rsp;
@@ -442,9 +459,9 @@ do { \
442 */ 459 */
443struct rcu_state { 460struct rcu_state {
444 struct rcu_node node[NUM_RCU_NODES]; /* Hierarchy. */ 461 struct rcu_node node[NUM_RCU_NODES]; /* Hierarchy. */
445 struct rcu_node *level[RCU_NUM_LVLS]; /* Hierarchy levels. */ 462 struct rcu_node *level[RCU_NUM_LVLS + 1];
446 u32 levelcnt[MAX_RCU_LVLS + 1]; /* # nodes in each level. */ 463 /* Hierarchy levels (+1 to */
447 u8 levelspread[RCU_NUM_LVLS]; /* kids/node in each level. */ 464 /* shut bogus gcc warning) */
448 u8 flavor_mask; /* bit in flavor mask. */ 465 u8 flavor_mask; /* bit in flavor mask. */
449 struct rcu_data __percpu *rda; /* pointer of percu rcu_data. */ 466 struct rcu_data __percpu *rda; /* pointer of percu rcu_data. */
450 void (*call)(struct rcu_head *head, /* call_rcu() flavor. */ 467 void (*call)(struct rcu_head *head, /* call_rcu() flavor. */
@@ -479,21 +496,18 @@ struct rcu_state {
479 struct mutex barrier_mutex; /* Guards barrier fields. */ 496 struct mutex barrier_mutex; /* Guards barrier fields. */
480 atomic_t barrier_cpu_count; /* # CPUs waiting on. */ 497 atomic_t barrier_cpu_count; /* # CPUs waiting on. */
481 struct completion barrier_completion; /* Wake at barrier end. */ 498 struct completion barrier_completion; /* Wake at barrier end. */
482 unsigned long n_barrier_done; /* ++ at start and end of */ 499 unsigned long barrier_sequence; /* ++ at start and end of */
483 /* _rcu_barrier(). */ 500 /* _rcu_barrier(). */
484 /* End of fields guarded by barrier_mutex. */ 501 /* End of fields guarded by barrier_mutex. */
485 502
486 atomic_long_t expedited_start; /* Starting ticket. */ 503 unsigned long expedited_sequence; /* Take a ticket. */
487 atomic_long_t expedited_done; /* Done ticket. */ 504 atomic_long_t expedited_workdone0; /* # done by others #0. */
488 atomic_long_t expedited_wrap; /* # near-wrap incidents. */
489 atomic_long_t expedited_tryfail; /* # acquisition failures. */
490 atomic_long_t expedited_workdone1; /* # done by others #1. */ 505 atomic_long_t expedited_workdone1; /* # done by others #1. */
491 atomic_long_t expedited_workdone2; /* # done by others #2. */ 506 atomic_long_t expedited_workdone2; /* # done by others #2. */
507 atomic_long_t expedited_workdone3; /* # done by others #3. */
492 atomic_long_t expedited_normal; /* # fallbacks to normal. */ 508 atomic_long_t expedited_normal; /* # fallbacks to normal. */
493 atomic_long_t expedited_stoppedcpus; /* # successful stop_cpus. */ 509 atomic_t expedited_need_qs; /* # CPUs left to check in. */
494 atomic_long_t expedited_done_tries; /* # tries to update _done. */ 510 wait_queue_head_t expedited_wq; /* Wait for check-ins. */
495 atomic_long_t expedited_done_lost; /* # times beaten to _done. */
496 atomic_long_t expedited_done_exit; /* # times exited _done loop. */
497 511
498 unsigned long jiffies_force_qs; /* Time at which to invoke */ 512 unsigned long jiffies_force_qs; /* Time at which to invoke */
499 /* force_quiescent_state(). */ 513 /* force_quiescent_state(). */
@@ -527,7 +541,11 @@ struct rcu_state {
527/* Values for rcu_state structure's gp_flags field. */ 541/* Values for rcu_state structure's gp_flags field. */
528#define RCU_GP_WAIT_INIT 0 /* Initial state. */ 542#define RCU_GP_WAIT_INIT 0 /* Initial state. */
529#define RCU_GP_WAIT_GPS 1 /* Wait for grace-period start. */ 543#define RCU_GP_WAIT_GPS 1 /* Wait for grace-period start. */
530#define RCU_GP_WAIT_FQS 2 /* Wait for force-quiescent-state time. */ 544#define RCU_GP_DONE_GPS 2 /* Wait done for grace-period start. */
545#define RCU_GP_WAIT_FQS 3 /* Wait for force-quiescent-state time. */
546#define RCU_GP_DOING_FQS 4 /* Wait done for force-quiescent-state time. */
547#define RCU_GP_CLEANUP 5 /* Grace-period cleanup started. */
548#define RCU_GP_CLEANED 6 /* Grace-period cleanup complete. */
531 549
532extern struct list_head rcu_struct_flavors; 550extern struct list_head rcu_struct_flavors;
533 551
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 013485fb2b06..b2bf3963a0ae 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -82,10 +82,8 @@ static void __init rcu_bootup_announce_oddness(void)
82 pr_info("\tRCU lockdep checking is enabled.\n"); 82 pr_info("\tRCU lockdep checking is enabled.\n");
83 if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_RUNNABLE)) 83 if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_RUNNABLE))
84 pr_info("\tRCU torture testing starts during boot.\n"); 84 pr_info("\tRCU torture testing starts during boot.\n");
85 if (IS_ENABLED(CONFIG_RCU_CPU_STALL_INFO)) 85 if (RCU_NUM_LVLS >= 4)
86 pr_info("\tAdditional per-CPU info printed with stalls.\n"); 86 pr_info("\tFour(or more)-level hierarchy is enabled.\n");
87 if (NUM_RCU_LVL_4 != 0)
88 pr_info("\tFour-level hierarchy is enabled.\n");
89 if (RCU_FANOUT_LEAF != 16) 87 if (RCU_FANOUT_LEAF != 16)
90 pr_info("\tBuild-time adjustment of leaf fanout to %d.\n", 88 pr_info("\tBuild-time adjustment of leaf fanout to %d.\n",
91 RCU_FANOUT_LEAF); 89 RCU_FANOUT_LEAF);
@@ -418,8 +416,6 @@ static void rcu_print_detail_task_stall(struct rcu_state *rsp)
418 rcu_print_detail_task_stall_rnp(rnp); 416 rcu_print_detail_task_stall_rnp(rnp);
419} 417}
420 418
421#ifdef CONFIG_RCU_CPU_STALL_INFO
422
423static void rcu_print_task_stall_begin(struct rcu_node *rnp) 419static void rcu_print_task_stall_begin(struct rcu_node *rnp)
424{ 420{
425 pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):", 421 pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
@@ -431,18 +427,6 @@ static void rcu_print_task_stall_end(void)
431 pr_cont("\n"); 427 pr_cont("\n");
432} 428}
433 429
434#else /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
435
436static void rcu_print_task_stall_begin(struct rcu_node *rnp)
437{
438}
439
440static void rcu_print_task_stall_end(void)
441{
442}
443
444#endif /* #else #ifdef CONFIG_RCU_CPU_STALL_INFO */
445
446/* 430/*
447 * Scan the current list of tasks blocked within RCU read-side critical 431 * Scan the current list of tasks blocked within RCU read-side critical
448 * sections, printing out the tid of each. 432 * sections, printing out the tid of each.
@@ -538,10 +522,10 @@ EXPORT_SYMBOL_GPL(call_rcu);
538 */ 522 */
539void synchronize_rcu(void) 523void synchronize_rcu(void)
540{ 524{
541 rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) && 525 RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
542 !lock_is_held(&rcu_lock_map) && 526 lock_is_held(&rcu_lock_map) ||
543 !lock_is_held(&rcu_sched_lock_map), 527 lock_is_held(&rcu_sched_lock_map),
544 "Illegal synchronize_rcu() in RCU read-side critical section"); 528 "Illegal synchronize_rcu() in RCU read-side critical section");
545 if (!rcu_scheduler_active) 529 if (!rcu_scheduler_active)
546 return; 530 return;
547 if (rcu_gp_is_expedited()) 531 if (rcu_gp_is_expedited())
@@ -552,8 +536,6 @@ void synchronize_rcu(void)
552EXPORT_SYMBOL_GPL(synchronize_rcu); 536EXPORT_SYMBOL_GPL(synchronize_rcu);
553 537
554static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq); 538static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
555static unsigned long sync_rcu_preempt_exp_count;
556static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
557 539
558/* 540/*
559 * Return non-zero if there are any tasks in RCU read-side critical 541 * Return non-zero if there are any tasks in RCU read-side critical
@@ -573,7 +555,7 @@ static int rcu_preempted_readers_exp(struct rcu_node *rnp)
573 * for the current expedited grace period. Works only for preemptible 555 * for the current expedited grace period. Works only for preemptible
574 * RCU -- other RCU implementation use other means. 556 * RCU -- other RCU implementation use other means.
575 * 557 *
576 * Caller must hold sync_rcu_preempt_exp_mutex. 558 * Caller must hold the root rcu_node's exp_funnel_mutex.
577 */ 559 */
578static int sync_rcu_preempt_exp_done(struct rcu_node *rnp) 560static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
579{ 561{
@@ -589,7 +571,7 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
589 * recursively up the tree. (Calm down, calm down, we do the recursion 571 * recursively up the tree. (Calm down, calm down, we do the recursion
590 * iteratively!) 572 * iteratively!)
591 * 573 *
592 * Caller must hold sync_rcu_preempt_exp_mutex. 574 * Caller must hold the root rcu_node's exp_funnel_mutex.
593 */ 575 */
594static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp, 576static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
595 bool wake) 577 bool wake)
@@ -628,7 +610,7 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
628 * set the ->expmask bits on the leaf rcu_node structures to tell phase 2 610 * set the ->expmask bits on the leaf rcu_node structures to tell phase 2
629 * that work is needed here. 611 * that work is needed here.
630 * 612 *
631 * Caller must hold sync_rcu_preempt_exp_mutex. 613 * Caller must hold the root rcu_node's exp_funnel_mutex.
632 */ 614 */
633static void 615static void
634sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp) 616sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp)
@@ -671,7 +653,7 @@ sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp)
671 * invoke rcu_report_exp_rnp() to clear out the upper-level ->expmask bits, 653 * invoke rcu_report_exp_rnp() to clear out the upper-level ->expmask bits,
672 * enabling rcu_read_unlock_special() to do the bit-clearing. 654 * enabling rcu_read_unlock_special() to do the bit-clearing.
673 * 655 *
674 * Caller must hold sync_rcu_preempt_exp_mutex. 656 * Caller must hold the root rcu_node's exp_funnel_mutex.
675 */ 657 */
676static void 658static void
677sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp) 659sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp)
@@ -719,51 +701,17 @@ sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp)
719void synchronize_rcu_expedited(void) 701void synchronize_rcu_expedited(void)
720{ 702{
721 struct rcu_node *rnp; 703 struct rcu_node *rnp;
704 struct rcu_node *rnp_unlock;
722 struct rcu_state *rsp = rcu_state_p; 705 struct rcu_state *rsp = rcu_state_p;
723 unsigned long snap; 706 unsigned long s;
724 int trycount = 0;
725 707
726 smp_mb(); /* Caller's modifications seen first by other CPUs. */ 708 s = rcu_exp_gp_seq_snap(rsp);
727 snap = READ_ONCE(sync_rcu_preempt_exp_count) + 1;
728 smp_mb(); /* Above access cannot bleed into critical section. */
729 709
730 /* 710 rnp_unlock = exp_funnel_lock(rsp, s);
731 * Block CPU-hotplug operations. This means that any CPU-hotplug 711 if (rnp_unlock == NULL)
732 * operation that finds an rcu_node structure with tasks in the 712 return; /* Someone else did our work for us. */
733 * process of being boosted will know that all tasks blocking
734 * this expedited grace period will already be in the process of
735 * being boosted. This simplifies the process of moving tasks
736 * from leaf to root rcu_node structures.
737 */
738 if (!try_get_online_cpus()) {
739 /* CPU-hotplug operation in flight, fall back to normal GP. */
740 wait_rcu_gp(call_rcu);
741 return;
742 }
743 713
744 /* 714 rcu_exp_gp_seq_start(rsp);
745 * Acquire lock, falling back to synchronize_rcu() if too many
746 * lock-acquisition failures. Of course, if someone does the
747 * expedited grace period for us, just leave.
748 */
749 while (!mutex_trylock(&sync_rcu_preempt_exp_mutex)) {
750 if (ULONG_CMP_LT(snap,
751 READ_ONCE(sync_rcu_preempt_exp_count))) {
752 put_online_cpus();
753 goto mb_ret; /* Others did our work for us. */
754 }
755 if (trycount++ < 10) {
756 udelay(trycount * num_online_cpus());
757 } else {
758 put_online_cpus();
759 wait_rcu_gp(call_rcu);
760 return;
761 }
762 }
763 if (ULONG_CMP_LT(snap, READ_ONCE(sync_rcu_preempt_exp_count))) {
764 put_online_cpus();
765 goto unlock_mb_ret; /* Others did our work for us. */
766 }
767 715
768 /* force all RCU readers onto ->blkd_tasks lists. */ 716 /* force all RCU readers onto ->blkd_tasks lists. */
769 synchronize_sched_expedited(); 717 synchronize_sched_expedited();
@@ -779,20 +727,14 @@ void synchronize_rcu_expedited(void)
779 rcu_for_each_leaf_node(rsp, rnp) 727 rcu_for_each_leaf_node(rsp, rnp)
780 sync_rcu_preempt_exp_init2(rsp, rnp); 728 sync_rcu_preempt_exp_init2(rsp, rnp);
781 729
782 put_online_cpus();
783
784 /* Wait for snapshotted ->blkd_tasks lists to drain. */ 730 /* Wait for snapshotted ->blkd_tasks lists to drain. */
785 rnp = rcu_get_root(rsp); 731 rnp = rcu_get_root(rsp);
786 wait_event(sync_rcu_preempt_exp_wq, 732 wait_event(sync_rcu_preempt_exp_wq,
787 sync_rcu_preempt_exp_done(rnp)); 733 sync_rcu_preempt_exp_done(rnp));
788 734
789 /* Clean up and exit. */ 735 /* Clean up and exit. */
790 smp_mb(); /* ensure expedited GP seen before counter increment. */ 736 rcu_exp_gp_seq_end(rsp);
791 WRITE_ONCE(sync_rcu_preempt_exp_count, sync_rcu_preempt_exp_count + 1); 737 mutex_unlock(&rnp_unlock->exp_funnel_mutex);
792unlock_mb_ret:
793 mutex_unlock(&sync_rcu_preempt_exp_mutex);
794mb_ret:
795 smp_mb(); /* ensure subsequent action seen after grace period. */
796} 738}
797EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); 739EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
798 740
@@ -1061,8 +1003,7 @@ static int rcu_boost(struct rcu_node *rnp)
1061} 1003}
1062 1004
1063/* 1005/*
1064 * Priority-boosting kthread. One per leaf rcu_node and one for the 1006 * Priority-boosting kthread, one per leaf rcu_node.
1065 * root rcu_node.
1066 */ 1007 */
1067static int rcu_boost_kthread(void *arg) 1008static int rcu_boost_kthread(void *arg)
1068{ 1009{
@@ -1680,12 +1621,10 @@ static int rcu_oom_notify(struct notifier_block *self,
1680 */ 1621 */
1681 atomic_set(&oom_callback_count, 1); 1622 atomic_set(&oom_callback_count, 1);
1682 1623
1683 get_online_cpus();
1684 for_each_online_cpu(cpu) { 1624 for_each_online_cpu(cpu) {
1685 smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1); 1625 smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1);
1686 cond_resched_rcu_qs(); 1626 cond_resched_rcu_qs();
1687 } 1627 }
1688 put_online_cpus();
1689 1628
1690 /* Unconditionally decrement: no need to wake ourselves up. */ 1629 /* Unconditionally decrement: no need to wake ourselves up. */
1691 atomic_dec(&oom_callback_count); 1630 atomic_dec(&oom_callback_count);
@@ -1706,8 +1645,6 @@ early_initcall(rcu_register_oom_notifier);
1706 1645
1707#endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */ 1646#endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
1708 1647
1709#ifdef CONFIG_RCU_CPU_STALL_INFO
1710
1711#ifdef CONFIG_RCU_FAST_NO_HZ 1648#ifdef CONFIG_RCU_FAST_NO_HZ
1712 1649
1713static void print_cpu_stall_fast_no_hz(char *cp, int cpu) 1650static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
@@ -1796,33 +1733,6 @@ static void increment_cpu_stall_ticks(void)
1796 raw_cpu_inc(rsp->rda->ticks_this_gp); 1733 raw_cpu_inc(rsp->rda->ticks_this_gp);
1797} 1734}
1798 1735
1799#else /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
1800
1801static void print_cpu_stall_info_begin(void)
1802{
1803 pr_cont(" {");
1804}
1805
1806static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
1807{
1808 pr_cont(" %d", cpu);
1809}
1810
1811static void print_cpu_stall_info_end(void)
1812{
1813 pr_cont("} ");
1814}
1815
1816static void zero_cpu_stall_ticks(struct rcu_data *rdp)
1817{
1818}
1819
1820static void increment_cpu_stall_ticks(void)
1821{
1822}
1823
1824#endif /* #else #ifdef CONFIG_RCU_CPU_STALL_INFO */
1825
1826#ifdef CONFIG_RCU_NOCB_CPU 1736#ifdef CONFIG_RCU_NOCB_CPU
1827 1737
1828/* 1738/*
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 3ea7ffc7d5c4..6fc4c5ff3bb5 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -81,9 +81,9 @@ static void r_stop(struct seq_file *m, void *v)
81static int show_rcubarrier(struct seq_file *m, void *v) 81static int show_rcubarrier(struct seq_file *m, void *v)
82{ 82{
83 struct rcu_state *rsp = (struct rcu_state *)m->private; 83 struct rcu_state *rsp = (struct rcu_state *)m->private;
84 seq_printf(m, "bcc: %d nbd: %lu\n", 84 seq_printf(m, "bcc: %d bseq: %lu\n",
85 atomic_read(&rsp->barrier_cpu_count), 85 atomic_read(&rsp->barrier_cpu_count),
86 rsp->n_barrier_done); 86 rsp->barrier_sequence);
87 return 0; 87 return 0;
88} 88}
89 89
@@ -185,18 +185,15 @@ static int show_rcuexp(struct seq_file *m, void *v)
185{ 185{
186 struct rcu_state *rsp = (struct rcu_state *)m->private; 186 struct rcu_state *rsp = (struct rcu_state *)m->private;
187 187
188 seq_printf(m, "s=%lu d=%lu w=%lu tf=%lu wd1=%lu wd2=%lu n=%lu sc=%lu dt=%lu dl=%lu dx=%lu\n", 188 seq_printf(m, "s=%lu wd0=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
189 atomic_long_read(&rsp->expedited_start), 189 rsp->expedited_sequence,
190 atomic_long_read(&rsp->expedited_done), 190 atomic_long_read(&rsp->expedited_workdone0),
191 atomic_long_read(&rsp->expedited_wrap),
192 atomic_long_read(&rsp->expedited_tryfail),
193 atomic_long_read(&rsp->expedited_workdone1), 191 atomic_long_read(&rsp->expedited_workdone1),
194 atomic_long_read(&rsp->expedited_workdone2), 192 atomic_long_read(&rsp->expedited_workdone2),
193 atomic_long_read(&rsp->expedited_workdone3),
195 atomic_long_read(&rsp->expedited_normal), 194 atomic_long_read(&rsp->expedited_normal),
196 atomic_long_read(&rsp->expedited_stoppedcpus), 195 atomic_read(&rsp->expedited_need_qs),
197 atomic_long_read(&rsp->expedited_done_tries), 196 rsp->expedited_sequence / 2);
198 atomic_long_read(&rsp->expedited_done_lost),
199 atomic_long_read(&rsp->expedited_done_exit));
200 return 0; 197 return 0;
201} 198}
202 199
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index afaecb7a799a..7a0b3bc7c5ed 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -62,6 +62,55 @@ MODULE_ALIAS("rcupdate");
62 62
63module_param(rcu_expedited, int, 0); 63module_param(rcu_expedited, int, 0);
64 64
65#if defined(CONFIG_DEBUG_LOCK_ALLOC) && defined(CONFIG_PREEMPT_COUNT)
66/**
67 * rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
68 *
69 * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
70 * RCU-sched read-side critical section. In absence of
71 * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
72 * critical section unless it can prove otherwise. Note that disabling
73 * of preemption (including disabling irqs) counts as an RCU-sched
74 * read-side critical section. This is useful for debug checks in functions
75 * that required that they be called within an RCU-sched read-side
76 * critical section.
77 *
78 * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
79 * and while lockdep is disabled.
80 *
81 * Note that if the CPU is in the idle loop from an RCU point of
82 * view (ie: that we are in the section between rcu_idle_enter() and
83 * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
84 * did an rcu_read_lock(). The reason for this is that RCU ignores CPUs
85 * that are in such a section, considering these as in extended quiescent
86 * state, so such a CPU is effectively never in an RCU read-side critical
87 * section regardless of what RCU primitives it invokes. This state of
88 * affairs is required --- we need to keep an RCU-free window in idle
89 * where the CPU may possibly enter into low power mode. This way we can
90 * notice an extended quiescent state to other CPUs that started a grace
91 * period. Otherwise we would delay any grace period as long as we run in
92 * the idle task.
93 *
94 * Similarly, we avoid claiming an SRCU read lock held if the current
95 * CPU is offline.
96 */
97int rcu_read_lock_sched_held(void)
98{
99 int lockdep_opinion = 0;
100
101 if (!debug_lockdep_rcu_enabled())
102 return 1;
103 if (!rcu_is_watching())
104 return 0;
105 if (!rcu_lockdep_current_cpu_online())
106 return 0;
107 if (debug_locks)
108 lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
109 return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
110}
111EXPORT_SYMBOL(rcu_read_lock_sched_held);
112#endif
113
65#ifndef CONFIG_TINY_RCU 114#ifndef CONFIG_TINY_RCU
66 115
67static atomic_t rcu_expedited_nesting = 116static atomic_t rcu_expedited_nesting =
@@ -269,20 +318,37 @@ void wakeme_after_rcu(struct rcu_head *head)
269 rcu = container_of(head, struct rcu_synchronize, head); 318 rcu = container_of(head, struct rcu_synchronize, head);
270 complete(&rcu->completion); 319 complete(&rcu->completion);
271} 320}
321EXPORT_SYMBOL_GPL(wakeme_after_rcu);
272 322
273void wait_rcu_gp(call_rcu_func_t crf) 323void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
324 struct rcu_synchronize *rs_array)
274{ 325{
275 struct rcu_synchronize rcu; 326 int i;
276 327
277 init_rcu_head_on_stack(&rcu.head); 328 /* Initialize and register callbacks for each flavor specified. */
278 init_completion(&rcu.completion); 329 for (i = 0; i < n; i++) {
279 /* Will wake me after RCU finished. */ 330 if (checktiny &&
280 crf(&rcu.head, wakeme_after_rcu); 331 (crcu_array[i] == call_rcu ||
281 /* Wait for it. */ 332 crcu_array[i] == call_rcu_bh)) {
282 wait_for_completion(&rcu.completion); 333 might_sleep();
283 destroy_rcu_head_on_stack(&rcu.head); 334 continue;
335 }
336 init_rcu_head_on_stack(&rs_array[i].head);
337 init_completion(&rs_array[i].completion);
338 (crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
339 }
340
341 /* Wait for all callbacks to be invoked. */
342 for (i = 0; i < n; i++) {
343 if (checktiny &&
344 (crcu_array[i] == call_rcu ||
345 crcu_array[i] == call_rcu_bh))
346 continue;
347 wait_for_completion(&rs_array[i].completion);
348 destroy_rcu_head_on_stack(&rs_array[i].head);
349 }
284} 350}
285EXPORT_SYMBOL_GPL(wait_rcu_gp); 351EXPORT_SYMBOL_GPL(__wait_rcu_gp);
286 352
287#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD 353#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
288void init_rcu_head(struct rcu_head *head) 354void init_rcu_head(struct rcu_head *head)
@@ -523,8 +589,8 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks);
523void synchronize_rcu_tasks(void) 589void synchronize_rcu_tasks(void)
524{ 590{
525 /* Complain if the scheduler has not started. */ 591 /* Complain if the scheduler has not started. */
526 rcu_lockdep_assert(!rcu_scheduler_active, 592 RCU_LOCKDEP_WARN(!rcu_scheduler_active,
527 "synchronize_rcu_tasks called too soon"); 593 "synchronize_rcu_tasks called too soon");
528 594
529 /* Wait for the grace period. */ 595 /* Wait for the grace period. */
530 wait_rcu_gp(call_rcu_tasks); 596 wait_rcu_gp(call_rcu_tasks);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10081..5e73c79fadd0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2200,8 +2200,8 @@ unsigned long to_ratio(u64 period, u64 runtime)
2200#ifdef CONFIG_SMP 2200#ifdef CONFIG_SMP
2201inline struct dl_bw *dl_bw_of(int i) 2201inline struct dl_bw *dl_bw_of(int i)
2202{ 2202{
2203 rcu_lockdep_assert(rcu_read_lock_sched_held(), 2203 RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
2204 "sched RCU must be held"); 2204 "sched RCU must be held");
2205 return &cpu_rq(i)->rd->dl_bw; 2205 return &cpu_rq(i)->rd->dl_bw;
2206} 2206}
2207 2207
@@ -2210,8 +2210,8 @@ static inline int dl_bw_cpus(int i)
2210 struct root_domain *rd = cpu_rq(i)->rd; 2210 struct root_domain *rd = cpu_rq(i)->rd;
2211 int cpus = 0; 2211 int cpus = 0;
2212 2212
2213 rcu_lockdep_assert(rcu_read_lock_sched_held(), 2213 RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
2214 "sched RCU must be held"); 2214 "sched RCU must be held");
2215 for_each_cpu_and(i, rd->span, cpu_active_mask) 2215 for_each_cpu_and(i, rd->span, cpu_active_mask)
2216 cpus++; 2216 cpus++;
2217 2217
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 579ce1b929af..4008d9f95dd7 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -92,12 +92,10 @@ config NO_HZ_FULL
92 depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS 92 depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
93 # We need at least one periodic CPU for timekeeping 93 # We need at least one periodic CPU for timekeeping
94 depends on SMP 94 depends on SMP
95 # RCU_USER_QS dependency
96 depends on HAVE_CONTEXT_TRACKING 95 depends on HAVE_CONTEXT_TRACKING
97 # VIRT_CPU_ACCOUNTING_GEN dependency 96 # VIRT_CPU_ACCOUNTING_GEN dependency
98 depends on HAVE_VIRT_CPU_ACCOUNTING_GEN 97 depends on HAVE_VIRT_CPU_ACCOUNTING_GEN
99 select NO_HZ_COMMON 98 select NO_HZ_COMMON
100 select RCU_USER_QS
101 select RCU_NOCB_CPU 99 select RCU_NOCB_CPU
102 select VIRT_CPU_ACCOUNTING_GEN 100 select VIRT_CPU_ACCOUNTING_GEN
103 select IRQ_WORK 101 select IRQ_WORK
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4c4f06176f74..cb91c63b4f4a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -338,20 +338,20 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
338#include <trace/events/workqueue.h> 338#include <trace/events/workqueue.h>
339 339
340#define assert_rcu_or_pool_mutex() \ 340#define assert_rcu_or_pool_mutex() \
341 rcu_lockdep_assert(rcu_read_lock_sched_held() || \ 341 RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
342 lockdep_is_held(&wq_pool_mutex), \ 342 !lockdep_is_held(&wq_pool_mutex), \
343 "sched RCU or wq_pool_mutex should be held") 343 "sched RCU or wq_pool_mutex should be held")
344 344
345#define assert_rcu_or_wq_mutex(wq) \ 345#define assert_rcu_or_wq_mutex(wq) \
346 rcu_lockdep_assert(rcu_read_lock_sched_held() || \ 346 RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
347 lockdep_is_held(&wq->mutex), \ 347 !lockdep_is_held(&wq->mutex), \
348 "sched RCU or wq->mutex should be held") 348 "sched RCU or wq->mutex should be held")
349 349
350#define assert_rcu_or_wq_mutex_or_pool_mutex(wq) \ 350#define assert_rcu_or_wq_mutex_or_pool_mutex(wq) \
351 rcu_lockdep_assert(rcu_read_lock_sched_held() || \ 351 RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
352 lockdep_is_held(&wq->mutex) || \ 352 !lockdep_is_held(&wq->mutex) && \
353 lockdep_is_held(&wq_pool_mutex), \ 353 !lockdep_is_held(&wq_pool_mutex), \
354 "sched RCU, wq->mutex or wq_pool_mutex should be held") 354 "sched RCU, wq->mutex or wq_pool_mutex should be held")
355 355
356#define for_each_cpu_worker_pool(pool, cpu) \ 356#define for_each_cpu_worker_pool(pool, cpu) \
357 for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0]; \ 357 for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0]; \
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e2894b23efb6..3e0b662cae09 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1353,20 +1353,6 @@ config RCU_CPU_STALL_TIMEOUT
1353 RCU grace period persists, additional CPU stall warnings are 1353 RCU grace period persists, additional CPU stall warnings are
1354 printed at more widely spaced intervals. 1354 printed at more widely spaced intervals.
1355 1355
1356config RCU_CPU_STALL_INFO
1357 bool "Print additional diagnostics on RCU CPU stall"
1358 depends on (TREE_RCU || PREEMPT_RCU) && DEBUG_KERNEL
1359 default y
1360 help
1361 For each stalled CPU that is aware of the current RCU grace
1362 period, print out additional per-CPU diagnostic information
1363 regarding scheduling-clock ticks, idle state, and,
1364 for RCU_FAST_NO_HZ kernels, idle-entry state.
1365
1366 Say N if you are unsure.
1367
1368 Say Y if you want to enable such diagnostics.
1369
1370config RCU_TRACE 1356config RCU_TRACE
1371 bool "Enable tracing for RCU" 1357 bool "Enable tracing for RCU"
1372 depends on DEBUG_KERNEL 1358 depends on DEBUG_KERNEL
@@ -1379,7 +1365,7 @@ config RCU_TRACE
1379 Say N if you are unsure. 1365 Say N if you are unsure.
1380 1366
1381config RCU_EQS_DEBUG 1367config RCU_EQS_DEBUG
1382 bool "Use this when adding any sort of NO_HZ support to your arch" 1368 bool "Provide debugging asserts for adding NO_HZ support to an arch"
1383 depends on DEBUG_KERNEL 1369 depends on DEBUG_KERNEL
1384 help 1370 help
1385 This option provides consistency checks in RCU's handling of 1371 This option provides consistency checks in RCU's handling of
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index d5c8e9a3a73c..a51ca0e5beef 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5011,6 +5011,7 @@ sub process {
5011 "memory barrier without comment\n" . $herecurr); 5011 "memory barrier without comment\n" . $herecurr);
5012 } 5012 }
5013 } 5013 }
5014
5014# check for waitqueue_active without a comment. 5015# check for waitqueue_active without a comment.
5015 if ($line =~ /\bwaitqueue_active\s*\(/) { 5016 if ($line =~ /\bwaitqueue_active\s*\(/) {
5016 if (!ctx_has_comment($first_line, $linenr)) { 5017 if (!ctx_has_comment($first_line, $linenr)) {
@@ -5018,6 +5019,24 @@ sub process {
5018 "waitqueue_active without comment\n" . $herecurr); 5019 "waitqueue_active without comment\n" . $herecurr);
5019 } 5020 }
5020 } 5021 }
5022
5023# Check for expedited grace periods that interrupt non-idle non-nohz
5024# online CPUs. These expedited can therefore degrade real-time response
5025# if used carelessly, and should be avoided where not absolutely
5026# needed. It is always OK to use synchronize_rcu_expedited() and
5027# synchronize_sched_expedited() at boot time (before real-time applications
5028# start) and in error situations where real-time response is compromised in
5029# any case. Note that synchronize_srcu_expedited() does -not- interrupt
5030# other CPUs, so don't warn on uses of synchronize_srcu_expedited().
5031# Of course, nothing comes for free, and srcu_read_lock() and
5032# srcu_read_unlock() do contain full memory barriers in payment for
5033# synchronize_srcu_expedited() non-interruption properties.
5034 if ($line =~ /\b(synchronize_rcu_expedited|synchronize_sched_expedited)\(/) {
5035 WARN("EXPEDITED_RCU_GRACE_PERIOD",
5036 "expedited RCU grace periods should be avoided where they can degrade real-time response\n" . $herecurr);
5037
5038 }
5039
5021# check of hardware specific defines 5040# check of hardware specific defines
5022 if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) { 5041 if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) {
5023 CHK("ARCH_DEFINES", 5042 CHK("ARCH_DEFINES",
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 188c1d26393b..73455089feef 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -400,9 +400,9 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup,
400{ 400{
401 bool match = false; 401 bool match = false;
402 402
403 rcu_lockdep_assert(rcu_read_lock_held() || 403 RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
404 lockdep_is_held(&devcgroup_mutex), 404 lockdep_is_held(&devcgroup_mutex),
405 "device_cgroup:verify_new_ex called without proper synchronization"); 405 "device_cgroup:verify_new_ex called without proper synchronization");
406 406
407 if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) { 407 if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {
408 if (behavior == DEVCG_DEFAULT_ALLOW) { 408 if (behavior == DEVCG_DEFAULT_ALLOW) {
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01
index 2cc0e60eba6e..bafe94cbd739 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01
@@ -5,6 +5,6 @@ CONFIG_PREEMPT_NONE=n
5CONFIG_PREEMPT_VOLUNTARY=n 5CONFIG_PREEMPT_VOLUNTARY=n
6CONFIG_PREEMPT=y 6CONFIG_PREEMPT=y
7CONFIG_DEBUG_LOCK_ALLOC=y 7CONFIG_DEBUG_LOCK_ALLOC=y
8CONFIG_PROVE_LOCKING=n 8CONFIG_PROVE_LOCKING=y
9#CHECK#CONFIG_PROVE_RCU=n 9#CHECK#CONFIG_PROVE_RCU=y
10CONFIG_RCU_EXPERT=y 10CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
index 8e9137f66831..f572b873c620 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
@@ -13,7 +13,6 @@ CONFIG_MAXSMP=y
13CONFIG_RCU_NOCB_CPU=y 13CONFIG_RCU_NOCB_CPU=y
14CONFIG_RCU_NOCB_CPU_ZERO=y 14CONFIG_RCU_NOCB_CPU_ZERO=y
15CONFIG_DEBUG_LOCK_ALLOC=n 15CONFIG_DEBUG_LOCK_ALLOC=n
16CONFIG_RCU_CPU_STALL_INFO=n
17CONFIG_RCU_BOOST=n 16CONFIG_RCU_BOOST=n
18CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 17CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
19CONFIG_RCU_EXPERT=y 18CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02 b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
index aeea6a204d14..ef6a22c44dea 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
@@ -17,7 +17,6 @@ CONFIG_RCU_FANOUT_LEAF=3
17CONFIG_RCU_NOCB_CPU=n 17CONFIG_RCU_NOCB_CPU=n
18CONFIG_DEBUG_LOCK_ALLOC=y 18CONFIG_DEBUG_LOCK_ALLOC=y
19CONFIG_PROVE_LOCKING=n 19CONFIG_PROVE_LOCKING=n
20CONFIG_RCU_CPU_STALL_INFO=n
21CONFIG_RCU_BOOST=n 20CONFIG_RCU_BOOST=n
22CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 21CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
23CONFIG_RCU_EXPERT=y 22CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
index 2ac9e68ea3d1..917d2517b5b5 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT_LEAF=3
17CONFIG_RCU_NOCB_CPU=n 17CONFIG_RCU_NOCB_CPU=n
18CONFIG_DEBUG_LOCK_ALLOC=y 18CONFIG_DEBUG_LOCK_ALLOC=y
19CONFIG_PROVE_LOCKING=n 19CONFIG_PROVE_LOCKING=n
20CONFIG_RCU_CPU_STALL_INFO=n
21CONFIG_RCU_BOOST=n 20CONFIG_RCU_BOOST=n
22CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 21CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
index 72aa7d87ea99..7a17c503b382 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
@@ -13,7 +13,6 @@ CONFIG_RCU_FANOUT=2
13CONFIG_RCU_FANOUT_LEAF=2 13CONFIG_RCU_FANOUT_LEAF=2
14CONFIG_RCU_NOCB_CPU=n 14CONFIG_RCU_NOCB_CPU=n
15CONFIG_DEBUG_LOCK_ALLOC=n 15CONFIG_DEBUG_LOCK_ALLOC=n
16CONFIG_RCU_CPU_STALL_INFO=n
17CONFIG_RCU_BOOST=y 16CONFIG_RCU_BOOST=y
18CONFIG_RCU_KTHREAD_PRIO=2 17CONFIG_RCU_KTHREAD_PRIO=2
19CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 18CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
index 3f5112751cda..39a2c6d7d7ec 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT=4
17CONFIG_RCU_FANOUT_LEAF=4 17CONFIG_RCU_FANOUT_LEAF=4
18CONFIG_RCU_NOCB_CPU=n 18CONFIG_RCU_NOCB_CPU=n
19CONFIG_DEBUG_LOCK_ALLOC=n 19CONFIG_DEBUG_LOCK_ALLOC=n
20CONFIG_RCU_CPU_STALL_INFO=n
21CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 20CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
22CONFIG_RCU_EXPERT=y 21CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05 b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
index c04dfea6fd21..1257d3227b1e 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
@@ -17,6 +17,5 @@ CONFIG_RCU_NOCB_CPU_NONE=y
17CONFIG_DEBUG_LOCK_ALLOC=y 17CONFIG_DEBUG_LOCK_ALLOC=y
18CONFIG_PROVE_LOCKING=y 18CONFIG_PROVE_LOCKING=y
19#CHECK#CONFIG_PROVE_RCU=y 19#CHECK#CONFIG_PROVE_RCU=y
20CONFIG_RCU_CPU_STALL_INFO=n
21CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 20CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
22CONFIG_RCU_EXPERT=y 21CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE06 b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
index f51d2c73a68e..d3e456b74cbe 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE06
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
@@ -18,6 +18,5 @@ CONFIG_RCU_NOCB_CPU=n
18CONFIG_DEBUG_LOCK_ALLOC=y 18CONFIG_DEBUG_LOCK_ALLOC=y
19CONFIG_PROVE_LOCKING=y 19CONFIG_PROVE_LOCKING=y
20#CHECK#CONFIG_PROVE_RCU=y 20#CHECK#CONFIG_PROVE_RCU=y
21CONFIG_RCU_CPU_STALL_INFO=n
22CONFIG_DEBUG_OBJECTS_RCU_HEAD=y 21CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
23CONFIG_RCU_EXPERT=y 22CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
index f422af4ff5a3..3956b4131f72 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE07
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT=2
17CONFIG_RCU_FANOUT_LEAF=2 17CONFIG_RCU_FANOUT_LEAF=2
18CONFIG_RCU_NOCB_CPU=n 18CONFIG_RCU_NOCB_CPU=n
19CONFIG_DEBUG_LOCK_ALLOC=n 19CONFIG_DEBUG_LOCK_ALLOC=n
20CONFIG_RCU_CPU_STALL_INFO=n
21CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 20CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
22CONFIG_RCU_EXPERT=y 21CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08 b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
index a24d2ca30646..bb9b0c1a23c2 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
@@ -19,7 +19,6 @@ CONFIG_RCU_NOCB_CPU_ALL=y
19CONFIG_DEBUG_LOCK_ALLOC=n 19CONFIG_DEBUG_LOCK_ALLOC=n
20CONFIG_PROVE_LOCKING=y 20CONFIG_PROVE_LOCKING=y
21#CHECK#CONFIG_PROVE_RCU=y 21#CHECK#CONFIG_PROVE_RCU=y
22CONFIG_RCU_CPU_STALL_INFO=n
23CONFIG_RCU_BOOST=n 22CONFIG_RCU_BOOST=n
24CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 23CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
25CONFIG_RCU_EXPERT=y 24CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
index b2b8cea69dc9..2ad13f0d29cc 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT_LEAF=2
17CONFIG_RCU_NOCB_CPU=y 17CONFIG_RCU_NOCB_CPU=y
18CONFIG_RCU_NOCB_CPU_ALL=y 18CONFIG_RCU_NOCB_CPU_ALL=y
19CONFIG_DEBUG_LOCK_ALLOC=n 19CONFIG_DEBUG_LOCK_ALLOC=n
20CONFIG_RCU_CPU_STALL_INFO=n
21CONFIG_RCU_BOOST=n 20CONFIG_RCU_BOOST=n
22CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 21CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE09 b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
index aa4ed08d999d..6710e749d9de 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE09
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
@@ -13,7 +13,6 @@ CONFIG_SUSPEND=n
13CONFIG_HIBERNATION=n 13CONFIG_HIBERNATION=n
14CONFIG_RCU_NOCB_CPU=n 14CONFIG_RCU_NOCB_CPU=n
15CONFIG_DEBUG_LOCK_ALLOC=n 15CONFIG_DEBUG_LOCK_ALLOC=n
16CONFIG_RCU_CPU_STALL_INFO=n
17CONFIG_RCU_BOOST=n 16CONFIG_RCU_BOOST=n
18CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 17CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
19#CHECK#CONFIG_RCU_EXPERT=n 18#CHECK#CONFIG_RCU_EXPERT=n
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index b24c0004fc49..657f3a035488 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -16,7 +16,6 @@ CONFIG_PROVE_LOCKING -- Do several, covering CONFIG_DEBUG_LOCK_ALLOC=y and not.
16CONFIG_PROVE_RCU -- Hardwired to CONFIG_PROVE_LOCKING. 16CONFIG_PROVE_RCU -- Hardwired to CONFIG_PROVE_LOCKING.
17CONFIG_RCU_BOOST -- one of PREEMPT_RCU. 17CONFIG_RCU_BOOST -- one of PREEMPT_RCU.
18CONFIG_RCU_KTHREAD_PRIO -- set to 2 for _BOOST testing. 18CONFIG_RCU_KTHREAD_PRIO -- set to 2 for _BOOST testing.
19CONFIG_RCU_CPU_STALL_INFO -- Now default, avoid at least twice.
20CONFIG_RCU_FANOUT -- Cover hierarchy, but overlap with others. 19CONFIG_RCU_FANOUT -- Cover hierarchy, but overlap with others.
21CONFIG_RCU_FANOUT_LEAF -- Do one non-default. 20CONFIG_RCU_FANOUT_LEAF -- Do one non-default.
22CONFIG_RCU_FAST_NO_HZ -- Do one, but not with CONFIG_RCU_NOCB_CPU_ALL. 21CONFIG_RCU_FAST_NO_HZ -- Do one, but not with CONFIG_RCU_NOCB_CPU_ALL.