diff options
author | Ingo Molnar <mingo@kernel.org> | 2016-06-30 02:27:41 -0400 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2016-06-30 02:27:41 -0400 |
commit | 54d5f16e55a7cdd64e0f6bcadf2b5f871f94bb83 (patch) | |
tree | 169537619e16c6a6d802585ba97aae405642233a | |
parent | 4c2e07c6a29e0129e975727b9f57eede813eea85 (diff) | |
parent | 4d03754f04247bc4d469b78b61cac942df37445d (diff) |
Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:
- Documentation updates. Just some simple changes, no design-level
additions.
- Miscellaneous fixes.
- Torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-rw-r--r-- | Documentation/RCU/Design/Requirements/Requirements.html | 35 | ||||
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 2 | ||||
-rw-r--r-- | Documentation/RCU/whatisRCU.txt | 3 | ||||
-rw-r--r-- | Documentation/sysctl/kernel.txt | 12 | ||||
-rw-r--r-- | include/linux/kernel.h | 1 | ||||
-rw-r--r-- | include/linux/rcupdate.h | 23 | ||||
-rw-r--r-- | include/linux/torture.h | 4 | ||||
-rw-r--r-- | init/Kconfig | 1 | ||||
-rw-r--r-- | kernel/rcu/rcuperf.c | 25 | ||||
-rw-r--r-- | kernel/rcu/rcutorture.c | 9 | ||||
-rw-r--r-- | kernel/rcu/tree.c | 586 | ||||
-rw-r--r-- | kernel/rcu/tree.h | 15 | ||||
-rw-r--r-- | kernel/rcu/tree_exp.h | 656 | ||||
-rw-r--r-- | kernel/rcu/tree_plugin.h | 95 | ||||
-rw-r--r-- | kernel/rcu/update.c | 7 | ||||
-rw-r--r-- | kernel/sysctl.c | 11 | ||||
-rw-r--r-- | kernel/torture.c | 176 | ||||
-rw-r--r-- | lib/Kconfig.debug | 33 | ||||
-rw-r--r-- | tools/testing/selftests/rcutorture/bin/functions.sh | 12 | ||||
-rwxr-xr-x | tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 34 | ||||
-rwxr-xr-x | tools/testing/selftests/rcutorture/bin/kvm.sh | 2 | ||||
-rwxr-xr-x | tools/testing/selftests/rcutorture/bin/parse-console.sh | 7 | ||||
-rw-r--r-- | tools/testing/selftests/rcutorture/doc/initrd.txt | 22 |
23 files changed, 978 insertions, 793 deletions
diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index e7e24b3e86e2..ece410f40436 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html | |||
@@ -2391,6 +2391,41 @@ and <tt>RCU_NONIDLE()</tt> on the other while inspecting | |||
2391 | idle-loop code. | 2391 | idle-loop code. |
2392 | Steven Rostedt supplied <tt>_rcuidle</tt> event tracing, | 2392 | Steven Rostedt supplied <tt>_rcuidle</tt> event tracing, |
2393 | which is used quite heavily in the idle loop. | 2393 | which is used quite heavily in the idle loop. |
2394 | However, there are some restrictions on the code placed within | ||
2395 | <tt>RCU_NONIDLE()</tt>: | ||
2396 | |||
2397 | <ol> | ||
2398 | <li> Blocking is prohibited. | ||
2399 | In practice, this is not a serious restriction given that idle | ||
2400 | tasks are prohibited from blocking to begin with. | ||
2401 | <li> Although nesting <tt>RCU_NONIDLE()</tt> is permited, they cannot | ||
2402 | nest indefinitely deeply. | ||
2403 | However, given that they can be nested on the order of a million | ||
2404 | deep, even on 32-bit systems, this should not be a serious | ||
2405 | restriction. | ||
2406 | This nesting limit would probably be reached long after the | ||
2407 | compiler OOMed or the stack overflowed. | ||
2408 | <li> Any code path that enters <tt>RCU_NONIDLE()</tt> must sequence | ||
2409 | out of that same <tt>RCU_NONIDLE()</tt>. | ||
2410 | For example, the following is grossly illegal: | ||
2411 | |||
2412 | <blockquote> | ||
2413 | <pre> | ||
2414 | 1 RCU_NONIDLE({ | ||
2415 | 2 do_something(); | ||
2416 | 3 goto bad_idea; /* BUG!!! */ | ||
2417 | 4 do_something_else();}); | ||
2418 | 5 bad_idea: | ||
2419 | </pre> | ||
2420 | </blockquote> | ||
2421 | |||
2422 | <p> | ||
2423 | It is just as illegal to transfer control into the middle of | ||
2424 | <tt>RCU_NONIDLE()</tt>'s argument. | ||
2425 | Yes, in theory, you could transfer in as long as you also | ||
2426 | transferred out, but in practice you could also expect to get sharply | ||
2427 | worded review comments. | ||
2428 | </ol> | ||
2394 | 2429 | ||
2395 | <p> | 2430 | <p> |
2396 | It is similarly socially unacceptable to interrupt an | 2431 | It is similarly socially unacceptable to interrupt an |
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 0f7fb4298e7e..e93d04133fe7 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt | |||
@@ -49,7 +49,7 @@ rcupdate.rcu_task_stall_timeout | |||
49 | This boot/sysfs parameter controls the RCU-tasks stall warning | 49 | This boot/sysfs parameter controls the RCU-tasks stall warning |
50 | interval. A value of zero or less suppresses RCU-tasks stall | 50 | interval. A value of zero or less suppresses RCU-tasks stall |
51 | warnings. A positive value sets the stall-warning interval | 51 | warnings. A positive value sets the stall-warning interval |
52 | in jiffies. An RCU-tasks stall warning starts wtih the line: | 52 | in jiffies. An RCU-tasks stall warning starts with the line: |
53 | 53 | ||
54 | INFO: rcu_tasks detected stalls on tasks: | 54 | INFO: rcu_tasks detected stalls on tasks: |
55 | 55 | ||
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 111770ffa10e..204422719197 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt | |||
@@ -5,6 +5,9 @@ to start learning about RCU: | |||
5 | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ | 5 | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ |
6 | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ | 6 | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ |
7 | 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ | 7 | 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ |
8 | 2010 Big API Table http://lwn.net/Articles/419086/ | ||
9 | 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ | ||
10 | 2014 Big API Table http://lwn.net/Articles/609973/ | ||
8 | 11 | ||
9 | 12 | ||
10 | What is RCU? | 13 | What is RCU? |
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index a3683ce2a2f3..33204604de6c 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt | |||
@@ -58,6 +58,7 @@ show up in /proc/sys/kernel: | |||
58 | - panic_on_stackoverflow | 58 | - panic_on_stackoverflow |
59 | - panic_on_unrecovered_nmi | 59 | - panic_on_unrecovered_nmi |
60 | - panic_on_warn | 60 | - panic_on_warn |
61 | - panic_on_rcu_stall | ||
61 | - perf_cpu_time_max_percent | 62 | - perf_cpu_time_max_percent |
62 | - perf_event_paranoid | 63 | - perf_event_paranoid |
63 | - perf_event_max_stack | 64 | - perf_event_max_stack |
@@ -618,6 +619,17 @@ a kernel rebuild when attempting to kdump at the location of a WARN(). | |||
618 | 619 | ||
619 | ============================================================== | 620 | ============================================================== |
620 | 621 | ||
622 | panic_on_rcu_stall: | ||
623 | |||
624 | When set to 1, calls panic() after RCU stall detection messages. This | ||
625 | is useful to define the root cause of RCU stalls using a vmcore. | ||
626 | |||
627 | 0: do not panic() when RCU stall takes place, default behavior. | ||
628 | |||
629 | 1: panic() after printing RCU stall messages. | ||
630 | |||
631 | ============================================================== | ||
632 | |||
621 | perf_cpu_time_max_percent: | 633 | perf_cpu_time_max_percent: |
622 | 634 | ||
623 | Hints to the kernel how much CPU time it should be allowed to | 635 | Hints to the kernel how much CPU time it should be allowed to |
diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 94aa10ffe156..c42082112ec8 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h | |||
@@ -451,6 +451,7 @@ extern int panic_on_oops; | |||
451 | extern int panic_on_unrecovered_nmi; | 451 | extern int panic_on_unrecovered_nmi; |
452 | extern int panic_on_io_nmi; | 452 | extern int panic_on_io_nmi; |
453 | extern int panic_on_warn; | 453 | extern int panic_on_warn; |
454 | extern int sysctl_panic_on_rcu_stall; | ||
454 | extern int sysctl_panic_on_stackoverflow; | 455 | extern int sysctl_panic_on_stackoverflow; |
455 | 456 | ||
456 | extern bool crash_kexec_post_notifiers; | 457 | extern bool crash_kexec_post_notifiers; |
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5f1533e3d032..3bc5de08c0b7 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h | |||
@@ -45,6 +45,7 @@ | |||
45 | #include <linux/bug.h> | 45 | #include <linux/bug.h> |
46 | #include <linux/compiler.h> | 46 | #include <linux/compiler.h> |
47 | #include <linux/ktime.h> | 47 | #include <linux/ktime.h> |
48 | #include <linux/irqflags.h> | ||
48 | 49 | ||
49 | #include <asm/barrier.h> | 50 | #include <asm/barrier.h> |
50 | 51 | ||
@@ -379,12 +380,13 @@ static inline void rcu_init_nohz(void) | |||
379 | * in the inner idle loop. | 380 | * in the inner idle loop. |
380 | * | 381 | * |
381 | * This macro provides the way out: RCU_NONIDLE(do_something_with_RCU()) | 382 | * This macro provides the way out: RCU_NONIDLE(do_something_with_RCU()) |
382 | * will tell RCU that it needs to pay attending, invoke its argument | 383 | * will tell RCU that it needs to pay attention, invoke its argument |
383 | * (in this example, a call to the do_something_with_RCU() function), | 384 | * (in this example, calling the do_something_with_RCU() function), |
384 | * and then tell RCU to go back to ignoring this CPU. It is permissible | 385 | * and then tell RCU to go back to ignoring this CPU. It is permissible |
385 | * to nest RCU_NONIDLE() wrappers, but the nesting level is currently | 386 | * to nest RCU_NONIDLE() wrappers, but not indefinitely (but the limit is |
386 | * quite limited. If deeper nesting is required, it will be necessary | 387 | * on the order of a million or so, even on 32-bit systems). It is |
387 | * to adjust DYNTICK_TASK_NESTING_VALUE accordingly. | 388 | * not legal to block within RCU_NONIDLE(), nor is it permissible to |
389 | * transfer control either into or out of RCU_NONIDLE()'s statement. | ||
388 | */ | 390 | */ |
389 | #define RCU_NONIDLE(a) \ | 391 | #define RCU_NONIDLE(a) \ |
390 | do { \ | 392 | do { \ |
@@ -649,7 +651,16 @@ static inline void rcu_preempt_sleep_check(void) | |||
649 | * please be careful when making changes to rcu_assign_pointer() and the | 651 | * please be careful when making changes to rcu_assign_pointer() and the |
650 | * other macros that it invokes. | 652 | * other macros that it invokes. |
651 | */ | 653 | */ |
652 | #define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v)) | 654 | #define rcu_assign_pointer(p, v) \ |
655 | ({ \ | ||
656 | uintptr_t _r_a_p__v = (uintptr_t)(v); \ | ||
657 | \ | ||
658 | if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \ | ||
659 | WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \ | ||
660 | else \ | ||
661 | smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \ | ||
662 | _r_a_p__v; \ | ||
663 | }) | ||
653 | 664 | ||
654 | /** | 665 | /** |
655 | * rcu_access_pointer() - fetch RCU pointer with no dereferencing | 666 | * rcu_access_pointer() - fetch RCU pointer with no dereferencing |
diff --git a/include/linux/torture.h b/include/linux/torture.h index 7759fc3c622d..6685a73736a2 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h | |||
@@ -50,6 +50,10 @@ | |||
50 | do { if (verbose) pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); } while (0) | 50 | do { if (verbose) pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); } while (0) |
51 | 51 | ||
52 | /* Definitions for online/offline exerciser. */ | 52 | /* Definitions for online/offline exerciser. */ |
53 | bool torture_offline(int cpu, long *n_onl_attempts, long *n_onl_successes, | ||
54 | unsigned long *sum_offl, int *min_onl, int *max_onl); | ||
55 | bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes, | ||
56 | unsigned long *sum_onl, int *min_onl, int *max_onl); | ||
53 | int torture_onoff_init(long ooholdoff, long oointerval); | 57 | int torture_onoff_init(long ooholdoff, long oointerval); |
54 | void torture_onoff_stats(void); | 58 | void torture_onoff_stats(void); |
55 | bool torture_onoff_failures(void); | 59 | bool torture_onoff_failures(void); |
diff --git a/init/Kconfig b/init/Kconfig index f755a602d4a1..a068265fbcaf 100644 --- a/init/Kconfig +++ b/init/Kconfig | |||
@@ -517,6 +517,7 @@ config SRCU | |||
517 | config TASKS_RCU | 517 | config TASKS_RCU |
518 | bool | 518 | bool |
519 | default n | 519 | default n |
520 | depends on !UML | ||
520 | select SRCU | 521 | select SRCU |
521 | help | 522 | help |
522 | This option enables a task-based RCU implementation that uses | 523 | This option enables a task-based RCU implementation that uses |
diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c index 3cee0d8393ed..d38ab08a3fe7 100644 --- a/kernel/rcu/rcuperf.c +++ b/kernel/rcu/rcuperf.c | |||
@@ -58,7 +58,7 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>"); | |||
58 | #define VERBOSE_PERFOUT_ERRSTRING(s) \ | 58 | #define VERBOSE_PERFOUT_ERRSTRING(s) \ |
59 | do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0) | 59 | do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0) |
60 | 60 | ||
61 | torture_param(bool, gp_exp, true, "Use expedited GP wait primitives"); | 61 | torture_param(bool, gp_exp, false, "Use expedited GP wait primitives"); |
62 | torture_param(int, holdoff, 10, "Holdoff time before test start (s)"); | 62 | torture_param(int, holdoff, 10, "Holdoff time before test start (s)"); |
63 | torture_param(int, nreaders, -1, "Number of RCU reader threads"); | 63 | torture_param(int, nreaders, -1, "Number of RCU reader threads"); |
64 | torture_param(int, nwriters, -1, "Number of RCU updater threads"); | 64 | torture_param(int, nwriters, -1, "Number of RCU updater threads"); |
@@ -96,12 +96,7 @@ static int rcu_perf_writer_state; | |||
96 | #define MAX_MEAS 10000 | 96 | #define MAX_MEAS 10000 |
97 | #define MIN_MEAS 100 | 97 | #define MIN_MEAS 100 |
98 | 98 | ||
99 | #if defined(MODULE) || defined(CONFIG_RCU_PERF_TEST_RUNNABLE) | 99 | static int perf_runnable = IS_ENABLED(MODULE); |
100 | #define RCUPERF_RUNNABLE_INIT 1 | ||
101 | #else | ||
102 | #define RCUPERF_RUNNABLE_INIT 0 | ||
103 | #endif | ||
104 | static int perf_runnable = RCUPERF_RUNNABLE_INIT; | ||
105 | module_param(perf_runnable, int, 0444); | 100 | module_param(perf_runnable, int, 0444); |
106 | MODULE_PARM_DESC(perf_runnable, "Start rcuperf at boot"); | 101 | MODULE_PARM_DESC(perf_runnable, "Start rcuperf at boot"); |
107 | 102 | ||
@@ -363,8 +358,6 @@ rcu_perf_writer(void *arg) | |||
363 | u64 *wdpp = writer_durations[me]; | 358 | u64 *wdpp = writer_durations[me]; |
364 | 359 | ||
365 | VERBOSE_PERFOUT_STRING("rcu_perf_writer task started"); | 360 | VERBOSE_PERFOUT_STRING("rcu_perf_writer task started"); |
366 | WARN_ON(rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp); | ||
367 | WARN_ON(rcu_gp_is_normal() && gp_exp); | ||
368 | WARN_ON(!wdpp); | 361 | WARN_ON(!wdpp); |
369 | set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); | 362 | set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); |
370 | sp.sched_priority = 1; | 363 | sp.sched_priority = 1; |
@@ -631,12 +624,24 @@ rcu_perf_init(void) | |||
631 | firsterr = -ENOMEM; | 624 | firsterr = -ENOMEM; |
632 | goto unwind; | 625 | goto unwind; |
633 | } | 626 | } |
627 | if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp) { | ||
628 | VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!"); | ||
629 | firsterr = -EINVAL; | ||
630 | goto unwind; | ||
631 | } | ||
632 | if (rcu_gp_is_normal() && gp_exp) { | ||
633 | VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!"); | ||
634 | firsterr = -EINVAL; | ||
635 | goto unwind; | ||
636 | } | ||
634 | for (i = 0; i < nrealwriters; i++) { | 637 | for (i = 0; i < nrealwriters; i++) { |
635 | writer_durations[i] = | 638 | writer_durations[i] = |
636 | kcalloc(MAX_MEAS, sizeof(*writer_durations[i]), | 639 | kcalloc(MAX_MEAS, sizeof(*writer_durations[i]), |
637 | GFP_KERNEL); | 640 | GFP_KERNEL); |
638 | if (!writer_durations[i]) | 641 | if (!writer_durations[i]) { |
642 | firsterr = -ENOMEM; | ||
639 | goto unwind; | 643 | goto unwind; |
644 | } | ||
640 | firsterr = torture_create_kthread(rcu_perf_writer, (void *)i, | 645 | firsterr = torture_create_kthread(rcu_perf_writer, (void *)i, |
641 | writer_tasks[i]); | 646 | writer_tasks[i]); |
642 | if (firsterr) | 647 | if (firsterr) |
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 084a28a732eb..971e2b138063 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c | |||
@@ -182,12 +182,7 @@ static const char *rcu_torture_writer_state_getname(void) | |||
182 | return rcu_torture_writer_state_names[i]; | 182 | return rcu_torture_writer_state_names[i]; |
183 | } | 183 | } |
184 | 184 | ||
185 | #if defined(MODULE) || defined(CONFIG_RCU_TORTURE_TEST_RUNNABLE) | 185 | static int torture_runnable = IS_ENABLED(MODULE); |
186 | #define RCUTORTURE_RUNNABLE_INIT 1 | ||
187 | #else | ||
188 | #define RCUTORTURE_RUNNABLE_INIT 0 | ||
189 | #endif | ||
190 | static int torture_runnable = RCUTORTURE_RUNNABLE_INIT; | ||
191 | module_param(torture_runnable, int, 0444); | 186 | module_param(torture_runnable, int, 0444); |
192 | MODULE_PARM_DESC(torture_runnable, "Start rcutorture at boot"); | 187 | MODULE_PARM_DESC(torture_runnable, "Start rcutorture at boot"); |
193 | 188 | ||
@@ -1476,7 +1471,7 @@ static int rcu_torture_barrier_cbs(void *arg) | |||
1476 | break; | 1471 | break; |
1477 | /* | 1472 | /* |
1478 | * The above smp_load_acquire() ensures barrier_phase load | 1473 | * The above smp_load_acquire() ensures barrier_phase load |
1479 | * is ordered before the folloiwng ->call(). | 1474 | * is ordered before the following ->call(). |
1480 | */ | 1475 | */ |
1481 | local_irq_disable(); /* Just to test no-irq call_rcu(). */ | 1476 | local_irq_disable(); /* Just to test no-irq call_rcu(). */ |
1482 | cur_ops->call(&rcu, rcu_torture_barrier_cbf); | 1477 | cur_ops->call(&rcu, rcu_torture_barrier_cbf); |
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index c7f1bc4f817c..f433959e9322 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c | |||
@@ -125,12 +125,14 @@ int rcu_num_lvls __read_mostly = RCU_NUM_LVLS; | |||
125 | /* Number of rcu_nodes at specified level. */ | 125 | /* Number of rcu_nodes at specified level. */ |
126 | static int num_rcu_lvl[] = NUM_RCU_LVL_INIT; | 126 | static int num_rcu_lvl[] = NUM_RCU_LVL_INIT; |
127 | int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */ | 127 | int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */ |
128 | /* panic() on RCU Stall sysctl. */ | ||
129 | int sysctl_panic_on_rcu_stall __read_mostly; | ||
128 | 130 | ||
129 | /* | 131 | /* |
130 | * The rcu_scheduler_active variable transitions from zero to one just | 132 | * The rcu_scheduler_active variable transitions from zero to one just |
131 | * before the first task is spawned. So when this variable is zero, RCU | 133 | * before the first task is spawned. So when this variable is zero, RCU |
132 | * can assume that there is but one task, allowing RCU to (for example) | 134 | * can assume that there is but one task, allowing RCU to (for example) |
133 | * optimize synchronize_sched() to a simple barrier(). When this variable | 135 | * optimize synchronize_rcu() to a simple barrier(). When this variable |
134 | * is one, RCU must actually do all the hard work required to detect real | 136 | * is one, RCU must actually do all the hard work required to detect real |
135 | * grace periods. This variable is also used to suppress boot-time false | 137 | * grace periods. This variable is also used to suppress boot-time false |
136 | * positives from lockdep-RCU error checking. | 138 | * positives from lockdep-RCU error checking. |
@@ -159,6 +161,7 @@ static void invoke_rcu_core(void); | |||
159 | static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp); | 161 | static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp); |
160 | static void rcu_report_exp_rdp(struct rcu_state *rsp, | 162 | static void rcu_report_exp_rdp(struct rcu_state *rsp, |
161 | struct rcu_data *rdp, bool wake); | 163 | struct rcu_data *rdp, bool wake); |
164 | static void sync_sched_exp_online_cleanup(int cpu); | ||
162 | 165 | ||
163 | /* rcuc/rcub kthread realtime priority */ | 166 | /* rcuc/rcub kthread realtime priority */ |
164 | #ifdef CONFIG_RCU_KTHREAD_PRIO | 167 | #ifdef CONFIG_RCU_KTHREAD_PRIO |
@@ -1284,9 +1287,9 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp) | |||
1284 | rcu_for_each_leaf_node(rsp, rnp) { | 1287 | rcu_for_each_leaf_node(rsp, rnp) { |
1285 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | 1288 | raw_spin_lock_irqsave_rcu_node(rnp, flags); |
1286 | if (rnp->qsmask != 0) { | 1289 | if (rnp->qsmask != 0) { |
1287 | for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++) | 1290 | for_each_leaf_node_possible_cpu(rnp, cpu) |
1288 | if (rnp->qsmask & (1UL << cpu)) | 1291 | if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) |
1289 | dump_cpu_task(rnp->grplo + cpu); | 1292 | dump_cpu_task(cpu); |
1290 | } | 1293 | } |
1291 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | 1294 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); |
1292 | } | 1295 | } |
@@ -1311,6 +1314,12 @@ static void rcu_stall_kick_kthreads(struct rcu_state *rsp) | |||
1311 | } | 1314 | } |
1312 | } | 1315 | } |
1313 | 1316 | ||
1317 | static inline void panic_on_rcu_stall(void) | ||
1318 | { | ||
1319 | if (sysctl_panic_on_rcu_stall) | ||
1320 | panic("RCU Stall\n"); | ||
1321 | } | ||
1322 | |||
1314 | static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum) | 1323 | static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum) |
1315 | { | 1324 | { |
1316 | int cpu; | 1325 | int cpu; |
@@ -1351,10 +1360,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum) | |||
1351 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | 1360 | raw_spin_lock_irqsave_rcu_node(rnp, flags); |
1352 | ndetected += rcu_print_task_stall(rnp); | 1361 | ndetected += rcu_print_task_stall(rnp); |
1353 | if (rnp->qsmask != 0) { | 1362 | if (rnp->qsmask != 0) { |
1354 | for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++) | 1363 | for_each_leaf_node_possible_cpu(rnp, cpu) |
1355 | if (rnp->qsmask & (1UL << cpu)) { | 1364 | if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) { |
1356 | print_cpu_stall_info(rsp, | 1365 | print_cpu_stall_info(rsp, cpu); |
1357 | rnp->grplo + cpu); | ||
1358 | ndetected++; | 1366 | ndetected++; |
1359 | } | 1367 | } |
1360 | } | 1368 | } |
@@ -1390,6 +1398,8 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum) | |||
1390 | 1398 | ||
1391 | rcu_check_gp_kthread_starvation(rsp); | 1399 | rcu_check_gp_kthread_starvation(rsp); |
1392 | 1400 | ||
1401 | panic_on_rcu_stall(); | ||
1402 | |||
1393 | force_quiescent_state(rsp); /* Kick them all. */ | 1403 | force_quiescent_state(rsp); /* Kick them all. */ |
1394 | } | 1404 | } |
1395 | 1405 | ||
@@ -1430,6 +1440,8 @@ static void print_cpu_stall(struct rcu_state *rsp) | |||
1430 | jiffies + 3 * rcu_jiffies_till_stall_check() + 3); | 1440 | jiffies + 3 * rcu_jiffies_till_stall_check() + 3); |
1431 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | 1441 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); |
1432 | 1442 | ||
1443 | panic_on_rcu_stall(); | ||
1444 | |||
1433 | /* | 1445 | /* |
1434 | * Attempt to revive the RCU machinery by forcing a context switch. | 1446 | * Attempt to revive the RCU machinery by forcing a context switch. |
1435 | * | 1447 | * |
@@ -1989,8 +2001,7 @@ static bool rcu_gp_init(struct rcu_state *rsp) | |||
1989 | * of the tree within the rsp->node[] array. Note that other CPUs | 2001 | * of the tree within the rsp->node[] array. Note that other CPUs |
1990 | * will access only the leaves of the hierarchy, thus seeing that no | 2002 | * will access only the leaves of the hierarchy, thus seeing that no |
1991 | * grace period is in progress, at least until the corresponding | 2003 | * grace period is in progress, at least until the corresponding |
1992 | * leaf node has been initialized. In addition, we have excluded | 2004 | * leaf node has been initialized. |
1993 | * CPU-hotplug operations. | ||
1994 | * | 2005 | * |
1995 | * The grace period cannot complete until the initialization | 2006 | * The grace period cannot complete until the initialization |
1996 | * process finishes, because this kthread handles both. | 2007 | * process finishes, because this kthread handles both. |
@@ -2872,7 +2883,6 @@ static void force_qs_rnp(struct rcu_state *rsp, | |||
2872 | unsigned long *maxj), | 2883 | unsigned long *maxj), |
2873 | bool *isidle, unsigned long *maxj) | 2884 | bool *isidle, unsigned long *maxj) |
2874 | { | 2885 | { |
2875 | unsigned long bit; | ||
2876 | int cpu; | 2886 | int cpu; |
2877 | unsigned long flags; | 2887 | unsigned long flags; |
2878 | unsigned long mask; | 2888 | unsigned long mask; |
@@ -2907,9 +2917,8 @@ static void force_qs_rnp(struct rcu_state *rsp, | |||
2907 | continue; | 2917 | continue; |
2908 | } | 2918 | } |
2909 | } | 2919 | } |
2910 | cpu = rnp->grplo; | 2920 | for_each_leaf_node_possible_cpu(rnp, cpu) { |
2911 | bit = 1; | 2921 | unsigned long bit = leaf_node_cpu_bit(rnp, cpu); |
2912 | for (; cpu <= rnp->grphi; cpu++, bit <<= 1) { | ||
2913 | if ((rnp->qsmask & bit) != 0) { | 2922 | if ((rnp->qsmask & bit) != 0) { |
2914 | if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj)) | 2923 | if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj)) |
2915 | mask |= bit; | 2924 | mask |= bit; |
@@ -3448,549 +3457,6 @@ static bool rcu_seq_done(unsigned long *sp, unsigned long s) | |||
3448 | return ULONG_CMP_GE(READ_ONCE(*sp), s); | 3457 | return ULONG_CMP_GE(READ_ONCE(*sp), s); |
3449 | } | 3458 | } |
3450 | 3459 | ||
3451 | /* Wrapper functions for expedited grace periods. */ | ||
3452 | static void rcu_exp_gp_seq_start(struct rcu_state *rsp) | ||
3453 | { | ||
3454 | rcu_seq_start(&rsp->expedited_sequence); | ||
3455 | } | ||
3456 | static void rcu_exp_gp_seq_end(struct rcu_state *rsp) | ||
3457 | { | ||
3458 | rcu_seq_end(&rsp->expedited_sequence); | ||
3459 | smp_mb(); /* Ensure that consecutive grace periods serialize. */ | ||
3460 | } | ||
3461 | static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp) | ||
3462 | { | ||
3463 | unsigned long s; | ||
3464 | |||
3465 | smp_mb(); /* Caller's modifications seen first by other CPUs. */ | ||
3466 | s = rcu_seq_snap(&rsp->expedited_sequence); | ||
3467 | trace_rcu_exp_grace_period(rsp->name, s, TPS("snap")); | ||
3468 | return s; | ||
3469 | } | ||
3470 | static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s) | ||
3471 | { | ||
3472 | return rcu_seq_done(&rsp->expedited_sequence, s); | ||
3473 | } | ||
3474 | |||
3475 | /* | ||
3476 | * Reset the ->expmaskinit values in the rcu_node tree to reflect any | ||
3477 | * recent CPU-online activity. Note that these masks are not cleared | ||
3478 | * when CPUs go offline, so they reflect the union of all CPUs that have | ||
3479 | * ever been online. This means that this function normally takes its | ||
3480 | * no-work-to-do fastpath. | ||
3481 | */ | ||
3482 | static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp) | ||
3483 | { | ||
3484 | bool done; | ||
3485 | unsigned long flags; | ||
3486 | unsigned long mask; | ||
3487 | unsigned long oldmask; | ||
3488 | int ncpus = READ_ONCE(rsp->ncpus); | ||
3489 | struct rcu_node *rnp; | ||
3490 | struct rcu_node *rnp_up; | ||
3491 | |||
3492 | /* If no new CPUs onlined since last time, nothing to do. */ | ||
3493 | if (likely(ncpus == rsp->ncpus_snap)) | ||
3494 | return; | ||
3495 | rsp->ncpus_snap = ncpus; | ||
3496 | |||
3497 | /* | ||
3498 | * Each pass through the following loop propagates newly onlined | ||
3499 | * CPUs for the current rcu_node structure up the rcu_node tree. | ||
3500 | */ | ||
3501 | rcu_for_each_leaf_node(rsp, rnp) { | ||
3502 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
3503 | if (rnp->expmaskinit == rnp->expmaskinitnext) { | ||
3504 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3505 | continue; /* No new CPUs, nothing to do. */ | ||
3506 | } | ||
3507 | |||
3508 | /* Update this node's mask, track old value for propagation. */ | ||
3509 | oldmask = rnp->expmaskinit; | ||
3510 | rnp->expmaskinit = rnp->expmaskinitnext; | ||
3511 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3512 | |||
3513 | /* If was already nonzero, nothing to propagate. */ | ||
3514 | if (oldmask) | ||
3515 | continue; | ||
3516 | |||
3517 | /* Propagate the new CPU up the tree. */ | ||
3518 | mask = rnp->grpmask; | ||
3519 | rnp_up = rnp->parent; | ||
3520 | done = false; | ||
3521 | while (rnp_up) { | ||
3522 | raw_spin_lock_irqsave_rcu_node(rnp_up, flags); | ||
3523 | if (rnp_up->expmaskinit) | ||
3524 | done = true; | ||
3525 | rnp_up->expmaskinit |= mask; | ||
3526 | raw_spin_unlock_irqrestore_rcu_node(rnp_up, flags); | ||
3527 | if (done) | ||
3528 | break; | ||
3529 | mask = rnp_up->grpmask; | ||
3530 | rnp_up = rnp_up->parent; | ||
3531 | } | ||
3532 | } | ||
3533 | } | ||
3534 | |||
3535 | /* | ||
3536 | * Reset the ->expmask values in the rcu_node tree in preparation for | ||
3537 | * a new expedited grace period. | ||
3538 | */ | ||
3539 | static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp) | ||
3540 | { | ||
3541 | unsigned long flags; | ||
3542 | struct rcu_node *rnp; | ||
3543 | |||
3544 | sync_exp_reset_tree_hotplug(rsp); | ||
3545 | rcu_for_each_node_breadth_first(rsp, rnp) { | ||
3546 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
3547 | WARN_ON_ONCE(rnp->expmask); | ||
3548 | rnp->expmask = rnp->expmaskinit; | ||
3549 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3550 | } | ||
3551 | } | ||
3552 | |||
3553 | /* | ||
3554 | * Return non-zero if there is no RCU expedited grace period in progress | ||
3555 | * for the specified rcu_node structure, in other words, if all CPUs and | ||
3556 | * tasks covered by the specified rcu_node structure have done their bit | ||
3557 | * for the current expedited grace period. Works only for preemptible | ||
3558 | * RCU -- other RCU implementation use other means. | ||
3559 | * | ||
3560 | * Caller must hold the rcu_state's exp_mutex. | ||
3561 | */ | ||
3562 | static int sync_rcu_preempt_exp_done(struct rcu_node *rnp) | ||
3563 | { | ||
3564 | return rnp->exp_tasks == NULL && | ||
3565 | READ_ONCE(rnp->expmask) == 0; | ||
3566 | } | ||
3567 | |||
3568 | /* | ||
3569 | * Report the exit from RCU read-side critical section for the last task | ||
3570 | * that queued itself during or before the current expedited preemptible-RCU | ||
3571 | * grace period. This event is reported either to the rcu_node structure on | ||
3572 | * which the task was queued or to one of that rcu_node structure's ancestors, | ||
3573 | * recursively up the tree. (Calm down, calm down, we do the recursion | ||
3574 | * iteratively!) | ||
3575 | * | ||
3576 | * Caller must hold the rcu_state's exp_mutex and the specified rcu_node | ||
3577 | * structure's ->lock. | ||
3578 | */ | ||
3579 | static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp, | ||
3580 | bool wake, unsigned long flags) | ||
3581 | __releases(rnp->lock) | ||
3582 | { | ||
3583 | unsigned long mask; | ||
3584 | |||
3585 | for (;;) { | ||
3586 | if (!sync_rcu_preempt_exp_done(rnp)) { | ||
3587 | if (!rnp->expmask) | ||
3588 | rcu_initiate_boost(rnp, flags); | ||
3589 | else | ||
3590 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3591 | break; | ||
3592 | } | ||
3593 | if (rnp->parent == NULL) { | ||
3594 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3595 | if (wake) { | ||
3596 | smp_mb(); /* EGP done before wake_up(). */ | ||
3597 | swake_up(&rsp->expedited_wq); | ||
3598 | } | ||
3599 | break; | ||
3600 | } | ||
3601 | mask = rnp->grpmask; | ||
3602 | raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled */ | ||
3603 | rnp = rnp->parent; | ||
3604 | raw_spin_lock_rcu_node(rnp); /* irqs already disabled */ | ||
3605 | WARN_ON_ONCE(!(rnp->expmask & mask)); | ||
3606 | rnp->expmask &= ~mask; | ||
3607 | } | ||
3608 | } | ||
3609 | |||
3610 | /* | ||
3611 | * Report expedited quiescent state for specified node. This is a | ||
3612 | * lock-acquisition wrapper function for __rcu_report_exp_rnp(). | ||
3613 | * | ||
3614 | * Caller must hold the rcu_state's exp_mutex. | ||
3615 | */ | ||
3616 | static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp, | ||
3617 | struct rcu_node *rnp, bool wake) | ||
3618 | { | ||
3619 | unsigned long flags; | ||
3620 | |||
3621 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
3622 | __rcu_report_exp_rnp(rsp, rnp, wake, flags); | ||
3623 | } | ||
3624 | |||
3625 | /* | ||
3626 | * Report expedited quiescent state for multiple CPUs, all covered by the | ||
3627 | * specified leaf rcu_node structure. Caller must hold the rcu_state's | ||
3628 | * exp_mutex. | ||
3629 | */ | ||
3630 | static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp, | ||
3631 | unsigned long mask, bool wake) | ||
3632 | { | ||
3633 | unsigned long flags; | ||
3634 | |||
3635 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
3636 | if (!(rnp->expmask & mask)) { | ||
3637 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3638 | return; | ||
3639 | } | ||
3640 | rnp->expmask &= ~mask; | ||
3641 | __rcu_report_exp_rnp(rsp, rnp, wake, flags); /* Releases rnp->lock. */ | ||
3642 | } | ||
3643 | |||
3644 | /* | ||
3645 | * Report expedited quiescent state for specified rcu_data (CPU). | ||
3646 | */ | ||
3647 | static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp, | ||
3648 | bool wake) | ||
3649 | { | ||
3650 | rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake); | ||
3651 | } | ||
3652 | |||
3653 | /* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */ | ||
3654 | static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, | ||
3655 | unsigned long s) | ||
3656 | { | ||
3657 | if (rcu_exp_gp_seq_done(rsp, s)) { | ||
3658 | trace_rcu_exp_grace_period(rsp->name, s, TPS("done")); | ||
3659 | /* Ensure test happens before caller kfree(). */ | ||
3660 | smp_mb__before_atomic(); /* ^^^ */ | ||
3661 | atomic_long_inc(stat); | ||
3662 | return true; | ||
3663 | } | ||
3664 | return false; | ||
3665 | } | ||
3666 | |||
3667 | /* | ||
3668 | * Funnel-lock acquisition for expedited grace periods. Returns true | ||
3669 | * if some other task completed an expedited grace period that this task | ||
3670 | * can piggy-back on, and with no mutex held. Otherwise, returns false | ||
3671 | * with the mutex held, indicating that the caller must actually do the | ||
3672 | * expedited grace period. | ||
3673 | */ | ||
3674 | static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s) | ||
3675 | { | ||
3676 | struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id()); | ||
3677 | struct rcu_node *rnp = rdp->mynode; | ||
3678 | struct rcu_node *rnp_root = rcu_get_root(rsp); | ||
3679 | |||
3680 | /* Low-contention fastpath. */ | ||
3681 | if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s) && | ||
3682 | (rnp == rnp_root || | ||
3683 | ULONG_CMP_LT(READ_ONCE(rnp_root->exp_seq_rq), s)) && | ||
3684 | !mutex_is_locked(&rsp->exp_mutex) && | ||
3685 | mutex_trylock(&rsp->exp_mutex)) | ||
3686 | goto fastpath; | ||
3687 | |||
3688 | /* | ||
3689 | * Each pass through the following loop works its way up | ||
3690 | * the rcu_node tree, returning if others have done the work or | ||
3691 | * otherwise falls through to acquire rsp->exp_mutex. The mapping | ||
3692 | * from CPU to rcu_node structure can be inexact, as it is just | ||
3693 | * promoting locality and is not strictly needed for correctness. | ||
3694 | */ | ||
3695 | for (; rnp != NULL; rnp = rnp->parent) { | ||
3696 | if (sync_exp_work_done(rsp, &rdp->exp_workdone1, s)) | ||
3697 | return true; | ||
3698 | |||
3699 | /* Work not done, either wait here or go up. */ | ||
3700 | spin_lock(&rnp->exp_lock); | ||
3701 | if (ULONG_CMP_GE(rnp->exp_seq_rq, s)) { | ||
3702 | |||
3703 | /* Someone else doing GP, so wait for them. */ | ||
3704 | spin_unlock(&rnp->exp_lock); | ||
3705 | trace_rcu_exp_funnel_lock(rsp->name, rnp->level, | ||
3706 | rnp->grplo, rnp->grphi, | ||
3707 | TPS("wait")); | ||
3708 | wait_event(rnp->exp_wq[(s >> 1) & 0x3], | ||
3709 | sync_exp_work_done(rsp, | ||
3710 | &rdp->exp_workdone2, s)); | ||
3711 | return true; | ||
3712 | } | ||
3713 | rnp->exp_seq_rq = s; /* Followers can wait on us. */ | ||
3714 | spin_unlock(&rnp->exp_lock); | ||
3715 | trace_rcu_exp_funnel_lock(rsp->name, rnp->level, rnp->grplo, | ||
3716 | rnp->grphi, TPS("nxtlvl")); | ||
3717 | } | ||
3718 | mutex_lock(&rsp->exp_mutex); | ||
3719 | fastpath: | ||
3720 | if (sync_exp_work_done(rsp, &rdp->exp_workdone3, s)) { | ||
3721 | mutex_unlock(&rsp->exp_mutex); | ||
3722 | return true; | ||
3723 | } | ||
3724 | rcu_exp_gp_seq_start(rsp); | ||
3725 | trace_rcu_exp_grace_period(rsp->name, s, TPS("start")); | ||
3726 | return false; | ||
3727 | } | ||
3728 | |||
3729 | /* Invoked on each online non-idle CPU for expedited quiescent state. */ | ||
3730 | static void sync_sched_exp_handler(void *data) | ||
3731 | { | ||
3732 | struct rcu_data *rdp; | ||
3733 | struct rcu_node *rnp; | ||
3734 | struct rcu_state *rsp = data; | ||
3735 | |||
3736 | rdp = this_cpu_ptr(rsp->rda); | ||
3737 | rnp = rdp->mynode; | ||
3738 | if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || | ||
3739 | __this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)) | ||
3740 | return; | ||
3741 | if (rcu_is_cpu_rrupt_from_idle()) { | ||
3742 | rcu_report_exp_rdp(&rcu_sched_state, | ||
3743 | this_cpu_ptr(&rcu_sched_data), true); | ||
3744 | return; | ||
3745 | } | ||
3746 | __this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true); | ||
3747 | resched_cpu(smp_processor_id()); | ||
3748 | } | ||
3749 | |||
3750 | /* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */ | ||
3751 | static void sync_sched_exp_online_cleanup(int cpu) | ||
3752 | { | ||
3753 | struct rcu_data *rdp; | ||
3754 | int ret; | ||
3755 | struct rcu_node *rnp; | ||
3756 | struct rcu_state *rsp = &rcu_sched_state; | ||
3757 | |||
3758 | rdp = per_cpu_ptr(rsp->rda, cpu); | ||
3759 | rnp = rdp->mynode; | ||
3760 | if (!(READ_ONCE(rnp->expmask) & rdp->grpmask)) | ||
3761 | return; | ||
3762 | ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0); | ||
3763 | WARN_ON_ONCE(ret); | ||
3764 | } | ||
3765 | |||
3766 | /* | ||
3767 | * Select the nodes that the upcoming expedited grace period needs | ||
3768 | * to wait for. | ||
3769 | */ | ||
3770 | static void sync_rcu_exp_select_cpus(struct rcu_state *rsp, | ||
3771 | smp_call_func_t func) | ||
3772 | { | ||
3773 | int cpu; | ||
3774 | unsigned long flags; | ||
3775 | unsigned long mask; | ||
3776 | unsigned long mask_ofl_test; | ||
3777 | unsigned long mask_ofl_ipi; | ||
3778 | int ret; | ||
3779 | struct rcu_node *rnp; | ||
3780 | |||
3781 | sync_exp_reset_tree(rsp); | ||
3782 | rcu_for_each_leaf_node(rsp, rnp) { | ||
3783 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
3784 | |||
3785 | /* Each pass checks a CPU for identity, offline, and idle. */ | ||
3786 | mask_ofl_test = 0; | ||
3787 | for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) { | ||
3788 | struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); | ||
3789 | struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); | ||
3790 | |||
3791 | if (raw_smp_processor_id() == cpu || | ||
3792 | !(atomic_add_return(0, &rdtp->dynticks) & 0x1)) | ||
3793 | mask_ofl_test |= rdp->grpmask; | ||
3794 | } | ||
3795 | mask_ofl_ipi = rnp->expmask & ~mask_ofl_test; | ||
3796 | |||
3797 | /* | ||
3798 | * Need to wait for any blocked tasks as well. Note that | ||
3799 | * additional blocking tasks will also block the expedited | ||
3800 | * GP until such time as the ->expmask bits are cleared. | ||
3801 | */ | ||
3802 | if (rcu_preempt_has_tasks(rnp)) | ||
3803 | rnp->exp_tasks = rnp->blkd_tasks.next; | ||
3804 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3805 | |||
3806 | /* IPI the remaining CPUs for expedited quiescent state. */ | ||
3807 | mask = 1; | ||
3808 | for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) { | ||
3809 | if (!(mask_ofl_ipi & mask)) | ||
3810 | continue; | ||
3811 | retry_ipi: | ||
3812 | ret = smp_call_function_single(cpu, func, rsp, 0); | ||
3813 | if (!ret) { | ||
3814 | mask_ofl_ipi &= ~mask; | ||
3815 | continue; | ||
3816 | } | ||
3817 | /* Failed, raced with offline. */ | ||
3818 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
3819 | if (cpu_online(cpu) && | ||
3820 | (rnp->expmask & mask)) { | ||
3821 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3822 | schedule_timeout_uninterruptible(1); | ||
3823 | if (cpu_online(cpu) && | ||
3824 | (rnp->expmask & mask)) | ||
3825 | goto retry_ipi; | ||
3826 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
3827 | } | ||
3828 | if (!(rnp->expmask & mask)) | ||
3829 | mask_ofl_ipi &= ~mask; | ||
3830 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
3831 | } | ||
3832 | /* Report quiescent states for those that went offline. */ | ||
3833 | mask_ofl_test |= mask_ofl_ipi; | ||
3834 | if (mask_ofl_test) | ||
3835 | rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false); | ||
3836 | } | ||
3837 | } | ||
3838 | |||
3839 | static void synchronize_sched_expedited_wait(struct rcu_state *rsp) | ||
3840 | { | ||
3841 | int cpu; | ||
3842 | unsigned long jiffies_stall; | ||
3843 | unsigned long jiffies_start; | ||
3844 | unsigned long mask; | ||
3845 | int ndetected; | ||
3846 | struct rcu_node *rnp; | ||
3847 | struct rcu_node *rnp_root = rcu_get_root(rsp); | ||
3848 | int ret; | ||
3849 | |||
3850 | jiffies_stall = rcu_jiffies_till_stall_check(); | ||
3851 | jiffies_start = jiffies; | ||
3852 | |||
3853 | for (;;) { | ||
3854 | ret = swait_event_timeout( | ||
3855 | rsp->expedited_wq, | ||
3856 | sync_rcu_preempt_exp_done(rnp_root), | ||
3857 | jiffies_stall); | ||
3858 | if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root)) | ||
3859 | return; | ||
3860 | if (ret < 0) { | ||
3861 | /* Hit a signal, disable CPU stall warnings. */ | ||
3862 | swait_event(rsp->expedited_wq, | ||
3863 | sync_rcu_preempt_exp_done(rnp_root)); | ||
3864 | return; | ||
3865 | } | ||
3866 | pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", | ||
3867 | rsp->name); | ||
3868 | ndetected = 0; | ||
3869 | rcu_for_each_leaf_node(rsp, rnp) { | ||
3870 | ndetected += rcu_print_task_exp_stall(rnp); | ||
3871 | mask = 1; | ||
3872 | for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) { | ||
3873 | struct rcu_data *rdp; | ||
3874 | |||
3875 | if (!(rnp->expmask & mask)) | ||
3876 | continue; | ||
3877 | ndetected++; | ||
3878 | rdp = per_cpu_ptr(rsp->rda, cpu); | ||
3879 | pr_cont(" %d-%c%c%c", cpu, | ||
3880 | "O."[!!cpu_online(cpu)], | ||
3881 | "o."[!!(rdp->grpmask & rnp->expmaskinit)], | ||
3882 | "N."[!!(rdp->grpmask & rnp->expmaskinitnext)]); | ||
3883 | } | ||
3884 | mask <<= 1; | ||
3885 | } | ||
3886 | pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", | ||
3887 | jiffies - jiffies_start, rsp->expedited_sequence, | ||
3888 | rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]); | ||
3889 | if (ndetected) { | ||
3890 | pr_err("blocking rcu_node structures:"); | ||
3891 | rcu_for_each_node_breadth_first(rsp, rnp) { | ||
3892 | if (rnp == rnp_root) | ||
3893 | continue; /* printed unconditionally */ | ||
3894 | if (sync_rcu_preempt_exp_done(rnp)) | ||
3895 | continue; | ||
3896 | pr_cont(" l=%u:%d-%d:%#lx/%c", | ||
3897 | rnp->level, rnp->grplo, rnp->grphi, | ||
3898 | rnp->expmask, | ||
3899 | ".T"[!!rnp->exp_tasks]); | ||
3900 | } | ||
3901 | pr_cont("\n"); | ||
3902 | } | ||
3903 | rcu_for_each_leaf_node(rsp, rnp) { | ||
3904 | mask = 1; | ||
3905 | for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) { | ||
3906 | if (!(rnp->expmask & mask)) | ||
3907 | continue; | ||
3908 | dump_cpu_task(cpu); | ||
3909 | } | ||
3910 | } | ||
3911 | jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3; | ||
3912 | } | ||
3913 | } | ||
3914 | |||
3915 | /* | ||
3916 | * Wait for the current expedited grace period to complete, and then | ||
3917 | * wake up everyone who piggybacked on the just-completed expedited | ||
3918 | * grace period. Also update all the ->exp_seq_rq counters as needed | ||
3919 | * in order to avoid counter-wrap problems. | ||
3920 | */ | ||
3921 | static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) | ||
3922 | { | ||
3923 | struct rcu_node *rnp; | ||
3924 | |||
3925 | synchronize_sched_expedited_wait(rsp); | ||
3926 | rcu_exp_gp_seq_end(rsp); | ||
3927 | trace_rcu_exp_grace_period(rsp->name, s, TPS("end")); | ||
3928 | |||
3929 | /* | ||
3930 | * Switch over to wakeup mode, allowing the next GP, but -only- the | ||
3931 | * next GP, to proceed. | ||
3932 | */ | ||
3933 | mutex_lock(&rsp->exp_wake_mutex); | ||
3934 | mutex_unlock(&rsp->exp_mutex); | ||
3935 | |||
3936 | rcu_for_each_node_breadth_first(rsp, rnp) { | ||
3937 | if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) { | ||
3938 | spin_lock(&rnp->exp_lock); | ||
3939 | /* Recheck, avoid hang in case someone just arrived. */ | ||
3940 | if (ULONG_CMP_LT(rnp->exp_seq_rq, s)) | ||
3941 | rnp->exp_seq_rq = s; | ||
3942 | spin_unlock(&rnp->exp_lock); | ||
3943 | } | ||
3944 | wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]); | ||
3945 | } | ||
3946 | trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake")); | ||
3947 | mutex_unlock(&rsp->exp_wake_mutex); | ||
3948 | } | ||
3949 | |||
3950 | /** | ||
3951 | * synchronize_sched_expedited - Brute-force RCU-sched grace period | ||
3952 | * | ||
3953 | * Wait for an RCU-sched grace period to elapse, but use a "big hammer" | ||
3954 | * approach to force the grace period to end quickly. This consumes | ||
3955 | * significant time on all CPUs and is unfriendly to real-time workloads, | ||
3956 | * so is thus not recommended for any sort of common-case code. In fact, | ||
3957 | * if you are using synchronize_sched_expedited() in a loop, please | ||
3958 | * restructure your code to batch your updates, and then use a single | ||
3959 | * synchronize_sched() instead. | ||
3960 | * | ||
3961 | * This implementation can be thought of as an application of sequence | ||
3962 | * locking to expedited grace periods, but using the sequence counter to | ||
3963 | * determine when someone else has already done the work instead of for | ||
3964 | * retrying readers. | ||
3965 | */ | ||
3966 | void synchronize_sched_expedited(void) | ||
3967 | { | ||
3968 | unsigned long s; | ||
3969 | struct rcu_state *rsp = &rcu_sched_state; | ||
3970 | |||
3971 | /* If only one CPU, this is automatically a grace period. */ | ||
3972 | if (rcu_blocking_is_gp()) | ||
3973 | return; | ||
3974 | |||
3975 | /* If expedited grace periods are prohibited, fall back to normal. */ | ||
3976 | if (rcu_gp_is_normal()) { | ||
3977 | wait_rcu_gp(call_rcu_sched); | ||
3978 | return; | ||
3979 | } | ||
3980 | |||
3981 | /* Take a snapshot of the sequence number. */ | ||
3982 | s = rcu_exp_gp_seq_snap(rsp); | ||
3983 | if (exp_funnel_lock(rsp, s)) | ||
3984 | return; /* Someone else did our work for us. */ | ||
3985 | |||
3986 | /* Initialize the rcu_node tree in preparation for the wait. */ | ||
3987 | sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler); | ||
3988 | |||
3989 | /* Wait and clean up, including waking everyone. */ | ||
3990 | rcu_exp_wait_wake(rsp, s); | ||
3991 | } | ||
3992 | EXPORT_SYMBOL_GPL(synchronize_sched_expedited); | ||
3993 | |||
3994 | /* | 3460 | /* |
3995 | * Check to see if there is any immediate RCU-related work to be done | 3461 | * Check to see if there is any immediate RCU-related work to be done |
3996 | * by the current CPU, for the specified type of RCU, returning 1 if so. | 3462 | * by the current CPU, for the specified type of RCU, returning 1 if so. |
@@ -4281,7 +3747,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp) | |||
4281 | 3747 | ||
4282 | /* Set up local state, ensuring consistent view of global state. */ | 3748 | /* Set up local state, ensuring consistent view of global state. */ |
4283 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | 3749 | raw_spin_lock_irqsave_rcu_node(rnp, flags); |
4284 | rdp->grpmask = 1UL << (cpu - rdp->mynode->grplo); | 3750 | rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu); |
4285 | rdp->dynticks = &per_cpu(rcu_dynticks, cpu); | 3751 | rdp->dynticks = &per_cpu(rcu_dynticks, cpu); |
4286 | WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE); | 3752 | WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE); |
4287 | WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1); | 3753 | WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1); |
@@ -4364,9 +3830,6 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp) | |||
4364 | struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); | 3830 | struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); |
4365 | struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ | 3831 | struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ |
4366 | 3832 | ||
4367 | if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) | ||
4368 | return; | ||
4369 | |||
4370 | /* Remove outgoing CPU from mask in the leaf rcu_node structure. */ | 3833 | /* Remove outgoing CPU from mask in the leaf rcu_node structure. */ |
4371 | mask = rdp->grpmask; | 3834 | mask = rdp->grpmask; |
4372 | raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */ | 3835 | raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */ |
@@ -4751,4 +4214,5 @@ void __init rcu_init(void) | |||
4751 | rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu); | 4214 | rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu); |
4752 | } | 4215 | } |
4753 | 4216 | ||
4217 | #include "tree_exp.h" | ||
4754 | #include "tree_plugin.h" | 4218 | #include "tree_plugin.h" |
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index e3959f5e6ddf..f714f873bf9d 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h | |||
@@ -254,6 +254,13 @@ struct rcu_node { | |||
254 | } ____cacheline_internodealigned_in_smp; | 254 | } ____cacheline_internodealigned_in_smp; |
255 | 255 | ||
256 | /* | 256 | /* |
257 | * Bitmasks in an rcu_node cover the interval [grplo, grphi] of CPU IDs, and | ||
258 | * are indexed relative to this interval rather than the global CPU ID space. | ||
259 | * This generates the bit for a CPU in node-local masks. | ||
260 | */ | ||
261 | #define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo)) | ||
262 | |||
263 | /* | ||
257 | * Do a full breadth-first scan of the rcu_node structures for the | 264 | * Do a full breadth-first scan of the rcu_node structures for the |
258 | * specified rcu_state structure. | 265 | * specified rcu_state structure. |
259 | */ | 266 | */ |
@@ -281,6 +288,14 @@ struct rcu_node { | |||
281 | (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++) | 288 | (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++) |
282 | 289 | ||
283 | /* | 290 | /* |
291 | * Iterate over all possible CPUs in a leaf RCU node. | ||
292 | */ | ||
293 | #define for_each_leaf_node_possible_cpu(rnp, cpu) \ | ||
294 | for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \ | ||
295 | cpu <= rnp->grphi; \ | ||
296 | cpu = cpumask_next((cpu), cpu_possible_mask)) | ||
297 | |||
298 | /* | ||
284 | * Union to allow "aggregate OR" operation on the need for a quiescent | 299 | * Union to allow "aggregate OR" operation on the need for a quiescent |
285 | * state by the normal and expedited grace periods. | 300 | * state by the normal and expedited grace periods. |
286 | */ | 301 | */ |
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h new file mode 100644 index 000000000000..d400434af6b2 --- /dev/null +++ b/kernel/rcu/tree_exp.h | |||
@@ -0,0 +1,656 @@ | |||
1 | /* | ||
2 | * RCU expedited grace periods | ||
3 | * | ||
4 | * This program is free software; you can redistribute it and/or modify | ||
5 | * it under the terms of the GNU General Public License as published by | ||
6 | * the Free Software Foundation; either version 2 of the License, or | ||
7 | * (at your option) any later version. | ||
8 | * | ||
9 | * This program is distributed in the hope that it will be useful, | ||
10 | * but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
11 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
12 | * GNU General Public License for more details. | ||
13 | * | ||
14 | * You should have received a copy of the GNU General Public License | ||
15 | * along with this program; if not, you can access it online at | ||
16 | * http://www.gnu.org/licenses/gpl-2.0.html. | ||
17 | * | ||
18 | * Copyright IBM Corporation, 2016 | ||
19 | * | ||
20 | * Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com> | ||
21 | */ | ||
22 | |||
23 | /* Wrapper functions for expedited grace periods. */ | ||
24 | static void rcu_exp_gp_seq_start(struct rcu_state *rsp) | ||
25 | { | ||
26 | rcu_seq_start(&rsp->expedited_sequence); | ||
27 | } | ||
28 | static void rcu_exp_gp_seq_end(struct rcu_state *rsp) | ||
29 | { | ||
30 | rcu_seq_end(&rsp->expedited_sequence); | ||
31 | smp_mb(); /* Ensure that consecutive grace periods serialize. */ | ||
32 | } | ||
33 | static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp) | ||
34 | { | ||
35 | unsigned long s; | ||
36 | |||
37 | smp_mb(); /* Caller's modifications seen first by other CPUs. */ | ||
38 | s = rcu_seq_snap(&rsp->expedited_sequence); | ||
39 | trace_rcu_exp_grace_period(rsp->name, s, TPS("snap")); | ||
40 | return s; | ||
41 | } | ||
42 | static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s) | ||
43 | { | ||
44 | return rcu_seq_done(&rsp->expedited_sequence, s); | ||
45 | } | ||
46 | |||
47 | /* | ||
48 | * Reset the ->expmaskinit values in the rcu_node tree to reflect any | ||
49 | * recent CPU-online activity. Note that these masks are not cleared | ||
50 | * when CPUs go offline, so they reflect the union of all CPUs that have | ||
51 | * ever been online. This means that this function normally takes its | ||
52 | * no-work-to-do fastpath. | ||
53 | */ | ||
54 | static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp) | ||
55 | { | ||
56 | bool done; | ||
57 | unsigned long flags; | ||
58 | unsigned long mask; | ||
59 | unsigned long oldmask; | ||
60 | int ncpus = READ_ONCE(rsp->ncpus); | ||
61 | struct rcu_node *rnp; | ||
62 | struct rcu_node *rnp_up; | ||
63 | |||
64 | /* If no new CPUs onlined since last time, nothing to do. */ | ||
65 | if (likely(ncpus == rsp->ncpus_snap)) | ||
66 | return; | ||
67 | rsp->ncpus_snap = ncpus; | ||
68 | |||
69 | /* | ||
70 | * Each pass through the following loop propagates newly onlined | ||
71 | * CPUs for the current rcu_node structure up the rcu_node tree. | ||
72 | */ | ||
73 | rcu_for_each_leaf_node(rsp, rnp) { | ||
74 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
75 | if (rnp->expmaskinit == rnp->expmaskinitnext) { | ||
76 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
77 | continue; /* No new CPUs, nothing to do. */ | ||
78 | } | ||
79 | |||
80 | /* Update this node's mask, track old value for propagation. */ | ||
81 | oldmask = rnp->expmaskinit; | ||
82 | rnp->expmaskinit = rnp->expmaskinitnext; | ||
83 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
84 | |||
85 | /* If was already nonzero, nothing to propagate. */ | ||
86 | if (oldmask) | ||
87 | continue; | ||
88 | |||
89 | /* Propagate the new CPU up the tree. */ | ||
90 | mask = rnp->grpmask; | ||
91 | rnp_up = rnp->parent; | ||
92 | done = false; | ||
93 | while (rnp_up) { | ||
94 | raw_spin_lock_irqsave_rcu_node(rnp_up, flags); | ||
95 | if (rnp_up->expmaskinit) | ||
96 | done = true; | ||
97 | rnp_up->expmaskinit |= mask; | ||
98 | raw_spin_unlock_irqrestore_rcu_node(rnp_up, flags); | ||
99 | if (done) | ||
100 | break; | ||
101 | mask = rnp_up->grpmask; | ||
102 | rnp_up = rnp_up->parent; | ||
103 | } | ||
104 | } | ||
105 | } | ||
106 | |||
107 | /* | ||
108 | * Reset the ->expmask values in the rcu_node tree in preparation for | ||
109 | * a new expedited grace period. | ||
110 | */ | ||
111 | static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp) | ||
112 | { | ||
113 | unsigned long flags; | ||
114 | struct rcu_node *rnp; | ||
115 | |||
116 | sync_exp_reset_tree_hotplug(rsp); | ||
117 | rcu_for_each_node_breadth_first(rsp, rnp) { | ||
118 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
119 | WARN_ON_ONCE(rnp->expmask); | ||
120 | rnp->expmask = rnp->expmaskinit; | ||
121 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
122 | } | ||
123 | } | ||
124 | |||
125 | /* | ||
126 | * Return non-zero if there is no RCU expedited grace period in progress | ||
127 | * for the specified rcu_node structure, in other words, if all CPUs and | ||
128 | * tasks covered by the specified rcu_node structure have done their bit | ||
129 | * for the current expedited grace period. Works only for preemptible | ||
130 | * RCU -- other RCU implementation use other means. | ||
131 | * | ||
132 | * Caller must hold the rcu_state's exp_mutex. | ||
133 | */ | ||
134 | static int sync_rcu_preempt_exp_done(struct rcu_node *rnp) | ||
135 | { | ||
136 | return rnp->exp_tasks == NULL && | ||
137 | READ_ONCE(rnp->expmask) == 0; | ||
138 | } | ||
139 | |||
140 | /* | ||
141 | * Report the exit from RCU read-side critical section for the last task | ||
142 | * that queued itself during or before the current expedited preemptible-RCU | ||
143 | * grace period. This event is reported either to the rcu_node structure on | ||
144 | * which the task was queued or to one of that rcu_node structure's ancestors, | ||
145 | * recursively up the tree. (Calm down, calm down, we do the recursion | ||
146 | * iteratively!) | ||
147 | * | ||
148 | * Caller must hold the rcu_state's exp_mutex and the specified rcu_node | ||
149 | * structure's ->lock. | ||
150 | */ | ||
151 | static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp, | ||
152 | bool wake, unsigned long flags) | ||
153 | __releases(rnp->lock) | ||
154 | { | ||
155 | unsigned long mask; | ||
156 | |||
157 | for (;;) { | ||
158 | if (!sync_rcu_preempt_exp_done(rnp)) { | ||
159 | if (!rnp->expmask) | ||
160 | rcu_initiate_boost(rnp, flags); | ||
161 | else | ||
162 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
163 | break; | ||
164 | } | ||
165 | if (rnp->parent == NULL) { | ||
166 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
167 | if (wake) { | ||
168 | smp_mb(); /* EGP done before wake_up(). */ | ||
169 | swake_up(&rsp->expedited_wq); | ||
170 | } | ||
171 | break; | ||
172 | } | ||
173 | mask = rnp->grpmask; | ||
174 | raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled */ | ||
175 | rnp = rnp->parent; | ||
176 | raw_spin_lock_rcu_node(rnp); /* irqs already disabled */ | ||
177 | WARN_ON_ONCE(!(rnp->expmask & mask)); | ||
178 | rnp->expmask &= ~mask; | ||
179 | } | ||
180 | } | ||
181 | |||
182 | /* | ||
183 | * Report expedited quiescent state for specified node. This is a | ||
184 | * lock-acquisition wrapper function for __rcu_report_exp_rnp(). | ||
185 | * | ||
186 | * Caller must hold the rcu_state's exp_mutex. | ||
187 | */ | ||
188 | static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp, | ||
189 | struct rcu_node *rnp, bool wake) | ||
190 | { | ||
191 | unsigned long flags; | ||
192 | |||
193 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
194 | __rcu_report_exp_rnp(rsp, rnp, wake, flags); | ||
195 | } | ||
196 | |||
197 | /* | ||
198 | * Report expedited quiescent state for multiple CPUs, all covered by the | ||
199 | * specified leaf rcu_node structure. Caller must hold the rcu_state's | ||
200 | * exp_mutex. | ||
201 | */ | ||
202 | static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp, | ||
203 | unsigned long mask, bool wake) | ||
204 | { | ||
205 | unsigned long flags; | ||
206 | |||
207 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
208 | if (!(rnp->expmask & mask)) { | ||
209 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
210 | return; | ||
211 | } | ||
212 | rnp->expmask &= ~mask; | ||
213 | __rcu_report_exp_rnp(rsp, rnp, wake, flags); /* Releases rnp->lock. */ | ||
214 | } | ||
215 | |||
216 | /* | ||
217 | * Report expedited quiescent state for specified rcu_data (CPU). | ||
218 | */ | ||
219 | static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp, | ||
220 | bool wake) | ||
221 | { | ||
222 | rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake); | ||
223 | } | ||
224 | |||
225 | /* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */ | ||
226 | static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, | ||
227 | unsigned long s) | ||
228 | { | ||
229 | if (rcu_exp_gp_seq_done(rsp, s)) { | ||
230 | trace_rcu_exp_grace_period(rsp->name, s, TPS("done")); | ||
231 | /* Ensure test happens before caller kfree(). */ | ||
232 | smp_mb__before_atomic(); /* ^^^ */ | ||
233 | atomic_long_inc(stat); | ||
234 | return true; | ||
235 | } | ||
236 | return false; | ||
237 | } | ||
238 | |||
239 | /* | ||
240 | * Funnel-lock acquisition for expedited grace periods. Returns true | ||
241 | * if some other task completed an expedited grace period that this task | ||
242 | * can piggy-back on, and with no mutex held. Otherwise, returns false | ||
243 | * with the mutex held, indicating that the caller must actually do the | ||
244 | * expedited grace period. | ||
245 | */ | ||
246 | static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s) | ||
247 | { | ||
248 | struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id()); | ||
249 | struct rcu_node *rnp = rdp->mynode; | ||
250 | struct rcu_node *rnp_root = rcu_get_root(rsp); | ||
251 | |||
252 | /* Low-contention fastpath. */ | ||
253 | if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s) && | ||
254 | (rnp == rnp_root || | ||
255 | ULONG_CMP_LT(READ_ONCE(rnp_root->exp_seq_rq), s)) && | ||
256 | !mutex_is_locked(&rsp->exp_mutex) && | ||
257 | mutex_trylock(&rsp->exp_mutex)) | ||
258 | goto fastpath; | ||
259 | |||
260 | /* | ||
261 | * Each pass through the following loop works its way up | ||
262 | * the rcu_node tree, returning if others have done the work or | ||
263 | * otherwise falls through to acquire rsp->exp_mutex. The mapping | ||
264 | * from CPU to rcu_node structure can be inexact, as it is just | ||
265 | * promoting locality and is not strictly needed for correctness. | ||
266 | */ | ||
267 | for (; rnp != NULL; rnp = rnp->parent) { | ||
268 | if (sync_exp_work_done(rsp, &rdp->exp_workdone1, s)) | ||
269 | return true; | ||
270 | |||
271 | /* Work not done, either wait here or go up. */ | ||
272 | spin_lock(&rnp->exp_lock); | ||
273 | if (ULONG_CMP_GE(rnp->exp_seq_rq, s)) { | ||
274 | |||
275 | /* Someone else doing GP, so wait for them. */ | ||
276 | spin_unlock(&rnp->exp_lock); | ||
277 | trace_rcu_exp_funnel_lock(rsp->name, rnp->level, | ||
278 | rnp->grplo, rnp->grphi, | ||
279 | TPS("wait")); | ||
280 | wait_event(rnp->exp_wq[(s >> 1) & 0x3], | ||
281 | sync_exp_work_done(rsp, | ||
282 | &rdp->exp_workdone2, s)); | ||
283 | return true; | ||
284 | } | ||
285 | rnp->exp_seq_rq = s; /* Followers can wait on us. */ | ||
286 | spin_unlock(&rnp->exp_lock); | ||
287 | trace_rcu_exp_funnel_lock(rsp->name, rnp->level, rnp->grplo, | ||
288 | rnp->grphi, TPS("nxtlvl")); | ||
289 | } | ||
290 | mutex_lock(&rsp->exp_mutex); | ||
291 | fastpath: | ||
292 | if (sync_exp_work_done(rsp, &rdp->exp_workdone3, s)) { | ||
293 | mutex_unlock(&rsp->exp_mutex); | ||
294 | return true; | ||
295 | } | ||
296 | rcu_exp_gp_seq_start(rsp); | ||
297 | trace_rcu_exp_grace_period(rsp->name, s, TPS("start")); | ||
298 | return false; | ||
299 | } | ||
300 | |||
301 | /* Invoked on each online non-idle CPU for expedited quiescent state. */ | ||
302 | static void sync_sched_exp_handler(void *data) | ||
303 | { | ||
304 | struct rcu_data *rdp; | ||
305 | struct rcu_node *rnp; | ||
306 | struct rcu_state *rsp = data; | ||
307 | |||
308 | rdp = this_cpu_ptr(rsp->rda); | ||
309 | rnp = rdp->mynode; | ||
310 | if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || | ||
311 | __this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)) | ||
312 | return; | ||
313 | if (rcu_is_cpu_rrupt_from_idle()) { | ||
314 | rcu_report_exp_rdp(&rcu_sched_state, | ||
315 | this_cpu_ptr(&rcu_sched_data), true); | ||
316 | return; | ||
317 | } | ||
318 | __this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true); | ||
319 | resched_cpu(smp_processor_id()); | ||
320 | } | ||
321 | |||
322 | /* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */ | ||
323 | static void sync_sched_exp_online_cleanup(int cpu) | ||
324 | { | ||
325 | struct rcu_data *rdp; | ||
326 | int ret; | ||
327 | struct rcu_node *rnp; | ||
328 | struct rcu_state *rsp = &rcu_sched_state; | ||
329 | |||
330 | rdp = per_cpu_ptr(rsp->rda, cpu); | ||
331 | rnp = rdp->mynode; | ||
332 | if (!(READ_ONCE(rnp->expmask) & rdp->grpmask)) | ||
333 | return; | ||
334 | ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0); | ||
335 | WARN_ON_ONCE(ret); | ||
336 | } | ||
337 | |||
338 | /* | ||
339 | * Select the nodes that the upcoming expedited grace period needs | ||
340 | * to wait for. | ||
341 | */ | ||
342 | static void sync_rcu_exp_select_cpus(struct rcu_state *rsp, | ||
343 | smp_call_func_t func) | ||
344 | { | ||
345 | int cpu; | ||
346 | unsigned long flags; | ||
347 | unsigned long mask_ofl_test; | ||
348 | unsigned long mask_ofl_ipi; | ||
349 | int ret; | ||
350 | struct rcu_node *rnp; | ||
351 | |||
352 | sync_exp_reset_tree(rsp); | ||
353 | rcu_for_each_leaf_node(rsp, rnp) { | ||
354 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
355 | |||
356 | /* Each pass checks a CPU for identity, offline, and idle. */ | ||
357 | mask_ofl_test = 0; | ||
358 | for_each_leaf_node_possible_cpu(rnp, cpu) { | ||
359 | struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); | ||
360 | struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); | ||
361 | |||
362 | if (raw_smp_processor_id() == cpu || | ||
363 | !(atomic_add_return(0, &rdtp->dynticks) & 0x1)) | ||
364 | mask_ofl_test |= rdp->grpmask; | ||
365 | } | ||
366 | mask_ofl_ipi = rnp->expmask & ~mask_ofl_test; | ||
367 | |||
368 | /* | ||
369 | * Need to wait for any blocked tasks as well. Note that | ||
370 | * additional blocking tasks will also block the expedited | ||
371 | * GP until such time as the ->expmask bits are cleared. | ||
372 | */ | ||
373 | if (rcu_preempt_has_tasks(rnp)) | ||
374 | rnp->exp_tasks = rnp->blkd_tasks.next; | ||
375 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
376 | |||
377 | /* IPI the remaining CPUs for expedited quiescent state. */ | ||
378 | for_each_leaf_node_possible_cpu(rnp, cpu) { | ||
379 | unsigned long mask = leaf_node_cpu_bit(rnp, cpu); | ||
380 | if (!(mask_ofl_ipi & mask)) | ||
381 | continue; | ||
382 | retry_ipi: | ||
383 | ret = smp_call_function_single(cpu, func, rsp, 0); | ||
384 | if (!ret) { | ||
385 | mask_ofl_ipi &= ~mask; | ||
386 | continue; | ||
387 | } | ||
388 | /* Failed, raced with offline. */ | ||
389 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
390 | if (cpu_online(cpu) && | ||
391 | (rnp->expmask & mask)) { | ||
392 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
393 | schedule_timeout_uninterruptible(1); | ||
394 | if (cpu_online(cpu) && | ||
395 | (rnp->expmask & mask)) | ||
396 | goto retry_ipi; | ||
397 | raw_spin_lock_irqsave_rcu_node(rnp, flags); | ||
398 | } | ||
399 | if (!(rnp->expmask & mask)) | ||
400 | mask_ofl_ipi &= ~mask; | ||
401 | raw_spin_unlock_irqrestore_rcu_node(rnp, flags); | ||
402 | } | ||
403 | /* Report quiescent states for those that went offline. */ | ||
404 | mask_ofl_test |= mask_ofl_ipi; | ||
405 | if (mask_ofl_test) | ||
406 | rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false); | ||
407 | } | ||
408 | } | ||
409 | |||
410 | static void synchronize_sched_expedited_wait(struct rcu_state *rsp) | ||
411 | { | ||
412 | int cpu; | ||
413 | unsigned long jiffies_stall; | ||
414 | unsigned long jiffies_start; | ||
415 | unsigned long mask; | ||
416 | int ndetected; | ||
417 | struct rcu_node *rnp; | ||
418 | struct rcu_node *rnp_root = rcu_get_root(rsp); | ||
419 | int ret; | ||
420 | |||
421 | jiffies_stall = rcu_jiffies_till_stall_check(); | ||
422 | jiffies_start = jiffies; | ||
423 | |||
424 | for (;;) { | ||
425 | ret = swait_event_timeout( | ||
426 | rsp->expedited_wq, | ||
427 | sync_rcu_preempt_exp_done(rnp_root), | ||
428 | jiffies_stall); | ||
429 | if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root)) | ||
430 | return; | ||
431 | if (ret < 0) { | ||
432 | /* Hit a signal, disable CPU stall warnings. */ | ||
433 | swait_event(rsp->expedited_wq, | ||
434 | sync_rcu_preempt_exp_done(rnp_root)); | ||
435 | return; | ||
436 | } | ||
437 | pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", | ||
438 | rsp->name); | ||
439 | ndetected = 0; | ||
440 | rcu_for_each_leaf_node(rsp, rnp) { | ||
441 | ndetected += rcu_print_task_exp_stall(rnp); | ||
442 | for_each_leaf_node_possible_cpu(rnp, cpu) { | ||
443 | struct rcu_data *rdp; | ||
444 | |||
445 | mask = leaf_node_cpu_bit(rnp, cpu); | ||
446 | if (!(rnp->expmask & mask)) | ||
447 | continue; | ||
448 | ndetected++; | ||
449 | rdp = per_cpu_ptr(rsp->rda, cpu); | ||
450 | pr_cont(" %d-%c%c%c", cpu, | ||
451 | "O."[!!cpu_online(cpu)], | ||
452 | "o."[!!(rdp->grpmask & rnp->expmaskinit)], | ||
453 | "N."[!!(rdp->grpmask & rnp->expmaskinitnext)]); | ||
454 | } | ||
455 | } | ||
456 | pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", | ||
457 | jiffies - jiffies_start, rsp->expedited_sequence, | ||
458 | rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]); | ||
459 | if (ndetected) { | ||
460 | pr_err("blocking rcu_node structures:"); | ||
461 | rcu_for_each_node_breadth_first(rsp, rnp) { | ||
462 | if (rnp == rnp_root) | ||
463 | continue; /* printed unconditionally */ | ||
464 | if (sync_rcu_preempt_exp_done(rnp)) | ||
465 | continue; | ||
466 | pr_cont(" l=%u:%d-%d:%#lx/%c", | ||
467 | rnp->level, rnp->grplo, rnp->grphi, | ||
468 | rnp->expmask, | ||
469 | ".T"[!!rnp->exp_tasks]); | ||
470 | } | ||
471 | pr_cont("\n"); | ||
472 | } | ||
473 | rcu_for_each_leaf_node(rsp, rnp) { | ||
474 | for_each_leaf_node_possible_cpu(rnp, cpu) { | ||
475 | mask = leaf_node_cpu_bit(rnp, cpu); | ||
476 | if (!(rnp->expmask & mask)) | ||
477 | continue; | ||
478 | dump_cpu_task(cpu); | ||
479 | } | ||
480 | } | ||
481 | jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3; | ||
482 | } | ||
483 | } | ||
484 | |||
485 | /* | ||
486 | * Wait for the current expedited grace period to complete, and then | ||
487 | * wake up everyone who piggybacked on the just-completed expedited | ||
488 | * grace period. Also update all the ->exp_seq_rq counters as needed | ||
489 | * in order to avoid counter-wrap problems. | ||
490 | */ | ||
491 | static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) | ||
492 | { | ||
493 | struct rcu_node *rnp; | ||
494 | |||
495 | synchronize_sched_expedited_wait(rsp); | ||
496 | rcu_exp_gp_seq_end(rsp); | ||
497 | trace_rcu_exp_grace_period(rsp->name, s, TPS("end")); | ||
498 | |||
499 | /* | ||
500 | * Switch over to wakeup mode, allowing the next GP, but -only- the | ||
501 | * next GP, to proceed. | ||
502 | */ | ||
503 | mutex_lock(&rsp->exp_wake_mutex); | ||
504 | mutex_unlock(&rsp->exp_mutex); | ||
505 | |||
506 | rcu_for_each_node_breadth_first(rsp, rnp) { | ||
507 | if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) { | ||
508 | spin_lock(&rnp->exp_lock); | ||
509 | /* Recheck, avoid hang in case someone just arrived. */ | ||
510 | if (ULONG_CMP_LT(rnp->exp_seq_rq, s)) | ||
511 | rnp->exp_seq_rq = s; | ||
512 | spin_unlock(&rnp->exp_lock); | ||
513 | } | ||
514 | wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]); | ||
515 | } | ||
516 | trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake")); | ||
517 | mutex_unlock(&rsp->exp_wake_mutex); | ||
518 | } | ||
519 | |||
520 | /** | ||
521 | * synchronize_sched_expedited - Brute-force RCU-sched grace period | ||
522 | * | ||
523 | * Wait for an RCU-sched grace period to elapse, but use a "big hammer" | ||
524 | * approach to force the grace period to end quickly. This consumes | ||
525 | * significant time on all CPUs and is unfriendly to real-time workloads, | ||
526 | * so is thus not recommended for any sort of common-case code. In fact, | ||
527 | * if you are using synchronize_sched_expedited() in a loop, please | ||
528 | * restructure your code to batch your updates, and then use a single | ||
529 | * synchronize_sched() instead. | ||
530 | * | ||
531 | * This implementation can be thought of as an application of sequence | ||
532 | * locking to expedited grace periods, but using the sequence counter to | ||
533 | * determine when someone else has already done the work instead of for | ||
534 | * retrying readers. | ||
535 | */ | ||
536 | void synchronize_sched_expedited(void) | ||
537 | { | ||
538 | unsigned long s; | ||
539 | struct rcu_state *rsp = &rcu_sched_state; | ||
540 | |||
541 | /* If only one CPU, this is automatically a grace period. */ | ||
542 | if (rcu_blocking_is_gp()) | ||
543 | return; | ||
544 | |||
545 | /* If expedited grace periods are prohibited, fall back to normal. */ | ||
546 | if (rcu_gp_is_normal()) { | ||
547 | wait_rcu_gp(call_rcu_sched); | ||
548 | return; | ||
549 | } | ||
550 | |||
551 | /* Take a snapshot of the sequence number. */ | ||
552 | s = rcu_exp_gp_seq_snap(rsp); | ||
553 | if (exp_funnel_lock(rsp, s)) | ||
554 | return; /* Someone else did our work for us. */ | ||
555 | |||
556 | /* Initialize the rcu_node tree in preparation for the wait. */ | ||
557 | sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler); | ||
558 | |||
559 | /* Wait and clean up, including waking everyone. */ | ||
560 | rcu_exp_wait_wake(rsp, s); | ||
561 | } | ||
562 | EXPORT_SYMBOL_GPL(synchronize_sched_expedited); | ||
563 | |||
564 | #ifdef CONFIG_PREEMPT_RCU | ||
565 | |||
566 | /* | ||
567 | * Remote handler for smp_call_function_single(). If there is an | ||
568 | * RCU read-side critical section in effect, request that the | ||
569 | * next rcu_read_unlock() record the quiescent state up the | ||
570 | * ->expmask fields in the rcu_node tree. Otherwise, immediately | ||
571 | * report the quiescent state. | ||
572 | */ | ||
573 | static void sync_rcu_exp_handler(void *info) | ||
574 | { | ||
575 | struct rcu_data *rdp; | ||
576 | struct rcu_state *rsp = info; | ||
577 | struct task_struct *t = current; | ||
578 | |||
579 | /* | ||
580 | * Within an RCU read-side critical section, request that the next | ||
581 | * rcu_read_unlock() report. Unless this RCU read-side critical | ||
582 | * section has already blocked, in which case it is already set | ||
583 | * up for the expedited grace period to wait on it. | ||
584 | */ | ||
585 | if (t->rcu_read_lock_nesting > 0 && | ||
586 | !t->rcu_read_unlock_special.b.blocked) { | ||
587 | t->rcu_read_unlock_special.b.exp_need_qs = true; | ||
588 | return; | ||
589 | } | ||
590 | |||
591 | /* | ||
592 | * We are either exiting an RCU read-side critical section (negative | ||
593 | * values of t->rcu_read_lock_nesting) or are not in one at all | ||
594 | * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU | ||
595 | * read-side critical section that blocked before this expedited | ||
596 | * grace period started. Either way, we can immediately report | ||
597 | * the quiescent state. | ||
598 | */ | ||
599 | rdp = this_cpu_ptr(rsp->rda); | ||
600 | rcu_report_exp_rdp(rsp, rdp, true); | ||
601 | } | ||
602 | |||
603 | /** | ||
604 | * synchronize_rcu_expedited - Brute-force RCU grace period | ||
605 | * | ||
606 | * Wait for an RCU-preempt grace period, but expedite it. The basic | ||
607 | * idea is to IPI all non-idle non-nohz online CPUs. The IPI handler | ||
608 | * checks whether the CPU is in an RCU-preempt critical section, and | ||
609 | * if so, it sets a flag that causes the outermost rcu_read_unlock() | ||
610 | * to report the quiescent state. On the other hand, if the CPU is | ||
611 | * not in an RCU read-side critical section, the IPI handler reports | ||
612 | * the quiescent state immediately. | ||
613 | * | ||
614 | * Although this is a greate improvement over previous expedited | ||
615 | * implementations, it is still unfriendly to real-time workloads, so is | ||
616 | * thus not recommended for any sort of common-case code. In fact, if | ||
617 | * you are using synchronize_rcu_expedited() in a loop, please restructure | ||
618 | * your code to batch your updates, and then Use a single synchronize_rcu() | ||
619 | * instead. | ||
620 | */ | ||
621 | void synchronize_rcu_expedited(void) | ||
622 | { | ||
623 | struct rcu_state *rsp = rcu_state_p; | ||
624 | unsigned long s; | ||
625 | |||
626 | /* If expedited grace periods are prohibited, fall back to normal. */ | ||
627 | if (rcu_gp_is_normal()) { | ||
628 | wait_rcu_gp(call_rcu); | ||
629 | return; | ||
630 | } | ||
631 | |||
632 | s = rcu_exp_gp_seq_snap(rsp); | ||
633 | if (exp_funnel_lock(rsp, s)) | ||
634 | return; /* Someone else did our work for us. */ | ||
635 | |||
636 | /* Initialize the rcu_node tree in preparation for the wait. */ | ||
637 | sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler); | ||
638 | |||
639 | /* Wait for ->blkd_tasks lists to drain, then wake everyone up. */ | ||
640 | rcu_exp_wait_wake(rsp, s); | ||
641 | } | ||
642 | EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); | ||
643 | |||
644 | #else /* #ifdef CONFIG_PREEMPT_RCU */ | ||
645 | |||
646 | /* | ||
647 | * Wait for an rcu-preempt grace period, but make it happen quickly. | ||
648 | * But because preemptible RCU does not exist, map to rcu-sched. | ||
649 | */ | ||
650 | void synchronize_rcu_expedited(void) | ||
651 | { | ||
652 | synchronize_sched_expedited(); | ||
653 | } | ||
654 | EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); | ||
655 | |||
656 | #endif /* #else #ifdef CONFIG_PREEMPT_RCU */ | ||
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index ff1cd4e1188d..0082fce402a0 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h | |||
@@ -79,8 +79,6 @@ static void __init rcu_bootup_announce_oddness(void) | |||
79 | pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n"); | 79 | pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n"); |
80 | if (IS_ENABLED(CONFIG_PROVE_RCU)) | 80 | if (IS_ENABLED(CONFIG_PROVE_RCU)) |
81 | pr_info("\tRCU lockdep checking is enabled.\n"); | 81 | pr_info("\tRCU lockdep checking is enabled.\n"); |
82 | if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_RUNNABLE)) | ||
83 | pr_info("\tRCU torture testing starts during boot.\n"); | ||
84 | if (RCU_NUM_LVLS >= 4) | 82 | if (RCU_NUM_LVLS >= 4) |
85 | pr_info("\tFour(or more)-level hierarchy is enabled.\n"); | 83 | pr_info("\tFour(or more)-level hierarchy is enabled.\n"); |
86 | if (RCU_FANOUT_LEAF != 16) | 84 | if (RCU_FANOUT_LEAF != 16) |
@@ -681,84 +679,6 @@ void synchronize_rcu(void) | |||
681 | } | 679 | } |
682 | EXPORT_SYMBOL_GPL(synchronize_rcu); | 680 | EXPORT_SYMBOL_GPL(synchronize_rcu); |
683 | 681 | ||
684 | /* | ||
685 | * Remote handler for smp_call_function_single(). If there is an | ||
686 | * RCU read-side critical section in effect, request that the | ||
687 | * next rcu_read_unlock() record the quiescent state up the | ||
688 | * ->expmask fields in the rcu_node tree. Otherwise, immediately | ||
689 | * report the quiescent state. | ||
690 | */ | ||
691 | static void sync_rcu_exp_handler(void *info) | ||
692 | { | ||
693 | struct rcu_data *rdp; | ||
694 | struct rcu_state *rsp = info; | ||
695 | struct task_struct *t = current; | ||
696 | |||
697 | /* | ||
698 | * Within an RCU read-side critical section, request that the next | ||
699 | * rcu_read_unlock() report. Unless this RCU read-side critical | ||
700 | * section has already blocked, in which case it is already set | ||
701 | * up for the expedited grace period to wait on it. | ||
702 | */ | ||
703 | if (t->rcu_read_lock_nesting > 0 && | ||
704 | !t->rcu_read_unlock_special.b.blocked) { | ||
705 | t->rcu_read_unlock_special.b.exp_need_qs = true; | ||
706 | return; | ||
707 | } | ||
708 | |||
709 | /* | ||
710 | * We are either exiting an RCU read-side critical section (negative | ||
711 | * values of t->rcu_read_lock_nesting) or are not in one at all | ||
712 | * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU | ||
713 | * read-side critical section that blocked before this expedited | ||
714 | * grace period started. Either way, we can immediately report | ||
715 | * the quiescent state. | ||
716 | */ | ||
717 | rdp = this_cpu_ptr(rsp->rda); | ||
718 | rcu_report_exp_rdp(rsp, rdp, true); | ||
719 | } | ||
720 | |||
721 | /** | ||
722 | * synchronize_rcu_expedited - Brute-force RCU grace period | ||
723 | * | ||
724 | * Wait for an RCU-preempt grace period, but expedite it. The basic | ||
725 | * idea is to IPI all non-idle non-nohz online CPUs. The IPI handler | ||
726 | * checks whether the CPU is in an RCU-preempt critical section, and | ||
727 | * if so, it sets a flag that causes the outermost rcu_read_unlock() | ||
728 | * to report the quiescent state. On the other hand, if the CPU is | ||
729 | * not in an RCU read-side critical section, the IPI handler reports | ||
730 | * the quiescent state immediately. | ||
731 | * | ||
732 | * Although this is a greate improvement over previous expedited | ||
733 | * implementations, it is still unfriendly to real-time workloads, so is | ||
734 | * thus not recommended for any sort of common-case code. In fact, if | ||
735 | * you are using synchronize_rcu_expedited() in a loop, please restructure | ||
736 | * your code to batch your updates, and then Use a single synchronize_rcu() | ||
737 | * instead. | ||
738 | */ | ||
739 | void synchronize_rcu_expedited(void) | ||
740 | { | ||
741 | struct rcu_state *rsp = rcu_state_p; | ||
742 | unsigned long s; | ||
743 | |||
744 | /* If expedited grace periods are prohibited, fall back to normal. */ | ||
745 | if (rcu_gp_is_normal()) { | ||
746 | wait_rcu_gp(call_rcu); | ||
747 | return; | ||
748 | } | ||
749 | |||
750 | s = rcu_exp_gp_seq_snap(rsp); | ||
751 | if (exp_funnel_lock(rsp, s)) | ||
752 | return; /* Someone else did our work for us. */ | ||
753 | |||
754 | /* Initialize the rcu_node tree in preparation for the wait. */ | ||
755 | sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler); | ||
756 | |||
757 | /* Wait for ->blkd_tasks lists to drain, then wake everyone up. */ | ||
758 | rcu_exp_wait_wake(rsp, s); | ||
759 | } | ||
760 | EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); | ||
761 | |||
762 | /** | 682 | /** |
763 | * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete. | 683 | * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete. |
764 | * | 684 | * |
@@ -883,16 +803,6 @@ static void rcu_preempt_check_callbacks(void) | |||
883 | } | 803 | } |
884 | 804 | ||
885 | /* | 805 | /* |
886 | * Wait for an rcu-preempt grace period, but make it happen quickly. | ||
887 | * But because preemptible RCU does not exist, map to rcu-sched. | ||
888 | */ | ||
889 | void synchronize_rcu_expedited(void) | ||
890 | { | ||
891 | synchronize_sched_expedited(); | ||
892 | } | ||
893 | EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); | ||
894 | |||
895 | /* | ||
896 | * Because preemptible RCU does not exist, rcu_barrier() is just | 806 | * Because preemptible RCU does not exist, rcu_barrier() is just |
897 | * another name for rcu_barrier_sched(). | 807 | * another name for rcu_barrier_sched(). |
898 | */ | 808 | */ |
@@ -1254,8 +1164,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) | |||
1254 | return; | 1164 | return; |
1255 | if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) | 1165 | if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) |
1256 | return; | 1166 | return; |
1257 | for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) | 1167 | for_each_leaf_node_possible_cpu(rnp, cpu) |
1258 | if ((mask & 0x1) && cpu != outgoingcpu) | 1168 | if ((mask & leaf_node_cpu_bit(rnp, cpu)) && |
1169 | cpu != outgoingcpu) | ||
1259 | cpumask_set_cpu(cpu, cm); | 1170 | cpumask_set_cpu(cpu, cm); |
1260 | if (cpumask_weight(cm) == 0) | 1171 | if (cpumask_weight(cm) == 0) |
1261 | cpumask_setall(cm); | 1172 | cpumask_setall(cm); |
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 3e888cd5a594..f0d8322bc3ec 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c | |||
@@ -528,6 +528,7 @@ static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10; | |||
528 | module_param(rcu_task_stall_timeout, int, 0644); | 528 | module_param(rcu_task_stall_timeout, int, 0644); |
529 | 529 | ||
530 | static void rcu_spawn_tasks_kthread(void); | 530 | static void rcu_spawn_tasks_kthread(void); |
531 | static struct task_struct *rcu_tasks_kthread_ptr; | ||
531 | 532 | ||
532 | /* | 533 | /* |
533 | * Post an RCU-tasks callback. First call must be from process context | 534 | * Post an RCU-tasks callback. First call must be from process context |
@@ -537,6 +538,7 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func) | |||
537 | { | 538 | { |
538 | unsigned long flags; | 539 | unsigned long flags; |
539 | bool needwake; | 540 | bool needwake; |
541 | bool havetask = READ_ONCE(rcu_tasks_kthread_ptr); | ||
540 | 542 | ||
541 | rhp->next = NULL; | 543 | rhp->next = NULL; |
542 | rhp->func = func; | 544 | rhp->func = func; |
@@ -545,7 +547,9 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func) | |||
545 | *rcu_tasks_cbs_tail = rhp; | 547 | *rcu_tasks_cbs_tail = rhp; |
546 | rcu_tasks_cbs_tail = &rhp->next; | 548 | rcu_tasks_cbs_tail = &rhp->next; |
547 | raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); | 549 | raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); |
548 | if (needwake) { | 550 | /* We can't create the thread unless interrupts are enabled. */ |
551 | if ((needwake && havetask) || | ||
552 | (!havetask && !irqs_disabled_flags(flags))) { | ||
549 | rcu_spawn_tasks_kthread(); | 553 | rcu_spawn_tasks_kthread(); |
550 | wake_up(&rcu_tasks_cbs_wq); | 554 | wake_up(&rcu_tasks_cbs_wq); |
551 | } | 555 | } |
@@ -790,7 +794,6 @@ static int __noreturn rcu_tasks_kthread(void *arg) | |||
790 | static void rcu_spawn_tasks_kthread(void) | 794 | static void rcu_spawn_tasks_kthread(void) |
791 | { | 795 | { |
792 | static DEFINE_MUTEX(rcu_tasks_kthread_mutex); | 796 | static DEFINE_MUTEX(rcu_tasks_kthread_mutex); |
793 | static struct task_struct *rcu_tasks_kthread_ptr; | ||
794 | struct task_struct *t; | 797 | struct task_struct *t; |
795 | 798 | ||
796 | if (READ_ONCE(rcu_tasks_kthread_ptr)) { | 799 | if (READ_ONCE(rcu_tasks_kthread_ptr)) { |
diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 87b2fc38398b..35f0dcb1cb4f 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c | |||
@@ -1205,6 +1205,17 @@ static struct ctl_table kern_table[] = { | |||
1205 | .extra2 = &one, | 1205 | .extra2 = &one, |
1206 | }, | 1206 | }, |
1207 | #endif | 1207 | #endif |
1208 | #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) | ||
1209 | { | ||
1210 | .procname = "panic_on_rcu_stall", | ||
1211 | .data = &sysctl_panic_on_rcu_stall, | ||
1212 | .maxlen = sizeof(sysctl_panic_on_rcu_stall), | ||
1213 | .mode = 0644, | ||
1214 | .proc_handler = proc_dointvec_minmax, | ||
1215 | .extra1 = &zero, | ||
1216 | .extra2 = &one, | ||
1217 | }, | ||
1218 | #endif | ||
1208 | { } | 1219 | { } |
1209 | }; | 1220 | }; |
1210 | 1221 | ||
diff --git a/kernel/torture.c b/kernel/torture.c index fa0bdeee17ac..75961b3decfe 100644 --- a/kernel/torture.c +++ b/kernel/torture.c | |||
@@ -82,6 +82,104 @@ static int min_online = -1; | |||
82 | static int max_online; | 82 | static int max_online; |
83 | 83 | ||
84 | /* | 84 | /* |
85 | * Attempt to take a CPU offline. Return false if the CPU is already | ||
86 | * offline or if it is not subject to CPU-hotplug operations. The | ||
87 | * caller can detect other failures by looking at the statistics. | ||
88 | */ | ||
89 | bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes, | ||
90 | unsigned long *sum_offl, int *min_offl, int *max_offl) | ||
91 | { | ||
92 | unsigned long delta; | ||
93 | int ret; | ||
94 | unsigned long starttime; | ||
95 | |||
96 | if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu)) | ||
97 | return false; | ||
98 | |||
99 | if (verbose) | ||
100 | pr_alert("%s" TORTURE_FLAG | ||
101 | "torture_onoff task: offlining %d\n", | ||
102 | torture_type, cpu); | ||
103 | starttime = jiffies; | ||
104 | (*n_offl_attempts)++; | ||
105 | ret = cpu_down(cpu); | ||
106 | if (ret) { | ||
107 | if (verbose) | ||
108 | pr_alert("%s" TORTURE_FLAG | ||
109 | "torture_onoff task: offline %d failed: errno %d\n", | ||
110 | torture_type, cpu, ret); | ||
111 | } else { | ||
112 | if (verbose) | ||
113 | pr_alert("%s" TORTURE_FLAG | ||
114 | "torture_onoff task: offlined %d\n", | ||
115 | torture_type, cpu); | ||
116 | (*n_offl_successes)++; | ||
117 | delta = jiffies - starttime; | ||
118 | sum_offl += delta; | ||
119 | if (*min_offl < 0) { | ||
120 | *min_offl = delta; | ||
121 | *max_offl = delta; | ||
122 | } | ||
123 | if (*min_offl > delta) | ||
124 | *min_offl = delta; | ||
125 | if (*max_offl < delta) | ||
126 | *max_offl = delta; | ||
127 | } | ||
128 | |||
129 | return true; | ||
130 | } | ||
131 | EXPORT_SYMBOL_GPL(torture_offline); | ||
132 | |||
133 | /* | ||
134 | * Attempt to bring a CPU online. Return false if the CPU is already | ||
135 | * online or if it is not subject to CPU-hotplug operations. The | ||
136 | * caller can detect other failures by looking at the statistics. | ||
137 | */ | ||
138 | bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes, | ||
139 | unsigned long *sum_onl, int *min_onl, int *max_onl) | ||
140 | { | ||
141 | unsigned long delta; | ||
142 | int ret; | ||
143 | unsigned long starttime; | ||
144 | |||
145 | if (cpu_online(cpu) || !cpu_is_hotpluggable(cpu)) | ||
146 | return false; | ||
147 | |||
148 | if (verbose) | ||
149 | pr_alert("%s" TORTURE_FLAG | ||
150 | "torture_onoff task: onlining %d\n", | ||
151 | torture_type, cpu); | ||
152 | starttime = jiffies; | ||
153 | (*n_onl_attempts)++; | ||
154 | ret = cpu_up(cpu); | ||
155 | if (ret) { | ||
156 | if (verbose) | ||
157 | pr_alert("%s" TORTURE_FLAG | ||
158 | "torture_onoff task: online %d failed: errno %d\n", | ||
159 | torture_type, cpu, ret); | ||
160 | } else { | ||
161 | if (verbose) | ||
162 | pr_alert("%s" TORTURE_FLAG | ||
163 | "torture_onoff task: onlined %d\n", | ||
164 | torture_type, cpu); | ||
165 | (*n_onl_successes)++; | ||
166 | delta = jiffies - starttime; | ||
167 | *sum_onl += delta; | ||
168 | if (*min_onl < 0) { | ||
169 | *min_onl = delta; | ||
170 | *max_onl = delta; | ||
171 | } | ||
172 | if (*min_onl > delta) | ||
173 | *min_onl = delta; | ||
174 | if (*max_onl < delta) | ||
175 | *max_onl = delta; | ||
176 | } | ||
177 | |||
178 | return true; | ||
179 | } | ||
180 | EXPORT_SYMBOL_GPL(torture_online); | ||
181 | |||
182 | /* | ||
85 | * Execute random CPU-hotplug operations at the interval specified | 183 | * Execute random CPU-hotplug operations at the interval specified |
86 | * by the onoff_interval. | 184 | * by the onoff_interval. |
87 | */ | 185 | */ |
@@ -89,16 +187,19 @@ static int | |||
89 | torture_onoff(void *arg) | 187 | torture_onoff(void *arg) |
90 | { | 188 | { |
91 | int cpu; | 189 | int cpu; |
92 | unsigned long delta; | ||
93 | int maxcpu = -1; | 190 | int maxcpu = -1; |
94 | DEFINE_TORTURE_RANDOM(rand); | 191 | DEFINE_TORTURE_RANDOM(rand); |
95 | int ret; | ||
96 | unsigned long starttime; | ||
97 | 192 | ||
98 | VERBOSE_TOROUT_STRING("torture_onoff task started"); | 193 | VERBOSE_TOROUT_STRING("torture_onoff task started"); |
99 | for_each_online_cpu(cpu) | 194 | for_each_online_cpu(cpu) |
100 | maxcpu = cpu; | 195 | maxcpu = cpu; |
101 | WARN_ON(maxcpu < 0); | 196 | WARN_ON(maxcpu < 0); |
197 | |||
198 | if (maxcpu == 0) { | ||
199 | VERBOSE_TOROUT_STRING("Only one CPU, so CPU-hotplug testing is disabled"); | ||
200 | goto stop; | ||
201 | } | ||
202 | |||
102 | if (onoff_holdoff > 0) { | 203 | if (onoff_holdoff > 0) { |
103 | VERBOSE_TOROUT_STRING("torture_onoff begin holdoff"); | 204 | VERBOSE_TOROUT_STRING("torture_onoff begin holdoff"); |
104 | schedule_timeout_interruptible(onoff_holdoff); | 205 | schedule_timeout_interruptible(onoff_holdoff); |
@@ -106,69 +207,16 @@ torture_onoff(void *arg) | |||
106 | } | 207 | } |
107 | while (!torture_must_stop()) { | 208 | while (!torture_must_stop()) { |
108 | cpu = (torture_random(&rand) >> 4) % (maxcpu + 1); | 209 | cpu = (torture_random(&rand) >> 4) % (maxcpu + 1); |
109 | if (cpu_online(cpu) && cpu_is_hotpluggable(cpu)) { | 210 | if (!torture_offline(cpu, |
110 | if (verbose) | 211 | &n_offline_attempts, &n_offline_successes, |
111 | pr_alert("%s" TORTURE_FLAG | 212 | &sum_offline, &min_offline, &max_offline)) |
112 | "torture_onoff task: offlining %d\n", | 213 | torture_online(cpu, |
113 | torture_type, cpu); | 214 | &n_online_attempts, &n_online_successes, |
114 | starttime = jiffies; | 215 | &sum_online, &min_online, &max_online); |
115 | n_offline_attempts++; | ||
116 | ret = cpu_down(cpu); | ||
117 | if (ret) { | ||
118 | if (verbose) | ||
119 | pr_alert("%s" TORTURE_FLAG | ||
120 | "torture_onoff task: offline %d failed: errno %d\n", | ||
121 | torture_type, cpu, ret); | ||
122 | } else { | ||
123 | if (verbose) | ||
124 | pr_alert("%s" TORTURE_FLAG | ||
125 | "torture_onoff task: offlined %d\n", | ||
126 | torture_type, cpu); | ||
127 | n_offline_successes++; | ||
128 | delta = jiffies - starttime; | ||
129 | sum_offline += delta; | ||
130 | if (min_offline < 0) { | ||
131 | min_offline = delta; | ||
132 | max_offline = delta; | ||
133 | } | ||
134 | if (min_offline > delta) | ||
135 | min_offline = delta; | ||
136 | if (max_offline < delta) | ||
137 | max_offline = delta; | ||
138 | } | ||
139 | } else if (cpu_is_hotpluggable(cpu)) { | ||
140 | if (verbose) | ||
141 | pr_alert("%s" TORTURE_FLAG | ||
142 | "torture_onoff task: onlining %d\n", | ||
143 | torture_type, cpu); | ||
144 | starttime = jiffies; | ||
145 | n_online_attempts++; | ||
146 | ret = cpu_up(cpu); | ||
147 | if (ret) { | ||
148 | if (verbose) | ||
149 | pr_alert("%s" TORTURE_FLAG | ||
150 | "torture_onoff task: online %d failed: errno %d\n", | ||
151 | torture_type, cpu, ret); | ||
152 | } else { | ||
153 | if (verbose) | ||
154 | pr_alert("%s" TORTURE_FLAG | ||
155 | "torture_onoff task: onlined %d\n", | ||
156 | torture_type, cpu); | ||
157 | n_online_successes++; | ||
158 | delta = jiffies - starttime; | ||
159 | sum_online += delta; | ||
160 | if (min_online < 0) { | ||
161 | min_online = delta; | ||
162 | max_online = delta; | ||
163 | } | ||
164 | if (min_online > delta) | ||
165 | min_online = delta; | ||
166 | if (max_online < delta) | ||
167 | max_online = delta; | ||
168 | } | ||
169 | } | ||
170 | schedule_timeout_interruptible(onoff_interval); | 216 | schedule_timeout_interruptible(onoff_interval); |
171 | } | 217 | } |
218 | |||
219 | stop: | ||
172 | torture_kthread_stopping("torture_onoff"); | 220 | torture_kthread_stopping("torture_onoff"); |
173 | return 0; | 221 | return 0; |
174 | } | 222 | } |
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index b9cfdbfae9aa..805b7048a1bd 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug | |||
@@ -1307,22 +1307,6 @@ config RCU_PERF_TEST | |||
1307 | Say M if you want the RCU performance tests to build as a module. | 1307 | Say M if you want the RCU performance tests to build as a module. |
1308 | Say N if you are unsure. | 1308 | Say N if you are unsure. |
1309 | 1309 | ||
1310 | config RCU_PERF_TEST_RUNNABLE | ||
1311 | bool "performance tests for RCU runnable by default" | ||
1312 | depends on RCU_PERF_TEST = y | ||
1313 | default n | ||
1314 | help | ||
1315 | This option provides a way to build the RCU performance tests | ||
1316 | directly into the kernel without them starting up at boot time. | ||
1317 | You can use /sys/module to manually override this setting. | ||
1318 | This /proc file is available only when the RCU performance | ||
1319 | tests have been built into the kernel. | ||
1320 | |||
1321 | Say Y here if you want the RCU performance tests to start during | ||
1322 | boot (you probably don't). | ||
1323 | Say N here if you want the RCU performance tests to start only | ||
1324 | after being manually enabled via /sys/module. | ||
1325 | |||
1326 | config RCU_TORTURE_TEST | 1310 | config RCU_TORTURE_TEST |
1327 | tristate "torture tests for RCU" | 1311 | tristate "torture tests for RCU" |
1328 | depends on DEBUG_KERNEL | 1312 | depends on DEBUG_KERNEL |
@@ -1340,23 +1324,6 @@ config RCU_TORTURE_TEST | |||
1340 | Say M if you want the RCU torture tests to build as a module. | 1324 | Say M if you want the RCU torture tests to build as a module. |
1341 | Say N if you are unsure. | 1325 | Say N if you are unsure. |
1342 | 1326 | ||
1343 | config RCU_TORTURE_TEST_RUNNABLE | ||
1344 | bool "torture tests for RCU runnable by default" | ||
1345 | depends on RCU_TORTURE_TEST = y | ||
1346 | default n | ||
1347 | help | ||
1348 | This option provides a way to build the RCU torture tests | ||
1349 | directly into the kernel without them starting up at boot | ||
1350 | time. You can use /proc/sys/kernel/rcutorture_runnable | ||
1351 | to manually override this setting. This /proc file is | ||
1352 | available only when the RCU torture tests have been built | ||
1353 | into the kernel. | ||
1354 | |||
1355 | Say Y here if you want the RCU torture tests to start during | ||
1356 | boot (you probably don't). | ||
1357 | Say N here if you want the RCU torture tests to start only | ||
1358 | after being manually enabled via /proc. | ||
1359 | |||
1360 | config RCU_TORTURE_TEST_SLOW_PREINIT | 1327 | config RCU_TORTURE_TEST_SLOW_PREINIT |
1361 | bool "Slow down RCU grace-period pre-initialization to expose races" | 1328 | bool "Slow down RCU grace-period pre-initialization to expose races" |
1362 | depends on RCU_TORTURE_TEST | 1329 | depends on RCU_TORTURE_TEST |
diff --git a/tools/testing/selftests/rcutorture/bin/functions.sh b/tools/testing/selftests/rcutorture/bin/functions.sh index b325470c01b3..1426a9b97494 100644 --- a/tools/testing/selftests/rcutorture/bin/functions.sh +++ b/tools/testing/selftests/rcutorture/bin/functions.sh | |||
@@ -99,8 +99,9 @@ configfrag_hotplug_cpu () { | |||
99 | # identify_boot_image qemu-cmd | 99 | # identify_boot_image qemu-cmd |
100 | # | 100 | # |
101 | # Returns the relative path to the kernel build image. This will be | 101 | # Returns the relative path to the kernel build image. This will be |
102 | # arch/<arch>/boot/bzImage unless overridden with the TORTURE_BOOT_IMAGE | 102 | # arch/<arch>/boot/bzImage or vmlinux if bzImage is not a target for the |
103 | # environment variable. | 103 | # architecture, unless overridden with the TORTURE_BOOT_IMAGE environment |
104 | # variable. | ||
104 | identify_boot_image () { | 105 | identify_boot_image () { |
105 | if test -n "$TORTURE_BOOT_IMAGE" | 106 | if test -n "$TORTURE_BOOT_IMAGE" |
106 | then | 107 | then |
@@ -110,11 +111,8 @@ identify_boot_image () { | |||
110 | qemu-system-x86_64|qemu-system-i386) | 111 | qemu-system-x86_64|qemu-system-i386) |
111 | echo arch/x86/boot/bzImage | 112 | echo arch/x86/boot/bzImage |
112 | ;; | 113 | ;; |
113 | qemu-system-ppc64) | ||
114 | echo arch/powerpc/boot/bzImage | ||
115 | ;; | ||
116 | *) | 114 | *) |
117 | echo "" | 115 | echo vmlinux |
118 | ;; | 116 | ;; |
119 | esac | 117 | esac |
120 | fi | 118 | fi |
@@ -175,7 +173,7 @@ identify_qemu_args () { | |||
175 | qemu-system-x86_64|qemu-system-i386) | 173 | qemu-system-x86_64|qemu-system-i386) |
176 | ;; | 174 | ;; |
177 | qemu-system-ppc64) | 175 | qemu-system-ppc64) |
178 | echo -enable-kvm -M pseries -cpu POWER7 -nodefaults | 176 | echo -enable-kvm -M pseries -nodefaults |
179 | echo -device spapr-vscsi | 177 | echo -device spapr-vscsi |
180 | if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC" | 178 | if test -n "$TORTURE_QEMU_INTERACTIVE" -a -n "$TORTURE_QEMU_MAC" |
181 | then | 179 | then |
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index 4109f306d855..ea6e373edc27 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | |||
@@ -8,9 +8,9 @@ | |||
8 | # | 8 | # |
9 | # Usage: kvm-test-1-run.sh config builddir resdir seconds qemu-args boot_args | 9 | # Usage: kvm-test-1-run.sh config builddir resdir seconds qemu-args boot_args |
10 | # | 10 | # |
11 | # qemu-args defaults to "-enable-kvm -soundhw pcspk -nographic", along with | 11 | # qemu-args defaults to "-enable-kvm -nographic", along with arguments |
12 | # arguments specifying the number of CPUs and other | 12 | # specifying the number of CPUs and other options |
13 | # options generated from the underlying CPU architecture. | 13 | # generated from the underlying CPU architecture. |
14 | # boot_args defaults to value returned by the per_version_boot_params | 14 | # boot_args defaults to value returned by the per_version_boot_params |
15 | # shell function. | 15 | # shell function. |
16 | # | 16 | # |
@@ -96,7 +96,8 @@ if test "$base_resdir" != "$resdir" -a -f $base_resdir/bzImage -a -f $base_resdi | |||
96 | then | 96 | then |
97 | # Rerunning previous test, so use that test's kernel. | 97 | # Rerunning previous test, so use that test's kernel. |
98 | QEMU="`identify_qemu $base_resdir/vmlinux`" | 98 | QEMU="`identify_qemu $base_resdir/vmlinux`" |
99 | KERNEL=$base_resdir/bzImage | 99 | BOOT_IMAGE="`identify_boot_image $QEMU`" |
100 | KERNEL=$base_resdir/${BOOT_IMAGE##*/} # use the last component of ${BOOT_IMAGE} | ||
100 | ln -s $base_resdir/Make*.out $resdir # for kvm-recheck.sh | 101 | ln -s $base_resdir/Make*.out $resdir # for kvm-recheck.sh |
101 | ln -s $base_resdir/.config $resdir # for kvm-recheck.sh | 102 | ln -s $base_resdir/.config $resdir # for kvm-recheck.sh |
102 | elif kvm-build.sh $config_template $builddir $T | 103 | elif kvm-build.sh $config_template $builddir $T |
@@ -110,7 +111,7 @@ then | |||
110 | if test -n "$BOOT_IMAGE" | 111 | if test -n "$BOOT_IMAGE" |
111 | then | 112 | then |
112 | cp $builddir/$BOOT_IMAGE $resdir | 113 | cp $builddir/$BOOT_IMAGE $resdir |
113 | KERNEL=$resdir/bzImage | 114 | KERNEL=$resdir/${BOOT_IMAGE##*/} |
114 | else | 115 | else |
115 | echo No identifiable boot image, not running KVM, see $resdir. | 116 | echo No identifiable boot image, not running KVM, see $resdir. |
116 | echo Do the torture scripts know about your architecture? | 117 | echo Do the torture scripts know about your architecture? |
@@ -147,7 +148,7 @@ then | |||
147 | fi | 148 | fi |
148 | 149 | ||
149 | # Generate -smp qemu argument. | 150 | # Generate -smp qemu argument. |
150 | qemu_args="-enable-kvm -soundhw pcspk -nographic $qemu_args" | 151 | qemu_args="-enable-kvm -nographic $qemu_args" |
151 | cpu_count=`configNR_CPUS.sh $config_template` | 152 | cpu_count=`configNR_CPUS.sh $config_template` |
152 | cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` | 153 | cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` |
153 | vcpus=`identify_qemu_vcpus` | 154 | vcpus=`identify_qemu_vcpus` |
@@ -229,6 +230,7 @@ fi | |||
229 | if test $commandcompleted -eq 0 -a -n "$qemu_pid" | 230 | if test $commandcompleted -eq 0 -a -n "$qemu_pid" |
230 | then | 231 | then |
231 | echo Grace period for qemu job at pid $qemu_pid | 232 | echo Grace period for qemu job at pid $qemu_pid |
233 | oldline="`tail $resdir/console.log`" | ||
232 | while : | 234 | while : |
233 | do | 235 | do |
234 | kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` | 236 | kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` |
@@ -238,13 +240,29 @@ then | |||
238 | else | 240 | else |
239 | break | 241 | break |
240 | fi | 242 | fi |
241 | if test $kruntime -ge $((seconds + $TORTURE_SHUTDOWN_GRACE)) | 243 | must_continue=no |
244 | newline="`tail $resdir/console.log`" | ||
245 | if test "$newline" != "$oldline" && echo $newline | grep -q ' [0-9]\+us : ' | ||
246 | then | ||
247 | must_continue=yes | ||
248 | fi | ||
249 | last_ts="`tail $resdir/console.log | grep '^\[ *[0-9]\+\.[0-9]\+]' | tail -1 | sed -e 's/^\[ *//' -e 's/\..*$//'`" | ||
250 | if test -z "last_ts" | ||
251 | then | ||
252 | last_ts=0 | ||
253 | fi | ||
254 | if test "$newline" != "$oldline" -a "$last_ts" -lt $((seconds + $TORTURE_SHUTDOWN_GRACE)) | ||
255 | then | ||
256 | must_continue=yes | ||
257 | fi | ||
258 | if test $must_continue = no -a $kruntime -ge $((seconds + $TORTURE_SHUTDOWN_GRACE)) | ||
242 | then | 259 | then |
243 | echo "!!! PID $qemu_pid hung at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1 | 260 | echo "!!! PID $qemu_pid hung at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1 |
244 | kill -KILL $qemu_pid | 261 | kill -KILL $qemu_pid |
245 | break | 262 | break |
246 | fi | 263 | fi |
247 | sleep 1 | 264 | oldline=$newline |
265 | sleep 10 | ||
248 | done | 266 | done |
249 | elif test -z "$qemu_pid" | 267 | elif test -z "$qemu_pid" |
250 | then | 268 | then |
diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh index 0d598145873e..0aed965f0062 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm.sh | |||
@@ -48,7 +48,7 @@ resdir="" | |||
48 | configs="" | 48 | configs="" |
49 | cpus=0 | 49 | cpus=0 |
50 | ds=`date +%Y.%m.%d-%H:%M:%S` | 50 | ds=`date +%Y.%m.%d-%H:%M:%S` |
51 | jitter=0 | 51 | jitter="-1" |
52 | 52 | ||
53 | . functions.sh | 53 | . functions.sh |
54 | 54 | ||
diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh index 5eb49b7f864c..08aa7d50ae0e 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-console.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh | |||
@@ -33,7 +33,7 @@ if grep -Pq '\x00' < $file | |||
33 | then | 33 | then |
34 | print_warning Console output contains nul bytes, old qemu still running? | 34 | print_warning Console output contains nul bytes, old qemu still running? |
35 | fi | 35 | fi |
36 | egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state' < $file | grep -v 'ODEBUG: ' | grep -v 'Warning: unable to open an initial console' > $1.diags | 36 | egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for' < $file | grep -v 'ODEBUG: ' | grep -v 'Warning: unable to open an initial console' > $1.diags |
37 | if test -s $1.diags | 37 | if test -s $1.diags |
38 | then | 38 | then |
39 | print_warning Assertion failure in $file $title | 39 | print_warning Assertion failure in $file $title |
@@ -69,6 +69,11 @@ then | |||
69 | then | 69 | then |
70 | summary="$summary Stalls: $n_stalls" | 70 | summary="$summary Stalls: $n_stalls" |
71 | fi | 71 | fi |
72 | n_starves=`grep -c 'rcu_.*kthread starved for' $1` | ||
73 | if test "$n_starves" -ne 0 | ||
74 | then | ||
75 | summary="$summary Starves: $n_starves" | ||
76 | fi | ||
72 | print_warning Summary: $summary | 77 | print_warning Summary: $summary |
73 | else | 78 | else |
74 | rm $1.diags | 79 | rm $1.diags |
diff --git a/tools/testing/selftests/rcutorture/doc/initrd.txt b/tools/testing/selftests/rcutorture/doc/initrd.txt index 4170e714f044..833f826d6ec2 100644 --- a/tools/testing/selftests/rcutorture/doc/initrd.txt +++ b/tools/testing/selftests/rcutorture/doc/initrd.txt | |||
@@ -13,6 +13,22 @@ cd initrd | |||
13 | cpio -id < /tmp/initrd.img.zcat | 13 | cpio -id < /tmp/initrd.img.zcat |
14 | ------------------------------------------------------------------------ | 14 | ------------------------------------------------------------------------ |
15 | 15 | ||
16 | Another way to create an initramfs image is using "dracut"[1], which is | ||
17 | available on many distros, however the initramfs dracut generates is a cpio | ||
18 | archive with another cpio archive in it, so an extra step is needed to create | ||
19 | the initrd directory hierarchy. | ||
20 | |||
21 | Here are the commands to create a initrd directory for rcutorture using | ||
22 | dracut: | ||
23 | |||
24 | ------------------------------------------------------------------------ | ||
25 | dracut --no-hostonly --no-hostonly-cmdline --module "base bash shutdown" /tmp/initramfs.img | ||
26 | cd tools/testing/selftests/rcutorture | ||
27 | mkdir initrd | ||
28 | cd initrd | ||
29 | /usr/lib/dracut/skipcpio /tmp/initramfs.img | zcat | cpio -id < /tmp/initramfs.img | ||
30 | ------------------------------------------------------------------------ | ||
31 | |||
16 | Interestingly enough, if you are running rcutorture, you don't really | 32 | Interestingly enough, if you are running rcutorture, you don't really |
17 | need userspace in many cases. Running without userspace has the | 33 | need userspace in many cases. Running without userspace has the |
18 | advantage of allowing you to test your kernel independently of the | 34 | advantage of allowing you to test your kernel independently of the |
@@ -89,3 +105,9 @@ while : | |||
89 | do | 105 | do |
90 | sleep 10 | 106 | sleep 10 |
91 | done | 107 | done |
108 | ------------------------------------------------------------------------ | ||
109 | |||
110 | References: | ||
111 | [1]: https://dracut.wiki.kernel.org/index.php/Main_Page | ||
112 | [2]: http://blog.elastocloud.org/2015/06/rapid-linux-kernel-devtest-with-qemu.html | ||
113 | [3]: https://www.centos.org/forums/viewtopic.php?t=51621 | ||