diff options
author | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2009-11-22 11:53:48 -0500 |
---|---|---|
committer | Ingo Molnar <mingo@elte.hu> | 2009-11-22 12:58:15 -0500 |
commit | b668c9cf3e58739dac54a1d6f42f2b4bdd980b3e (patch) | |
tree | c0165d39532a2314f8187d765f3c8ddf88b72831 /kernel/rcutree.h | |
parent | 2f51f9884f6a36b0fe9636d5a1937e5cbd25723b (diff) |
rcu: Fix grace-period-stall bug on large systems with CPU hotplug
When the last CPU of a given leaf rcu_node structure goes
offline, all of the tasks queued on that leaf rcu_node structure
(due to having blocked in their current RCU read-side critical
sections) are requeued onto the root rcu_node structure. This
requeuing is carried out by rcu_preempt_offline_tasks().
However, it is possible that these queued tasks are the only
thing preventing the leaf rcu_node structure from reporting a
quiescent state up the rcu_node hierarchy. Unfortunately, the
old code would fail to do this reporting, resulting in a
grace-period stall given the following sequence of events:
1. Kernel built for more than 32 CPUs on 32-bit systems or for more
than 64 CPUs on 64-bit systems, so that there is more than one
rcu_node structure. (Or CONFIG_RCU_FANOUT is artificially set
to a number smaller than CONFIG_NR_CPUS.)
2. The kernel is built with CONFIG_TREE_PREEMPT_RCU.
3. A task running on a CPU associated with a given leaf rcu_node
structure blocks while in an RCU read-side critical section
-and- that CPU has not yet passed through a quiescent state
for the current RCU grace period. This will cause the task
to be queued on the leaf rcu_node's blocked_tasks[] array, in
particular, on the element of this array corresponding to the
current grace period.
4. Each of the remaining CPUs corresponding to this same leaf rcu_node
structure pass through a quiescent state. However, the task is
still in its RCU read-side critical section, so these quiescent
states cannot be reported further up the rcu_node hierarchy.
Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
field are now zero.
5. Each of the remaining CPUs go offline. (The events in step
#4 and #5 can happen in any order as long as each CPU passes
through a quiescent state before going offline.)
6. When the last CPU goes offline, __rcu_offline_cpu() will invoke
rcu_preempt_offline_tasks(), which will move the task to the
root rcu_node structure, but without reporting a quiescent state
up the rcu_node hierarchy (and this failure to report a quiescent
state is the bug).
But because this leaf rcu_node structure's ->qsmask field is
already zero and its ->block_tasks[] entries are all empty,
force_quiescent_state() will skip this rcu_node structure.
Therefore, grace periods are now hung.
This patch abstracts some code out of rcu_read_unlock_special(),
calling the result task_quiet() by analogy with cpu_quiet(), and
invokes task_quiet() from both rcu_read_lock_special() and
__rcu_offline_cpu(). Invoking task_quiet() from
__rcu_offline_cpu() reports the quiescent state up the rcu_node
hierarchy, fixing the bug. This ends up requiring a separate
lock_class_key per level of the rcu_node hierarchy, which this
patch also provides.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12589088301770-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'kernel/rcutree.h')
-rw-r--r-- | kernel/rcutree.h | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 17a28a08b559..a81188c42929 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h | |||
@@ -305,6 +305,9 @@ static void rcu_bootup_announce(void); | |||
305 | long rcu_batches_completed(void); | 305 | long rcu_batches_completed(void); |
306 | static void rcu_preempt_note_context_switch(int cpu); | 306 | static void rcu_preempt_note_context_switch(int cpu); |
307 | static int rcu_preempted_readers(struct rcu_node *rnp); | 307 | static int rcu_preempted_readers(struct rcu_node *rnp); |
308 | #ifdef CONFIG_HOTPLUG_CPU | ||
309 | static void task_quiet(struct rcu_node *rnp, unsigned long flags); | ||
310 | #endif /* #ifdef CONFIG_HOTPLUG_CPU */ | ||
308 | #ifdef CONFIG_RCU_CPU_STALL_DETECTOR | 311 | #ifdef CONFIG_RCU_CPU_STALL_DETECTOR |
309 | static void rcu_print_task_stall(struct rcu_node *rnp); | 312 | static void rcu_print_task_stall(struct rcu_node *rnp); |
310 | #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */ | 313 | #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */ |