CPU hotplug, smp: flush any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async() | = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. This of course, makes the smp-call-function IPI handler code running on CPU 2 unhappy and it complains about "receiving an IPI on an offline CPU". One real example of the scenario on CPU 1 is the block layer's complete-request call-path: __blk_complete_request() [interrupt-handler] raise_blk_irq() smp_call_function_single_async() However, if we look closely, the block layer does check that the target CPU is online before firing the IPI. So in this case, it is actually the unfortunate ordering/timing of events in the stop-machine phase that leads to receiving IPIs after the target CPU has gone offline. In reality, getting a late IPI on an offline CPU is not too bad by itself (this can happen even due to hardware latencies in IPI send-receive). It is a bug only if the target CPU really went offline without executing all the callbacks queued on its list. (Note that a CPU is free to execute its pending smp-call-function callbacks in a batch, without waiting for the corresponding IPIs to arrive for each one of those callbacks). So, fixing this issue can be broken up into two parts: 1. Ensure that a CPU goes offline only after executing all the callbacks queued on it. 2. Modify the warning condition in the smp-call-function IPI handler code such that it warns only if an offline CPU got an IPI *and* that CPU had gone offline with callbacks still pending in its queue. Achieving part 1 is straight-forward - just flush (execute) all the queued callbacks on the outgoing CPU in the CPU_DYING stage[1], including those callbacks for which the source CPU's IPIs might not have been received on the outgoing CPU yet. Once we do this, an IPI that arrives late on the CPU going offline (either due to the race mentioned above, or due to hardware latencies) will be completely harmless, since the outgoing CPU would have executed all the queued callbacks before going offline. Overall, this fix (parts 1 and 2 put together) additionally guarantees that we will see a warning only when the *IPI-sender code* is buggy - that is, if it queues the callback _after_ the target CPU has gone offline. [1]. The CPU_DYING part needs a little more explanation: by the time we execute the CPU_DYING notifier callbacks, the CPU would have already been marked offline. But we want to flush out the pending callbacks at this stage, ignoring the fact that the CPU is offline. So restructure the IPI handler code so that we can by-pass the "is-cpu-offline?" check in this particular case. (Of course, the right solution here is to fix CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING notifiers, but this requires a lot of audit to ensure that this change doesn't break any existing code; hence lets go with the solution proposed above until that is done). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sachin Kamat <sachin.kamat@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> 2014-06-23 16:22:02 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2014-06-23 19:47:43 -0400
commit: 8d056c48e486249e6487910b83e0f3be7c14acf7 (patch)
tree: 4c834049da5ce8babdcb813ac4a11cdb5bc18bff
parent: e020d5bd8a730757b565b18d620240f71c3e21fe (diff)
1 files changed, 49 insertions, 8 deletions
diff --git a/kernel/smp.c b/kernel/smp.c
index 306f8180b0d5..80c33f8de14f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
+static void flush_smp_call_function_queue(bool warn_cpu_offline);
 static int
 hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -51,12 +53,27 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 #ifdef CONFIG_HOTPLUG_CPU
        case CPU_UP_CANCELED:
        case CPU_UP_CANCELED_FROZEN:
+                /* Fall-through to the CPU_DEAD[_FROZEN] case. */
        case CPU_DEAD:
        case CPU_DEAD_FROZEN:
                free_cpumask_var(cfd->cpumask);
                free_percpu(cfd->csd);
                break;
+        case CPU_DYING:
+        case CPU_DYING_FROZEN:
+                /*
+                 * The IPIs for the smp-call-function callbacks queued by other
+                 * CPUs might arrive late, either due to hardware latencies or
+                 * because this CPU disabled interrupts (inside stop-machine)
+                 * before the IPIs were sent. So flush out any pending callbacks
+                 * explicitly (without waiting for the IPIs to arrive), to
+                 * ensure that the outgoing CPU doesn't go offline with work
+                 * still pending.
+                 */
+                flush_smp_call_function_queue(false);
+                break;
 #endif
        };
@@ -177,23 +194,47 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
        return 0;
 }
-/*
+/**
- * Invoked by arch to handle an IPI for call function single. Must be
+ * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
- * called from the arch with interrupts disabled.
+ *
+ * Invoked by arch to handle an IPI for call function single.
+ * Must be called with interrupts disabled.
 */
 void generic_smp_call_function_single_interrupt(void)
 {
+        flush_smp_call_function_queue(true);
+}
+/**
+ * flush_smp_call_function_queue - Flush pending smp-call-function callbacks
+ *
+ * @warn_cpu_offline: If set to 'true', warn if callbacks were queued on an
+ *                    offline CPU. Skip this check if set to 'false'.
+ *
+ * Flush any pending smp-call-function callbacks queued on this CPU. This is
+ * invoked by the generic IPI handler, as well as by a CPU about to go offline,
+ * to ensure that all pending IPI callbacks are run before it goes completely
+ * offline.
+ *
+ * Loop through the call_single_queue and run all the queued callbacks.
+ * Must be called with interrupts disabled.
+ */
+static void flush_smp_call_function_queue(bool warn_cpu_offline)
+{
+        struct llist_head *head;
        struct llist_node *entry;
        struct call_single_data *csd, *csd_next;
        static bool warned;
-        entry = llist_del_all(&__get_cpu_var(call_single_queue));
+        WARN_ON(!irqs_disabled());
+        head = &__get_cpu_var(call_single_queue);
+        entry = llist_del_all(head);
        entry = llist_reverse_order(entry);
-        /*
+        /* There shouldn't be any pending callbacks on an offline CPU. */
-         * Shouldn't receive this interrupt on a cpu that is not yet online.
+        if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
-         */
+                     !warned && !llist_empty(head))) {
-        if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
                warned = true;
                WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
author	Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>	2014-06-23 16:22:02 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2014-06-23 19:47:43 -0400
commit	8d056c48e486249e6487910b83e0f3be7c14acf7 (patch)
tree	4c834049da5ce8babdcb813ac4a11cdb5bc18bff
parent	e020d5bd8a730757b565b18d620240f71c3e21fe (diff)

diff --git a/kernel/smp.c b/kernel/smp.c index 306f8180b0d5..80c33f8de14f 100644 --- a/kernel/smp.c +++ b/kernel/smp.c
@@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data);
29		29
30	static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);	30	static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
31		31
		32	static void flush_smp_call_function_queue(bool warn_cpu_offline);
		33
32	static int	34	static int
33	hotplug_cfd(struct notifier_block nfb, unsigned long action, void hcpu)	35	hotplug_cfd(struct notifier_block nfb, unsigned long action, void hcpu)
34	{	36	{
@@ -51,12 +53,27 @@ hotplug_cfd(struct notifier_block nfb, unsigned long action, void hcpu)
51	#ifdef CONFIG_HOTPLUG_CPU	53	#ifdef CONFIG_HOTPLUG_CPU
52	case CPU_UP_CANCELED:	54	case CPU_UP_CANCELED:
53	case CPU_UP_CANCELED_FROZEN:	55	case CPU_UP_CANCELED_FROZEN:
		56	/* Fall-through to the CPU_DEAD[_FROZEN] case. */
54		57
55	case CPU_DEAD:	58	case CPU_DEAD:
56	case CPU_DEAD_FROZEN:	59	case CPU_DEAD_FROZEN:
57	free_cpumask_var(cfd->cpumask);	60	free_cpumask_var(cfd->cpumask);
58	free_percpu(cfd->csd);	61	free_percpu(cfd->csd);
59	break;	62	break;
		63
		64	case CPU_DYING:
		65	case CPU_DYING_FROZEN:
		66	/*
		67	* The IPIs for the smp-call-function callbacks queued by other
		68	* CPUs might arrive late, either due to hardware latencies or
		69	* because this CPU disabled interrupts (inside stop-machine)
		70	* before the IPIs were sent. So flush out any pending callbacks
		71	* explicitly (without waiting for the IPIs to arrive), to
		72	* ensure that the outgoing CPU doesn't go offline with work
		73	* still pending.
		74	*/
		75	flush_smp_call_function_queue(false);
		76	break;
60	#endif	77	#endif
61	};	78	};
62		79
@@ -177,23 +194,47 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
177	return 0;	194	return 0;
178	}	195	}
179		196
180	/*	197	/**
181	* Invoked by arch to handle an IPI for call function single. Must be	198	* generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
182	* called from the arch with interrupts disabled.	199	*
		200	* Invoked by arch to handle an IPI for call function single.
		201	* Must be called with interrupts disabled.
183	*/	202	*/
184	void generic_smp_call_function_single_interrupt(void)	203	void generic_smp_call_function_single_interrupt(void)
185	{	204	{
		205	flush_smp_call_function_queue(true);
		206	}
		207
		208	/**
		209	* flush_smp_call_function_queue - Flush pending smp-call-function callbacks
		210	*
		211	* @warn_cpu_offline: If set to 'true', warn if callbacks were queued on an
		212	* offline CPU. Skip this check if set to 'false'.
		213	*
		214	* Flush any pending smp-call-function callbacks queued on this CPU. This is
		215	* invoked by the generic IPI handler, as well as by a CPU about to go offline,
		216	* to ensure that all pending IPI callbacks are run before it goes completely
		217	* offline.
		218	*
		219	* Loop through the call_single_queue and run all the queued callbacks.
		220	* Must be called with interrupts disabled.
		221	*/
		222	static void flush_smp_call_function_queue(bool warn_cpu_offline)
		223	{
		224	struct llist_head *head;
186	struct llist_node *entry;	225	struct llist_node *entry;
187	struct call_single_data csd, csd_next;	226	struct call_single_data csd, csd_next;
188	static bool warned;	227	static bool warned;
189		228
190	entry = llist_del_all(&__get_cpu_var(call_single_queue));	229	WARN_ON(!irqs_disabled());
		230
		231	head = &__get_cpu_var(call_single_queue);
		232	entry = llist_del_all(head);
191	entry = llist_reverse_order(entry);	233	entry = llist_reverse_order(entry);
192		234
193	/*	235	/* There shouldn't be any pending callbacks on an offline CPU. */
194	* Shouldn't receive this interrupt on a cpu that is not yet online.	236	if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
195	*/	237	!warned && !llist_empty(head))) {
196	if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
197	warned = true;	238	warned = true;
198	WARN(1, "IPI on offline CPU %d\n", smp_processor_id());	239	WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
199		240