aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc
diff options
context:
space:
mode:
authorMichael Wang <wangyun@linux.vnet.ibm.com>2014-04-07 23:19:36 -0400
committerBenjamin Herrenschmidt <benh@kernel.crashing.org>2014-04-08 22:54:17 -0400
commit9a0133613e4412b2caaaf0d9dd81f213bcededf1 (patch)
tree8298db7c2bfa2a4d22d6d9f0a8eee024d53265ac /arch/powerpc
parentd3d35d957a9d0733dc51f14b5abc0bff5d3c5f3a (diff)
power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
Since v1: Edited the comment according to Srivatsa's suggestion. During the testing, we encounter below WARN followed by Oops: WARNING: at kernel/sched/core.c:6218 ... NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200 LR [c000000000101358] .build_sched_domains+0xec8/0x1200 PACATMSCRATCH [800000000000f032] Call Trace: [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0 LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200 PACATMSCRATCH [8000000000029032] Call Trace: [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0 [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... This was caused by that 'sd->groups == NULL' after building groups, which was caused by the empty 'sd->span'. The cpu's domain contained nothing because the cpu was assigned to a wrong node, due to the following unfortunate sequence of events: 1. The hypervisor sent a topology update to the guest OS, to notify changes to the cpu-node mapping. However, the update was actually redundant - i.e., the "new" mapping was exactly the same as the old one. 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting the 'for-loop' in arch_update_cpu_topology(). 3. So we ended up calling stop-machine() with an empty cpumask list, which made stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as the cpu to run the payload (the update_cpu_topology() function). 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates' is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology() finds update->cpu as well as update->new_nid to be 0. In other words, we end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly. Along with the following wrong updating, it causes the sched-domain rebuild code to break and crash the system. Fix this by skipping the topology update in cases where we find that the topology has not actually changed in reality (ie., spurious updates). CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Robert Jennings <rcj@linux.vnet.ibm.com> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> CC: Alistair Popple <alistair@popple.id.au> Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Diffstat (limited to 'arch/powerpc')
-rw-r--r--arch/powerpc/mm/numa.c15
1 files changed, 15 insertions, 0 deletions
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 30a42e24bf14..4ebbb9e99286 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1591,6 +1591,20 @@ int arch_update_cpu_topology(void)
1591 cpu = cpu_last_thread_sibling(cpu); 1591 cpu = cpu_last_thread_sibling(cpu);
1592 } 1592 }
1593 1593
1594 /*
1595 * In cases where we have nothing to update (because the updates list
1596 * is too short or because the new topology is same as the old one),
1597 * skip invoking update_cpu_topology() via stop-machine(). This is
1598 * necessary (and not just a fast-path optimization) since stop-machine
1599 * can end up electing a random CPU to run update_cpu_topology(), and
1600 * thus trick us into setting up incorrect cpu-node mappings (since
1601 * 'updates' is kzalloc()'ed).
1602 *
1603 * And for the similar reason, we will skip all the following updating.
1604 */
1605 if (!cpumask_weight(&updated_cpus))
1606 goto out;
1607
1594 stop_machine(update_cpu_topology, &updates[0], &updated_cpus); 1608 stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
1595 1609
1596 /* 1610 /*
@@ -1612,6 +1626,7 @@ int arch_update_cpu_topology(void)
1612 changed = 1; 1626 changed = 1;
1613 } 1627 }
1614 1628
1629out:
1615 kfree(updates); 1630 kfree(updates);
1616 return changed; 1631 return changed;
1617} 1632}