aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/sched.c
diff options
context:
space:
mode:
authorSuresh Siddha <suresh.b.siddha@intel.com>2010-02-12 20:14:22 -0500
committerThomas Gleixner <tglx@linutronix.de>2010-02-16 09:13:59 -0500
commit9000f05c6d1607f79c0deacf42b09693be673f4c (patch)
treede24233877ccf6008bd65278820251bac442fa97 /kernel/sched.c
parent28f5318167adf23b16c844b9c2253f355cb21796 (diff)
sched: Fix SMT scheduler regression in find_busiest_queue()
Fix a SMT scheduler performance regression that is leading to a scenario where SMT threads in one core are completely idle while both the SMT threads in another core (on the same socket) are busy. This is caused by this commit (with the problematic code highlighted) commit bdb94aa5dbd8b55e75f5a50b61312fe589e2c2d1 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Tue Sep 1 10:34:38 2009 +0200 sched: Try to deal with low capacity @@ -4203,15 +4223,18 @@ find_busiest_queue() ... for_each_cpu(i, sched_group_cpus(group)) { + unsigned long power = power_of(i); ... - wl = weighted_cpuload(i); + wl = weighted_cpuload(i) * SCHED_LOAD_SCALE; + wl /= power; - if (rq->nr_running == 1 && wl > imbalance) + if (capacity && rq->nr_running == 1 && wl > imbalance) continue; On a SMT system, power of the HT logical cpu will be 589 and the scheduler load imbalance (for scenarios like the one mentioned above) can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling the weighted load with the power will result in "wl > imbalance" and ultimately resulting in find_busiest_queue() return NULL, causing load_balance() to think that the load is well balanced. But infact one of the tasks can be moved to the idle core for optimal performance. We don't need to use the weighted load (wl) scaled by the cpu power to compare with imabalance. In that condition, we already know there is only a single task "rq->nr_running == 1" and the comparison between imbalance, wl is to make sure that we select the correct priority thread which matches imbalance. So we really need to compare the imabalnce with the original weighted load of the cpu and not the scaled load. But in other conditions where we want the most hammered(busiest) cpu, we can use scaled load to ensure that we consider the cpu power in addition to the actual load on that cpu, so that we can move the load away from the guy that is getting most hammered with respect to the actual capacity, as compared with the rest of the cpu's in that busiest group. Fix it. Reported-by: Ma Ling <ling.ma@intel.com> Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com> Cc: stable@kernel.org [2.6.32.x] Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Diffstat (limited to 'kernel/sched.c')
-rw-r--r--kernel/sched.c15
1 files changed, 13 insertions, 2 deletions
diff --git a/kernel/sched.c b/kernel/sched.c
index e3199df426e3..4d78aef4559d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4119,12 +4119,23 @@ find_busiest_queue(struct sched_group *group, enum cpu_idle_type idle,
4119 continue; 4119 continue;
4120 4120
4121 rq = cpu_rq(i); 4121 rq = cpu_rq(i);
4122 wl = weighted_cpuload(i) * SCHED_LOAD_SCALE; 4122 wl = weighted_cpuload(i);
4123 wl /= power;
4124 4123
4124 /*
4125 * When comparing with imbalance, use weighted_cpuload()
4126 * which is not scaled with the cpu power.
4127 */
4125 if (capacity && rq->nr_running == 1 && wl > imbalance) 4128 if (capacity && rq->nr_running == 1 && wl > imbalance)
4126 continue; 4129 continue;
4127 4130
4131 /*
4132 * For the load comparisons with the other cpu's, consider
4133 * the weighted_cpuload() scaled with the cpu power, so that
4134 * the load can be moved away from the cpu that is potentially
4135 * running at a lower capacity.
4136 */
4137 wl = (wl * SCHED_LOAD_SCALE) / power;
4138
4128 if (wl > max_load) { 4139 if (wl > max_load) {
4129 max_load = wl; 4140 max_load = wl;
4130 busiest = rq; 4141 busiest = rq;