aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
diff options
context:
space:
mode:
authorKen Chen <kenchen@google.com>2011-04-07 20:23:22 -0400
committerIngo Molnar <mingo@elte.hu>2011-04-11 05:08:54 -0400
commitb0432d8f162c7d5d9537b4cb749d44076b76a783 (patch)
tree98b94ec55f6d18935aedbc9ab898705ad252b939 /kernel
parent4263a2f1dad8c8e7ce2352a0cbc882c2b0c044a9 (diff)
sched: Fix sched-domain avg_load calculation
In function find_busiest_group(), the sched-domain avg_load isn't calculated at all if there is a group imbalance within the domain. This will cause erroneous imbalance calculation. The reason is that calculate_imbalance() sees sds->avg_load = 0 and it will dump entire sds->max_load into imbalance variable, which is used later on to migrate entire load from busiest CPU to the puller CPU. This has two really bad effect: 1. stampede of task migration, and they won't be able to break out of the bad state because of positive feedback loop: large load delta -> heavier load migration -> larger imbalance and the cycle goes on. 2. severe imbalance in CPU queue depth. This causes really long scheduling latency blip which affects badly on application that has tight latency requirement. The fix is to have kernel calculate domain avg_load in both cases. This will ensure that imbalance calculation is always sensible and the target is usually half way between busiest and puller CPU. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/20110408002322.3A0D812217F@elm.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'kernel')
-rw-r--r--kernel/sched_fair.c3
1 files changed, 2 insertions, 1 deletions
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 7f00772e57c9..60f9d407c5ec 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3127,6 +3127,8 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
3127 if (!sds.busiest || sds.busiest_nr_running == 0) 3127 if (!sds.busiest || sds.busiest_nr_running == 0)
3128 goto out_balanced; 3128 goto out_balanced;
3129 3129
3130 sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr;
3131
3130 /* 3132 /*
3131 * If the busiest group is imbalanced the below checks don't 3133 * If the busiest group is imbalanced the below checks don't
3132 * work because they assumes all things are equal, which typically 3134 * work because they assumes all things are equal, which typically
@@ -3151,7 +3153,6 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
3151 * Don't pull any tasks if this group is already above the domain 3153 * Don't pull any tasks if this group is already above the domain
3152 * average load. 3154 * average load.
3153 */ 3155 */
3154 sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr;
3155 if (sds.this_load >= sds.avg_load) 3156 if (sds.this_load >= sds.avg_load)
3156 goto out_balanced; 3157 goto out_balanced;
3157 3158