From 0f004f5a696a9434b7214d0d3cbd0525ee77d428 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 30 Nov 2010 19:48:45 +0100 Subject: sched: Cure more NO_HZ load average woes There's a long-running regression that proved difficult to fix and which is hitting certain people and is rather annoying in its effects. Damien reported that after 74f5187ac8 (sched: Cure load average vs NO_HZ woes) his load average is unnaturally high, he also noted that even with that patch reverted the load avgerage numbers are not correct. The problem is that the previous patch only solved half the NO_HZ problem, it addressed the part of going into NO_HZ mode, not of comming out of NO_HZ mode. This patch implements that missing half. When comming out of NO_HZ mode there are two important things to take care of: - Folding the pending idle delta into the global active count. - Correctly aging the averages for the idle-duration. So with this patch the NO_HZ interaction should be complete and behaviour between CONFIG_NO_HZ=[yn] should be equivalent. Furthermore, this patch slightly changes the load average computation by adding a rounding term to the fixed point multiplication. Reported-by: Damien Wyart Reported-by: Tim McGrath Tested-by: Damien Wyart Tested-by: Orion Poplawski Tested-by: Kyle McMartin Signed-off-by: Peter Zijlstra Cc: stable@kernel.org Cc: Chase Douglas LKML-Reference: <1291129145.32004.874.camel@laptop> Signed-off-by: Ingo Molnar --- kernel/timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/timer.c') diff --git a/kernel/timer.c b/kernel/timer.c index 68a9ae7679b7..7bd715fda974 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1319,7 +1319,7 @@ void do_timer(unsigned long ticks) { jiffies_64 += ticks; update_wall_time(); - calc_global_load(); + calc_global_load(ticks); } #ifdef __ARCH_WANT_SYS_ALARM -- cgit v1.2.2 From dbd87b5af055a0cc9bba17795c9a2b0d17795389 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 1 Dec 2010 10:11:09 +0100 Subject: nohz: Fix get_next_timer_interrupt() vs cpu hotplug This fixes a bug as seen on 2.6.32 based kernels where timers got enqueued on offline cpus. If a cpu goes offline it might still have pending timers. These will be migrated during CPU_DEAD handling after the cpu is offline. However while the cpu is going offline it will schedule the idle task which will then call tick_nohz_stop_sched_tick(). That function in turn will call get_next_timer_intterupt() to figure out if the tick of the cpu can be stopped or not. If it turns out that the next tick is just one jiffy off (delta_jiffies == 1) tick_nohz_stop_sched_tick() incorrectly assumes that the tick should not stop and takes an early exit and thus it won't update the load balancer cpu. Just afterwards the cpu will be killed and the load balancer cpu could be the offline cpu. On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide on which cpu a timer should be enqueued (see __mod_timer()). Which leads to the possibility that timers get enqueued on an offline cpu. These will never expire and can cause a system hang. This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses get_nohz_timer_target() which doesn't have that problem. However there might be other problems because of the too early exit tick_nohz_stop_sched_tick() in case a cpu goes offline. The easiest and probably safest fix seems to be to let get_next_timer_interrupt() just lie and let it say there isn't any pending timer if the current cpu is offline. I also thought of moving migrate_[hr]timers() from CPU_DEAD to CPU_DYING, but seeing that there already have been fixes at least in the hrtimer code in this area I'm afraid that this could add new subtle bugs. Signed-off-by: Heiko Carstens Signed-off-by: Peter Zijlstra LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com> Cc: stable@kernel.org Signed-off-by: Ingo Molnar --- kernel/timer.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'kernel/timer.c') diff --git a/kernel/timer.c b/kernel/timer.c index 7bd715fda974..353b9227c2ec 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1252,6 +1252,12 @@ unsigned long get_next_timer_interrupt(unsigned long now) struct tvec_base *base = __get_cpu_var(tvec_bases); unsigned long expires; + /* + * Pretend that there is no timer pending if the cpu is offline. + * Possible pending timers will be migrated later to an active cpu. + */ + if (cpu_is_offline(smp_processor_id())) + return now + NEXT_TIMER_MAX_DELTA; spin_lock(&base->lock); if (time_before_eq(base->next_timer, base->timer_jiffies)) base->next_timer = __next_timer_interrupt(base); -- cgit v1.2.2