aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/time
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-05-05 16:23:27 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2013-05-05 16:23:27 -0400
commit534c97b0950b1967bca1c753aeaed32f5db40264 (patch)
tree9421d26e4f6d479d1bc32b036a731b065daab0fa /kernel/time
parent64049d1973c1735f543eb7a55653e291e108b0cb (diff)
parent265f22a975c1e4cc3a4d1f94a3ec53ffbb6f5b9f (diff)
Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull 'full dynticks' support from Ingo Molnar: "This tree from Frederic Weisbecker adds a new, (exciting! :-) core kernel feature to the timer and scheduler subsystems: 'full dynticks', or CONFIG_NO_HZ_FULL=y. This feature extends the nohz variable-size timer tick feature from idle to busy CPUs (running at most one task) as well, potentially reducing the number of timer interrupts significantly. This feature got motivated by real-time folks and the -rt tree, but the general utility and motivation of full-dynticks runs wider than that: - HPC workloads get faster: CPUs running a single task should be able to utilize a maximum amount of CPU power. A periodic timer tick at HZ=1000 can cause a constant overhead of up to 1.0%. This feature removes that overhead - and speeds up the system by 0.5%-1.0% on typical distro configs even on modern systems. - Real-time workload latency reduction: CPUs running critical tasks should experience as little jitter as possible. The last remaining source of kernel-related jitter was the periodic timer tick. - A single task executing on a CPU is a pretty common situation, especially with an increasing number of cores/CPUs, so this feature helps desktop and mobile workloads as well. The cost of the feature is mainly related to increased timer reprogramming overhead when a CPU switches its tick period, and thus slightly longer to-idle and from-idle latency. Configuration-wise a third mode of operation is added to the existing two NOHZ kconfig modes: - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named as a config option. This is the traditional Linux periodic tick design: there's a HZ tick going on all the time, regardless of whether a CPU is idle or not. - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the periodic tick when a CPU enters idle mode. - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the tick when a CPU is idle, also slows the tick down to 1 Hz (one timer interrupt per second) when only a single task is running on a CPU. The .config behavior is compatible: existing !CONFIG_NO_HZ and CONFIG_NO_HZ=y settings get translated to the new values, without the user having to configure anything. CONFIG_NO_HZ_FULL is turned off by default. This feature is based on a lot of infrastructure work that has been steadily going upstream in the last 2-3 cycles: related RCU support and non-periodic cputime support in particular is upstream already. This tree adds the final pieces and activates the feature. The pull request is marked RFC because: - it's marked 64-bit only at the moment - the 32-bit support patch is small but did not get ready in time. - it has a number of fresh commits that came in after the merge window. The overwhelming majority of commits are from before the merge window, but still some aspects of the tree are fresh and so I marked it RFC. - it's a pretty wide-reaching feature with lots of effects - and while the components have been in testing for some time, the full combination is still not very widely used. That it's default-off should reduce its regression abilities and obviously there are no known regressions with CONFIG_NO_HZ_FULL=y enabled either. - the feature is not completely idempotent: there is no 100% equivalent replacement for a periodic scheduler/timer tick. In particular there's ongoing work to map out and reduce its effects on scheduler load-balancing and statistics. This should not impact correctness though, there are no known regressions related to this feature at this point. - it's a pretty ambitious feature that with time will likely be enabled by most Linux distros, and we'd like you to make input on its design/implementation, if you dislike some aspect we missed. Without flaming us to crisp! :-) Future plans: - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off the periodic tick altogether when there's a single busy task on a CPU. We'd first like 1 Hz to be exposed more widely before we go for the 0 Hz target though. - once we reach 0 Hz we can remove the periodic tick assumption from nr_running>=2 as well, by essentially interrupting busy tasks only as frequently as the sched_latency constraints require us to do - once every 4-40 msecs, depending on nr_running. I am personally leaning towards biting the bullet and doing this in v3.10, like the -rt tree this effort has been going on for too long - but the final word is up to you as usual. More technical details can be found in Documentation/timers/NO_HZ.txt" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) sched: Keep at least 1 tick per second for active dynticks tasks rcu: Fix full dynticks' dependency on wide RCU nocb mode nohz: Protect smp_processor_id() in tick_nohz_task_switch() nohz_full: Add documentation. cputime_nsecs: use math64.h for nsec resolution conversion helpers nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config nohz: Reduce overhead under high-freq idling patterns nohz: Remove full dynticks' superfluous dependency on RCU tree nohz: Fix unavailable tick_stop tracepoint in dynticks idle nohz: Add basic tracing nohz: Select wide RCU nocb for full dynticks nohz: Disable the tick when irq resume in full dynticks CPU nohz: Re-evaluate the tick for the new task after a context switch nohz: Prepare to stop the tick on irq exit nohz: Implement full dynticks kick nohz: Re-evaluate the tick from the scheduler IPI sched: New helper to prevent from stopping the tick in full dynticks sched: Kick full dynticks CPU that have more than one task enqueued. perf: New helper to prevent full dynticks CPUs from stopping tick perf: Kick full dynticks CPU if events rotation is needed ...
Diffstat (limited to 'kernel/time')
-rw-r--r--kernel/time/Kconfig80
-rw-r--r--kernel/time/tick-broadcast.c3
-rw-r--r--kernel/time/tick-common.c5
-rw-r--r--kernel/time/tick-sched.c296
4 files changed, 360 insertions, 24 deletions
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 24510d84efd7..e4c07b0692bb 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -64,20 +64,88 @@ config GENERIC_CMOS_UPDATE
64if GENERIC_CLOCKEVENTS 64if GENERIC_CLOCKEVENTS
65menu "Timers subsystem" 65menu "Timers subsystem"
66 66
67# Core internal switch. Selected by NO_HZ / HIGH_RES_TIMERS. This is 67# Core internal switch. Selected by NO_HZ_COMMON / HIGH_RES_TIMERS. This is
68# only related to the tick functionality. Oneshot clockevent devices 68# only related to the tick functionality. Oneshot clockevent devices
69# are supported independ of this. 69# are supported independ of this.
70config TICK_ONESHOT 70config TICK_ONESHOT
71 bool 71 bool
72 72
73config NO_HZ 73config NO_HZ_COMMON
74 bool "Tickless System (Dynamic Ticks)" 74 bool
75 depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS 75 depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
76 select TICK_ONESHOT 76 select TICK_ONESHOT
77
78choice
79 prompt "Timer tick handling"
80 default NO_HZ_IDLE if NO_HZ
81
82config HZ_PERIODIC
83 bool "Periodic timer ticks (constant rate, no dynticks)"
84 help
85 This option keeps the tick running periodically at a constant
86 rate, even when the CPU doesn't need it.
87
88config NO_HZ_IDLE
89 bool "Idle dynticks system (tickless idle)"
90 depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
91 select NO_HZ_COMMON
92 help
93 This option enables a tickless idle system: timer interrupts
94 will only trigger on an as-needed basis when the system is idle.
95 This is usually interesting for energy saving.
96
97 Most of the time you want to say Y here.
98
99config NO_HZ_FULL
100 bool "Full dynticks system (tickless)"
101 # NO_HZ_COMMON dependency
102 depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
103 # We need at least one periodic CPU for timekeeping
104 depends on SMP
105 # RCU_USER_QS dependency
106 depends on HAVE_CONTEXT_TRACKING
107 # VIRT_CPU_ACCOUNTING_GEN dependency
108 depends on 64BIT
109 select NO_HZ_COMMON
110 select RCU_USER_QS
111 select RCU_NOCB_CPU
112 select VIRT_CPU_ACCOUNTING_GEN
113 select CONTEXT_TRACKING_FORCE
114 select IRQ_WORK
115 help
116 Adaptively try to shutdown the tick whenever possible, even when
117 the CPU is running tasks. Typically this requires running a single
118 task on the CPU. Chances for running tickless are maximized when
119 the task mostly runs in userspace and has few kernel activity.
120
121 You need to fill up the nohz_full boot parameter with the
122 desired range of dynticks CPUs.
123
124 This is implemented at the expense of some overhead in user <-> kernel
125 transitions: syscalls, exceptions and interrupts. Even when it's
126 dynamically off.
127
128 Say N.
129
130endchoice
131
132config NO_HZ_FULL_ALL
133 bool "Full dynticks system on all CPUs by default"
134 depends on NO_HZ_FULL
135 help
136 If the user doesn't pass the nohz_full boot option to
137 define the range of full dynticks CPUs, consider that all
138 CPUs in the system are full dynticks by default.
139 Note the boot CPU will still be kept outside the range to
140 handle the timekeeping duty.
141
142config NO_HZ
143 bool "Old Idle dynticks config"
144 depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
77 help 145 help
78 This option enables a tickless system: timer interrupts will 146 This is the old config entry that enables dynticks idle.
79 only trigger on an as-needed basis both when the system is 147 We keep it around for a little while to enforce backward
80 busy and when the system is idle. 148 compatibility with older config files.
81 149
82config HIGH_RES_TIMERS 150config HIGH_RES_TIMERS
83 bool "High Resolution Timer Support" 151 bool "High Resolution Timer Support"
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 61d00a8cdf2f..206bbfb34e09 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -693,7 +693,8 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
693 bc->event_handler = tick_handle_oneshot_broadcast; 693 bc->event_handler = tick_handle_oneshot_broadcast;
694 694
695 /* Take the do_timer update */ 695 /* Take the do_timer update */
696 tick_do_timer_cpu = cpu; 696 if (!tick_nohz_full_cpu(cpu))
697 tick_do_timer_cpu = cpu;
697 698
698 /* 699 /*
699 * We must be careful here. There might be other CPUs 700 * We must be careful here. There might be other CPUs
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 6176a3e45709..5d3fb100bc06 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -163,7 +163,10 @@ static void tick_setup_device(struct tick_device *td,
163 * this cpu: 163 * this cpu:
164 */ 164 */
165 if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) { 165 if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
166 tick_do_timer_cpu = cpu; 166 if (!tick_nohz_full_cpu(cpu))
167 tick_do_timer_cpu = cpu;
168 else
169 tick_do_timer_cpu = TICK_DO_TIMER_NONE;
167 tick_next_period = ktime_get(); 170 tick_next_period = ktime_get();
168 tick_period = ktime_set(0, NSEC_PER_SEC / HZ); 171 tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
169 } 172 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 225f8bf19095..bc67d4245e1d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -21,11 +21,15 @@
21#include <linux/sched.h> 21#include <linux/sched.h>
22#include <linux/module.h> 22#include <linux/module.h>
23#include <linux/irq_work.h> 23#include <linux/irq_work.h>
24#include <linux/posix-timers.h>
25#include <linux/perf_event.h>
24 26
25#include <asm/irq_regs.h> 27#include <asm/irq_regs.h>
26 28
27#include "tick-internal.h" 29#include "tick-internal.h"
28 30
31#include <trace/events/timer.h>
32
29/* 33/*
30 * Per cpu nohz control structure 34 * Per cpu nohz control structure
31 */ 35 */
@@ -104,7 +108,7 @@ static void tick_sched_do_timer(ktime_t now)
104{ 108{
105 int cpu = smp_processor_id(); 109 int cpu = smp_processor_id();
106 110
107#ifdef CONFIG_NO_HZ 111#ifdef CONFIG_NO_HZ_COMMON
108 /* 112 /*
109 * Check if the do_timer duty was dropped. We don't care about 113 * Check if the do_timer duty was dropped. We don't care about
110 * concurrency: This happens only when the cpu in charge went 114 * concurrency: This happens only when the cpu in charge went
@@ -112,7 +116,8 @@ static void tick_sched_do_timer(ktime_t now)
112 * this duty, then the jiffies update is still serialized by 116 * this duty, then the jiffies update is still serialized by
113 * jiffies_lock. 117 * jiffies_lock.
114 */ 118 */
115 if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) 119 if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
120 && !tick_nohz_full_cpu(cpu))
116 tick_do_timer_cpu = cpu; 121 tick_do_timer_cpu = cpu;
117#endif 122#endif
118 123
@@ -123,7 +128,7 @@ static void tick_sched_do_timer(ktime_t now)
123 128
124static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) 129static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
125{ 130{
126#ifdef CONFIG_NO_HZ 131#ifdef CONFIG_NO_HZ_COMMON
127 /* 132 /*
128 * When we are idle and the tick is stopped, we have to touch 133 * When we are idle and the tick is stopped, we have to touch
129 * the watchdog as we might not schedule for a really long 134 * the watchdog as we might not schedule for a really long
@@ -142,10 +147,226 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
142 profile_tick(CPU_PROFILING); 147 profile_tick(CPU_PROFILING);
143} 148}
144 149
150#ifdef CONFIG_NO_HZ_FULL
151static cpumask_var_t nohz_full_mask;
152bool have_nohz_full_mask;
153
154static bool can_stop_full_tick(void)
155{
156 WARN_ON_ONCE(!irqs_disabled());
157
158 if (!sched_can_stop_tick()) {
159 trace_tick_stop(0, "more than 1 task in runqueue\n");
160 return false;
161 }
162
163 if (!posix_cpu_timers_can_stop_tick(current)) {
164 trace_tick_stop(0, "posix timers running\n");
165 return false;
166 }
167
168 if (!perf_event_can_stop_tick()) {
169 trace_tick_stop(0, "perf events running\n");
170 return false;
171 }
172
173 /* sched_clock_tick() needs us? */
174#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
175 /*
176 * TODO: kick full dynticks CPUs when
177 * sched_clock_stable is set.
178 */
179 if (!sched_clock_stable) {
180 trace_tick_stop(0, "unstable sched clock\n");
181 return false;
182 }
183#endif
184
185 return true;
186}
187
188static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
189
190/*
191 * Re-evaluate the need for the tick on the current CPU
192 * and restart it if necessary.
193 */
194void tick_nohz_full_check(void)
195{
196 struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
197
198 if (tick_nohz_full_cpu(smp_processor_id())) {
199 if (ts->tick_stopped && !is_idle_task(current)) {
200 if (!can_stop_full_tick())
201 tick_nohz_restart_sched_tick(ts, ktime_get());
202 }
203 }
204}
205
206static void nohz_full_kick_work_func(struct irq_work *work)
207{
208 tick_nohz_full_check();
209}
210
211static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
212 .func = nohz_full_kick_work_func,
213};
214
215/*
216 * Kick the current CPU if it's full dynticks in order to force it to
217 * re-evaluate its dependency on the tick and restart it if necessary.
218 */
219void tick_nohz_full_kick(void)
220{
221 if (tick_nohz_full_cpu(smp_processor_id()))
222 irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
223}
224
225static void nohz_full_kick_ipi(void *info)
226{
227 tick_nohz_full_check();
228}
229
230/*
231 * Kick all full dynticks CPUs in order to force these to re-evaluate
232 * their dependency on the tick and restart it if necessary.
233 */
234void tick_nohz_full_kick_all(void)
235{
236 if (!have_nohz_full_mask)
237 return;
238
239 preempt_disable();
240 smp_call_function_many(nohz_full_mask,
241 nohz_full_kick_ipi, NULL, false);
242 preempt_enable();
243}
244
245/*
246 * Re-evaluate the need for the tick as we switch the current task.
247 * It might need the tick due to per task/process properties:
248 * perf events, posix cpu timers, ...
249 */
250void tick_nohz_task_switch(struct task_struct *tsk)
251{
252 unsigned long flags;
253
254 local_irq_save(flags);
255
256 if (!tick_nohz_full_cpu(smp_processor_id()))
257 goto out;
258
259 if (tick_nohz_tick_stopped() && !can_stop_full_tick())
260 tick_nohz_full_kick();
261
262out:
263 local_irq_restore(flags);
264}
265
266int tick_nohz_full_cpu(int cpu)
267{
268 if (!have_nohz_full_mask)
269 return 0;
270
271 return cpumask_test_cpu(cpu, nohz_full_mask);
272}
273
274/* Parse the boot-time nohz CPU list from the kernel parameters. */
275static int __init tick_nohz_full_setup(char *str)
276{
277 int cpu;
278
279 alloc_bootmem_cpumask_var(&nohz_full_mask);
280 if (cpulist_parse(str, nohz_full_mask) < 0) {
281 pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
282 return 1;
283 }
284
285 cpu = smp_processor_id();
286 if (cpumask_test_cpu(cpu, nohz_full_mask)) {
287 pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
288 cpumask_clear_cpu(cpu, nohz_full_mask);
289 }
290 have_nohz_full_mask = true;
291
292 return 1;
293}
294__setup("nohz_full=", tick_nohz_full_setup);
295
296static int __cpuinit tick_nohz_cpu_down_callback(struct notifier_block *nfb,
297 unsigned long action,
298 void *hcpu)
299{
300 unsigned int cpu = (unsigned long)hcpu;
301
302 switch (action & ~CPU_TASKS_FROZEN) {
303 case CPU_DOWN_PREPARE:
304 /*
305 * If we handle the timekeeping duty for full dynticks CPUs,
306 * we can't safely shutdown that CPU.
307 */
308 if (have_nohz_full_mask && tick_do_timer_cpu == cpu)
309 return -EINVAL;
310 break;
311 }
312 return NOTIFY_OK;
313}
314
315/*
316 * Worst case string length in chunks of CPU range seems 2 steps
317 * separations: 0,2,4,6,...
318 * This is NR_CPUS + sizeof('\0')
319 */
320static char __initdata nohz_full_buf[NR_CPUS + 1];
321
322static int tick_nohz_init_all(void)
323{
324 int err = -1;
325
326#ifdef CONFIG_NO_HZ_FULL_ALL
327 if (!alloc_cpumask_var(&nohz_full_mask, GFP_KERNEL)) {
328 pr_err("NO_HZ: Can't allocate full dynticks cpumask\n");
329 return err;
330 }
331 err = 0;
332 cpumask_setall(nohz_full_mask);
333 cpumask_clear_cpu(smp_processor_id(), nohz_full_mask);
334 have_nohz_full_mask = true;
335#endif
336 return err;
337}
338
339void __init tick_nohz_init(void)
340{
341 int cpu;
342
343 if (!have_nohz_full_mask) {
344 if (tick_nohz_init_all() < 0)
345 return;
346 }
347
348 cpu_notifier(tick_nohz_cpu_down_callback, 0);
349
350 /* Make sure full dynticks CPU are also RCU nocbs */
351 for_each_cpu(cpu, nohz_full_mask) {
352 if (!rcu_is_nocb_cpu(cpu)) {
353 pr_warning("NO_HZ: CPU %d is not RCU nocb: "
354 "cleared from nohz_full range", cpu);
355 cpumask_clear_cpu(cpu, nohz_full_mask);
356 }
357 }
358
359 cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
360 pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
361}
362#else
363#define have_nohz_full_mask (0)
364#endif
365
145/* 366/*
146 * NOHZ - aka dynamic tick functionality 367 * NOHZ - aka dynamic tick functionality
147 */ 368 */
148#ifdef CONFIG_NO_HZ 369#ifdef CONFIG_NO_HZ_COMMON
149/* 370/*
150 * NO HZ enabled ? 371 * NO HZ enabled ?
151 */ 372 */
@@ -345,11 +566,12 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
345 delta_jiffies = rcu_delta_jiffies; 566 delta_jiffies = rcu_delta_jiffies;
346 } 567 }
347 } 568 }
569
348 /* 570 /*
349 * Do not stop the tick, if we are only one off 571 * Do not stop the tick, if we are only one off (or less)
350 * or if the cpu is required for rcu 572 * or if the cpu is required for RCU:
351 */ 573 */
352 if (!ts->tick_stopped && delta_jiffies == 1) 574 if (!ts->tick_stopped && delta_jiffies <= 1)
353 goto out; 575 goto out;
354 576
355 /* Schedule the tick, if we are at least one jiffie off */ 577 /* Schedule the tick, if we are at least one jiffie off */
@@ -378,6 +600,13 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
378 time_delta = KTIME_MAX; 600 time_delta = KTIME_MAX;
379 } 601 }
380 602
603#ifdef CONFIG_NO_HZ_FULL
604 if (!ts->inidle) {
605 time_delta = min(time_delta,
606 scheduler_tick_max_deferment());
607 }
608#endif
609
381 /* 610 /*
382 * calculate the expiry time for the next timer wheel 611 * calculate the expiry time for the next timer wheel
383 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals 612 * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals
@@ -421,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
421 650
422 ts->last_tick = hrtimer_get_expires(&ts->sched_timer); 651 ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
423 ts->tick_stopped = 1; 652 ts->tick_stopped = 1;
653 trace_tick_stop(1, " ");
424 } 654 }
425 655
426 /* 656 /*
@@ -457,6 +687,24 @@ out:
457 return ret; 687 return ret;
458} 688}
459 689
690static void tick_nohz_full_stop_tick(struct tick_sched *ts)
691{
692#ifdef CONFIG_NO_HZ_FULL
693 int cpu = smp_processor_id();
694
695 if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
696 return;
697
698 if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
699 return;
700
701 if (!can_stop_full_tick())
702 return;
703
704 tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
705#endif
706}
707
460static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) 708static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
461{ 709{
462 /* 710 /*
@@ -489,6 +737,21 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
489 return false; 737 return false;
490 } 738 }
491 739
740 if (have_nohz_full_mask) {
741 /*
742 * Keep the tick alive to guarantee timekeeping progression
743 * if there are full dynticks CPUs around
744 */
745 if (tick_do_timer_cpu == cpu)
746 return false;
747 /*
748 * Boot safety: make sure the timekeeping duty has been
749 * assigned before entering dyntick-idle mode,
750 */
751 if (tick_do_timer_cpu == TICK_DO_TIMER_NONE)
752 return false;
753 }
754
492 return true; 755 return true;
493} 756}
494 757
@@ -568,12 +831,13 @@ void tick_nohz_irq_exit(void)
568{ 831{
569 struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); 832 struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
570 833
571 if (!ts->inidle) 834 if (ts->inidle) {
572 return; 835 /* Cancel the timer because CPU already waken up from the C-states*/
573 836 menu_hrtimer_cancel();
574 /* Cancel the timer because CPU already waken up from the C-states*/ 837 __tick_nohz_idle_enter(ts);
575 menu_hrtimer_cancel(); 838 } else {
576 __tick_nohz_idle_enter(ts); 839 tick_nohz_full_stop_tick(ts);
840 }
577} 841}
578 842
579/** 843/**
@@ -802,7 +1066,7 @@ static inline void tick_check_nohz(int cpu)
802static inline void tick_nohz_switch_to_nohz(void) { } 1066static inline void tick_nohz_switch_to_nohz(void) { }
803static inline void tick_check_nohz(int cpu) { } 1067static inline void tick_check_nohz(int cpu) { }
804 1068
805#endif /* NO_HZ */ 1069#endif /* CONFIG_NO_HZ_COMMON */
806 1070
807/* 1071/*
808 * Called from irq_enter to notify about the possible interruption of idle() 1072 * Called from irq_enter to notify about the possible interruption of idle()
@@ -887,14 +1151,14 @@ void tick_setup_sched_timer(void)
887 now = ktime_get(); 1151 now = ktime_get();
888 } 1152 }
889 1153
890#ifdef CONFIG_NO_HZ 1154#ifdef CONFIG_NO_HZ_COMMON
891 if (tick_nohz_enabled) 1155 if (tick_nohz_enabled)
892 ts->nohz_mode = NOHZ_MODE_HIGHRES; 1156 ts->nohz_mode = NOHZ_MODE_HIGHRES;
893#endif 1157#endif
894} 1158}
895#endif /* HIGH_RES_TIMERS */ 1159#endif /* HIGH_RES_TIMERS */
896 1160
897#if defined CONFIG_NO_HZ || defined CONFIG_HIGH_RES_TIMERS 1161#if defined CONFIG_NO_HZ_COMMON || defined CONFIG_HIGH_RES_TIMERS
898void tick_cancel_sched_timer(int cpu) 1162void tick_cancel_sched_timer(int cpu)
899{ 1163{
900 struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); 1164 struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);