diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2013-05-05 16:23:27 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-05-05 16:23:27 -0400 |
commit | 534c97b0950b1967bca1c753aeaed32f5db40264 (patch) | |
tree | 9421d26e4f6d479d1bc32b036a731b065daab0fa /kernel | |
parent | 64049d1973c1735f543eb7a55653e291e108b0cb (diff) | |
parent | 265f22a975c1e4cc3a4d1f94a3ec53ffbb6f5b9f (diff) |
Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.
This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.
This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:
- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.
- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.
- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.
The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.
Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:
- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.
- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.
- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.
The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.
This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.
This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:
- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.
- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.
- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.
- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.
- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)
Future plans:
- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.
- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.
I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.
More technical details can be found in Documentation/timers/NO_HZ.txt"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/events/core.c | 17 | ||||
-rw-r--r-- | kernel/hrtimer.c | 4 | ||||
-rw-r--r-- | kernel/posix-cpu-timers.c | 76 | ||||
-rw-r--r-- | kernel/rcutree.c | 16 | ||||
-rw-r--r-- | kernel/rcutree.h | 2 | ||||
-rw-r--r-- | kernel/rcutree_plugin.h | 33 | ||||
-rw-r--r-- | kernel/sched/core.c | 92 | ||||
-rw-r--r-- | kernel/sched/fair.c | 10 | ||||
-rw-r--r-- | kernel/sched/idle_task.c | 1 | ||||
-rw-r--r-- | kernel/sched/sched.h | 25 | ||||
-rw-r--r-- | kernel/softirq.c | 19 | ||||
-rw-r--r-- | kernel/time/Kconfig | 80 | ||||
-rw-r--r-- | kernel/time/tick-broadcast.c | 3 | ||||
-rw-r--r-- | kernel/time/tick-common.c | 5 | ||||
-rw-r--r-- | kernel/time/tick-sched.c | 296 | ||||
-rw-r--r-- | kernel/timer.c | 16 |
16 files changed, 607 insertions, 88 deletions
diff --git a/kernel/events/core.c b/kernel/events/core.c index 3820e3cefbae..6b41c1899a8b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c | |||
@@ -18,6 +18,7 @@ | |||
18 | #include <linux/poll.h> | 18 | #include <linux/poll.h> |
19 | #include <linux/slab.h> | 19 | #include <linux/slab.h> |
20 | #include <linux/hash.h> | 20 | #include <linux/hash.h> |
21 | #include <linux/tick.h> | ||
21 | #include <linux/sysfs.h> | 22 | #include <linux/sysfs.h> |
22 | #include <linux/dcache.h> | 23 | #include <linux/dcache.h> |
23 | #include <linux/percpu.h> | 24 | #include <linux/percpu.h> |
@@ -685,8 +686,12 @@ static void perf_pmu_rotate_start(struct pmu *pmu) | |||
685 | 686 | ||
686 | WARN_ON(!irqs_disabled()); | 687 | WARN_ON(!irqs_disabled()); |
687 | 688 | ||
688 | if (list_empty(&cpuctx->rotation_list)) | 689 | if (list_empty(&cpuctx->rotation_list)) { |
690 | int was_empty = list_empty(head); | ||
689 | list_add(&cpuctx->rotation_list, head); | 691 | list_add(&cpuctx->rotation_list, head); |
692 | if (was_empty) | ||
693 | tick_nohz_full_kick(); | ||
694 | } | ||
690 | } | 695 | } |
691 | 696 | ||
692 | static void get_ctx(struct perf_event_context *ctx) | 697 | static void get_ctx(struct perf_event_context *ctx) |
@@ -2591,6 +2596,16 @@ done: | |||
2591 | list_del_init(&cpuctx->rotation_list); | 2596 | list_del_init(&cpuctx->rotation_list); |
2592 | } | 2597 | } |
2593 | 2598 | ||
2599 | #ifdef CONFIG_NO_HZ_FULL | ||
2600 | bool perf_event_can_stop_tick(void) | ||
2601 | { | ||
2602 | if (list_empty(&__get_cpu_var(rotation_list))) | ||
2603 | return true; | ||
2604 | else | ||
2605 | return false; | ||
2606 | } | ||
2607 | #endif | ||
2608 | |||
2594 | void perf_event_task_tick(void) | 2609 | void perf_event_task_tick(void) |
2595 | { | 2610 | { |
2596 | struct list_head *head = &__get_cpu_var(rotation_list); | 2611 | struct list_head *head = &__get_cpu_var(rotation_list); |
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 609d8ff38b74..fd4b13b131f8 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c | |||
@@ -172,7 +172,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, | |||
172 | */ | 172 | */ |
173 | static int hrtimer_get_target(int this_cpu, int pinned) | 173 | static int hrtimer_get_target(int this_cpu, int pinned) |
174 | { | 174 | { |
175 | #ifdef CONFIG_NO_HZ | 175 | #ifdef CONFIG_NO_HZ_COMMON |
176 | if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu)) | 176 | if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu)) |
177 | return get_nohz_timer_target(); | 177 | return get_nohz_timer_target(); |
178 | #endif | 178 | #endif |
@@ -1125,7 +1125,7 @@ ktime_t hrtimer_get_remaining(const struct hrtimer *timer) | |||
1125 | } | 1125 | } |
1126 | EXPORT_SYMBOL_GPL(hrtimer_get_remaining); | 1126 | EXPORT_SYMBOL_GPL(hrtimer_get_remaining); |
1127 | 1127 | ||
1128 | #ifdef CONFIG_NO_HZ | 1128 | #ifdef CONFIG_NO_HZ_COMMON |
1129 | /** | 1129 | /** |
1130 | * hrtimer_get_next_event - get the time until next expiry event | 1130 | * hrtimer_get_next_event - get the time until next expiry event |
1131 | * | 1131 | * |
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c index 8fd709c9bb58..42670e9b44e0 100644 --- a/kernel/posix-cpu-timers.c +++ b/kernel/posix-cpu-timers.c | |||
@@ -10,6 +10,8 @@ | |||
10 | #include <linux/kernel_stat.h> | 10 | #include <linux/kernel_stat.h> |
11 | #include <trace/events/timer.h> | 11 | #include <trace/events/timer.h> |
12 | #include <linux/random.h> | 12 | #include <linux/random.h> |
13 | #include <linux/tick.h> | ||
14 | #include <linux/workqueue.h> | ||
13 | 15 | ||
14 | /* | 16 | /* |
15 | * Called after updating RLIMIT_CPU to run cpu timer and update | 17 | * Called after updating RLIMIT_CPU to run cpu timer and update |
@@ -153,6 +155,21 @@ static void bump_cpu_timer(struct k_itimer *timer, | |||
153 | } | 155 | } |
154 | } | 156 | } |
155 | 157 | ||
158 | /** | ||
159 | * task_cputime_zero - Check a task_cputime struct for all zero fields. | ||
160 | * | ||
161 | * @cputime: The struct to compare. | ||
162 | * | ||
163 | * Checks @cputime to see if all fields are zero. Returns true if all fields | ||
164 | * are zero, false if any field is nonzero. | ||
165 | */ | ||
166 | static inline int task_cputime_zero(const struct task_cputime *cputime) | ||
167 | { | ||
168 | if (!cputime->utime && !cputime->stime && !cputime->sum_exec_runtime) | ||
169 | return 1; | ||
170 | return 0; | ||
171 | } | ||
172 | |||
156 | static inline cputime_t prof_ticks(struct task_struct *p) | 173 | static inline cputime_t prof_ticks(struct task_struct *p) |
157 | { | 174 | { |
158 | cputime_t utime, stime; | 175 | cputime_t utime, stime; |
@@ -636,6 +653,37 @@ static int cpu_timer_sample_group(const clockid_t which_clock, | |||
636 | return 0; | 653 | return 0; |
637 | } | 654 | } |
638 | 655 | ||
656 | #ifdef CONFIG_NO_HZ_FULL | ||
657 | static void nohz_kick_work_fn(struct work_struct *work) | ||
658 | { | ||
659 | tick_nohz_full_kick_all(); | ||
660 | } | ||
661 | |||
662 | static DECLARE_WORK(nohz_kick_work, nohz_kick_work_fn); | ||
663 | |||
664 | /* | ||
665 | * We need the IPIs to be sent from sane process context. | ||
666 | * The posix cpu timers are always set with irqs disabled. | ||
667 | */ | ||
668 | static void posix_cpu_timer_kick_nohz(void) | ||
669 | { | ||
670 | schedule_work(&nohz_kick_work); | ||
671 | } | ||
672 | |||
673 | bool posix_cpu_timers_can_stop_tick(struct task_struct *tsk) | ||
674 | { | ||
675 | if (!task_cputime_zero(&tsk->cputime_expires)) | ||
676 | return false; | ||
677 | |||
678 | if (tsk->signal->cputimer.running) | ||
679 | return false; | ||
680 | |||
681 | return true; | ||
682 | } | ||
683 | #else | ||
684 | static inline void posix_cpu_timer_kick_nohz(void) { } | ||
685 | #endif | ||
686 | |||
639 | /* | 687 | /* |
640 | * Guts of sys_timer_settime for CPU timers. | 688 | * Guts of sys_timer_settime for CPU timers. |
641 | * This is called with the timer locked and interrupts disabled. | 689 | * This is called with the timer locked and interrupts disabled. |
@@ -794,6 +842,8 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int flags, | |||
794 | sample_to_timespec(timer->it_clock, | 842 | sample_to_timespec(timer->it_clock, |
795 | old_incr, &old->it_interval); | 843 | old_incr, &old->it_interval); |
796 | } | 844 | } |
845 | if (!ret) | ||
846 | posix_cpu_timer_kick_nohz(); | ||
797 | return ret; | 847 | return ret; |
798 | } | 848 | } |
799 | 849 | ||
@@ -1008,21 +1058,6 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it, | |||
1008 | } | 1058 | } |
1009 | } | 1059 | } |
1010 | 1060 | ||
1011 | /** | ||
1012 | * task_cputime_zero - Check a task_cputime struct for all zero fields. | ||
1013 | * | ||
1014 | * @cputime: The struct to compare. | ||
1015 | * | ||
1016 | * Checks @cputime to see if all fields are zero. Returns true if all fields | ||
1017 | * are zero, false if any field is nonzero. | ||
1018 | */ | ||
1019 | static inline int task_cputime_zero(const struct task_cputime *cputime) | ||
1020 | { | ||
1021 | if (!cputime->utime && !cputime->stime && !cputime->sum_exec_runtime) | ||
1022 | return 1; | ||
1023 | return 0; | ||
1024 | } | ||
1025 | |||
1026 | /* | 1061 | /* |
1027 | * Check for any per-thread CPU timers that have fired and move them | 1062 | * Check for any per-thread CPU timers that have fired and move them |
1028 | * off the tsk->*_timers list onto the firing list. Per-thread timers | 1063 | * off the tsk->*_timers list onto the firing list. Per-thread timers |
@@ -1336,6 +1371,13 @@ void run_posix_cpu_timers(struct task_struct *tsk) | |||
1336 | cpu_timer_fire(timer); | 1371 | cpu_timer_fire(timer); |
1337 | spin_unlock(&timer->it_lock); | 1372 | spin_unlock(&timer->it_lock); |
1338 | } | 1373 | } |
1374 | |||
1375 | /* | ||
1376 | * In case some timers were rescheduled after the queue got emptied, | ||
1377 | * wake up full dynticks CPUs. | ||
1378 | */ | ||
1379 | if (tsk->signal->cputimer.running) | ||
1380 | posix_cpu_timer_kick_nohz(); | ||
1339 | } | 1381 | } |
1340 | 1382 | ||
1341 | /* | 1383 | /* |
@@ -1366,7 +1408,7 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clock_idx, | |||
1366 | } | 1408 | } |
1367 | 1409 | ||
1368 | if (!*newval) | 1410 | if (!*newval) |
1369 | return; | 1411 | goto out; |
1370 | *newval += now.cpu; | 1412 | *newval += now.cpu; |
1371 | } | 1413 | } |
1372 | 1414 | ||
@@ -1384,6 +1426,8 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clock_idx, | |||
1384 | tsk->signal->cputime_expires.virt_exp = *newval; | 1426 | tsk->signal->cputime_expires.virt_exp = *newval; |
1385 | break; | 1427 | break; |
1386 | } | 1428 | } |
1429 | out: | ||
1430 | posix_cpu_timer_kick_nohz(); | ||
1387 | } | 1431 | } |
1388 | 1432 | ||
1389 | static int do_cpu_nanosleep(const clockid_t which_clock, int flags, | 1433 | static int do_cpu_nanosleep(const clockid_t which_clock, int flags, |
diff --git a/kernel/rcutree.c b/kernel/rcutree.c index d8534308fd05..16ea67925015 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c | |||
@@ -799,6 +799,16 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp) | |||
799 | rdp->offline_fqs++; | 799 | rdp->offline_fqs++; |
800 | return 1; | 800 | return 1; |
801 | } | 801 | } |
802 | |||
803 | /* | ||
804 | * There is a possibility that a CPU in adaptive-ticks state | ||
805 | * might run in the kernel with the scheduling-clock tick disabled | ||
806 | * for an extended time period. Invoke rcu_kick_nohz_cpu() to | ||
807 | * force the CPU to restart the scheduling-clock tick in this | ||
808 | * CPU is in this state. | ||
809 | */ | ||
810 | rcu_kick_nohz_cpu(rdp->cpu); | ||
811 | |||
802 | return 0; | 812 | return 0; |
803 | } | 813 | } |
804 | 814 | ||
@@ -1820,7 +1830,7 @@ rcu_send_cbs_to_orphanage(int cpu, struct rcu_state *rsp, | |||
1820 | struct rcu_node *rnp, struct rcu_data *rdp) | 1830 | struct rcu_node *rnp, struct rcu_data *rdp) |
1821 | { | 1831 | { |
1822 | /* No-CBs CPUs do not have orphanable callbacks. */ | 1832 | /* No-CBs CPUs do not have orphanable callbacks. */ |
1823 | if (is_nocb_cpu(rdp->cpu)) | 1833 | if (rcu_is_nocb_cpu(rdp->cpu)) |
1824 | return; | 1834 | return; |
1825 | 1835 | ||
1826 | /* | 1836 | /* |
@@ -2892,10 +2902,10 @@ static void _rcu_barrier(struct rcu_state *rsp) | |||
2892 | * corresponding CPU's preceding callbacks have been invoked. | 2902 | * corresponding CPU's preceding callbacks have been invoked. |
2893 | */ | 2903 | */ |
2894 | for_each_possible_cpu(cpu) { | 2904 | for_each_possible_cpu(cpu) { |
2895 | if (!cpu_online(cpu) && !is_nocb_cpu(cpu)) | 2905 | if (!cpu_online(cpu) && !rcu_is_nocb_cpu(cpu)) |
2896 | continue; | 2906 | continue; |
2897 | rdp = per_cpu_ptr(rsp->rda, cpu); | 2907 | rdp = per_cpu_ptr(rsp->rda, cpu); |
2898 | if (is_nocb_cpu(cpu)) { | 2908 | if (rcu_is_nocb_cpu(cpu)) { |
2899 | _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, | 2909 | _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, |
2900 | rsp->n_barrier_done); | 2910 | rsp->n_barrier_done); |
2901 | atomic_inc(&rsp->barrier_cpu_count); | 2911 | atomic_inc(&rsp->barrier_cpu_count); |
diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 14ee40795d6f..da77a8f57ff9 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h | |||
@@ -530,13 +530,13 @@ static int rcu_nocb_needs_gp(struct rcu_state *rsp); | |||
530 | static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq); | 530 | static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq); |
531 | static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp); | 531 | static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp); |
532 | static void rcu_init_one_nocb(struct rcu_node *rnp); | 532 | static void rcu_init_one_nocb(struct rcu_node *rnp); |
533 | static bool is_nocb_cpu(int cpu); | ||
534 | static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, | 533 | static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, |
535 | bool lazy); | 534 | bool lazy); |
536 | static bool rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, | 535 | static bool rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, |
537 | struct rcu_data *rdp); | 536 | struct rcu_data *rdp); |
538 | static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); | 537 | static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); |
539 | static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp); | 538 | static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp); |
539 | static void rcu_kick_nohz_cpu(int cpu); | ||
540 | static bool init_nocb_callback_list(struct rcu_data *rdp); | 540 | static bool init_nocb_callback_list(struct rcu_data *rdp); |
541 | 541 | ||
542 | #endif /* #ifndef RCU_TREE_NONCORE */ | 542 | #endif /* #ifndef RCU_TREE_NONCORE */ |
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index d084ae3f281c..170814dc418f 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h | |||
@@ -28,6 +28,7 @@ | |||
28 | #include <linux/gfp.h> | 28 | #include <linux/gfp.h> |
29 | #include <linux/oom.h> | 29 | #include <linux/oom.h> |
30 | #include <linux/smpboot.h> | 30 | #include <linux/smpboot.h> |
31 | #include <linux/tick.h> | ||
31 | 32 | ||
32 | #define RCU_KTHREAD_PRIO 1 | 33 | #define RCU_KTHREAD_PRIO 1 |
33 | 34 | ||
@@ -1705,7 +1706,7 @@ static void rcu_prepare_for_idle(int cpu) | |||
1705 | return; | 1706 | return; |
1706 | 1707 | ||
1707 | /* If this is a no-CBs CPU, no callbacks, just return. */ | 1708 | /* If this is a no-CBs CPU, no callbacks, just return. */ |
1708 | if (is_nocb_cpu(cpu)) | 1709 | if (rcu_is_nocb_cpu(cpu)) |
1709 | return; | 1710 | return; |
1710 | 1711 | ||
1711 | /* | 1712 | /* |
@@ -1747,7 +1748,7 @@ static void rcu_cleanup_after_idle(int cpu) | |||
1747 | struct rcu_data *rdp; | 1748 | struct rcu_data *rdp; |
1748 | struct rcu_state *rsp; | 1749 | struct rcu_state *rsp; |
1749 | 1750 | ||
1750 | if (is_nocb_cpu(cpu)) | 1751 | if (rcu_is_nocb_cpu(cpu)) |
1751 | return; | 1752 | return; |
1752 | rcu_try_advance_all_cbs(); | 1753 | rcu_try_advance_all_cbs(); |
1753 | for_each_rcu_flavor(rsp) { | 1754 | for_each_rcu_flavor(rsp) { |
@@ -2052,7 +2053,7 @@ static void rcu_init_one_nocb(struct rcu_node *rnp) | |||
2052 | } | 2053 | } |
2053 | 2054 | ||
2054 | /* Is the specified CPU a no-CPUs CPU? */ | 2055 | /* Is the specified CPU a no-CPUs CPU? */ |
2055 | static bool is_nocb_cpu(int cpu) | 2056 | bool rcu_is_nocb_cpu(int cpu) |
2056 | { | 2057 | { |
2057 | if (have_rcu_nocb_mask) | 2058 | if (have_rcu_nocb_mask) |
2058 | return cpumask_test_cpu(cpu, rcu_nocb_mask); | 2059 | return cpumask_test_cpu(cpu, rcu_nocb_mask); |
@@ -2110,7 +2111,7 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, | |||
2110 | bool lazy) | 2111 | bool lazy) |
2111 | { | 2112 | { |
2112 | 2113 | ||
2113 | if (!is_nocb_cpu(rdp->cpu)) | 2114 | if (!rcu_is_nocb_cpu(rdp->cpu)) |
2114 | return 0; | 2115 | return 0; |
2115 | __call_rcu_nocb_enqueue(rdp, rhp, &rhp->next, 1, lazy); | 2116 | __call_rcu_nocb_enqueue(rdp, rhp, &rhp->next, 1, lazy); |
2116 | if (__is_kfree_rcu_offset((unsigned long)rhp->func)) | 2117 | if (__is_kfree_rcu_offset((unsigned long)rhp->func)) |
@@ -2134,7 +2135,7 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, | |||
2134 | long qll = rsp->qlen_lazy; | 2135 | long qll = rsp->qlen_lazy; |
2135 | 2136 | ||
2136 | /* If this is not a no-CBs CPU, tell the caller to do it the old way. */ | 2137 | /* If this is not a no-CBs CPU, tell the caller to do it the old way. */ |
2137 | if (!is_nocb_cpu(smp_processor_id())) | 2138 | if (!rcu_is_nocb_cpu(smp_processor_id())) |
2138 | return 0; | 2139 | return 0; |
2139 | rsp->qlen = 0; | 2140 | rsp->qlen = 0; |
2140 | rsp->qlen_lazy = 0; | 2141 | rsp->qlen_lazy = 0; |
@@ -2306,11 +2307,6 @@ static void rcu_init_one_nocb(struct rcu_node *rnp) | |||
2306 | { | 2307 | { |
2307 | } | 2308 | } |
2308 | 2309 | ||
2309 | static bool is_nocb_cpu(int cpu) | ||
2310 | { | ||
2311 | return false; | ||
2312 | } | ||
2313 | |||
2314 | static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, | 2310 | static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, |
2315 | bool lazy) | 2311 | bool lazy) |
2316 | { | 2312 | { |
@@ -2337,3 +2333,20 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) | |||
2337 | } | 2333 | } |
2338 | 2334 | ||
2339 | #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ | 2335 | #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ |
2336 | |||
2337 | /* | ||
2338 | * An adaptive-ticks CPU can potentially execute in kernel mode for an | ||
2339 | * arbitrarily long period of time with the scheduling-clock tick turned | ||
2340 | * off. RCU will be paying attention to this CPU because it is in the | ||
2341 | * kernel, but the CPU cannot be guaranteed to be executing the RCU state | ||
2342 | * machine because the scheduling-clock tick has been disabled. Therefore, | ||
2343 | * if an adaptive-ticks CPU is failing to respond to the current grace | ||
2344 | * period and has not be idle from an RCU perspective, kick it. | ||
2345 | */ | ||
2346 | static void rcu_kick_nohz_cpu(int cpu) | ||
2347 | { | ||
2348 | #ifdef CONFIG_NO_HZ_FULL | ||
2349 | if (tick_nohz_full_cpu(cpu)) | ||
2350 | smp_send_reschedule(cpu); | ||
2351 | #endif /* #ifdef CONFIG_NO_HZ_FULL */ | ||
2352 | } | ||
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5662f58f0b69..58453b8272fd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c | |||
@@ -544,7 +544,7 @@ void resched_cpu(int cpu) | |||
544 | raw_spin_unlock_irqrestore(&rq->lock, flags); | 544 | raw_spin_unlock_irqrestore(&rq->lock, flags); |
545 | } | 545 | } |
546 | 546 | ||
547 | #ifdef CONFIG_NO_HZ | 547 | #ifdef CONFIG_NO_HZ_COMMON |
548 | /* | 548 | /* |
549 | * In the semi idle case, use the nearest busy cpu for migrating timers | 549 | * In the semi idle case, use the nearest busy cpu for migrating timers |
550 | * from an idle cpu. This is good for power-savings. | 550 | * from an idle cpu. This is good for power-savings. |
@@ -582,7 +582,7 @@ unlock: | |||
582 | * account when the CPU goes back to idle and evaluates the timer | 582 | * account when the CPU goes back to idle and evaluates the timer |
583 | * wheel for the next timer event. | 583 | * wheel for the next timer event. |
584 | */ | 584 | */ |
585 | void wake_up_idle_cpu(int cpu) | 585 | static void wake_up_idle_cpu(int cpu) |
586 | { | 586 | { |
587 | struct rq *rq = cpu_rq(cpu); | 587 | struct rq *rq = cpu_rq(cpu); |
588 | 588 | ||
@@ -612,20 +612,56 @@ void wake_up_idle_cpu(int cpu) | |||
612 | smp_send_reschedule(cpu); | 612 | smp_send_reschedule(cpu); |
613 | } | 613 | } |
614 | 614 | ||
615 | static bool wake_up_full_nohz_cpu(int cpu) | ||
616 | { | ||
617 | if (tick_nohz_full_cpu(cpu)) { | ||
618 | if (cpu != smp_processor_id() || | ||
619 | tick_nohz_tick_stopped()) | ||
620 | smp_send_reschedule(cpu); | ||
621 | return true; | ||
622 | } | ||
623 | |||
624 | return false; | ||
625 | } | ||
626 | |||
627 | void wake_up_nohz_cpu(int cpu) | ||
628 | { | ||
629 | if (!wake_up_full_nohz_cpu(cpu)) | ||
630 | wake_up_idle_cpu(cpu); | ||
631 | } | ||
632 | |||
615 | static inline bool got_nohz_idle_kick(void) | 633 | static inline bool got_nohz_idle_kick(void) |
616 | { | 634 | { |
617 | int cpu = smp_processor_id(); | 635 | int cpu = smp_processor_id(); |
618 | return idle_cpu(cpu) && test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); | 636 | return idle_cpu(cpu) && test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); |
619 | } | 637 | } |
620 | 638 | ||
621 | #else /* CONFIG_NO_HZ */ | 639 | #else /* CONFIG_NO_HZ_COMMON */ |
622 | 640 | ||
623 | static inline bool got_nohz_idle_kick(void) | 641 | static inline bool got_nohz_idle_kick(void) |
624 | { | 642 | { |
625 | return false; | 643 | return false; |
626 | } | 644 | } |
627 | 645 | ||
628 | #endif /* CONFIG_NO_HZ */ | 646 | #endif /* CONFIG_NO_HZ_COMMON */ |
647 | |||
648 | #ifdef CONFIG_NO_HZ_FULL | ||
649 | bool sched_can_stop_tick(void) | ||
650 | { | ||
651 | struct rq *rq; | ||
652 | |||
653 | rq = this_rq(); | ||
654 | |||
655 | /* Make sure rq->nr_running update is visible after the IPI */ | ||
656 | smp_rmb(); | ||
657 | |||
658 | /* More than one running task need preemption */ | ||
659 | if (rq->nr_running > 1) | ||
660 | return false; | ||
661 | |||
662 | return true; | ||
663 | } | ||
664 | #endif /* CONFIG_NO_HZ_FULL */ | ||
629 | 665 | ||
630 | void sched_avg_update(struct rq *rq) | 666 | void sched_avg_update(struct rq *rq) |
631 | { | 667 | { |
@@ -1357,7 +1393,8 @@ static void sched_ttwu_pending(void) | |||
1357 | 1393 | ||
1358 | void scheduler_ipi(void) | 1394 | void scheduler_ipi(void) |
1359 | { | 1395 | { |
1360 | if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick()) | 1396 | if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick() |
1397 | && !tick_nohz_full_cpu(smp_processor_id())) | ||
1361 | return; | 1398 | return; |
1362 | 1399 | ||
1363 | /* | 1400 | /* |
@@ -1374,6 +1411,7 @@ void scheduler_ipi(void) | |||
1374 | * somewhat pessimize the simple resched case. | 1411 | * somewhat pessimize the simple resched case. |
1375 | */ | 1412 | */ |
1376 | irq_enter(); | 1413 | irq_enter(); |
1414 | tick_nohz_full_check(); | ||
1377 | sched_ttwu_pending(); | 1415 | sched_ttwu_pending(); |
1378 | 1416 | ||
1379 | /* | 1417 | /* |
@@ -1855,6 +1893,8 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev) | |||
1855 | kprobe_flush_task(prev); | 1893 | kprobe_flush_task(prev); |
1856 | put_task_struct(prev); | 1894 | put_task_struct(prev); |
1857 | } | 1895 | } |
1896 | |||
1897 | tick_nohz_task_switch(current); | ||
1858 | } | 1898 | } |
1859 | 1899 | ||
1860 | #ifdef CONFIG_SMP | 1900 | #ifdef CONFIG_SMP |
@@ -2118,7 +2158,7 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active) | |||
2118 | return load >> FSHIFT; | 2158 | return load >> FSHIFT; |
2119 | } | 2159 | } |
2120 | 2160 | ||
2121 | #ifdef CONFIG_NO_HZ | 2161 | #ifdef CONFIG_NO_HZ_COMMON |
2122 | /* | 2162 | /* |
2123 | * Handle NO_HZ for the global load-average. | 2163 | * Handle NO_HZ for the global load-average. |
2124 | * | 2164 | * |
@@ -2344,12 +2384,12 @@ static void calc_global_nohz(void) | |||
2344 | smp_wmb(); | 2384 | smp_wmb(); |
2345 | calc_load_idx++; | 2385 | calc_load_idx++; |
2346 | } | 2386 | } |
2347 | #else /* !CONFIG_NO_HZ */ | 2387 | #else /* !CONFIG_NO_HZ_COMMON */ |
2348 | 2388 | ||
2349 | static inline long calc_load_fold_idle(void) { return 0; } | 2389 | static inline long calc_load_fold_idle(void) { return 0; } |
2350 | static inline void calc_global_nohz(void) { } | 2390 | static inline void calc_global_nohz(void) { } |
2351 | 2391 | ||
2352 | #endif /* CONFIG_NO_HZ */ | 2392 | #endif /* CONFIG_NO_HZ_COMMON */ |
2353 | 2393 | ||
2354 | /* | 2394 | /* |
2355 | * calc_load - update the avenrun load estimates 10 ticks after the | 2395 | * calc_load - update the avenrun load estimates 10 ticks after the |
@@ -2509,7 +2549,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, | |||
2509 | sched_avg_update(this_rq); | 2549 | sched_avg_update(this_rq); |
2510 | } | 2550 | } |
2511 | 2551 | ||
2512 | #ifdef CONFIG_NO_HZ | 2552 | #ifdef CONFIG_NO_HZ_COMMON |
2513 | /* | 2553 | /* |
2514 | * There is no sane way to deal with nohz on smp when using jiffies because the | 2554 | * There is no sane way to deal with nohz on smp when using jiffies because the |
2515 | * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading | 2555 | * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading |
@@ -2569,7 +2609,7 @@ void update_cpu_load_nohz(void) | |||
2569 | } | 2609 | } |
2570 | raw_spin_unlock(&this_rq->lock); | 2610 | raw_spin_unlock(&this_rq->lock); |
2571 | } | 2611 | } |
2572 | #endif /* CONFIG_NO_HZ */ | 2612 | #endif /* CONFIG_NO_HZ_COMMON */ |
2573 | 2613 | ||
2574 | /* | 2614 | /* |
2575 | * Called from scheduler_tick() | 2615 | * Called from scheduler_tick() |
@@ -2696,7 +2736,34 @@ void scheduler_tick(void) | |||
2696 | rq->idle_balance = idle_cpu(cpu); | 2736 | rq->idle_balance = idle_cpu(cpu); |
2697 | trigger_load_balance(rq, cpu); | 2737 | trigger_load_balance(rq, cpu); |
2698 | #endif | 2738 | #endif |
2739 | rq_last_tick_reset(rq); | ||
2740 | } | ||
2741 | |||
2742 | #ifdef CONFIG_NO_HZ_FULL | ||
2743 | /** | ||
2744 | * scheduler_tick_max_deferment | ||
2745 | * | ||
2746 | * Keep at least one tick per second when a single | ||
2747 | * active task is running because the scheduler doesn't | ||
2748 | * yet completely support full dynticks environment. | ||
2749 | * | ||
2750 | * This makes sure that uptime, CFS vruntime, load | ||
2751 | * balancing, etc... continue to move forward, even | ||
2752 | * with a very low granularity. | ||
2753 | */ | ||
2754 | u64 scheduler_tick_max_deferment(void) | ||
2755 | { | ||
2756 | struct rq *rq = this_rq(); | ||
2757 | unsigned long next, now = ACCESS_ONCE(jiffies); | ||
2758 | |||
2759 | next = rq->last_sched_tick + HZ; | ||
2760 | |||
2761 | if (time_before_eq(next, now)) | ||
2762 | return 0; | ||
2763 | |||
2764 | return jiffies_to_usecs(next - now) * NSEC_PER_USEC; | ||
2699 | } | 2765 | } |
2766 | #endif | ||
2700 | 2767 | ||
2701 | notrace unsigned long get_parent_ip(unsigned long addr) | 2768 | notrace unsigned long get_parent_ip(unsigned long addr) |
2702 | { | 2769 | { |
@@ -6951,9 +7018,12 @@ void __init sched_init(void) | |||
6951 | INIT_LIST_HEAD(&rq->cfs_tasks); | 7018 | INIT_LIST_HEAD(&rq->cfs_tasks); |
6952 | 7019 | ||
6953 | rq_attach_root(rq, &def_root_domain); | 7020 | rq_attach_root(rq, &def_root_domain); |
6954 | #ifdef CONFIG_NO_HZ | 7021 | #ifdef CONFIG_NO_HZ_COMMON |
6955 | rq->nohz_flags = 0; | 7022 | rq->nohz_flags = 0; |
6956 | #endif | 7023 | #endif |
7024 | #ifdef CONFIG_NO_HZ_FULL | ||
7025 | rq->last_sched_tick = 0; | ||
7026 | #endif | ||
6957 | #endif | 7027 | #endif |
6958 | init_rq_hrtick(rq); | 7028 | init_rq_hrtick(rq); |
6959 | atomic_set(&rq->nr_iowait, 0); | 7029 | atomic_set(&rq->nr_iowait, 0); |
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8bf7081b1ec5..c61a614465c8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c | |||
@@ -5355,7 +5355,7 @@ out_unlock: | |||
5355 | return 0; | 5355 | return 0; |
5356 | } | 5356 | } |
5357 | 5357 | ||
5358 | #ifdef CONFIG_NO_HZ | 5358 | #ifdef CONFIG_NO_HZ_COMMON |
5359 | /* | 5359 | /* |
5360 | * idle load balancing details | 5360 | * idle load balancing details |
5361 | * - When one of the busy CPUs notice that there may be an idle rebalancing | 5361 | * - When one of the busy CPUs notice that there may be an idle rebalancing |
@@ -5572,9 +5572,9 @@ out: | |||
5572 | rq->next_balance = next_balance; | 5572 | rq->next_balance = next_balance; |
5573 | } | 5573 | } |
5574 | 5574 | ||
5575 | #ifdef CONFIG_NO_HZ | 5575 | #ifdef CONFIG_NO_HZ_COMMON |
5576 | /* | 5576 | /* |
5577 | * In CONFIG_NO_HZ case, the idle balance kickee will do the | 5577 | * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the |
5578 | * rebalancing for all the cpus for whom scheduler ticks are stopped. | 5578 | * rebalancing for all the cpus for whom scheduler ticks are stopped. |
5579 | */ | 5579 | */ |
5580 | static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) | 5580 | static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) |
@@ -5717,7 +5717,7 @@ void trigger_load_balance(struct rq *rq, int cpu) | |||
5717 | if (time_after_eq(jiffies, rq->next_balance) && | 5717 | if (time_after_eq(jiffies, rq->next_balance) && |
5718 | likely(!on_null_domain(cpu))) | 5718 | likely(!on_null_domain(cpu))) |
5719 | raise_softirq(SCHED_SOFTIRQ); | 5719 | raise_softirq(SCHED_SOFTIRQ); |
5720 | #ifdef CONFIG_NO_HZ | 5720 | #ifdef CONFIG_NO_HZ_COMMON |
5721 | if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) | 5721 | if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) |
5722 | nohz_balancer_kick(cpu); | 5722 | nohz_balancer_kick(cpu); |
5723 | #endif | 5723 | #endif |
@@ -6187,7 +6187,7 @@ __init void init_sched_fair_class(void) | |||
6187 | #ifdef CONFIG_SMP | 6187 | #ifdef CONFIG_SMP |
6188 | open_softirq(SCHED_SOFTIRQ, run_rebalance_domains); | 6188 | open_softirq(SCHED_SOFTIRQ, run_rebalance_domains); |
6189 | 6189 | ||
6190 | #ifdef CONFIG_NO_HZ | 6190 | #ifdef CONFIG_NO_HZ_COMMON |
6191 | nohz.next_balance = jiffies; | 6191 | nohz.next_balance = jiffies; |
6192 | zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT); | 6192 | zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT); |
6193 | cpu_notifier(sched_ilb_notifier, 0); | 6193 | cpu_notifier(sched_ilb_notifier, 0); |
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b8ce77328341..d8da01008d39 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c | |||
@@ -17,6 +17,7 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) | |||
17 | static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) | 17 | static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) |
18 | { | 18 | { |
19 | idle_exit_fair(rq); | 19 | idle_exit_fair(rq); |
20 | rq_last_tick_reset(rq); | ||
20 | } | 21 | } |
21 | 22 | ||
22 | static void post_schedule_idle(struct rq *rq) | 23 | static void post_schedule_idle(struct rq *rq) |
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 4c225c4c7111..ce39224d6155 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h | |||
@@ -5,6 +5,7 @@ | |||
5 | #include <linux/mutex.h> | 5 | #include <linux/mutex.h> |
6 | #include <linux/spinlock.h> | 6 | #include <linux/spinlock.h> |
7 | #include <linux/stop_machine.h> | 7 | #include <linux/stop_machine.h> |
8 | #include <linux/tick.h> | ||
8 | 9 | ||
9 | #include "cpupri.h" | 10 | #include "cpupri.h" |
10 | #include "cpuacct.h" | 11 | #include "cpuacct.h" |
@@ -405,10 +406,13 @@ struct rq { | |||
405 | #define CPU_LOAD_IDX_MAX 5 | 406 | #define CPU_LOAD_IDX_MAX 5 |
406 | unsigned long cpu_load[CPU_LOAD_IDX_MAX]; | 407 | unsigned long cpu_load[CPU_LOAD_IDX_MAX]; |
407 | unsigned long last_load_update_tick; | 408 | unsigned long last_load_update_tick; |
408 | #ifdef CONFIG_NO_HZ | 409 | #ifdef CONFIG_NO_HZ_COMMON |
409 | u64 nohz_stamp; | 410 | u64 nohz_stamp; |
410 | unsigned long nohz_flags; | 411 | unsigned long nohz_flags; |
411 | #endif | 412 | #endif |
413 | #ifdef CONFIG_NO_HZ_FULL | ||
414 | unsigned long last_sched_tick; | ||
415 | #endif | ||
412 | int skip_clock_update; | 416 | int skip_clock_update; |
413 | 417 | ||
414 | /* capture load from *all* tasks on this cpu: */ | 418 | /* capture load from *all* tasks on this cpu: */ |
@@ -1072,6 +1076,16 @@ static inline u64 steal_ticks(u64 steal) | |||
1072 | static inline void inc_nr_running(struct rq *rq) | 1076 | static inline void inc_nr_running(struct rq *rq) |
1073 | { | 1077 | { |
1074 | rq->nr_running++; | 1078 | rq->nr_running++; |
1079 | |||
1080 | #ifdef CONFIG_NO_HZ_FULL | ||
1081 | if (rq->nr_running == 2) { | ||
1082 | if (tick_nohz_full_cpu(rq->cpu)) { | ||
1083 | /* Order rq->nr_running write against the IPI */ | ||
1084 | smp_wmb(); | ||
1085 | smp_send_reschedule(rq->cpu); | ||
1086 | } | ||
1087 | } | ||
1088 | #endif | ||
1075 | } | 1089 | } |
1076 | 1090 | ||
1077 | static inline void dec_nr_running(struct rq *rq) | 1091 | static inline void dec_nr_running(struct rq *rq) |
@@ -1079,6 +1093,13 @@ static inline void dec_nr_running(struct rq *rq) | |||
1079 | rq->nr_running--; | 1093 | rq->nr_running--; |
1080 | } | 1094 | } |
1081 | 1095 | ||
1096 | static inline void rq_last_tick_reset(struct rq *rq) | ||
1097 | { | ||
1098 | #ifdef CONFIG_NO_HZ_FULL | ||
1099 | rq->last_sched_tick = jiffies; | ||
1100 | #endif | ||
1101 | } | ||
1102 | |||
1082 | extern void update_rq_clock(struct rq *rq); | 1103 | extern void update_rq_clock(struct rq *rq); |
1083 | 1104 | ||
1084 | extern void activate_task(struct rq *rq, struct task_struct *p, int flags); | 1105 | extern void activate_task(struct rq *rq, struct task_struct *p, int flags); |
@@ -1299,7 +1320,7 @@ extern void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq); | |||
1299 | 1320 | ||
1300 | extern void account_cfs_bandwidth_used(int enabled, int was_enabled); | 1321 | extern void account_cfs_bandwidth_used(int enabled, int was_enabled); |
1301 | 1322 | ||
1302 | #ifdef CONFIG_NO_HZ | 1323 | #ifdef CONFIG_NO_HZ_COMMON |
1303 | enum rq_nohz_flag_bits { | 1324 | enum rq_nohz_flag_bits { |
1304 | NOHZ_TICK_STOPPED, | 1325 | NOHZ_TICK_STOPPED, |
1305 | NOHZ_BALANCE_KICK, | 1326 | NOHZ_BALANCE_KICK, |
diff --git a/kernel/softirq.c b/kernel/softirq.c index aa82723c7202..b5197dcb0dad 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c | |||
@@ -329,6 +329,19 @@ static inline void invoke_softirq(void) | |||
329 | wakeup_softirqd(); | 329 | wakeup_softirqd(); |
330 | } | 330 | } |
331 | 331 | ||
332 | static inline void tick_irq_exit(void) | ||
333 | { | ||
334 | #ifdef CONFIG_NO_HZ_COMMON | ||
335 | int cpu = smp_processor_id(); | ||
336 | |||
337 | /* Make sure that timer wheel updates are propagated */ | ||
338 | if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) { | ||
339 | if (!in_interrupt()) | ||
340 | tick_nohz_irq_exit(); | ||
341 | } | ||
342 | #endif | ||
343 | } | ||
344 | |||
332 | /* | 345 | /* |
333 | * Exit an interrupt context. Process softirqs if needed and possible: | 346 | * Exit an interrupt context. Process softirqs if needed and possible: |
334 | */ | 347 | */ |
@@ -346,11 +359,7 @@ void irq_exit(void) | |||
346 | if (!in_interrupt() && local_softirq_pending()) | 359 | if (!in_interrupt() && local_softirq_pending()) |
347 | invoke_softirq(); | 360 | invoke_softirq(); |
348 | 361 | ||
349 | #ifdef CONFIG_NO_HZ | 362 | tick_irq_exit(); |
350 | /* Make sure that timer wheel updates are propagated */ | ||
351 | if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched()) | ||
352 | tick_nohz_irq_exit(); | ||
353 | #endif | ||
354 | rcu_irq_exit(); | 363 | rcu_irq_exit(); |
355 | } | 364 | } |
356 | 365 | ||
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 24510d84efd7..e4c07b0692bb 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig | |||
@@ -64,20 +64,88 @@ config GENERIC_CMOS_UPDATE | |||
64 | if GENERIC_CLOCKEVENTS | 64 | if GENERIC_CLOCKEVENTS |
65 | menu "Timers subsystem" | 65 | menu "Timers subsystem" |
66 | 66 | ||
67 | # Core internal switch. Selected by NO_HZ / HIGH_RES_TIMERS. This is | 67 | # Core internal switch. Selected by NO_HZ_COMMON / HIGH_RES_TIMERS. This is |
68 | # only related to the tick functionality. Oneshot clockevent devices | 68 | # only related to the tick functionality. Oneshot clockevent devices |
69 | # are supported independ of this. | 69 | # are supported independ of this. |
70 | config TICK_ONESHOT | 70 | config TICK_ONESHOT |
71 | bool | 71 | bool |
72 | 72 | ||
73 | config NO_HZ | 73 | config NO_HZ_COMMON |
74 | bool "Tickless System (Dynamic Ticks)" | 74 | bool |
75 | depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS | 75 | depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS |
76 | select TICK_ONESHOT | 76 | select TICK_ONESHOT |
77 | |||
78 | choice | ||
79 | prompt "Timer tick handling" | ||
80 | default NO_HZ_IDLE if NO_HZ | ||
81 | |||
82 | config HZ_PERIODIC | ||
83 | bool "Periodic timer ticks (constant rate, no dynticks)" | ||
84 | help | ||
85 | This option keeps the tick running periodically at a constant | ||
86 | rate, even when the CPU doesn't need it. | ||
87 | |||
88 | config NO_HZ_IDLE | ||
89 | bool "Idle dynticks system (tickless idle)" | ||
90 | depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS | ||
91 | select NO_HZ_COMMON | ||
92 | help | ||
93 | This option enables a tickless idle system: timer interrupts | ||
94 | will only trigger on an as-needed basis when the system is idle. | ||
95 | This is usually interesting for energy saving. | ||
96 | |||
97 | Most of the time you want to say Y here. | ||
98 | |||
99 | config NO_HZ_FULL | ||
100 | bool "Full dynticks system (tickless)" | ||
101 | # NO_HZ_COMMON dependency | ||
102 | depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS | ||
103 | # We need at least one periodic CPU for timekeeping | ||
104 | depends on SMP | ||
105 | # RCU_USER_QS dependency | ||
106 | depends on HAVE_CONTEXT_TRACKING | ||
107 | # VIRT_CPU_ACCOUNTING_GEN dependency | ||
108 | depends on 64BIT | ||
109 | select NO_HZ_COMMON | ||
110 | select RCU_USER_QS | ||
111 | select RCU_NOCB_CPU | ||
112 | select VIRT_CPU_ACCOUNTING_GEN | ||
113 | select CONTEXT_TRACKING_FORCE | ||
114 | select IRQ_WORK | ||
115 | help | ||
116 | Adaptively try to shutdown the tick whenever possible, even when | ||
117 | the CPU is running tasks. Typically this requires running a single | ||
118 | task on the CPU. Chances for running tickless are maximized when | ||
119 | the task mostly runs in userspace and has few kernel activity. | ||
120 | |||
121 | You need to fill up the nohz_full boot parameter with the | ||
122 | desired range of dynticks CPUs. | ||
123 | |||
124 | This is implemented at the expense of some overhead in user <-> kernel | ||
125 | transitions: syscalls, exceptions and interrupts. Even when it's | ||
126 | dynamically off. | ||
127 | |||
128 | Say N. | ||
129 | |||
130 | endchoice | ||
131 | |||
132 | config NO_HZ_FULL_ALL | ||
133 | bool "Full dynticks system on all CPUs by default" | ||
134 | depends on NO_HZ_FULL | ||
135 | help | ||
136 | If the user doesn't pass the nohz_full boot option to | ||
137 | define the range of full dynticks CPUs, consider that all | ||
138 | CPUs in the system are full dynticks by default. | ||
139 | Note the boot CPU will still be kept outside the range to | ||
140 | handle the timekeeping duty. | ||
141 | |||
142 | config NO_HZ | ||
143 | bool "Old Idle dynticks config" | ||
144 | depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS | ||
77 | help | 145 | help |
78 | This option enables a tickless system: timer interrupts will | 146 | This is the old config entry that enables dynticks idle. |
79 | only trigger on an as-needed basis both when the system is | 147 | We keep it around for a little while to enforce backward |
80 | busy and when the system is idle. | 148 | compatibility with older config files. |
81 | 149 | ||
82 | config HIGH_RES_TIMERS | 150 | config HIGH_RES_TIMERS |
83 | bool "High Resolution Timer Support" | 151 | bool "High Resolution Timer Support" |
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 61d00a8cdf2f..206bbfb34e09 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c | |||
@@ -693,7 +693,8 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) | |||
693 | bc->event_handler = tick_handle_oneshot_broadcast; | 693 | bc->event_handler = tick_handle_oneshot_broadcast; |
694 | 694 | ||
695 | /* Take the do_timer update */ | 695 | /* Take the do_timer update */ |
696 | tick_do_timer_cpu = cpu; | 696 | if (!tick_nohz_full_cpu(cpu)) |
697 | tick_do_timer_cpu = cpu; | ||
697 | 698 | ||
698 | /* | 699 | /* |
699 | * We must be careful here. There might be other CPUs | 700 | * We must be careful here. There might be other CPUs |
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 6176a3e45709..5d3fb100bc06 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c | |||
@@ -163,7 +163,10 @@ static void tick_setup_device(struct tick_device *td, | |||
163 | * this cpu: | 163 | * this cpu: |
164 | */ | 164 | */ |
165 | if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) { | 165 | if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) { |
166 | tick_do_timer_cpu = cpu; | 166 | if (!tick_nohz_full_cpu(cpu)) |
167 | tick_do_timer_cpu = cpu; | ||
168 | else | ||
169 | tick_do_timer_cpu = TICK_DO_TIMER_NONE; | ||
167 | tick_next_period = ktime_get(); | 170 | tick_next_period = ktime_get(); |
168 | tick_period = ktime_set(0, NSEC_PER_SEC / HZ); | 171 | tick_period = ktime_set(0, NSEC_PER_SEC / HZ); |
169 | } | 172 | } |
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 225f8bf19095..bc67d4245e1d 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c | |||
@@ -21,11 +21,15 @@ | |||
21 | #include <linux/sched.h> | 21 | #include <linux/sched.h> |
22 | #include <linux/module.h> | 22 | #include <linux/module.h> |
23 | #include <linux/irq_work.h> | 23 | #include <linux/irq_work.h> |
24 | #include <linux/posix-timers.h> | ||
25 | #include <linux/perf_event.h> | ||
24 | 26 | ||
25 | #include <asm/irq_regs.h> | 27 | #include <asm/irq_regs.h> |
26 | 28 | ||
27 | #include "tick-internal.h" | 29 | #include "tick-internal.h" |
28 | 30 | ||
31 | #include <trace/events/timer.h> | ||
32 | |||
29 | /* | 33 | /* |
30 | * Per cpu nohz control structure | 34 | * Per cpu nohz control structure |
31 | */ | 35 | */ |
@@ -104,7 +108,7 @@ static void tick_sched_do_timer(ktime_t now) | |||
104 | { | 108 | { |
105 | int cpu = smp_processor_id(); | 109 | int cpu = smp_processor_id(); |
106 | 110 | ||
107 | #ifdef CONFIG_NO_HZ | 111 | #ifdef CONFIG_NO_HZ_COMMON |
108 | /* | 112 | /* |
109 | * Check if the do_timer duty was dropped. We don't care about | 113 | * Check if the do_timer duty was dropped. We don't care about |
110 | * concurrency: This happens only when the cpu in charge went | 114 | * concurrency: This happens only when the cpu in charge went |
@@ -112,7 +116,8 @@ static void tick_sched_do_timer(ktime_t now) | |||
112 | * this duty, then the jiffies update is still serialized by | 116 | * this duty, then the jiffies update is still serialized by |
113 | * jiffies_lock. | 117 | * jiffies_lock. |
114 | */ | 118 | */ |
115 | if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) | 119 | if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) |
120 | && !tick_nohz_full_cpu(cpu)) | ||
116 | tick_do_timer_cpu = cpu; | 121 | tick_do_timer_cpu = cpu; |
117 | #endif | 122 | #endif |
118 | 123 | ||
@@ -123,7 +128,7 @@ static void tick_sched_do_timer(ktime_t now) | |||
123 | 128 | ||
124 | static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) | 129 | static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) |
125 | { | 130 | { |
126 | #ifdef CONFIG_NO_HZ | 131 | #ifdef CONFIG_NO_HZ_COMMON |
127 | /* | 132 | /* |
128 | * When we are idle and the tick is stopped, we have to touch | 133 | * When we are idle and the tick is stopped, we have to touch |
129 | * the watchdog as we might not schedule for a really long | 134 | * the watchdog as we might not schedule for a really long |
@@ -142,10 +147,226 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) | |||
142 | profile_tick(CPU_PROFILING); | 147 | profile_tick(CPU_PROFILING); |
143 | } | 148 | } |
144 | 149 | ||
150 | #ifdef CONFIG_NO_HZ_FULL | ||
151 | static cpumask_var_t nohz_full_mask; | ||
152 | bool have_nohz_full_mask; | ||
153 | |||
154 | static bool can_stop_full_tick(void) | ||
155 | { | ||
156 | WARN_ON_ONCE(!irqs_disabled()); | ||
157 | |||
158 | if (!sched_can_stop_tick()) { | ||
159 | trace_tick_stop(0, "more than 1 task in runqueue\n"); | ||
160 | return false; | ||
161 | } | ||
162 | |||
163 | if (!posix_cpu_timers_can_stop_tick(current)) { | ||
164 | trace_tick_stop(0, "posix timers running\n"); | ||
165 | return false; | ||
166 | } | ||
167 | |||
168 | if (!perf_event_can_stop_tick()) { | ||
169 | trace_tick_stop(0, "perf events running\n"); | ||
170 | return false; | ||
171 | } | ||
172 | |||
173 | /* sched_clock_tick() needs us? */ | ||
174 | #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK | ||
175 | /* | ||
176 | * TODO: kick full dynticks CPUs when | ||
177 | * sched_clock_stable is set. | ||
178 | */ | ||
179 | if (!sched_clock_stable) { | ||
180 | trace_tick_stop(0, "unstable sched clock\n"); | ||
181 | return false; | ||
182 | } | ||
183 | #endif | ||
184 | |||
185 | return true; | ||
186 | } | ||
187 | |||
188 | static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now); | ||
189 | |||
190 | /* | ||
191 | * Re-evaluate the need for the tick on the current CPU | ||
192 | * and restart it if necessary. | ||
193 | */ | ||
194 | void tick_nohz_full_check(void) | ||
195 | { | ||
196 | struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); | ||
197 | |||
198 | if (tick_nohz_full_cpu(smp_processor_id())) { | ||
199 | if (ts->tick_stopped && !is_idle_task(current)) { | ||
200 | if (!can_stop_full_tick()) | ||
201 | tick_nohz_restart_sched_tick(ts, ktime_get()); | ||
202 | } | ||
203 | } | ||
204 | } | ||
205 | |||
206 | static void nohz_full_kick_work_func(struct irq_work *work) | ||
207 | { | ||
208 | tick_nohz_full_check(); | ||
209 | } | ||
210 | |||
211 | static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = { | ||
212 | .func = nohz_full_kick_work_func, | ||
213 | }; | ||
214 | |||
215 | /* | ||
216 | * Kick the current CPU if it's full dynticks in order to force it to | ||
217 | * re-evaluate its dependency on the tick and restart it if necessary. | ||
218 | */ | ||
219 | void tick_nohz_full_kick(void) | ||
220 | { | ||
221 | if (tick_nohz_full_cpu(smp_processor_id())) | ||
222 | irq_work_queue(&__get_cpu_var(nohz_full_kick_work)); | ||
223 | } | ||
224 | |||
225 | static void nohz_full_kick_ipi(void *info) | ||
226 | { | ||
227 | tick_nohz_full_check(); | ||
228 | } | ||
229 | |||
230 | /* | ||
231 | * Kick all full dynticks CPUs in order to force these to re-evaluate | ||
232 | * their dependency on the tick and restart it if necessary. | ||
233 | */ | ||
234 | void tick_nohz_full_kick_all(void) | ||
235 | { | ||
236 | if (!have_nohz_full_mask) | ||
237 | return; | ||
238 | |||
239 | preempt_disable(); | ||
240 | smp_call_function_many(nohz_full_mask, | ||
241 | nohz_full_kick_ipi, NULL, false); | ||
242 | preempt_enable(); | ||
243 | } | ||
244 | |||
245 | /* | ||
246 | * Re-evaluate the need for the tick as we switch the current task. | ||
247 | * It might need the tick due to per task/process properties: | ||
248 | * perf events, posix cpu timers, ... | ||
249 | */ | ||
250 | void tick_nohz_task_switch(struct task_struct *tsk) | ||
251 | { | ||
252 | unsigned long flags; | ||
253 | |||
254 | local_irq_save(flags); | ||
255 | |||
256 | if (!tick_nohz_full_cpu(smp_processor_id())) | ||
257 | goto out; | ||
258 | |||
259 | if (tick_nohz_tick_stopped() && !can_stop_full_tick()) | ||
260 | tick_nohz_full_kick(); | ||
261 | |||
262 | out: | ||
263 | local_irq_restore(flags); | ||
264 | } | ||
265 | |||
266 | int tick_nohz_full_cpu(int cpu) | ||
267 | { | ||
268 | if (!have_nohz_full_mask) | ||
269 | return 0; | ||
270 | |||
271 | return cpumask_test_cpu(cpu, nohz_full_mask); | ||
272 | } | ||
273 | |||
274 | /* Parse the boot-time nohz CPU list from the kernel parameters. */ | ||
275 | static int __init tick_nohz_full_setup(char *str) | ||
276 | { | ||
277 | int cpu; | ||
278 | |||
279 | alloc_bootmem_cpumask_var(&nohz_full_mask); | ||
280 | if (cpulist_parse(str, nohz_full_mask) < 0) { | ||
281 | pr_warning("NOHZ: Incorrect nohz_full cpumask\n"); | ||
282 | return 1; | ||
283 | } | ||
284 | |||
285 | cpu = smp_processor_id(); | ||
286 | if (cpumask_test_cpu(cpu, nohz_full_mask)) { | ||
287 | pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu); | ||
288 | cpumask_clear_cpu(cpu, nohz_full_mask); | ||
289 | } | ||
290 | have_nohz_full_mask = true; | ||
291 | |||
292 | return 1; | ||
293 | } | ||
294 | __setup("nohz_full=", tick_nohz_full_setup); | ||
295 | |||
296 | static int __cpuinit tick_nohz_cpu_down_callback(struct notifier_block *nfb, | ||
297 | unsigned long action, | ||
298 | void *hcpu) | ||
299 | { | ||
300 | unsigned int cpu = (unsigned long)hcpu; | ||
301 | |||
302 | switch (action & ~CPU_TASKS_FROZEN) { | ||
303 | case CPU_DOWN_PREPARE: | ||
304 | /* | ||
305 | * If we handle the timekeeping duty for full dynticks CPUs, | ||
306 | * we can't safely shutdown that CPU. | ||
307 | */ | ||
308 | if (have_nohz_full_mask && tick_do_timer_cpu == cpu) | ||
309 | return -EINVAL; | ||
310 | break; | ||
311 | } | ||
312 | return NOTIFY_OK; | ||
313 | } | ||
314 | |||
315 | /* | ||
316 | * Worst case string length in chunks of CPU range seems 2 steps | ||
317 | * separations: 0,2,4,6,... | ||
318 | * This is NR_CPUS + sizeof('\0') | ||
319 | */ | ||
320 | static char __initdata nohz_full_buf[NR_CPUS + 1]; | ||
321 | |||
322 | static int tick_nohz_init_all(void) | ||
323 | { | ||
324 | int err = -1; | ||
325 | |||
326 | #ifdef CONFIG_NO_HZ_FULL_ALL | ||
327 | if (!alloc_cpumask_var(&nohz_full_mask, GFP_KERNEL)) { | ||
328 | pr_err("NO_HZ: Can't allocate full dynticks cpumask\n"); | ||
329 | return err; | ||
330 | } | ||
331 | err = 0; | ||
332 | cpumask_setall(nohz_full_mask); | ||
333 | cpumask_clear_cpu(smp_processor_id(), nohz_full_mask); | ||
334 | have_nohz_full_mask = true; | ||
335 | #endif | ||
336 | return err; | ||
337 | } | ||
338 | |||
339 | void __init tick_nohz_init(void) | ||
340 | { | ||
341 | int cpu; | ||
342 | |||
343 | if (!have_nohz_full_mask) { | ||
344 | if (tick_nohz_init_all() < 0) | ||
345 | return; | ||
346 | } | ||
347 | |||
348 | cpu_notifier(tick_nohz_cpu_down_callback, 0); | ||
349 | |||
350 | /* Make sure full dynticks CPU are also RCU nocbs */ | ||
351 | for_each_cpu(cpu, nohz_full_mask) { | ||
352 | if (!rcu_is_nocb_cpu(cpu)) { | ||
353 | pr_warning("NO_HZ: CPU %d is not RCU nocb: " | ||
354 | "cleared from nohz_full range", cpu); | ||
355 | cpumask_clear_cpu(cpu, nohz_full_mask); | ||
356 | } | ||
357 | } | ||
358 | |||
359 | cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask); | ||
360 | pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf); | ||
361 | } | ||
362 | #else | ||
363 | #define have_nohz_full_mask (0) | ||
364 | #endif | ||
365 | |||
145 | /* | 366 | /* |
146 | * NOHZ - aka dynamic tick functionality | 367 | * NOHZ - aka dynamic tick functionality |
147 | */ | 368 | */ |
148 | #ifdef CONFIG_NO_HZ | 369 | #ifdef CONFIG_NO_HZ_COMMON |
149 | /* | 370 | /* |
150 | * NO HZ enabled ? | 371 | * NO HZ enabled ? |
151 | */ | 372 | */ |
@@ -345,11 +566,12 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, | |||
345 | delta_jiffies = rcu_delta_jiffies; | 566 | delta_jiffies = rcu_delta_jiffies; |
346 | } | 567 | } |
347 | } | 568 | } |
569 | |||
348 | /* | 570 | /* |
349 | * Do not stop the tick, if we are only one off | 571 | * Do not stop the tick, if we are only one off (or less) |
350 | * or if the cpu is required for rcu | 572 | * or if the cpu is required for RCU: |
351 | */ | 573 | */ |
352 | if (!ts->tick_stopped && delta_jiffies == 1) | 574 | if (!ts->tick_stopped && delta_jiffies <= 1) |
353 | goto out; | 575 | goto out; |
354 | 576 | ||
355 | /* Schedule the tick, if we are at least one jiffie off */ | 577 | /* Schedule the tick, if we are at least one jiffie off */ |
@@ -378,6 +600,13 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, | |||
378 | time_delta = KTIME_MAX; | 600 | time_delta = KTIME_MAX; |
379 | } | 601 | } |
380 | 602 | ||
603 | #ifdef CONFIG_NO_HZ_FULL | ||
604 | if (!ts->inidle) { | ||
605 | time_delta = min(time_delta, | ||
606 | scheduler_tick_max_deferment()); | ||
607 | } | ||
608 | #endif | ||
609 | |||
381 | /* | 610 | /* |
382 | * calculate the expiry time for the next timer wheel | 611 | * calculate the expiry time for the next timer wheel |
383 | * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals | 612 | * timer. delta_jiffies >= NEXT_TIMER_MAX_DELTA signals |
@@ -421,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, | |||
421 | 650 | ||
422 | ts->last_tick = hrtimer_get_expires(&ts->sched_timer); | 651 | ts->last_tick = hrtimer_get_expires(&ts->sched_timer); |
423 | ts->tick_stopped = 1; | 652 | ts->tick_stopped = 1; |
653 | trace_tick_stop(1, " "); | ||
424 | } | 654 | } |
425 | 655 | ||
426 | /* | 656 | /* |
@@ -457,6 +687,24 @@ out: | |||
457 | return ret; | 687 | return ret; |
458 | } | 688 | } |
459 | 689 | ||
690 | static void tick_nohz_full_stop_tick(struct tick_sched *ts) | ||
691 | { | ||
692 | #ifdef CONFIG_NO_HZ_FULL | ||
693 | int cpu = smp_processor_id(); | ||
694 | |||
695 | if (!tick_nohz_full_cpu(cpu) || is_idle_task(current)) | ||
696 | return; | ||
697 | |||
698 | if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE) | ||
699 | return; | ||
700 | |||
701 | if (!can_stop_full_tick()) | ||
702 | return; | ||
703 | |||
704 | tick_nohz_stop_sched_tick(ts, ktime_get(), cpu); | ||
705 | #endif | ||
706 | } | ||
707 | |||
460 | static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) | 708 | static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) |
461 | { | 709 | { |
462 | /* | 710 | /* |
@@ -489,6 +737,21 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) | |||
489 | return false; | 737 | return false; |
490 | } | 738 | } |
491 | 739 | ||
740 | if (have_nohz_full_mask) { | ||
741 | /* | ||
742 | * Keep the tick alive to guarantee timekeeping progression | ||
743 | * if there are full dynticks CPUs around | ||
744 | */ | ||
745 | if (tick_do_timer_cpu == cpu) | ||
746 | return false; | ||
747 | /* | ||
748 | * Boot safety: make sure the timekeeping duty has been | ||
749 | * assigned before entering dyntick-idle mode, | ||
750 | */ | ||
751 | if (tick_do_timer_cpu == TICK_DO_TIMER_NONE) | ||
752 | return false; | ||
753 | } | ||
754 | |||
492 | return true; | 755 | return true; |
493 | } | 756 | } |
494 | 757 | ||
@@ -568,12 +831,13 @@ void tick_nohz_irq_exit(void) | |||
568 | { | 831 | { |
569 | struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); | 832 | struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); |
570 | 833 | ||
571 | if (!ts->inidle) | 834 | if (ts->inidle) { |
572 | return; | 835 | /* Cancel the timer because CPU already waken up from the C-states*/ |
573 | 836 | menu_hrtimer_cancel(); | |
574 | /* Cancel the timer because CPU already waken up from the C-states*/ | 837 | __tick_nohz_idle_enter(ts); |
575 | menu_hrtimer_cancel(); | 838 | } else { |
576 | __tick_nohz_idle_enter(ts); | 839 | tick_nohz_full_stop_tick(ts); |
840 | } | ||
577 | } | 841 | } |
578 | 842 | ||
579 | /** | 843 | /** |
@@ -802,7 +1066,7 @@ static inline void tick_check_nohz(int cpu) | |||
802 | static inline void tick_nohz_switch_to_nohz(void) { } | 1066 | static inline void tick_nohz_switch_to_nohz(void) { } |
803 | static inline void tick_check_nohz(int cpu) { } | 1067 | static inline void tick_check_nohz(int cpu) { } |
804 | 1068 | ||
805 | #endif /* NO_HZ */ | 1069 | #endif /* CONFIG_NO_HZ_COMMON */ |
806 | 1070 | ||
807 | /* | 1071 | /* |
808 | * Called from irq_enter to notify about the possible interruption of idle() | 1072 | * Called from irq_enter to notify about the possible interruption of idle() |
@@ -887,14 +1151,14 @@ void tick_setup_sched_timer(void) | |||
887 | now = ktime_get(); | 1151 | now = ktime_get(); |
888 | } | 1152 | } |
889 | 1153 | ||
890 | #ifdef CONFIG_NO_HZ | 1154 | #ifdef CONFIG_NO_HZ_COMMON |
891 | if (tick_nohz_enabled) | 1155 | if (tick_nohz_enabled) |
892 | ts->nohz_mode = NOHZ_MODE_HIGHRES; | 1156 | ts->nohz_mode = NOHZ_MODE_HIGHRES; |
893 | #endif | 1157 | #endif |
894 | } | 1158 | } |
895 | #endif /* HIGH_RES_TIMERS */ | 1159 | #endif /* HIGH_RES_TIMERS */ |
896 | 1160 | ||
897 | #if defined CONFIG_NO_HZ || defined CONFIG_HIGH_RES_TIMERS | 1161 | #if defined CONFIG_NO_HZ_COMMON || defined CONFIG_HIGH_RES_TIMERS |
898 | void tick_cancel_sched_timer(int cpu) | 1162 | void tick_cancel_sched_timer(int cpu) |
899 | { | 1163 | { |
900 | struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); | 1164 | struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); |
diff --git a/kernel/timer.c b/kernel/timer.c index 09bca8ce9771..a860bba34412 100644 --- a/kernel/timer.c +++ b/kernel/timer.c | |||
@@ -739,7 +739,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, | |||
739 | 739 | ||
740 | cpu = smp_processor_id(); | 740 | cpu = smp_processor_id(); |
741 | 741 | ||
742 | #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP) | 742 | #if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP) |
743 | if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) | 743 | if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) |
744 | cpu = get_nohz_timer_target(); | 744 | cpu = get_nohz_timer_target(); |
745 | #endif | 745 | #endif |
@@ -931,14 +931,14 @@ void add_timer_on(struct timer_list *timer, int cpu) | |||
931 | debug_activate(timer, timer->expires); | 931 | debug_activate(timer, timer->expires); |
932 | internal_add_timer(base, timer); | 932 | internal_add_timer(base, timer); |
933 | /* | 933 | /* |
934 | * Check whether the other CPU is idle and needs to be | 934 | * Check whether the other CPU is in dynticks mode and needs |
935 | * triggered to reevaluate the timer wheel when nohz is | 935 | * to be triggered to reevaluate the timer wheel. |
936 | * active. We are protected against the other CPU fiddling | 936 | * We are protected against the other CPU fiddling |
937 | * with the timer by holding the timer base lock. This also | 937 | * with the timer by holding the timer base lock. This also |
938 | * makes sure that a CPU on the way to idle can not evaluate | 938 | * makes sure that a CPU on the way to stop its tick can not |
939 | * the timer wheel. | 939 | * evaluate the timer wheel. |
940 | */ | 940 | */ |
941 | wake_up_idle_cpu(cpu); | 941 | wake_up_nohz_cpu(cpu); |
942 | spin_unlock_irqrestore(&base->lock, flags); | 942 | spin_unlock_irqrestore(&base->lock, flags); |
943 | } | 943 | } |
944 | EXPORT_SYMBOL_GPL(add_timer_on); | 944 | EXPORT_SYMBOL_GPL(add_timer_on); |
@@ -1189,7 +1189,7 @@ static inline void __run_timers(struct tvec_base *base) | |||
1189 | spin_unlock_irq(&base->lock); | 1189 | spin_unlock_irq(&base->lock); |
1190 | } | 1190 | } |
1191 | 1191 | ||
1192 | #ifdef CONFIG_NO_HZ | 1192 | #ifdef CONFIG_NO_HZ_COMMON |
1193 | /* | 1193 | /* |
1194 | * Find out when the next timer event is due to happen. This | 1194 | * Find out when the next timer event is due to happen. This |
1195 | * is used on S/390 to stop all activity when a CPU is idle. | 1195 | * is used on S/390 to stop all activity when a CPU is idle. |