aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
...
| | * | | | | | timekeeping: Add timekeeper read_clock helper functionsMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add timekeeper_read_clock_ntp and timekeeper_read_clock_raw and use them for getnstimeofday, ktime_get, ktime_get_ts and getrawmonotonic. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.435105711@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | timekeeping: Move NTP adjusted clock multiplier to struct timekeeperMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clocksource structure has two multipliers, the unmodified multiplier clock->mult_orig and the NTP corrected multiplier clock->mult. The NTP multiplier is misplaced in the struct clocksource, this is private information of the timekeeping code. Add the mult field to the struct timekeeper to contain the NTP corrected value, keep the unmodifed multiplier in clock->mult and remove clock->mult_orig. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.149047645@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | timekeeping: Add xtime_shift and ntp_error_shift to struct timekeeperMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xtime_nsec value in the timekeeper structure is shifted by a few bits to improve precision. This happens to be the same value as the clock->shift. To improve readability add xtime_shift to the timekeeper and use it instead of the clock->shift. Likewise add ntp_error_shift and replace all (NTP_SCALE_SHIFT - clock->shift) expressions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.871899606@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | timekeeping: Introduce struct timekeeperMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add struct timekeeper to keep the internal values timekeeping.c needs in regard to the currently selected clock source. This moves the timekeeping intervals, xtime_nsec and the ntp error value from struct clocksource to struct timekeeper. The raw_time is removed from the clocksource as well. It gets treated like xtime as a global variable. Eventually xtime raw_time should be moved to struct timekeeper. [ tglx: minor cleanup ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.613209842@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | clocksource: Move watchdog downgrade to a work queue threadMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the downgrade of an unstable clocksource from the timer interrupt context into the process context of a work queue thread. This is needed to be able to do the clocksource switch with stop_machine. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.354926067@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | clocksource: Refactor clocksource watchdogMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor clocksource watchdog code to make it more readable. Add clocksource_dequeue_watchdog to remove a clocksource from the watchdog list when it is unregistered. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.110881699@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | clocksource: Simplify clocksource watchdog resume logicMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To resume the clocksource watchdog just remove the CLOCK_SOURCE_WATCHDOG bit from the watched clocksource. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.880925790@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | clocksource: Delay clocksource watchdog highres enablementMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clocksource watchdog marks a clock as highres capable before it checked the deviation from the watchdog clocksource even for a single time. Make sure that the deviation is at least checked once before doing the switch to highres mode. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.627795883@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | clocksource: Cleanup clocksource selectionMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a non high-resolution clocksource is first set as override clock and then registered it becomes active even if the system is in one-shot mode. Move the override check from sysfs_override_clocksource to the clocksource selection. That fixes the bug and simplifies the code. The check in clocksource_register for double registration of the same clocksource is removed without replacement. To find the initial clocksource a new weak function in jiffies.c is defined that returns the jiffies clocksource. The architecture code can then override the weak function with a more suitable clocksource, e.g. the TOD clock on s390. [ tglx: Folded in a fix from John Stultz ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.388024160@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | timekeeping: Move reset of cycle_last for tsc clocksource to tscMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | change_clocksource resets the cycle_last value to zero then sets it to a value read from the clocksource. The reset to zero is required only for the TSC clocksource to make the read_tsc function work after a resume. The reason is that the TSC read function uses cycle_last to detect backwards going TSCs. In the resume case cycle_last contains the TSC value from the last update before the suspend. On resume the TSC starts counting from 0 again and would trip over the cycle_last comparison. This is subtle and surprising. Move the reset to a resume function in the tsc code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.142191175@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | timekeeping: Remove clocksource inline functionsMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The three inline functions clocksource_read, clocksource_enable and clocksource_disable are simple wrappers of an indirect call plus the copy from and to the mult_orig value. The functions are exclusively used by the timekeeping code which has intimate knowledge of the clocksource anyway. Therefore remove the inline functions. No functional change. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134807.903108946@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | timekeeping: Introduce timekeeping_leap_insertJohn Stultz2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the adjustment of xtime, wall_to_monotonic and the update of the vsyscall variables to the timekeeping code. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> LKML-Reference: <20090814134807.609730216@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | Merge branch 'linus' into timers/coreThomas Gleixner2009-08-14
| | |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reason: Martin's timekeeping cleanup series depends on both timers/core and mainline changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | | timers: Cache __next_timer_interrupt resultMartin Schwidefsky2009-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each time a cpu goes to sleep on a NOHZ=y system the timer wheel is searched for the next timer interrupt. It can take quite a few cycles to find the next pending timer. This patch adds a field to tvec_base that caches the result of __next_timer_interrupt. The hit ratio is around 80% on my thinkpad under normal use, on a server I've seen hit ratios from 5% to 95% dependent on the workload. -v2: jiffies wrap fixes Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090721202505.7d56a079@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | hrtimer: Remove cb_entry from struct hrtimerPeter Zijlstra2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's unused, remove it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission>
| | * | | | | | | timekeeping: Move ktime_get() functions to timekeeping.cThomas Gleixner2009-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ktime_get() functions for GENERIC_TIME=n are still located in hrtimer.c. Move them to time/timekeeping.c where they belong. LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | | | | | timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=yMartin Schwidefsky2009-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic ktime_get function defined in kernel/hrtimer.c is suboptimial for GENERIC_TIME=y: 0) | ktime_get() { 0) | ktime_get_ts() { 0) | getnstimeofday() { 0) | read_tod_clock() { 0) 0.601 us | } 0) 1.938 us | } 0) | set_normalized_timespec() { 0) 0.602 us | } 0) 4.375 us | } 0) 5.523 us | } Overall there are two read_seqbegin/read_seqretry loops and a lot of unnecessary struct timespec calculations. ktime_get returns a nano second value which is the sum of xtime, wall_to_monotonic and the nano second delta from the clock source. ktime_get can be optimized for GENERIC_TIME=y. The new version only calls clocksource_read: 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.610 us | } 0) 1.977 us | } It uses a single read_seqbegin/readseqretry loop and just adds everthing to a nano second value. ktime_get_ts is optimized in a similar fashion. [ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: john stultz <johnstul@us.ibm.com> LKML-Reference: <20090707112728.3005244d@skybase> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2009-09-18
| |\ \ \ \ \ \ \ \ | | |_|_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits) sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE sched: Stop buddies from hogging the system sched: Add new wakeup preemption mode: WAKEUP_RUNNING sched: Fix TASK_WAKING & loadaverage breakage sched: Disable wakeup balancing sched: Rename flags to wake_flags sched: Clean up the load_idx selection in select_task_rq_fair sched: Optimize cgroup vs wakeup a bit sched: x86: Name old_perf in a unique way sched: Implement a gentler fair-sleepers feature sched: Add SD_PREFER_LOCAL sched: Add a few SYNC hint knobs to play with sched: Fix sync wakeups again sched: Add WF_FORK sched: Rename sync arguments sched: Rename select_task_rq() argument sched: Feature to disable APERF/MPERF cpu_power x86: sched: Provide arch implementations using aperf/mperf x86: Add generic aperf/mperf code x86: Move APERF/MPERF into a X86_FEATURE ... Fix up trivial conflict in arch/x86/include/asm/processor.h due to nearby addition of amd_get_nb_id() declaration from the EDAC merge.
| | * | | | | | | sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINEPeter Zijlstra2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL code can break out of the domain iteration early, making us miss the SD_WAKE_AFFINE bits. Fix this by continuing iteration until there is no need for a larger domain. This also cleans up the cgroup stuff a bit, but not having two update_shares() invocations. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Stop buddies from hogging the systemPeter Zijlstra2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear buddies more agressively. The (theoretical, haven't actually observed any of this) problem is that when we do not select either buddy in pick_next_entity() because they are too far ahead of the left-most task, we do not clear the buddies. This means that as soon as we service the left-most task, these same buddies will be tried again on the next schedule. Now if the left-most task was a pure hog, it wouldn't have done any wakeups and it wouldn't have set buddies of its own. That leads to the old buddies dominating, which would lead to bad latencies. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Add new wakeup preemption mode: WAKEUP_RUNNINGPeter Zijlstra2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a new wakeup preemption mode, preempt towards tasks that run shorter on avg. It sets next buddy to be sure we actually run the task we preempted for. Test results: root@twins:~# while :; do :; done & [1] 6537 root@twins:~# while :; do :; done & [2] 6538 root@twins:~# while :; do :; done & [3] 6539 root@twins:~# while :; do :; done & [4] 6540 root@twins:/home/peter# ./latt -c4 sleep 4 Entries: 48 (clients=4) Averages: ------------------------------ Max 4750 usec Avg 497 usec Stdev 737 usec root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features root@twins:/home/peter# ./latt -c4 sleep 4 Entries: 48 (clients=4) Averages: ------------------------------ Max 14 usec Avg 5 usec Stdev 3 usec Disabled by default - needs more testing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <new-submission>
| | * | | | | | | sched: Fix TASK_WAKING & loadaverage breakageIngo Molnar2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix this: top - 21:54:00 up 2:59, 1 user, load average: 432512.33, 426421.74, 417432.74 Which happens because we now set TASK_WAKING before activate_task(). Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Rename flags to wake_flagsPeter Zijlstra2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistencies sake, rename the argument (again). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Clean up the load_idx selection in select_task_rq_fairPeter Zijlstra2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the code a little. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Optimize cgroup vs wakeup a bitPeter Zijlstra2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to call update_shares() for each domain we iterate, just got the largets one. However, we should call it before wake_affine() as well, so that that can use up-to-date values too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Implement a gentler fair-sleepers featureIngo Molnar2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add back FAIR_SLEEPERS and GENTLE_FAIR_SLEEPERS. FAIR_SLEEPERS is the old logic: credit sleepers with their sleep time. GENTLE_FAIR_SLEEPERS dampens this a bit: 50% of their sleep time gets credited. The hope here is to still give the benefits of fair-sleepers logic (quick wakeups, etc.) while not allow them to have 100% of their sleep time as if they were running. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Add SD_PREFER_LOCALPeter Zijlstra2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And turn it on for NUMA and MC domains. This improves locality in balancing decisions by keeping up to capacity amount of tasks local before looking for idle CPUs. (and twice the capacity if SD_POWERSAVINGS_BALANCE is set.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Add a few SYNC hint knobs to play withPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we use overlap to weaken the SYNC hint, but allow it to set the hint as well. echo NO_SYNC_WAKEUP > /debug/sched_features echo SYNC_MORE > /debug/sched_features preserves pipe-test behaviour without using the WF_SYNC hint. Worth playing with on more workloads... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Fix sync wakeups againPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sync argument rename to introduce WF_* broke stuff by missing a local alias for an argument in __wake_up_common, fix it by using the more descriptive wake_flags name. This restores WF_SYNC propagation, which fixes wake_affine() behaviour, which fixes pipe-test. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Add WF_FORKPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid the cache buddies from biasing the time distribution away from fork()ers. Normally the next buddy will be the preferred scheduling target, but this makes fork()s prefer to run the new child, whereas we prefer to run the parent, since that will generate more work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Rename sync argumentsPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to extend the functions to have more than 1 flag (sync), rename the argument to flags, and explicitly define a WF_ space for individual flags. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Rename select_task_rq() argumentPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to rename the sync argument, we need to rename the current flag argument. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Feature to disable APERF/MPERF cpu_powerPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I suspect a feed-back loop between cpuidle and the aperf/mperf cpu_power bits, where when we have idle C-states lower the ratio, which leads to lower cpu_power and then less load, which generates more idle time, etc.. Put in a knob to disable it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Provide arch_scale_freq_powerPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide an ach specific hook for cpufreq based scaling of cpu_power. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ego@in.ibm.com: spotting bugs] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Improve latencies and throughputMike Galbraith2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Tweak wake_idxPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Fix task affinity for select_task_rq_fairPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While merging select_task_rq_fair() and sched_balance_self() I made a mistake that leads to testing the wrong task affinty. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: for_each_domain() vs RCUPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for_each_domain() uses RCU to serialize the sched_domains, except it doesn't actually use rcu_read_lock() and instead relies on disabling preemption -> FAIL. XXX: audit other sched_domain code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Weaken SD_POWERSAVINGS_BALANCEPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the problems of power-saving balancing is that under certain scenarios it is too slow and allows tons of real work to pile up. Avoid this by ignoring the powersave stuff when there's real work to be done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Merge select_task_rq_fair() and sched_balance_self()Peter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Add TASK_WAKINGPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're going to want to drop rq->lock in try_to_wake_up() for a longer period of time, however we also want to deal with concurrent waking of the same task, which is currently handled by holding rq->lock. So introduce a new TASK state, namely TASK_WAKING, which indicates someone is already waking the task (other wakers will fail p->state & state). We also keep preemption disabled over the whole ttwu(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Hook sched_balance_self() into sched_class::select_task_rq()Peter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather ugly patch to fully place the sched_balance_self() code inside the fair class. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Move sched_balance_self() into sched_fair.cPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the sched_balance_self() code into sched_fair.c This facilitates the merger of sched_balance_self() and sched_fair::select_task_rq(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Move code aroundPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to other code movement, move weighted_cpuload(), source_load() and target_load() before the class includes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Add come comments to the sched featuresPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add text... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Complete buddy switchesMike Galbraith2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a NEXT_BUDDY feature flag to aid in debugging. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Split WAKEUP_OVERLAPPeter Zijlstra2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It consists of two conditions, split them out in separate toggles so we can test them independently. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | sched: Fix double_rq_lock() compile warningPeter Zijlstra2009-09-15
| | | |_|/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | softirq: add BLOCK_IOPOLL to softirq_to_nameLi Zefan2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With BLOCK_IOPOLL_SOFTIRQ added, softirq_to_name[] and show_softirq_name() needs to be updated. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4AB20398.8070209@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | tracing: switch function prints from %pf to %psSteven Rostedt2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For direct function pointers (like what mcount provides) PowerPC64 requires the use of %ps, otherwise nothing is printed. This patch converts all prints of functions retrieved through mcount to use the %ps format from the %pf. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>