aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* sched: rt-group: reduce reschedulingPeter Zijlstra2008-01-25
| | | | | | | Only reschedule if the new group has a higher prio task. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: unlock hrtimer_wakeupPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | | | hrtimer_wakeup creates a base->lock rq->lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base->lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallbackPeter Zijlstra2008-01-25
| | | | | | | | | | Currently all highres=off timers are run from softirq context, but HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context. Fix this up by splitting it similar to the highres=on case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: clean up cpu->base locking tricksPeter Zijlstra2008-01-25
| | | | | | | | In order to more easily allow for the scheduler to use timers, clean up the locking a bit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt throttling vs no_hzPeter Zijlstra2008-01-25
| | | | | | | We need to teach no_hz about the rt throttling because its tick driven. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: pull_rt_task() cleanupMike Galbraith2008-01-25
| | | | | | | "goto out" is an odd way to spell "skip". Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt group schedulingPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | | | | Extend group scheduling to also cover the realtime classes. It uses the time limiting introduced by the previous patch to allow multiple realtime groups. The hard time limit is required to keep behaviour deterministic. The algorithms used make the realtime scheduler O(tg), linear scaling wrt the number of task groups. This is the worst case behaviour I can't seem to get out of, the avg. case of the algorithms can be improved, I focused on correctness and worst case. [ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt time limitPeter Zijlstra2008-01-25
| | | | | | | | | | Very simple time limit on the realtime scheduling classes. Allow the rq's realtime class to consume sched_rt_ratio of every sched_rt_period slice. If the class exceeds this quota the fair class will preempt the realtime class. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: high-res preemption tickPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | | | Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: do not do cond_resched() when CONFIG_PREEMPTHerbert Xu2008-01-25
| | | | | | | | | Why do we even have cond_resched when real preemption is on? It seems to be a waste of space and time. remove cond_resched with CONFIG_PREEMPT on. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: documentation, whitespace fixesIngo Molnar2008-01-25
| | | | | | whitespace fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: SCHED_FIFO/SCHED_RR watchdog timerPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | | | Introduce a new rlimit that allows the user to set a runtime timeout on real-time tasks their slice. Once this limit is exceeded the task will receive SIGXCPU. So it measures runtime since the last sleep. Input and ideas by Thomas Gleixner and Lennart Poettering. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Lennart Poettering <mzxreary@0pointer.de> CC: Michael Kerrisk <mtk.manpages@googlemail.com> CC: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: sched_rt_entityPeter Zijlstra2008-01-25
| | | | | | | | | | Move the task_struct members specific to rt scheduling together. A future optimization could be to put sched_entity and sched_rt_entity into a union. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* uids: merge multiple error paths in alloc_uid() into onePavel Emelyanov2008-01-25
| | | | | | | | | | | | | | There are already 4 error paths in alloc_uid() that do incremental rollbacks. I think it's time to merge them. This costs us 8 lines of code :) Maybe it would be better to merge this patch with the previous one, but I remember that some time ago I sent a similar patch (fixing the error path and cleaning it), but I was told to make two patches in such cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: dynamically update the root-domain span/online mapsGregory Haskins2008-01-25
| | | | | | | | | | The baseline code statically builds the span maps when the domain is formed. Previous attempts at dynamically updating the maps caused a suspend-to-ram regression, which should now be fixed. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: update RCU Documentation.Paul E. McKenney2008-01-25
| | | | | | | | | | This patch updates the RCU documentation to reflect preemptible RCU as well as recent publications. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: CPU Hotplug handlingPaul E. McKenney2008-01-25
| | | | | | | | | | | This patch allows preemptible RCU to tolerate CPU-hotplug operations. It accomplishes this by maintaining a local copy of a map of online CPUs, which it accesses under its own lock. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: implementationPaul E. McKenney2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: fix rcu_barrier for preemptive environment.Paul E. McKenney2008-01-25
| | | | | | | | | | | | Fix rcu_barrier() to work properly in preemptive kernel environment. Also, the ordering of callback must be preserved while moving callbacks to another CPU during CPU hotplug. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: reorganize RCU code into rcuclassic.c and rcupdate.cPaul E. McKenney2008-01-25
| | | | | | | | | | | | | This patch re-organizes the RCU code to enable multiple implementations of RCU. Users of RCU continues to include rcupdate.h and the RCU interfaces remain the same. This is in preparation for subsequently merging the preemptible RCU implementation. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: Use softirq instead of tasklets forDipankar Sarma2008-01-25
| | | | | | | | | | | | | | | | This patch makes RCU use softirq instead of tasklets. It also adds a memory barrier after raising the softirq inorder to ensure that the cpu sees the most recently updated value of rcu->cur while processing callbacks. The discussion of the related theoretical race pointed out by James Huang can be found here --> http://lkml.org/lkml/2007/11/20/603 Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove some old cpuset logicGregory Haskins2008-01-25
| | | | | | | | | We had support for overlapping cpuset based rto logic in early prototypes that is no longer used, so remove it. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, only adjust overload state when changingGregory Haskins2008-01-25
| | | | | | | | | | | | The overload set/clears were originally idempotent when this logic was first implemented. But that is no longer true due to the addition of the atomic counter and this logic was never updated to work properly with that change. So only adjust the overload state if it is actually changing to avoid getting out of sync. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, add new methods to sched_classSteven Rostedt2008-01-25
| | | | | | | | | | | | | | | | | Dmitry Adamushko found that the current implementation of the RT balancing code left out changes to the sched_setscheduler and rt_mutex_setprio. This patch addresses this issue by adding methods to the schedule classes to handle being switched out of (switched_from) and being switched into (switched_to) a sched_class. Also a method for changing of priorities is also added (prio_changed). This patch also removes some duplicate logic between rt_mutex_setprio and sched_setscheduler. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, replace hooks with pre/post schedule and wakeup methodsSteven Rostedt2008-01-25
| | | | | | | | | | | To make the main sched.c code more agnostic to the schedule classes. Instead of having specific hooks in the schedule code for the RT class balancing. They are replaced with a pre_schedule, post_schedule and task_wake_up methods. These methods may be used by any of the classes but currently, only the sched_rt class implements them. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove do_div() from __sched_slice()Peter Zijlstra2008-01-25
| | | | | | | | | | | Yanmin Zhang noticed a nice optimization: p = l * nr / nl, nl = l/g -> p = g * nr which eliminates a do_div() from __sched_period(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: get rid of 'new_cpu' in try_to_wake_up()Dmitry Adamushko2008-01-25
| | | | | | | | | | Clean-up try_to_wake_up(). Get rid of the 'new_cpu' variable in try_to_wake_up() [ that's, one #ifdef section less ]. Also remove a few redundant blank lines. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: no need for 'affine wakeup' balancingDmitry Adamushko2008-01-25
| | | | | | | | | | | No need to do a check for 'affine wakeup and passive balancing possibilities' in select_task_rq_fair() when task_cpu(p) == this_cpu. I guess, this part got missed upon introduction of per-sched_class select_task_rq() in try_to_wake_up(). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: whitespace cleanups in topology.hIngo Molnar2008-01-25
| | | | | | whitespace cleanups in topology.h. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: reactivate fork balancingIngo Molnar2008-01-25
| | | | | | reactivate fork balancing. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: add credits for RT balancing improvementsIngo Molnar2008-01-25
| | | | | | add credits for RT balancing improvements. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: style cleanup, #2Ingo Molnar2008-01-25
| | | | | | | | | | | | style cleanup of various changes that were done recently. no code changed: text data bss dec hex filename 26399 2578 48 29025 7161 sched.o.before 26399 2578 48 29025 7161 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove unused JIFFIES_TO_NS() macroIngo Molnar2008-01-25
| | | | | | remove unused JIFFIES_TO_NS() macro. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix sched_rt.c:join/leave_domainIngo Molnar2008-01-25
| | | | | | | fix build bug in sched_rt.c:join/leave_domain and make them only be included on SMP builds. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: only balance our RT tasks within our domainGregory Haskins2008-01-25
| | | | | | | | | | | | | | | | | | | | | | We move the rt-overload data as the first global to per-domain reclassification. This limits the scope of overload related cache-line bouncing to stay with a specified partition instead of affecting all cpus in the system. Finally, we limit the scope of find_lowest_cpu searches to the domain instead of the entire system. Note that we would always respect domain boundaries even without this patch, but we first would scan potentially all cpus before whittling the list down. Now we can avoid looking at RQs that are out of scope, again reducing cache-line hits. Note: In some cases, task->cpus_allowed will effectively reduce our search to within our domain. However, I believe there are cases where the cpus_allowed mask may be all ones and therefore we err on the side of caution. If it can be optimized later, so be it. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: add sched-domain rootsGregory Haskins2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | We add the notion of a root-domain which will be used later to rescope global variables to per-domain variables. Each exclusive cpuset essentially defines an island domain by fully partitioning the member cpus from any other cpuset. However, we currently still maintain some policy/state as global variables which transcend all cpusets. Consider, for instance, rt-overload state. Whenever a new exclusive cpuset is created, we also create a new root-domain object and move each cpu member to the root-domain's span. By default the system creates a single root-domain with all cpus as members (mimicking the global state we have today). We add some plumbing for storing class specific data in our root-domain. Whenever a RQ is switching root-domains (because of repartitioning) we give each sched_class the opportunity to remove any state from its old domain and add state to the new one. This logic doesn't have any clients yet but it will later in the series. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Christoph Lameter <clameter@sgi.com> CC: Paul Jackson <pj@sgi.com> CC: Simon Derr <simon.derr@bull.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up schedule_balance_rt()Ingo Molnar2008-01-25
| | | | | | clean up schedule_balance_rt(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up pull_rt_task()Ingo Molnar2008-01-25
| | | | | | clean up pull_rt_task(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove leftover debuggingIngo Molnar2008-01-25
| | | | | | remove leftover debugging. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove rt_overload()Ingo Molnar2008-01-25
| | | | | | remove rt_overload() - it's an unnecessary indirection. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up kernel/sched_rt.cIngo Molnar2008-01-25
| | | | | | clean up whitespace damage and missing comments in kernel/sched_rt.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up overlong line in kernel/sched_debug.cIngo Molnar2008-01-25
| | | | | | clean up overlong line in kernel/sched_debug.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up find_lock_lowest_rq()Ingo Molnar2008-01-25
| | | | | | clean up find_lock_lowest_rq(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up pick_next_highest_task_rt()Ingo Molnar2008-01-25
| | | | | | clean up pick_next_highest_task_rt(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance on new taskSteven Rostedt2008-01-25
| | | | | | rt-balance when creating new tasks. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, optimize cpu searchSteven Rostedt2008-01-25
| | | | | | | | | | | This patch removes several cpumask operations by keeping track of the first of the CPUS that is of the lowest priority. When the search for the lowest priority runqueue is completed, all the bits up to the first CPU with the lowest priority runqueue is cleared. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, optimizeGregory Haskins2008-01-25
| | | | | | | | | We can cheaply track the number of bits set in the cpumask for the lowest priority CPUs. Therefore, compute the mask's weight and use it to skip the optimal domain search logic when there is only one CPU available. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: break out early if RT task cannot be migratedGregory Haskins2008-01-25
| | | | | | | | We don't need to bother searching if the task cannot be migrated Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, avoid overloadingSteven Rostedt2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the searching for a run queue by a waking RT task to try to pick another runqueue if the currently running task is an RT task. The reason is that RT tasks behave different than normal tasks. Preempting a normal task to run a RT task to keep its cache hot is fine, because the preempted non-RT task may wait on that same runqueue to run again unless the migration thread comes along and pulls it off. RT tasks behave differently. If one is preempted, it makes an active effort to continue to run. So by having a high priority task preempt a lower priority RT task, that lower RT task will then quickly try to run on another runqueue. This will cause that lower RT task to replace its nice hot cache (and TLB) with a completely cold one. This is for the hope that the new high priority RT task will keep its cache hot. Remeber that this high priority RT task was just woken up. So it may likely have been sleeping for several milliseconds, and will end up with a cold cache anyway. RT tasks run till they voluntarily stop, or are preempted by a higher priority task. This means that it is unlikely that the woken RT task will have a hot cache to wake up to. So pushing off a lower RT task is just killing its cache for no good reason. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: wake-balance fixesGregory Haskins2008-01-25
| | | | | | | | | | We have logic to detect whether the system has migratable tasks, but we are not using it when deciding whether to push tasks away. So we add support for considering this new information. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>