aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/sched.c
Commit message (Collapse)AuthorAge
* litmus tracing: expose scheduling overheadBjoern B. Brandenburg2009-04-22
| | | | | move the switch_from/switch_to trace macros so that the scheduling overhead shows up more clearly in the traces
* litmus core: trace task state on wake ups of RT tasks.Bjoern B. Brandenburg2009-04-19
| | | | Helps when debugging wake-up related crashes.
* move task present flag from PFAIR to Litmus coreBjoern B. Brandenburg2009-04-02
| | | | | Other quantum-based plugins also require the present flag. Hence, move it to the core data structure
* sched_trace: new implementationBjoern B. Brandenburg2008-10-06
| | | | | This provides and hooks up a new made-from-scratch sched_trace() implementation based on Feather-Trace and ftdev.
* Feather-Trace: make SCHED only depend on nextBjoern B. Brandenburg2008-05-22
|
* record task type in feather trace time stampBjoern B. Brandenburg2008-05-22
|
* Add Feather-Trace support.Bjoern B. Brandenburg2008-05-22
| | | | | | | | - TICK - SCHED1 - SCHED2 - RELEASE - CXS
* two fixes and some logging improvementBjoern B. Brandenburg2008-05-10
| | | | | | - remove outdated comment - reorder stack_in_use marker to be in front of finish_lock_switch() - add TRACE()s to try_to_wake_up()
* LITMUS: disable Linux scheduler optimizationBjoern B. Brandenburg2008-05-05
| | | | | This caused LITMUS^RT real-time tasks to be missed, which in turn caused all kinds of inconsistent state.
* LITMUS: avoid using the same stack on two CPUs in global schedulersBjoern B. Brandenburg2008-05-04
| | | | | This change fixes a race where a job could be executed on more than one CPU, which to random crashes.
* LITMUS: add framework for carrying out jobs after dropping rq locksBjoern B. Brandenburg2008-05-02
| | | | | Many things can't be done from within the scheduler. This framework should make it easer to defer them.
* Merge commit 'jupiter/debug' into debugBjoern B. Brandenburg2008-05-02
|\
| * LITMUS: rework job migration codeBjoern B. Brandenburg2008-05-02
| | | | | | | | | | The old version had a significant window for races with interrupt handlers executing on other CPUs.
* | LITMUS: don't let real-time tasks migrate by themselvesBjoern B. Brandenburg2008-05-02
|/ | | | | | | | A proper real-time migration works as follows: 1) transition to best-effort task 2) migrate to target CPU 3) update real-time parameters 4) transition to real-time task
* SRP: improve robustnessBjoern B. Brandenburg2008-05-01
| | | | | | | | The SRP implementation did not correctly address various suspension-related scenarios correctly. Now the need for SRP blocking is tested on each scheduling event. This ensures mutual exclusion under the SRP even in the face of unexpected suspensions, for example due to IO.
* add complete_n() call to the completion API.Bjoern B. Brandenburg2008-02-19
| | | | | This is usefule for releasing exactly n tasks without having to call compete() repeatedly.
* LITMUS 2008: Initial PortBjoern B. Brandenburg2008-02-13
| | | | | This introduces the core changes ported from LITMUS 2007. The kernel seems to work under QEMU, but many bugs probably remain.
* sched: group scheduler, set uid share fixIngo Molnar2008-01-22
| | | | | | | | | | | | | | setting cpu share to 1 causes hangs, as reported in: http://bugzilla.kernel.org/show_bug.cgi?id=9779 as the default share is 1024, the values of 0 and 1 can indeed cause problems. Limit it to 2 or higher values. These values can only be set by the root user - but still it makes sense to protect against nonsensical values. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* show_task: real_parentRoland McGrath2008-01-09
| | | | | | | | | | The show_task function invoked by sysrq-t et al displays the pid and parent's pid of each task. It seems more useful to show the actual process hierarchy here than who is using ptrace on each process. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sched: touch softlockup watchdog after idlingIngo Molnar2007-12-18
| | | | | | touch softlockup watchdog after idling. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix crash on ia64, introduce task_current()Dmitry Adamushko2007-12-18
| | | | | | | | | | | | | | | | | | | | | | Some services (e.g. sched_setscheduler(), rt_mutex_setprio() and sched_move_task()) must handle a given task differently in case it's the 'rq->curr' task on its run-queue. The task_running() interface is not suitable for determining such tasks for platforms with one of the following options: #define __ARCH_WANT_UNLOCKED_CTXSW #define __ARCH_WANT_INTERRUPTS_ON_CTXSW Due to the fact that it makes use of 'p->oncpu == 1' as a criterion but such a task is not necessarily 'rq->curr'. The detailed explanation is available here: https://lists.linux-foundation.org/pipermail/containers/2007-December/009262.html Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
* sched: enable early use of sched_clock()Ingo Molnar2007-12-07
| | | | | | | | | | some platforms have sched_clock() implementations that cannot be called very early during wakeup. If it's called it might hang or crash in hard to debug ways. So only call update_rq_clock() [which calls sched_clock()] if sched_init() has already been called. (rq->idle is NULL before the scheduler is initialized.) Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: style cleanupsIngo Molnar2007-12-05
| | | | | | | | | | | | style cleanup of various changes that were done recently. no code changed: text data bss dec hex filename 23680 2542 28 26250 668a sched.o.before 23680 2542 28 26250 668a sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix crash in sys_sched_rr_get_interval()Ingo Molnar2007-12-04
| | | | | | | | | | | | | | | | | Luiz Fernando N. Capitulino reported that sched_rr_get_interval() crashes for SCHED_OTHER tasks that are on an idle runqueue. The fix is to return a 0 timeslice for tasks that are on an idle runqueue. (and which are not running, obviously) this also shrinks the code a bit: text data bss dec hex filename 47903 3934 336 52173 cbcd sched.o.before 47885 3934 336 52155 cbbb sched.o.after Reported-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: cpu accounting controller (V2)Srivatsa Vaddagiri2007-12-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for us, which provided a cpu accounting resource controller. This feature would be useful if someone wants to group tasks only for accounting purpose and doesnt really want to exercise any control over their cpu consumption. The patch below reintroduces the feature. It is based on Paul Menage's original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with these differences: - Removed load average information. I felt it needs more thought (esp to deal with SMP and virtualized platforms) and can be added for 2.6.25 after more discussions. - Convert group cpu usage to be nanosecond accurate (as rest of the cfs stats are) and invoke cpuacct_charge() from the respective scheduler classes - Make accounting scalable on SMP systems by splitting the usage counter to be per-cpu - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the code is not big enough to warrant a new file and also this rightly needs to live inside the scheduler. Also things like accessing rq->lock while reading cpu usage becomes easier if the code lived in kernel/sched.c) The patch also modifies the cpu controller not to provide the same accounting information. Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran some simple tests like cpuspin (spin on the cpu), ran several tasks in the same group and timed them. Compared their time stamps with cpuacct.usage. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up, move __sched_text_start/end to sched.hIngo Molnar2007-11-28
| | | | | | | | | | move __sched_text_start/end to sched.h. No code changed: text data bss dec hex filename 26582 2310 28 28920 70f8 sched.o.before 26582 2310 28 28920 70f8 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up sd_alloc_ctl_cpu_table() definitionIngo Molnar2007-11-28
| | | | | | clean up sd_alloc_ctl_cpu_table() definition. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: reorder SCHED_FEAT_ bitsIngo Molnar2007-11-15
| | | | | | | reorder SCHED_FEAT_ bits so that the used ones come first. Makes tuning instructions easier. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove activate_idle_task()Dmitry Adamushko2007-11-15
| | | | | | | | | | | | | | | | | | cpu_down() code is ok wrt sched_idle_next() placing the 'idle' task not at the beginning of the queue. So get rid of activate_idle_task() and make use of activate_task() instead. It is the same as activate_task(), except for the update_rq_clock(rq) call that is redundant. Code size goes down: text data bss dec hex filename 47853 3934 336 52123 cb9b sched.o.before 47828 3934 336 52098 cb82 sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix __set_task_cpu() SMP raceDmitry Adamushko2007-11-15
| | | | | | | | | | | | | | | | | | | | Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core system, which crashes can only be explained via runqueue corruption. there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to a new value, task_rq_lock(p, ...) can be successfuly executed on another CPU. We must ensure that updates of per-task data have been completed by this moment. this bug has been hiding in the Linux scheduler for an eternity (we never had any explicit barrier for task->cpu in set_task_cpu() - so the bug was introduced in 2.5.1), but only became visible via set_task_cfs_rq() being accidentally put after the task->cpu update. It also probably needs a sufficiently out-of-order CPU to trigger. Reported-by: Grant Wilson <grant.wilson@zen.co.uk> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix SCHED_FIFO tasks & FAIR_GROUP_SCHEDOleg Nesterov2007-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose that the SCHED_FIFO task does switch_uid(new_user); Now, p->se.cfs_rq and p->se.parent both point into the old user_struct->tg because sched_move_task() doesn't call set_task_cfs_rq() for !fair_sched_class case. Suppose that old user_struct/task_group is freed/reused, and the task does sched_setscheduler(SCHED_NORMAL); __setscheduler() sets fair_sched_class, but doesn't update ->se.cfs_rq/parent which point to the freed memory. This means that check_preempt_wakeup() doing while (!is_same_group(se, pse)) { se = parent_entity(se); pse = parent_entity(pse); } may OOPS in a similar way if rq->curr or p did something like above. Perhaps we need something like the patch below, note that __setscheduler() can't do set_task_cfs_rq(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix accounting of interrupts during guest execution on s390Christian Borntraeger2007-11-15
| | | | | | | | | | | | | | | | | Currently the scheduler checks for PF_VCPU to decide if this timeslice has to be accounted as guest time. On s390 host interrupts are not disabled during guest execution. This causes theses interrupts to be accounted as guest time if CONFIG_VIRT_CPU_ACCOUNTING is set. Solution is to check if an interrupt triggered account_system_time. As the tick is timer interrupt based, we have to subtract hardirq_offset. I tested the patch on s390 with CONFIG_VIRT_CPU_ACCOUNTING and on x86_64. Seems to work. CC: Avi Kivity <avi@qumranet.com> CC: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* revert "Task Control Groups: example CPU accounting subsystem"Andrew Morton2007-11-14
| | | | | | | | | | | | | | | | | | | Revert 62d0df64065e7c135d0002f069444fbdfc64768f. This was originally intended as a simple initial example of how to create a control groups subsystem; it wasn't intended for mainline, but I didn't make this clear enough to Andrew. The CFS cgroup subsystem now has better functionality for the per-cgroup usage accounting (based directly on CFS stats) than the "usage" status file in this patch, and the "load" status file is rather simplistic - although having a per-cgroup load average report would be a useful feature, I don't believe this patch actually provides it. If it gets into the final 2.6.24 we'd probably have to support this interface for ever. Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sched: proper prototype for kernel/sched.c:migration_init()Adrian Bunk2007-11-09
| | | | | | | | | | | | This patch adds a proper prototype for migration_init() in include/linux/sched.h Since there's no point in always returning 0 to a caller that doesn't check the return value it also changes the function to return void. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: avoid large irq-latencies in smp-balancingPeter Zijlstra2007-11-09
| | | | | | | | | | | | | SMP balancing is done with IRQs disabled and can iterate the full rq. When rqs are large this can cause large irq-latencies. Limit the nr of iterations on each run. This fixes a scheduling latency regression reported by the -rt folks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove PREEMPT_RESTRICTIngo Molnar2007-11-09
| | | | | | | remove PREEMPT_RESTRICT. (this is a separate commit so that any regression related to the removal itself is bisectable) Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: turn off PREEMPT_RESTRICTIngo Molnar2007-11-09
| | | | | | | | | | | | | PREEMPT_RESTRICT was a method aimed at reducing the amount of wakeup related preemption. It has a disadvantage though, it can prevent legitimate wakeups if a task is 'unlucky' to be hit too early by a tick that clears peer_preempt. Now that the wakeup preemption has been cleaned up we dont seem to have excessive preemptions anymore, so this feature can be turned off. (and removed in the next patch) Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SECEric Dumazet2007-11-09
| | | | | | | | | | | | | | | | | 1) hardcoded 1000000000 value is used five times in places where NSEC_PER_SEC might be more readable. 2) A conversion from nsec to msec uses the hardcoded 1000000 value, which is a candidate for NSEC_PER_MSEC. no code changed: text data bss dec hex filename 44359 3326 36 47721 ba69 sched.o.before 44359 3326 36 47721 ba69 sched.o.after Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: reintroduce SMP tunings againIngo Molnar2007-11-09
| | | | | | | | | | | | | | | | | | | Yanmin Zhang reported an aim7 regression and bisected it down to: | commit 38ad464d410dadceda1563f36bdb0be7fe4c8938 | Author: Ingo Molnar <mingo@elte.hu> | Date: Mon Oct 15 17:00:02 2007 +0200 | | sched: uniform tunings | | use the same defaults on both UP and SMP. fix this by reintroducing similar SMP tunings again. This resolves the regression. (also update the comments to match the ilog2(nr_cpus) tuning effect) Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix style in kernel/sched.cIngo Molnar2007-10-29
| | | | | | fallout of recent commits: small coding style fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: report CPU usage in CFS cgroup directoriesPaul Menage2007-10-29
| | | | | | | | | | Adds a cpu.usage file to the CFS cgroup that reports CPU usage in milliseconds for that cgroup's tasks [ mingo@elte.hu: style cleanups. ] Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: move rcu_head to task_group structSrivatsa Vaddagiri2007-10-29
| | | | | | | | | Peter Zijlstra noticed that the rcu_head object need not be present in every cfs_rq of a group. Move it to the task_group structure instead. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix incorrect assumption that cpu 0 existsJames Bottomley2007-10-29
| | | | | | | | | | | | | | | | | | | | | This patch: commit 9b5b77512dce239fa168183fa71896712232e95a Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Date: Mon Oct 15 17:00:09 2007 +0200 sched: clean up code under CONFIG_FAIR_GROUP_SCHED Introduced an assumption of the existence of CPU0 via this line cfs_rq = tg->cfs_rq[0]; If you have no CPU0, that will be NULL. The fix seems to be just to take whatever cfs_rq queue comes out of the for_each_possible_cpu() loop, since they're all equally good for the destruction operation. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: make kernel/sched.c:account_guest_time() staticAdrian Bunk2007-10-29
| | | | | | | account_guest_time() can become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: isolate SMP balancing code a bit morePeter Williams2007-10-24
| | | | | | | | | | | | | | | At the moment, a lot of load balancing code that is irrelevant to non SMP systems gets included during non SMP builds. This patch addresses this issue and reduces the binary size on non SMP systems: text data bss dec hex filename 10983 28 1192 12203 2fab sched.o.before 10739 28 1192 11959 2eb7 sched.o.after Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: reduce balance-tasks overheadPeter Williams2007-10-24
| | | | | | | | | | | | | | | At the moment, balance_tasks() provides low level functionality for both move_tasks() and move_one_task() (indirectly) via the load_balance() function (in the sched_class interface) which also provides dual functionality. This dual functionality complicates the interfaces and internal mechanisms and makes the run time overhead of operations that are called with two run queue locks held. This patch addresses this issue and reduces the overhead of these operations. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up some control group codePaul Menage2007-10-24
| | | | | | | | | - replace "cont" with "cgrp" in a few places in the CFS cgroup code, - use write_uint rather than write for cpu.shares write function Signed-off-by: Paul Menage <menage@google.com> Acked-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: use show_regs() to improve __schedule_bug() outputSatyam Sharma2007-10-24
| | | | | | | | | | | | | | | A full register dump along with stack backtrace would make the "scheduling while atomic" message more helpful. Use show_regs() instead of dump_stack() for this. We already know we're atomic in here (that is why this function was called) so show_regs()'s atomicity expectations are guaranteed. Also, modify the output of the "BUG: scheduling while atomic:" header a bit to keep task->comm and task->pid together and preempt_count() after them. Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up sched_domain_debug()Ingo Molnar2007-10-24
| | | | | | | | | | | | clean up sched_domain_debug(). this also shrinks the code a bit: text data bss dec hex filename 50474 4306 480 55260 d7dc sched.o.before 50404 4306 480 55190 d796 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix fastcall mismatch in completion APIsIngo Molnar2007-10-24
| | | | | | | | | | Jeff Dike noticed that wait_for_completion_interruptible()'s prototype had a mismatched fastcall. Fix this by removing the fastcall attributes from all the completion APIs. Found-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>