aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/scheduler
diff options
context:
space:
mode:
authorGlenn Elliott <gelliott@cs.unc.edu>2012-03-04 19:47:13 -0500
committerGlenn Elliott <gelliott@cs.unc.edu>2012-03-04 19:47:13 -0500
commitc71c03bda1e86c9d5198c5d83f712e695c4f2a1e (patch)
treeecb166cb3e2b7e2adb3b5e292245fefd23381ac8 /Documentation/scheduler
parentea53c912f8a86a8567697115b6a0d8152beee5c8 (diff)
parent6a00f206debf8a5c8899055726ad127dbeeed098 (diff)
Merge branch 'mpi-master' into wip-k-fmlpwip-k-fmlp
Conflicts: litmus/sched_cedf.c
Diffstat (limited to 'Documentation/scheduler')
-rw-r--r--Documentation/scheduler/00-INDEX2
-rw-r--r--Documentation/scheduler/sched-design-CFS.txt14
-rw-r--r--Documentation/scheduler/sched-domains.txt32
-rw-r--r--Documentation/scheduler/sched-rt-group.txt7
-rw-r--r--Documentation/scheduler/sched-stats.txt33
5 files changed, 47 insertions, 41 deletions
diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX
index 3c00c9c3219e..d2651c47ae27 100644
--- a/Documentation/scheduler/00-INDEX
+++ b/Documentation/scheduler/00-INDEX
@@ -3,7 +3,7 @@
3sched-arch.txt 3sched-arch.txt
4 - CPU Scheduler implementation hints for architecture specific code. 4 - CPU Scheduler implementation hints for architecture specific code.
5sched-design-CFS.txt 5sched-design-CFS.txt
6 - goals, design and implementation of the Complete Fair Scheduler. 6 - goals, design and implementation of the Completely Fair Scheduler.
7sched-domains.txt 7sched-domains.txt
8 - information on scheduling domains. 8 - information on scheduling domains.
9sched-nice-design.txt 9sched-nice-design.txt
diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt
index 8239ebbcddce..91ecff07cede 100644
--- a/Documentation/scheduler/sched-design-CFS.txt
+++ b/Documentation/scheduler/sched-design-CFS.txt
@@ -164,7 +164,7 @@ This is the (partial) list of the hooks:
164 It puts the scheduling entity (task) into the red-black tree and 164 It puts the scheduling entity (task) into the red-black tree and
165 increments the nr_running variable. 165 increments the nr_running variable.
166 166
167 - dequeue_tree(...) 167 - dequeue_task(...)
168 168
169 When a task is no longer runnable, this function is called to keep the 169 When a task is no longer runnable, this function is called to keep the
170 corresponding scheduling entity out of the red-black tree. It decrements 170 corresponding scheduling entity out of the red-black tree. It decrements
@@ -195,11 +195,6 @@ This is the (partial) list of the hooks:
195 This function is mostly called from time tick functions; it might lead to 195 This function is mostly called from time tick functions; it might lead to
196 process switch. This drives the running preemption. 196 process switch. This drives the running preemption.
197 197
198 - task_new(...)
199
200 The core scheduler gives the scheduling module an opportunity to manage new
201 task startup. The CFS scheduling module uses it for group scheduling, while
202 the scheduling module for a real-time task does not use it.
203 198
204 199
205 200
@@ -228,9 +223,10 @@ When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each
228group created using the pseudo filesystem. See example steps below to create 223group created using the pseudo filesystem. See example steps below to create
229task groups and modify their CPU share using the "cgroups" pseudo filesystem. 224task groups and modify their CPU share using the "cgroups" pseudo filesystem.
230 225
231 # mkdir /dev/cpuctl 226 # mount -t tmpfs cgroup_root /sys/fs/cgroup
232 # mount -t cgroup -ocpu none /dev/cpuctl 227 # mkdir /sys/fs/cgroup/cpu
233 # cd /dev/cpuctl 228 # mount -t cgroup -ocpu none /sys/fs/cgroup/cpu
229 # cd /sys/fs/cgroup/cpu
234 230
235 # mkdir multimedia # create "multimedia" group of tasks 231 # mkdir multimedia # create "multimedia" group of tasks
236 # mkdir browser # create "browser" group of tasks 232 # mkdir browser # create "browser" group of tasks
diff --git a/Documentation/scheduler/sched-domains.txt b/Documentation/scheduler/sched-domains.txt
index 373ceacc367e..b7ee379b651b 100644
--- a/Documentation/scheduler/sched-domains.txt
+++ b/Documentation/scheduler/sched-domains.txt
@@ -1,8 +1,7 @@
1Each CPU has a "base" scheduling domain (struct sched_domain). These are 1Each CPU has a "base" scheduling domain (struct sched_domain). The domain
2accessed via cpu_sched_domain(i) and this_sched_domain() macros. The domain
3hierarchy is built from these base domains via the ->parent pointer. ->parent 2hierarchy is built from these base domains via the ->parent pointer. ->parent
4MUST be NULL terminated, and domain structures should be per-CPU as they 3MUST be NULL terminated, and domain structures should be per-CPU as they are
5are locklessly updated. 4locklessly updated.
6 5
7Each scheduling domain spans a number of CPUs (stored in the ->span field). 6Each scheduling domain spans a number of CPUs (stored in the ->span field).
8A domain's span MUST be a superset of it child's span (this restriction could 7A domain's span MUST be a superset of it child's span (this restriction could
@@ -26,11 +25,26 @@ is treated as one entity. The load of a group is defined as the sum of the
26load of each of its member CPUs, and only when the load of a group becomes 25load of each of its member CPUs, and only when the load of a group becomes
27out of balance are tasks moved between groups. 26out of balance are tasks moved between groups.
28 27
29In kernel/sched.c, rebalance_tick is run periodically on each CPU. This 28In kernel/sched.c, trigger_load_balance() is run periodically on each CPU
30function takes its CPU's base sched domain and checks to see if has reached 29through scheduler_tick(). It raises a softirq after the next regularly scheduled
31its rebalance interval. If so, then it will run load_balance on that domain. 30rebalancing event for the current runqueue has arrived. The actual load
32rebalance_tick then checks the parent sched_domain (if it exists), and the 31balancing workhorse, run_rebalance_domains()->rebalance_domains(), is then run
33parent of the parent and so forth. 32in softirq context (SCHED_SOFTIRQ).
33
34The latter function takes two arguments: the current CPU and whether it was idle
35at the time the scheduler_tick() happened and iterates over all sched domains
36our CPU is on, starting from its base domain and going up the ->parent chain.
37While doing that, it checks to see if the current domain has exhausted its
38rebalance interval. If so, it runs load_balance() on that domain. It then checks
39the parent sched_domain (if it exists), and the parent of the parent and so
40forth.
41
42Initially, load_balance() finds the busiest group in the current sched domain.
43If it succeeds, it looks for the busiest runqueue of all the CPUs' runqueues in
44that group. If it manages to find such a runqueue, it locks both our initial
45CPU's runqueue and the newly found busiest one and starts moving tasks from it
46to our runqueue. The exact number of tasks amounts to an imbalance previously
47computed while iterating over this sched domain's groups.
34 48
35*** Implementing sched domains *** 49*** Implementing sched domains ***
36The "base" domain will "span" the first level of the hierarchy. In the case 50The "base" domain will "span" the first level of the hierarchy. In the case
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 605b0d40329d..71b54d549987 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -129,9 +129,8 @@ priority!
129Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real 129Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
130CPU bandwidth to task groups. 130CPU bandwidth to task groups.
131 131
132This uses the /cgroup virtual file system and 132This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
133"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each 133to control the CPU time reserved for each control group.
134control group.
135 134
136For more information on working with control groups, you should read 135For more information on working with control groups, you should read
137Documentation/cgroups/cgroups.txt as well. 136Documentation/cgroups/cgroups.txt as well.
@@ -150,7 +149,7 @@ For now, this can be simplified to just the following (but see Future plans):
150=============== 149===============
151 150
152There is work in progress to make the scheduling period for each group 151There is work in progress to make the scheduling period for each group
153("/cgroup/<cgroup>/cpu.rt_period_us") configurable as well. 152("<cgroup>/cpu.rt_period_us") configurable as well.
154 153
155The constraint on the period is that a subgroup must have a smaller or 154The constraint on the period is that a subgroup must have a smaller or
156equal period to its parent. But realistically its not very useful _yet_ 155equal period to its parent. But realistically its not very useful _yet_
diff --git a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt
index 01e69404ee5e..1cd5d51bc761 100644
--- a/Documentation/scheduler/sched-stats.txt
+++ b/Documentation/scheduler/sched-stats.txt
@@ -1,3 +1,7 @@
1Version 15 of schedstats dropped counters for some sched_yield:
2yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
3identical to version 14.
4
1Version 14 of schedstats includes support for sched_domains, which hit the 5Version 14 of schedstats includes support for sched_domains, which hit the
2mainline kernel in 2.6.20 although it is identical to the stats from version 6mainline kernel in 2.6.20 although it is identical to the stats from version
312 which was in the kernel from 2.6.13-2.6.19 (version 13 never saw a kernel 712 which was in the kernel from 2.6.13-2.6.19 (version 13 never saw a kernel
@@ -28,32 +32,25 @@ to write their own scripts, the fields are described here.
28 32
29CPU statistics 33CPU statistics
30-------------- 34--------------
31cpu<N> 1 2 3 4 5 6 7 8 9 10 11 12 35cpu<N> 1 2 3 4 5 6 7 8 9
32
33NOTE: In the sched_yield() statistics, the active queue is considered empty
34 if it has only one process in it, since obviously the process calling
35 sched_yield() is that process.
36 36
37First four fields are sched_yield() statistics: 37First field is a sched_yield() statistic:
38 1) # of times both the active and the expired queue were empty 38 1) # of times sched_yield() was called
39 2) # of times just the active queue was empty
40 3) # of times just the expired queue was empty
41 4) # of times sched_yield() was called
42 39
43Next three are schedule() statistics: 40Next three are schedule() statistics:
44 5) # of times we switched to the expired queue and reused it 41 2) # of times we switched to the expired queue and reused it
45 6) # of times schedule() was called 42 3) # of times schedule() was called
46 7) # of times schedule() left the processor idle 43 4) # of times schedule() left the processor idle
47 44
48Next two are try_to_wake_up() statistics: 45Next two are try_to_wake_up() statistics:
49 8) # of times try_to_wake_up() was called 46 5) # of times try_to_wake_up() was called
50 9) # of times try_to_wake_up() was called to wake up the local cpu 47 6) # of times try_to_wake_up() was called to wake up the local cpu
51 48
52Next three are statistics describing scheduling latency: 49Next three are statistics describing scheduling latency:
53 10) sum of all time spent running by tasks on this processor (in jiffies) 50 7) sum of all time spent running by tasks on this processor (in jiffies)
54 11) sum of all time spent waiting to run by tasks on this processor (in 51 8) sum of all time spent waiting to run by tasks on this processor (in
55 jiffies) 52 jiffies)
56 12) # of timeslices run on this cpu 53 9) # of timeslices run on this cpu
57 54
58 55
59Domain statistics 56Domain statistics