diff options
Diffstat (limited to 'Documentation/scheduler')
-rw-r--r-- | Documentation/scheduler/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/scheduler/sched-design-CFS.txt | 14 | ||||
-rw-r--r-- | Documentation/scheduler/sched-domains.txt | 32 | ||||
-rw-r--r-- | Documentation/scheduler/sched-rt-group.txt | 7 | ||||
-rw-r--r-- | Documentation/scheduler/sched-stats.txt | 33 |
5 files changed, 47 insertions, 41 deletions
diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX index 3c00c9c3219e..d2651c47ae27 100644 --- a/Documentation/scheduler/00-INDEX +++ b/Documentation/scheduler/00-INDEX | |||
@@ -3,7 +3,7 @@ | |||
3 | sched-arch.txt | 3 | sched-arch.txt |
4 | - CPU Scheduler implementation hints for architecture specific code. | 4 | - CPU Scheduler implementation hints for architecture specific code. |
5 | sched-design-CFS.txt | 5 | sched-design-CFS.txt |
6 | - goals, design and implementation of the Complete Fair Scheduler. | 6 | - goals, design and implementation of the Completely Fair Scheduler. |
7 | sched-domains.txt | 7 | sched-domains.txt |
8 | - information on scheduling domains. | 8 | - information on scheduling domains. |
9 | sched-nice-design.txt | 9 | sched-nice-design.txt |
diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt index 8239ebbcddce..91ecff07cede 100644 --- a/Documentation/scheduler/sched-design-CFS.txt +++ b/Documentation/scheduler/sched-design-CFS.txt | |||
@@ -164,7 +164,7 @@ This is the (partial) list of the hooks: | |||
164 | It puts the scheduling entity (task) into the red-black tree and | 164 | It puts the scheduling entity (task) into the red-black tree and |
165 | increments the nr_running variable. | 165 | increments the nr_running variable. |
166 | 166 | ||
167 | - dequeue_tree(...) | 167 | - dequeue_task(...) |
168 | 168 | ||
169 | When a task is no longer runnable, this function is called to keep the | 169 | When a task is no longer runnable, this function is called to keep the |
170 | corresponding scheduling entity out of the red-black tree. It decrements | 170 | corresponding scheduling entity out of the red-black tree. It decrements |
@@ -195,11 +195,6 @@ This is the (partial) list of the hooks: | |||
195 | This function is mostly called from time tick functions; it might lead to | 195 | This function is mostly called from time tick functions; it might lead to |
196 | process switch. This drives the running preemption. | 196 | process switch. This drives the running preemption. |
197 | 197 | ||
198 | - task_new(...) | ||
199 | |||
200 | The core scheduler gives the scheduling module an opportunity to manage new | ||
201 | task startup. The CFS scheduling module uses it for group scheduling, while | ||
202 | the scheduling module for a real-time task does not use it. | ||
203 | 198 | ||
204 | 199 | ||
205 | 200 | ||
@@ -228,9 +223,10 @@ When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each | |||
228 | group created using the pseudo filesystem. See example steps below to create | 223 | group created using the pseudo filesystem. See example steps below to create |
229 | task groups and modify their CPU share using the "cgroups" pseudo filesystem. | 224 | task groups and modify their CPU share using the "cgroups" pseudo filesystem. |
230 | 225 | ||
231 | # mkdir /dev/cpuctl | 226 | # mount -t tmpfs cgroup_root /sys/fs/cgroup |
232 | # mount -t cgroup -ocpu none /dev/cpuctl | 227 | # mkdir /sys/fs/cgroup/cpu |
233 | # cd /dev/cpuctl | 228 | # mount -t cgroup -ocpu none /sys/fs/cgroup/cpu |
229 | # cd /sys/fs/cgroup/cpu | ||
234 | 230 | ||
235 | # mkdir multimedia # create "multimedia" group of tasks | 231 | # mkdir multimedia # create "multimedia" group of tasks |
236 | # mkdir browser # create "browser" group of tasks | 232 | # mkdir browser # create "browser" group of tasks |
diff --git a/Documentation/scheduler/sched-domains.txt b/Documentation/scheduler/sched-domains.txt index 373ceacc367e..b7ee379b651b 100644 --- a/Documentation/scheduler/sched-domains.txt +++ b/Documentation/scheduler/sched-domains.txt | |||
@@ -1,8 +1,7 @@ | |||
1 | Each CPU has a "base" scheduling domain (struct sched_domain). These are | 1 | Each CPU has a "base" scheduling domain (struct sched_domain). The domain |
2 | accessed via cpu_sched_domain(i) and this_sched_domain() macros. The domain | ||
3 | hierarchy is built from these base domains via the ->parent pointer. ->parent | 2 | hierarchy is built from these base domains via the ->parent pointer. ->parent |
4 | MUST be NULL terminated, and domain structures should be per-CPU as they | 3 | MUST be NULL terminated, and domain structures should be per-CPU as they are |
5 | are locklessly updated. | 4 | locklessly updated. |
6 | 5 | ||
7 | Each scheduling domain spans a number of CPUs (stored in the ->span field). | 6 | Each scheduling domain spans a number of CPUs (stored in the ->span field). |
8 | A domain's span MUST be a superset of it child's span (this restriction could | 7 | A domain's span MUST be a superset of it child's span (this restriction could |
@@ -26,11 +25,26 @@ is treated as one entity. The load of a group is defined as the sum of the | |||
26 | load of each of its member CPUs, and only when the load of a group becomes | 25 | load of each of its member CPUs, and only when the load of a group becomes |
27 | out of balance are tasks moved between groups. | 26 | out of balance are tasks moved between groups. |
28 | 27 | ||
29 | In kernel/sched.c, rebalance_tick is run periodically on each CPU. This | 28 | In kernel/sched.c, trigger_load_balance() is run periodically on each CPU |
30 | function takes its CPU's base sched domain and checks to see if has reached | 29 | through scheduler_tick(). It raises a softirq after the next regularly scheduled |
31 | its rebalance interval. If so, then it will run load_balance on that domain. | 30 | rebalancing event for the current runqueue has arrived. The actual load |
32 | rebalance_tick then checks the parent sched_domain (if it exists), and the | 31 | balancing workhorse, run_rebalance_domains()->rebalance_domains(), is then run |
33 | parent of the parent and so forth. | 32 | in softirq context (SCHED_SOFTIRQ). |
33 | |||
34 | The latter function takes two arguments: the current CPU and whether it was idle | ||
35 | at the time the scheduler_tick() happened and iterates over all sched domains | ||
36 | our CPU is on, starting from its base domain and going up the ->parent chain. | ||
37 | While doing that, it checks to see if the current domain has exhausted its | ||
38 | rebalance interval. If so, it runs load_balance() on that domain. It then checks | ||
39 | the parent sched_domain (if it exists), and the parent of the parent and so | ||
40 | forth. | ||
41 | |||
42 | Initially, load_balance() finds the busiest group in the current sched domain. | ||
43 | If it succeeds, it looks for the busiest runqueue of all the CPUs' runqueues in | ||
44 | that group. If it manages to find such a runqueue, it locks both our initial | ||
45 | CPU's runqueue and the newly found busiest one and starts moving tasks from it | ||
46 | to our runqueue. The exact number of tasks amounts to an imbalance previously | ||
47 | computed while iterating over this sched domain's groups. | ||
34 | 48 | ||
35 | *** Implementing sched domains *** | 49 | *** Implementing sched domains *** |
36 | The "base" domain will "span" the first level of the hierarchy. In the case | 50 | The "base" domain will "span" the first level of the hierarchy. In the case |
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt index 605b0d40329d..71b54d549987 100644 --- a/Documentation/scheduler/sched-rt-group.txt +++ b/Documentation/scheduler/sched-rt-group.txt | |||
@@ -129,9 +129,8 @@ priority! | |||
129 | Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real | 129 | Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real |
130 | CPU bandwidth to task groups. | 130 | CPU bandwidth to task groups. |
131 | 131 | ||
132 | This uses the /cgroup virtual file system and | 132 | This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us" |
133 | "/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each | 133 | to control the CPU time reserved for each control group. |
134 | control group. | ||
135 | 134 | ||
136 | For more information on working with control groups, you should read | 135 | For more information on working with control groups, you should read |
137 | Documentation/cgroups/cgroups.txt as well. | 136 | Documentation/cgroups/cgroups.txt as well. |
@@ -150,7 +149,7 @@ For now, this can be simplified to just the following (but see Future plans): | |||
150 | =============== | 149 | =============== |
151 | 150 | ||
152 | There is work in progress to make the scheduling period for each group | 151 | There is work in progress to make the scheduling period for each group |
153 | ("/cgroup/<cgroup>/cpu.rt_period_us") configurable as well. | 152 | ("<cgroup>/cpu.rt_period_us") configurable as well. |
154 | 153 | ||
155 | The constraint on the period is that a subgroup must have a smaller or | 154 | The constraint on the period is that a subgroup must have a smaller or |
156 | equal period to its parent. But realistically its not very useful _yet_ | 155 | equal period to its parent. But realistically its not very useful _yet_ |
diff --git a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt index 01e69404ee5e..1cd5d51bc761 100644 --- a/Documentation/scheduler/sched-stats.txt +++ b/Documentation/scheduler/sched-stats.txt | |||
@@ -1,3 +1,7 @@ | |||
1 | Version 15 of schedstats dropped counters for some sched_yield: | ||
2 | yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is | ||
3 | identical to version 14. | ||
4 | |||
1 | Version 14 of schedstats includes support for sched_domains, which hit the | 5 | Version 14 of schedstats includes support for sched_domains, which hit the |
2 | mainline kernel in 2.6.20 although it is identical to the stats from version | 6 | mainline kernel in 2.6.20 although it is identical to the stats from version |
3 | 12 which was in the kernel from 2.6.13-2.6.19 (version 13 never saw a kernel | 7 | 12 which was in the kernel from 2.6.13-2.6.19 (version 13 never saw a kernel |
@@ -28,32 +32,25 @@ to write their own scripts, the fields are described here. | |||
28 | 32 | ||
29 | CPU statistics | 33 | CPU statistics |
30 | -------------- | 34 | -------------- |
31 | cpu<N> 1 2 3 4 5 6 7 8 9 10 11 12 | 35 | cpu<N> 1 2 3 4 5 6 7 8 9 |
32 | |||
33 | NOTE: In the sched_yield() statistics, the active queue is considered empty | ||
34 | if it has only one process in it, since obviously the process calling | ||
35 | sched_yield() is that process. | ||
36 | 36 | ||
37 | First four fields are sched_yield() statistics: | 37 | First field is a sched_yield() statistic: |
38 | 1) # of times both the active and the expired queue were empty | 38 | 1) # of times sched_yield() was called |
39 | 2) # of times just the active queue was empty | ||
40 | 3) # of times just the expired queue was empty | ||
41 | 4) # of times sched_yield() was called | ||
42 | 39 | ||
43 | Next three are schedule() statistics: | 40 | Next three are schedule() statistics: |
44 | 5) # of times we switched to the expired queue and reused it | 41 | 2) # of times we switched to the expired queue and reused it |
45 | 6) # of times schedule() was called | 42 | 3) # of times schedule() was called |
46 | 7) # of times schedule() left the processor idle | 43 | 4) # of times schedule() left the processor idle |
47 | 44 | ||
48 | Next two are try_to_wake_up() statistics: | 45 | Next two are try_to_wake_up() statistics: |
49 | 8) # of times try_to_wake_up() was called | 46 | 5) # of times try_to_wake_up() was called |
50 | 9) # of times try_to_wake_up() was called to wake up the local cpu | 47 | 6) # of times try_to_wake_up() was called to wake up the local cpu |
51 | 48 | ||
52 | Next three are statistics describing scheduling latency: | 49 | Next three are statistics describing scheduling latency: |
53 | 10) sum of all time spent running by tasks on this processor (in jiffies) | 50 | 7) sum of all time spent running by tasks on this processor (in jiffies) |
54 | 11) sum of all time spent waiting to run by tasks on this processor (in | 51 | 8) sum of all time spent waiting to run by tasks on this processor (in |
55 | jiffies) | 52 | jiffies) |
56 | 12) # of timeslices run on this cpu | 53 | 9) # of timeslices run on this cpu |
57 | 54 | ||
58 | 55 | ||
59 | Domain statistics | 56 | Domain statistics |