diff options
author | Paul Menage <menage@google.com> | 2007-10-19 02:39:39 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-10-19 14:53:36 -0400 |
commit | 8793d854edbc2774943a4b0de3304dc73991159a (patch) | |
tree | 380b3403a0fedfcce61d9af5af1ffbcc71017abf /Documentation/cpusets.txt | |
parent | 81a6a5cdd2c5cd70874b88afe524ab09e9e869af (diff) |
Task Control Groups: make cpusets a client of cgroups
Remove the filesystem support logic from the cpusets system and makes cpusets
a cgroup subsystem
The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
passed through to the cgroup filesystem with the appropriate options to
emulate the old cpuset filesystem behaviour.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/cpusets.txt')
-rw-r--r-- | Documentation/cpusets.txt | 93 |
1 files changed, 43 insertions, 50 deletions
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index ec9de6917f01..85eeab5e7e32 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt | |||
@@ -7,6 +7,7 @@ Written by Simon.Derr@bull.net | |||
7 | Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. | 7 | Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. |
8 | Modified by Paul Jackson <pj@sgi.com> | 8 | Modified by Paul Jackson <pj@sgi.com> |
9 | Modified by Christoph Lameter <clameter@sgi.com> | 9 | Modified by Christoph Lameter <clameter@sgi.com> |
10 | Modified by Paul Menage <menage@google.com> | ||
10 | 11 | ||
11 | CONTENTS: | 12 | CONTENTS: |
12 | ========= | 13 | ========= |
@@ -16,10 +17,9 @@ CONTENTS: | |||
16 | 1.2 Why are cpusets needed ? | 17 | 1.2 Why are cpusets needed ? |
17 | 1.3 How are cpusets implemented ? | 18 | 1.3 How are cpusets implemented ? |
18 | 1.4 What are exclusive cpusets ? | 19 | 1.4 What are exclusive cpusets ? |
19 | 1.5 What does notify_on_release do ? | 20 | 1.5 What is memory_pressure ? |
20 | 1.6 What is memory_pressure ? | 21 | 1.6 What is memory spread ? |
21 | 1.7 What is memory spread ? | 22 | 1.7 How do I use cpusets ? |
22 | 1.8 How do I use cpusets ? | ||
23 | 2. Usage Examples and Syntax | 23 | 2. Usage Examples and Syntax |
24 | 2.1 Basic Usage | 24 | 2.1 Basic Usage |
25 | 2.2 Adding/removing cpus | 25 | 2.2 Adding/removing cpus |
@@ -44,18 +44,19 @@ hierarchy visible in a virtual file system. These are the essential | |||
44 | hooks, beyond what is already present, required to manage dynamic | 44 | hooks, beyond what is already present, required to manage dynamic |
45 | job placement on large systems. | 45 | job placement on large systems. |
46 | 46 | ||
47 | Each task has a pointer to a cpuset. Multiple tasks may reference | 47 | Cpusets use the generic cgroup subsystem described in |
48 | the same cpuset. Requests by a task, using the sched_setaffinity(2) | 48 | Documentation/cgroup.txt. |
49 | system call to include CPUs in its CPU affinity mask, and using the | 49 | |
50 | mbind(2) and set_mempolicy(2) system calls to include Memory Nodes | 50 | Requests by a task, using the sched_setaffinity(2) system call to |
51 | in its memory policy, are both filtered through that tasks cpuset, | 51 | include CPUs in its CPU affinity mask, and using the mbind(2) and |
52 | filtering out any CPUs or Memory Nodes not in that cpuset. The | 52 | set_mempolicy(2) system calls to include Memory Nodes in its memory |
53 | scheduler will not schedule a task on a CPU that is not allowed in | 53 | policy, are both filtered through that tasks cpuset, filtering out any |
54 | its cpus_allowed vector, and the kernel page allocator will not | 54 | CPUs or Memory Nodes not in that cpuset. The scheduler will not |
55 | allocate a page on a node that is not allowed in the requesting tasks | 55 | schedule a task on a CPU that is not allowed in its cpus_allowed |
56 | mems_allowed vector. | 56 | vector, and the kernel page allocator will not allocate a page on a |
57 | 57 | node that is not allowed in the requesting tasks mems_allowed vector. | |
58 | User level code may create and destroy cpusets by name in the cpuset | 58 | |
59 | User level code may create and destroy cpusets by name in the cgroup | ||
59 | virtual file system, manage the attributes and permissions of these | 60 | virtual file system, manage the attributes and permissions of these |
60 | cpusets and which CPUs and Memory Nodes are assigned to each cpuset, | 61 | cpusets and which CPUs and Memory Nodes are assigned to each cpuset, |
61 | specify and query to which cpuset a task is assigned, and list the | 62 | specify and query to which cpuset a task is assigned, and list the |
@@ -115,7 +116,7 @@ Cpusets extends these two mechanisms as follows: | |||
115 | - Cpusets are sets of allowed CPUs and Memory Nodes, known to the | 116 | - Cpusets are sets of allowed CPUs and Memory Nodes, known to the |
116 | kernel. | 117 | kernel. |
117 | - Each task in the system is attached to a cpuset, via a pointer | 118 | - Each task in the system is attached to a cpuset, via a pointer |
118 | in the task structure to a reference counted cpuset structure. | 119 | in the task structure to a reference counted cgroup structure. |
119 | - Calls to sched_setaffinity are filtered to just those CPUs | 120 | - Calls to sched_setaffinity are filtered to just those CPUs |
120 | allowed in that tasks cpuset. | 121 | allowed in that tasks cpuset. |
121 | - Calls to mbind and set_mempolicy are filtered to just | 122 | - Calls to mbind and set_mempolicy are filtered to just |
@@ -145,15 +146,10 @@ into the rest of the kernel, none in performance critical paths: | |||
145 | - in page_alloc.c, to restrict memory to allowed nodes. | 146 | - in page_alloc.c, to restrict memory to allowed nodes. |
146 | - in vmscan.c, to restrict page recovery to the current cpuset. | 147 | - in vmscan.c, to restrict page recovery to the current cpuset. |
147 | 148 | ||
148 | In addition a new file system, of type "cpuset" may be mounted, | 149 | You should mount the "cgroup" filesystem type in order to enable |
149 | typically at /dev/cpuset, to enable browsing and modifying the cpusets | 150 | browsing and modifying the cpusets presently known to the kernel. No |
150 | presently known to the kernel. No new system calls are added for | 151 | new system calls are added for cpusets - all support for querying and |
151 | cpusets - all support for querying and modifying cpusets is via | 152 | modifying cpusets is via this cpuset file system. |
152 | this cpuset file system. | ||
153 | |||
154 | Each task under /proc has an added file named 'cpuset', displaying | ||
155 | the cpuset name, as the path relative to the root of the cpuset file | ||
156 | system. | ||
157 | 153 | ||
158 | The /proc/<pid>/status file for each task has two added lines, | 154 | The /proc/<pid>/status file for each task has two added lines, |
159 | displaying the tasks cpus_allowed (on which CPUs it may be scheduled) | 155 | displaying the tasks cpus_allowed (on which CPUs it may be scheduled) |
@@ -163,16 +159,15 @@ in the format seen in the following example: | |||
163 | Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff | 159 | Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff |
164 | Mems_allowed: ffffffff,ffffffff | 160 | Mems_allowed: ffffffff,ffffffff |
165 | 161 | ||
166 | Each cpuset is represented by a directory in the cpuset file system | 162 | Each cpuset is represented by a directory in the cgroup file system |
167 | containing the following files describing that cpuset: | 163 | containing (on top of the standard cgroup files) the following |
164 | files describing that cpuset: | ||
168 | 165 | ||
169 | - cpus: list of CPUs in that cpuset | 166 | - cpus: list of CPUs in that cpuset |
170 | - mems: list of Memory Nodes in that cpuset | 167 | - mems: list of Memory Nodes in that cpuset |
171 | - memory_migrate flag: if set, move pages to cpusets nodes | 168 | - memory_migrate flag: if set, move pages to cpusets nodes |
172 | - cpu_exclusive flag: is cpu placement exclusive? | 169 | - cpu_exclusive flag: is cpu placement exclusive? |
173 | - mem_exclusive flag: is memory placement exclusive? | 170 | - mem_exclusive flag: is memory placement exclusive? |
174 | - tasks: list of tasks (by pid) attached to that cpuset | ||
175 | - notify_on_release flag: run /sbin/cpuset_release_agent on exit? | ||
176 | - memory_pressure: measure of how much paging pressure in cpuset | 171 | - memory_pressure: measure of how much paging pressure in cpuset |
177 | 172 | ||
178 | In addition, the root cpuset only has the following file: | 173 | In addition, the root cpuset only has the following file: |
@@ -237,21 +232,7 @@ such as requests from interrupt handlers, is allowed to be taken | |||
237 | outside even a mem_exclusive cpuset. | 232 | outside even a mem_exclusive cpuset. |
238 | 233 | ||
239 | 234 | ||
240 | 1.5 What does notify_on_release do ? | 235 | 1.5 What is memory_pressure ? |
241 | ------------------------------------ | ||
242 | |||
243 | If the notify_on_release flag is enabled (1) in a cpuset, then whenever | ||
244 | the last task in the cpuset leaves (exits or attaches to some other | ||
245 | cpuset) and the last child cpuset of that cpuset is removed, then | ||
246 | the kernel runs the command /sbin/cpuset_release_agent, supplying the | ||
247 | pathname (relative to the mount point of the cpuset file system) of the | ||
248 | abandoned cpuset. This enables automatic removal of abandoned cpusets. | ||
249 | The default value of notify_on_release in the root cpuset at system | ||
250 | boot is disabled (0). The default value of other cpusets at creation | ||
251 | is the current value of their parents notify_on_release setting. | ||
252 | |||
253 | |||
254 | 1.6 What is memory_pressure ? | ||
255 | ----------------------------- | 236 | ----------------------------- |
256 | The memory_pressure of a cpuset provides a simple per-cpuset metric | 237 | The memory_pressure of a cpuset provides a simple per-cpuset metric |
257 | of the rate that the tasks in a cpuset are attempting to free up in | 238 | of the rate that the tasks in a cpuset are attempting to free up in |
@@ -308,7 +289,7 @@ the tasks in the cpuset, in units of reclaims attempted per second, | |||
308 | times 1000. | 289 | times 1000. |
309 | 290 | ||
310 | 291 | ||
311 | 1.7 What is memory spread ? | 292 | 1.6 What is memory spread ? |
312 | --------------------------- | 293 | --------------------------- |
313 | There are two boolean flag files per cpuset that control where the | 294 | There are two boolean flag files per cpuset that control where the |
314 | kernel allocates pages for the file system buffers and related in | 295 | kernel allocates pages for the file system buffers and related in |
@@ -379,7 +360,7 @@ data set, the memory allocation across the nodes in the jobs cpuset | |||
379 | can become very uneven. | 360 | can become very uneven. |
380 | 361 | ||
381 | 362 | ||
382 | 1.8 How do I use cpusets ? | 363 | 1.7 How do I use cpusets ? |
383 | -------------------------- | 364 | -------------------------- |
384 | 365 | ||
385 | In order to minimize the impact of cpusets on critical kernel | 366 | In order to minimize the impact of cpusets on critical kernel |
@@ -469,7 +450,7 @@ than stress the kernel. | |||
469 | To start a new job that is to be contained within a cpuset, the steps are: | 450 | To start a new job that is to be contained within a cpuset, the steps are: |
470 | 451 | ||
471 | 1) mkdir /dev/cpuset | 452 | 1) mkdir /dev/cpuset |
472 | 2) mount -t cpuset none /dev/cpuset | 453 | 2) mount -t cgroup -ocpuset cpuset /dev/cpuset |
473 | 3) Create the new cpuset by doing mkdir's and write's (or echo's) in | 454 | 3) Create the new cpuset by doing mkdir's and write's (or echo's) in |
474 | the /dev/cpuset virtual file system. | 455 | the /dev/cpuset virtual file system. |
475 | 4) Start a task that will be the "founding father" of the new job. | 456 | 4) Start a task that will be the "founding father" of the new job. |
@@ -481,7 +462,7 @@ For example, the following sequence of commands will setup a cpuset | |||
481 | named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, | 462 | named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, |
482 | and then start a subshell 'sh' in that cpuset: | 463 | and then start a subshell 'sh' in that cpuset: |
483 | 464 | ||
484 | mount -t cpuset none /dev/cpuset | 465 | mount -t cgroup -ocpuset cpuset /dev/cpuset |
485 | cd /dev/cpuset | 466 | cd /dev/cpuset |
486 | mkdir Charlie | 467 | mkdir Charlie |
487 | cd Charlie | 468 | cd Charlie |
@@ -513,7 +494,7 @@ Creating, modifying, using the cpusets can be done through the cpuset | |||
513 | virtual filesystem. | 494 | virtual filesystem. |
514 | 495 | ||
515 | To mount it, type: | 496 | To mount it, type: |
516 | # mount -t cpuset none /dev/cpuset | 497 | # mount -t cgroup -o cpuset cpuset /dev/cpuset |
517 | 498 | ||
518 | Then under /dev/cpuset you can find a tree that corresponds to the | 499 | Then under /dev/cpuset you can find a tree that corresponds to the |
519 | tree of the cpusets in the system. For instance, /dev/cpuset | 500 | tree of the cpusets in the system. For instance, /dev/cpuset |
@@ -556,6 +537,18 @@ To remove a cpuset, just use rmdir: | |||
556 | This will fail if the cpuset is in use (has cpusets inside, or has | 537 | This will fail if the cpuset is in use (has cpusets inside, or has |
557 | processes attached). | 538 | processes attached). |
558 | 539 | ||
540 | Note that for legacy reasons, the "cpuset" filesystem exists as a | ||
541 | wrapper around the cgroup filesystem. | ||
542 | |||
543 | The command | ||
544 | |||
545 | mount -t cpuset X /dev/cpuset | ||
546 | |||
547 | is equivalent to | ||
548 | |||
549 | mount -t cgroup -ocpuset X /dev/cpuset | ||
550 | echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent | ||
551 | |||
559 | 2.2 Adding/removing cpus | 552 | 2.2 Adding/removing cpus |
560 | ------------------------ | 553 | ------------------------ |
561 | 554 | ||