aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cpusets.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/cpusets.txt')
-rw-r--r--Documentation/cpusets.txt93
1 files changed, 43 insertions, 50 deletions
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index ec9de6917f01..85eeab5e7e32 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -7,6 +7,7 @@ Written by Simon.Derr@bull.net
7Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. 7Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
8Modified by Paul Jackson <pj@sgi.com> 8Modified by Paul Jackson <pj@sgi.com>
9Modified by Christoph Lameter <clameter@sgi.com> 9Modified by Christoph Lameter <clameter@sgi.com>
10Modified by Paul Menage <menage@google.com>
10 11
11CONTENTS: 12CONTENTS:
12========= 13=========
@@ -16,10 +17,9 @@ CONTENTS:
16 1.2 Why are cpusets needed ? 17 1.2 Why are cpusets needed ?
17 1.3 How are cpusets implemented ? 18 1.3 How are cpusets implemented ?
18 1.4 What are exclusive cpusets ? 19 1.4 What are exclusive cpusets ?
19 1.5 What does notify_on_release do ? 20 1.5 What is memory_pressure ?
20 1.6 What is memory_pressure ? 21 1.6 What is memory spread ?
21 1.7 What is memory spread ? 22 1.7 How do I use cpusets ?
22 1.8 How do I use cpusets ?
232. Usage Examples and Syntax 232. Usage Examples and Syntax
24 2.1 Basic Usage 24 2.1 Basic Usage
25 2.2 Adding/removing cpus 25 2.2 Adding/removing cpus
@@ -44,18 +44,19 @@ hierarchy visible in a virtual file system. These are the essential
44hooks, beyond what is already present, required to manage dynamic 44hooks, beyond what is already present, required to manage dynamic
45job placement on large systems. 45job placement on large systems.
46 46
47Each task has a pointer to a cpuset. Multiple tasks may reference 47Cpusets use the generic cgroup subsystem described in
48the same cpuset. Requests by a task, using the sched_setaffinity(2) 48Documentation/cgroup.txt.
49system call to include CPUs in its CPU affinity mask, and using the 49
50mbind(2) and set_mempolicy(2) system calls to include Memory Nodes 50Requests by a task, using the sched_setaffinity(2) system call to
51in its memory policy, are both filtered through that tasks cpuset, 51include CPUs in its CPU affinity mask, and using the mbind(2) and
52filtering out any CPUs or Memory Nodes not in that cpuset. The 52set_mempolicy(2) system calls to include Memory Nodes in its memory
53scheduler will not schedule a task on a CPU that is not allowed in 53policy, are both filtered through that tasks cpuset, filtering out any
54its cpus_allowed vector, and the kernel page allocator will not 54CPUs or Memory Nodes not in that cpuset. The scheduler will not
55allocate a page on a node that is not allowed in the requesting tasks 55schedule a task on a CPU that is not allowed in its cpus_allowed
56mems_allowed vector. 56vector, and the kernel page allocator will not allocate a page on a
57 57node that is not allowed in the requesting tasks mems_allowed vector.
58User level code may create and destroy cpusets by name in the cpuset 58
59User level code may create and destroy cpusets by name in the cgroup
59virtual file system, manage the attributes and permissions of these 60virtual file system, manage the attributes and permissions of these
60cpusets and which CPUs and Memory Nodes are assigned to each cpuset, 61cpusets and which CPUs and Memory Nodes are assigned to each cpuset,
61specify and query to which cpuset a task is assigned, and list the 62specify and query to which cpuset a task is assigned, and list the
@@ -115,7 +116,7 @@ Cpusets extends these two mechanisms as follows:
115 - Cpusets are sets of allowed CPUs and Memory Nodes, known to the 116 - Cpusets are sets of allowed CPUs and Memory Nodes, known to the
116 kernel. 117 kernel.
117 - Each task in the system is attached to a cpuset, via a pointer 118 - Each task in the system is attached to a cpuset, via a pointer
118 in the task structure to a reference counted cpuset structure. 119 in the task structure to a reference counted cgroup structure.
119 - Calls to sched_setaffinity are filtered to just those CPUs 120 - Calls to sched_setaffinity are filtered to just those CPUs
120 allowed in that tasks cpuset. 121 allowed in that tasks cpuset.
121 - Calls to mbind and set_mempolicy are filtered to just 122 - Calls to mbind and set_mempolicy are filtered to just
@@ -145,15 +146,10 @@ into the rest of the kernel, none in performance critical paths:
145 - in page_alloc.c, to restrict memory to allowed nodes. 146 - in page_alloc.c, to restrict memory to allowed nodes.
146 - in vmscan.c, to restrict page recovery to the current cpuset. 147 - in vmscan.c, to restrict page recovery to the current cpuset.
147 148
148In addition a new file system, of type "cpuset" may be mounted, 149You should mount the "cgroup" filesystem type in order to enable
149typically at /dev/cpuset, to enable browsing and modifying the cpusets 150browsing and modifying the cpusets presently known to the kernel. No
150presently known to the kernel. No new system calls are added for 151new system calls are added for cpusets - all support for querying and
151cpusets - all support for querying and modifying cpusets is via 152modifying cpusets is via this cpuset file system.
152this cpuset file system.
153
154Each task under /proc has an added file named 'cpuset', displaying
155the cpuset name, as the path relative to the root of the cpuset file
156system.
157 153
158The /proc/<pid>/status file for each task has two added lines, 154The /proc/<pid>/status file for each task has two added lines,
159displaying the tasks cpus_allowed (on which CPUs it may be scheduled) 155displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
@@ -163,16 +159,15 @@ in the format seen in the following example:
163 Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff 159 Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
164 Mems_allowed: ffffffff,ffffffff 160 Mems_allowed: ffffffff,ffffffff
165 161
166Each cpuset is represented by a directory in the cpuset file system 162Each cpuset is represented by a directory in the cgroup file system
167containing the following files describing that cpuset: 163containing (on top of the standard cgroup files) the following
164files describing that cpuset:
168 165
169 - cpus: list of CPUs in that cpuset 166 - cpus: list of CPUs in that cpuset
170 - mems: list of Memory Nodes in that cpuset 167 - mems: list of Memory Nodes in that cpuset
171 - memory_migrate flag: if set, move pages to cpusets nodes 168 - memory_migrate flag: if set, move pages to cpusets nodes
172 - cpu_exclusive flag: is cpu placement exclusive? 169 - cpu_exclusive flag: is cpu placement exclusive?
173 - mem_exclusive flag: is memory placement exclusive? 170 - mem_exclusive flag: is memory placement exclusive?
174 - tasks: list of tasks (by pid) attached to that cpuset
175 - notify_on_release flag: run /sbin/cpuset_release_agent on exit?
176 - memory_pressure: measure of how much paging pressure in cpuset 171 - memory_pressure: measure of how much paging pressure in cpuset
177 172
178In addition, the root cpuset only has the following file: 173In addition, the root cpuset only has the following file:
@@ -237,21 +232,7 @@ such as requests from interrupt handlers, is allowed to be taken
237outside even a mem_exclusive cpuset. 232outside even a mem_exclusive cpuset.
238 233
239 234
2401.5 What does notify_on_release do ? 2351.5 What is memory_pressure ?
241------------------------------------
242
243If the notify_on_release flag is enabled (1) in a cpuset, then whenever
244the last task in the cpuset leaves (exits or attaches to some other
245cpuset) and the last child cpuset of that cpuset is removed, then
246the kernel runs the command /sbin/cpuset_release_agent, supplying the
247pathname (relative to the mount point of the cpuset file system) of the
248abandoned cpuset. This enables automatic removal of abandoned cpusets.
249The default value of notify_on_release in the root cpuset at system
250boot is disabled (0). The default value of other cpusets at creation
251is the current value of their parents notify_on_release setting.
252
253
2541.6 What is memory_pressure ?
255----------------------------- 236-----------------------------
256The memory_pressure of a cpuset provides a simple per-cpuset metric 237The memory_pressure of a cpuset provides a simple per-cpuset metric
257of the rate that the tasks in a cpuset are attempting to free up in 238of the rate that the tasks in a cpuset are attempting to free up in
@@ -308,7 +289,7 @@ the tasks in the cpuset, in units of reclaims attempted per second,
308times 1000. 289times 1000.
309 290
310 291
3111.7 What is memory spread ? 2921.6 What is memory spread ?
312--------------------------- 293---------------------------
313There are two boolean flag files per cpuset that control where the 294There are two boolean flag files per cpuset that control where the
314kernel allocates pages for the file system buffers and related in 295kernel allocates pages for the file system buffers and related in
@@ -379,7 +360,7 @@ data set, the memory allocation across the nodes in the jobs cpuset
379can become very uneven. 360can become very uneven.
380 361
381 362
3821.8 How do I use cpusets ? 3631.7 How do I use cpusets ?
383-------------------------- 364--------------------------
384 365
385In order to minimize the impact of cpusets on critical kernel 366In order to minimize the impact of cpusets on critical kernel
@@ -469,7 +450,7 @@ than stress the kernel.
469To start a new job that is to be contained within a cpuset, the steps are: 450To start a new job that is to be contained within a cpuset, the steps are:
470 451
471 1) mkdir /dev/cpuset 452 1) mkdir /dev/cpuset
472 2) mount -t cpuset none /dev/cpuset 453 2) mount -t cgroup -ocpuset cpuset /dev/cpuset
473 3) Create the new cpuset by doing mkdir's and write's (or echo's) in 454 3) Create the new cpuset by doing mkdir's and write's (or echo's) in
474 the /dev/cpuset virtual file system. 455 the /dev/cpuset virtual file system.
475 4) Start a task that will be the "founding father" of the new job. 456 4) Start a task that will be the "founding father" of the new job.
@@ -481,7 +462,7 @@ For example, the following sequence of commands will setup a cpuset
481named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, 462named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
482and then start a subshell 'sh' in that cpuset: 463and then start a subshell 'sh' in that cpuset:
483 464
484 mount -t cpuset none /dev/cpuset 465 mount -t cgroup -ocpuset cpuset /dev/cpuset
485 cd /dev/cpuset 466 cd /dev/cpuset
486 mkdir Charlie 467 mkdir Charlie
487 cd Charlie 468 cd Charlie
@@ -513,7 +494,7 @@ Creating, modifying, using the cpusets can be done through the cpuset
513virtual filesystem. 494virtual filesystem.
514 495
515To mount it, type: 496To mount it, type:
516# mount -t cpuset none /dev/cpuset 497# mount -t cgroup -o cpuset cpuset /dev/cpuset
517 498
518Then under /dev/cpuset you can find a tree that corresponds to the 499Then under /dev/cpuset you can find a tree that corresponds to the
519tree of the cpusets in the system. For instance, /dev/cpuset 500tree of the cpusets in the system. For instance, /dev/cpuset
@@ -556,6 +537,18 @@ To remove a cpuset, just use rmdir:
556This will fail if the cpuset is in use (has cpusets inside, or has 537This will fail if the cpuset is in use (has cpusets inside, or has
557processes attached). 538processes attached).
558 539
540Note that for legacy reasons, the "cpuset" filesystem exists as a
541wrapper around the cgroup filesystem.
542
543The command
544
545mount -t cpuset X /dev/cpuset
546
547is equivalent to
548
549mount -t cgroup -ocpuset X /dev/cpuset
550echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
551
5592.2 Adding/removing cpus 5522.2 Adding/removing cpus
560------------------------ 553------------------------
561 554