diff options
Diffstat (limited to 'Documentation/cgroups/cpusets.txt')
-rw-r--r-- | Documentation/cgroups/cpusets.txt | 38 |
1 files changed, 19 insertions, 19 deletions
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 4160df82b3f5..51682ab2dd1a 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt | |||
@@ -42,7 +42,7 @@ Nodes to a set of tasks. In this document "Memory Node" refers to | |||
42 | an on-line node that contains memory. | 42 | an on-line node that contains memory. |
43 | 43 | ||
44 | Cpusets constrain the CPU and Memory placement of tasks to only | 44 | Cpusets constrain the CPU and Memory placement of tasks to only |
45 | the resources within a tasks current cpuset. They form a nested | 45 | the resources within a task's current cpuset. They form a nested |
46 | hierarchy visible in a virtual file system. These are the essential | 46 | hierarchy visible in a virtual file system. These are the essential |
47 | hooks, beyond what is already present, required to manage dynamic | 47 | hooks, beyond what is already present, required to manage dynamic |
48 | job placement on large systems. | 48 | job placement on large systems. |
@@ -53,11 +53,11 @@ Documentation/cgroups/cgroups.txt. | |||
53 | Requests by a task, using the sched_setaffinity(2) system call to | 53 | Requests by a task, using the sched_setaffinity(2) system call to |
54 | include CPUs in its CPU affinity mask, and using the mbind(2) and | 54 | include CPUs in its CPU affinity mask, and using the mbind(2) and |
55 | set_mempolicy(2) system calls to include Memory Nodes in its memory | 55 | set_mempolicy(2) system calls to include Memory Nodes in its memory |
56 | policy, are both filtered through that tasks cpuset, filtering out any | 56 | policy, are both filtered through that task's cpuset, filtering out any |
57 | CPUs or Memory Nodes not in that cpuset. The scheduler will not | 57 | CPUs or Memory Nodes not in that cpuset. The scheduler will not |
58 | schedule a task on a CPU that is not allowed in its cpus_allowed | 58 | schedule a task on a CPU that is not allowed in its cpus_allowed |
59 | vector, and the kernel page allocator will not allocate a page on a | 59 | vector, and the kernel page allocator will not allocate a page on a |
60 | node that is not allowed in the requesting tasks mems_allowed vector. | 60 | node that is not allowed in the requesting task's mems_allowed vector. |
61 | 61 | ||
62 | User level code may create and destroy cpusets by name in the cgroup | 62 | User level code may create and destroy cpusets by name in the cgroup |
63 | virtual file system, manage the attributes and permissions of these | 63 | virtual file system, manage the attributes and permissions of these |
@@ -121,9 +121,9 @@ Cpusets extends these two mechanisms as follows: | |||
121 | - Each task in the system is attached to a cpuset, via a pointer | 121 | - Each task in the system is attached to a cpuset, via a pointer |
122 | in the task structure to a reference counted cgroup structure. | 122 | in the task structure to a reference counted cgroup structure. |
123 | - Calls to sched_setaffinity are filtered to just those CPUs | 123 | - Calls to sched_setaffinity are filtered to just those CPUs |
124 | allowed in that tasks cpuset. | 124 | allowed in that task's cpuset. |
125 | - Calls to mbind and set_mempolicy are filtered to just | 125 | - Calls to mbind and set_mempolicy are filtered to just |
126 | those Memory Nodes allowed in that tasks cpuset. | 126 | those Memory Nodes allowed in that task's cpuset. |
127 | - The root cpuset contains all the systems CPUs and Memory | 127 | - The root cpuset contains all the systems CPUs and Memory |
128 | Nodes. | 128 | Nodes. |
129 | - For any cpuset, one can define child cpusets containing a subset | 129 | - For any cpuset, one can define child cpusets containing a subset |
@@ -141,11 +141,11 @@ into the rest of the kernel, none in performance critical paths: | |||
141 | - in init/main.c, to initialize the root cpuset at system boot. | 141 | - in init/main.c, to initialize the root cpuset at system boot. |
142 | - in fork and exit, to attach and detach a task from its cpuset. | 142 | - in fork and exit, to attach and detach a task from its cpuset. |
143 | - in sched_setaffinity, to mask the requested CPUs by what's | 143 | - in sched_setaffinity, to mask the requested CPUs by what's |
144 | allowed in that tasks cpuset. | 144 | allowed in that task's cpuset. |
145 | - in sched.c migrate_live_tasks(), to keep migrating tasks within | 145 | - in sched.c migrate_live_tasks(), to keep migrating tasks within |
146 | the CPUs allowed by their cpuset, if possible. | 146 | the CPUs allowed by their cpuset, if possible. |
147 | - in the mbind and set_mempolicy system calls, to mask the requested | 147 | - in the mbind and set_mempolicy system calls, to mask the requested |
148 | Memory Nodes by what's allowed in that tasks cpuset. | 148 | Memory Nodes by what's allowed in that task's cpuset. |
149 | - in page_alloc.c, to restrict memory to allowed nodes. | 149 | - in page_alloc.c, to restrict memory to allowed nodes. |
150 | - in vmscan.c, to restrict page recovery to the current cpuset. | 150 | - in vmscan.c, to restrict page recovery to the current cpuset. |
151 | 151 | ||
@@ -155,7 +155,7 @@ new system calls are added for cpusets - all support for querying and | |||
155 | modifying cpusets is via this cpuset file system. | 155 | modifying cpusets is via this cpuset file system. |
156 | 156 | ||
157 | The /proc/<pid>/status file for each task has four added lines, | 157 | The /proc/<pid>/status file for each task has four added lines, |
158 | displaying the tasks cpus_allowed (on which CPUs it may be scheduled) | 158 | displaying the task's cpus_allowed (on which CPUs it may be scheduled) |
159 | and mems_allowed (on which Memory Nodes it may obtain memory), | 159 | and mems_allowed (on which Memory Nodes it may obtain memory), |
160 | in the two formats seen in the following example: | 160 | in the two formats seen in the following example: |
161 | 161 | ||
@@ -323,17 +323,17 @@ stack segment pages of a task. | |||
323 | 323 | ||
324 | By default, both kinds of memory spreading are off, and memory | 324 | By default, both kinds of memory spreading are off, and memory |
325 | pages are allocated on the node local to where the task is running, | 325 | pages are allocated on the node local to where the task is running, |
326 | except perhaps as modified by the tasks NUMA mempolicy or cpuset | 326 | except perhaps as modified by the task's NUMA mempolicy or cpuset |
327 | configuration, so long as sufficient free memory pages are available. | 327 | configuration, so long as sufficient free memory pages are available. |
328 | 328 | ||
329 | When new cpusets are created, they inherit the memory spread settings | 329 | When new cpusets are created, they inherit the memory spread settings |
330 | of their parent. | 330 | of their parent. |
331 | 331 | ||
332 | Setting memory spreading causes allocations for the affected page | 332 | Setting memory spreading causes allocations for the affected page |
333 | or slab caches to ignore the tasks NUMA mempolicy and be spread | 333 | or slab caches to ignore the task's NUMA mempolicy and be spread |
334 | instead. Tasks using mbind() or set_mempolicy() calls to set NUMA | 334 | instead. Tasks using mbind() or set_mempolicy() calls to set NUMA |
335 | mempolicies will not notice any change in these calls as a result of | 335 | mempolicies will not notice any change in these calls as a result of |
336 | their containing tasks memory spread settings. If memory spreading | 336 | their containing task's memory spread settings. If memory spreading |
337 | is turned off, then the currently specified NUMA mempolicy once again | 337 | is turned off, then the currently specified NUMA mempolicy once again |
338 | applies to memory page allocations. | 338 | applies to memory page allocations. |
339 | 339 | ||
@@ -357,7 +357,7 @@ pages from the node returned by cpuset_mem_spread_node(). | |||
357 | 357 | ||
358 | The cpuset_mem_spread_node() routine is also simple. It uses the | 358 | The cpuset_mem_spread_node() routine is also simple. It uses the |
359 | value of a per-task rotor cpuset_mem_spread_rotor to select the next | 359 | value of a per-task rotor cpuset_mem_spread_rotor to select the next |
360 | node in the current tasks mems_allowed to prefer for the allocation. | 360 | node in the current task's mems_allowed to prefer for the allocation. |
361 | 361 | ||
362 | This memory placement policy is also known (in other contexts) as | 362 | This memory placement policy is also known (in other contexts) as |
363 | round-robin or interleave. | 363 | round-robin or interleave. |
@@ -594,7 +594,7 @@ is attached, is subtle. | |||
594 | If a cpuset has its Memory Nodes modified, then for each task attached | 594 | If a cpuset has its Memory Nodes modified, then for each task attached |
595 | to that cpuset, the next time that the kernel attempts to allocate | 595 | to that cpuset, the next time that the kernel attempts to allocate |
596 | a page of memory for that task, the kernel will notice the change | 596 | a page of memory for that task, the kernel will notice the change |
597 | in the tasks cpuset, and update its per-task memory placement to | 597 | in the task's cpuset, and update its per-task memory placement to |
598 | remain within the new cpusets memory placement. If the task was using | 598 | remain within the new cpusets memory placement. If the task was using |
599 | mempolicy MPOL_BIND, and the nodes to which it was bound overlap with | 599 | mempolicy MPOL_BIND, and the nodes to which it was bound overlap with |
600 | its new cpuset, then the task will continue to use whatever subset | 600 | its new cpuset, then the task will continue to use whatever subset |
@@ -603,13 +603,13 @@ was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed | |||
603 | in the new cpuset, then the task will be essentially treated as if it | 603 | in the new cpuset, then the task will be essentially treated as if it |
604 | was MPOL_BIND bound to the new cpuset (even though its NUMA placement, | 604 | was MPOL_BIND bound to the new cpuset (even though its NUMA placement, |
605 | as queried by get_mempolicy(), doesn't change). If a task is moved | 605 | as queried by get_mempolicy(), doesn't change). If a task is moved |
606 | from one cpuset to another, then the kernel will adjust the tasks | 606 | from one cpuset to another, then the kernel will adjust the task's |
607 | memory placement, as above, the next time that the kernel attempts | 607 | memory placement, as above, the next time that the kernel attempts |
608 | to allocate a page of memory for that task. | 608 | to allocate a page of memory for that task. |
609 | 609 | ||
610 | If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset | 610 | If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset |
611 | will have its allowed CPU placement changed immediately. Similarly, | 611 | will have its allowed CPU placement changed immediately. Similarly, |
612 | if a tasks pid is written to another cpusets 'cpuset.tasks' file, then its | 612 | if a task's pid is written to another cpusets 'cpuset.tasks' file, then its |
613 | allowed CPU placement is changed immediately. If such a task had been | 613 | allowed CPU placement is changed immediately. If such a task had been |
614 | bound to some subset of its cpuset using the sched_setaffinity() call, | 614 | bound to some subset of its cpuset using the sched_setaffinity() call, |
615 | the task will be allowed to run on any CPU allowed in its new cpuset, | 615 | the task will be allowed to run on any CPU allowed in its new cpuset, |
@@ -626,16 +626,16 @@ cpusets memory placement policy 'cpuset.mems' subsequently changes. | |||
626 | If the cpuset flag file 'cpuset.memory_migrate' is set true, then when | 626 | If the cpuset flag file 'cpuset.memory_migrate' is set true, then when |
627 | tasks are attached to that cpuset, any pages that task had | 627 | tasks are attached to that cpuset, any pages that task had |
628 | allocated to it on nodes in its previous cpuset are migrated | 628 | allocated to it on nodes in its previous cpuset are migrated |
629 | to the tasks new cpuset. The relative placement of the page within | 629 | to the task's new cpuset. The relative placement of the page within |
630 | the cpuset is preserved during these migration operations if possible. | 630 | the cpuset is preserved during these migration operations if possible. |
631 | For example if the page was on the second valid node of the prior cpuset | 631 | For example if the page was on the second valid node of the prior cpuset |
632 | then the page will be placed on the second valid node of the new cpuset. | 632 | then the page will be placed on the second valid node of the new cpuset. |
633 | 633 | ||
634 | Also if 'cpuset.memory_migrate' is set true, then if that cpusets | 634 | Also if 'cpuset.memory_migrate' is set true, then if that cpuset's |
635 | 'cpuset.mems' file is modified, pages allocated to tasks in that | 635 | 'cpuset.mems' file is modified, pages allocated to tasks in that |
636 | cpuset, that were on nodes in the previous setting of 'cpuset.mems', | 636 | cpuset, that were on nodes in the previous setting of 'cpuset.mems', |
637 | will be moved to nodes in the new setting of 'mems.' | 637 | will be moved to nodes in the new setting of 'mems.' |
638 | Pages that were not in the tasks prior cpuset, or in the cpusets | 638 | Pages that were not in the task's prior cpuset, or in the cpuset's |
639 | prior 'cpuset.mems' setting, will not be moved. | 639 | prior 'cpuset.mems' setting, will not be moved. |
640 | 640 | ||
641 | There is an exception to the above. If hotplug functionality is used | 641 | There is an exception to the above. If hotplug functionality is used |
@@ -655,7 +655,7 @@ There is a second exception to the above. GFP_ATOMIC requests are | |||
655 | kernel internal allocations that must be satisfied, immediately. | 655 | kernel internal allocations that must be satisfied, immediately. |
656 | The kernel may drop some request, in rare cases even panic, if a | 656 | The kernel may drop some request, in rare cases even panic, if a |
657 | GFP_ATOMIC alloc fails. If the request cannot be satisfied within | 657 | GFP_ATOMIC alloc fails. If the request cannot be satisfied within |
658 | the current tasks cpuset, then we relax the cpuset, and look for | 658 | the current task's cpuset, then we relax the cpuset, and look for |
659 | memory anywhere we can find it. It's better to violate the cpuset | 659 | memory anywhere we can find it. It's better to violate the cpuset |
660 | than stress the kernel. | 660 | than stress the kernel. |
661 | 661 | ||