diff options
Diffstat (limited to 'Documentation/cpusets.txt')
-rw-r--r-- | Documentation/cpusets.txt | 98 |
1 files changed, 84 insertions, 14 deletions
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index ad2bb3b3acc1..fb7b361e6eea 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt | |||
@@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. | |||
8 | Modified by Paul Jackson <pj@sgi.com> | 8 | Modified by Paul Jackson <pj@sgi.com> |
9 | Modified by Christoph Lameter <clameter@sgi.com> | 9 | Modified by Christoph Lameter <clameter@sgi.com> |
10 | Modified by Paul Menage <menage@google.com> | 10 | Modified by Paul Menage <menage@google.com> |
11 | Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> | ||
11 | 12 | ||
12 | CONTENTS: | 13 | CONTENTS: |
13 | ========= | 14 | ========= |
@@ -20,7 +21,8 @@ CONTENTS: | |||
20 | 1.5 What is memory_pressure ? | 21 | 1.5 What is memory_pressure ? |
21 | 1.6 What is memory spread ? | 22 | 1.6 What is memory spread ? |
22 | 1.7 What is sched_load_balance ? | 23 | 1.7 What is sched_load_balance ? |
23 | 1.8 How do I use cpusets ? | 24 | 1.8 What is sched_relax_domain_level ? |
25 | 1.9 How do I use cpusets ? | ||
24 | 2. Usage Examples and Syntax | 26 | 2. Usage Examples and Syntax |
25 | 2.1 Basic Usage | 27 | 2.1 Basic Usage |
26 | 2.2 Adding/removing cpus | 28 | 2.2 Adding/removing cpus |
@@ -169,6 +171,7 @@ files describing that cpuset: | |||
169 | - memory_migrate flag: if set, move pages to cpusets nodes | 171 | - memory_migrate flag: if set, move pages to cpusets nodes |
170 | - cpu_exclusive flag: is cpu placement exclusive? | 172 | - cpu_exclusive flag: is cpu placement exclusive? |
171 | - mem_exclusive flag: is memory placement exclusive? | 173 | - mem_exclusive flag: is memory placement exclusive? |
174 | - mem_hardwall flag: is memory allocation hardwalled | ||
172 | - memory_pressure: measure of how much paging pressure in cpuset | 175 | - memory_pressure: measure of how much paging pressure in cpuset |
173 | 176 | ||
174 | In addition, the root cpuset only has the following file: | 177 | In addition, the root cpuset only has the following file: |
@@ -220,17 +223,18 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than | |||
220 | a direct ancestor or descendent, may share any of the same CPUs or | 223 | a direct ancestor or descendent, may share any of the same CPUs or |
221 | Memory Nodes. | 224 | Memory Nodes. |
222 | 225 | ||
223 | A cpuset that is mem_exclusive restricts kernel allocations for | 226 | A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled", |
224 | page, buffer and other data commonly shared by the kernel across | 227 | i.e. it restricts kernel allocations for page, buffer and other data |
225 | multiple users. All cpusets, whether mem_exclusive or not, restrict | 228 | commonly shared by the kernel across multiple users. All cpusets, |
226 | allocations of memory for user space. This enables configuring a | 229 | whether hardwalled or not, restrict allocations of memory for user |
227 | system so that several independent jobs can share common kernel data, | 230 | space. This enables configuring a system so that several independent |
228 | such as file system pages, while isolating each jobs user allocation in | 231 | jobs can share common kernel data, such as file system pages, while |
229 | its own cpuset. To do this, construct a large mem_exclusive cpuset to | 232 | isolating each job's user allocation in its own cpuset. To do this, |
230 | hold all the jobs, and construct child, non-mem_exclusive cpusets for | 233 | construct a large mem_exclusive cpuset to hold all the jobs, and |
231 | each individual job. Only a small amount of typical kernel memory, | 234 | construct child, non-mem_exclusive cpusets for each individual job. |
232 | such as requests from interrupt handlers, is allowed to be taken | 235 | Only a small amount of typical kernel memory, such as requests from |
233 | outside even a mem_exclusive cpuset. | 236 | interrupt handlers, is allowed to be taken outside even a |
237 | mem_exclusive cpuset. | ||
234 | 238 | ||
235 | 239 | ||
236 | 1.5 What is memory_pressure ? | 240 | 1.5 What is memory_pressure ? |
@@ -497,7 +501,73 @@ the cpuset code to update these sched domains, it compares the new | |||
497 | partition requested with the current, and updates its sched domains, | 501 | partition requested with the current, and updates its sched domains, |
498 | removing the old and adding the new, for each change. | 502 | removing the old and adding the new, for each change. |
499 | 503 | ||
500 | 1.8 How do I use cpusets ? | 504 | |
505 | 1.8 What is sched_relax_domain_level ? | ||
506 | -------------------------------------- | ||
507 | |||
508 | In sched domain, the scheduler migrates tasks in 2 ways; periodic load | ||
509 | balance on tick, and at time of some schedule events. | ||
510 | |||
511 | When a task is woken up, scheduler try to move the task on idle CPU. | ||
512 | For example, if a task A running on CPU X activates another task B | ||
513 | on the same CPU X, and if CPU Y is X's sibling and performing idle, | ||
514 | then scheduler migrate task B to CPU Y so that task B can start on | ||
515 | CPU Y without waiting task A on CPU X. | ||
516 | |||
517 | And if a CPU run out of tasks in its runqueue, the CPU try to pull | ||
518 | extra tasks from other busy CPUs to help them before it is going to | ||
519 | be idle. | ||
520 | |||
521 | Of course it takes some searching cost to find movable tasks and/or | ||
522 | idle CPUs, the scheduler might not search all CPUs in the domain | ||
523 | everytime. In fact, in some architectures, the searching ranges on | ||
524 | events are limited in the same socket or node where the CPU locates, | ||
525 | while the load balance on tick searchs all. | ||
526 | |||
527 | For example, assume CPU Z is relatively far from CPU X. Even if CPU Z | ||
528 | is idle while CPU X and the siblings are busy, scheduler can't migrate | ||
529 | woken task B from X to Z since it is out of its searching range. | ||
530 | As the result, task B on CPU X need to wait task A or wait load balance | ||
531 | on the next tick. For some applications in special situation, waiting | ||
532 | 1 tick may be too long. | ||
533 | |||
534 | The 'sched_relax_domain_level' file allows you to request changing | ||
535 | this searching range as you like. This file takes int value which | ||
536 | indicates size of searching range in levels ideally as follows, | ||
537 | otherwise initial value -1 that indicates the cpuset has no request. | ||
538 | |||
539 | -1 : no request. use system default or follow request of others. | ||
540 | 0 : no search. | ||
541 | 1 : search siblings (hyperthreads in a core). | ||
542 | 2 : search cores in a package. | ||
543 | 3 : search cpus in a node [= system wide on non-NUMA system] | ||
544 | ( 4 : search nodes in a chunk of node [on NUMA system] ) | ||
545 | ( 5~ : search system wide [on NUMA system]) | ||
546 | |||
547 | This file is per-cpuset and affect the sched domain where the cpuset | ||
548 | belongs to. Therefore if the flag 'sched_load_balance' of a cpuset | ||
549 | is disabled, then 'sched_relax_domain_level' have no effect since | ||
550 | there is no sched domain belonging the cpuset. | ||
551 | |||
552 | If multiple cpusets are overlapping and hence they form a single sched | ||
553 | domain, the largest value among those is used. Be careful, if one | ||
554 | requests 0 and others are -1 then 0 is used. | ||
555 | |||
556 | Note that modifying this file will have both good and bad effects, | ||
557 | and whether it is acceptable or not will be depend on your situation. | ||
558 | Don't modify this file if you are not sure. | ||
559 | |||
560 | If your situation is: | ||
561 | - The migration costs between each cpu can be assumed considerably | ||
562 | small(for you) due to your special application's behavior or | ||
563 | special hardware support for CPU cache etc. | ||
564 | - The searching cost doesn't have impact(for you) or you can make | ||
565 | the searching cost enough small by managing cpuset to compact etc. | ||
566 | - The latency is required even it sacrifices cache hit rate etc. | ||
567 | then increasing 'sched_relax_domain_level' would benefit you. | ||
568 | |||
569 | |||
570 | 1.9 How do I use cpusets ? | ||
501 | -------------------------- | 571 | -------------------------- |
502 | 572 | ||
503 | In order to minimize the impact of cpusets on critical kernel | 573 | In order to minimize the impact of cpusets on critical kernel |
@@ -639,7 +709,7 @@ Now you want to do something with this cpuset. | |||
639 | 709 | ||
640 | In this directory you can find several files: | 710 | In this directory you can find several files: |
641 | # ls | 711 | # ls |
642 | cpus cpu_exclusive mems mem_exclusive tasks | 712 | cpus cpu_exclusive mems mem_exclusive mem_hardwall tasks |
643 | 713 | ||
644 | Reading them will give you information about the state of this cpuset: | 714 | Reading them will give you information about the state of this cpuset: |
645 | the CPUs and Memory Nodes it can use, the processes that are using | 715 | the CPUs and Memory Nodes it can use, the processes that are using |