aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cpusets.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/cpusets.txt')
-rw-r--r--Documentation/cpusets.txt98
1 files changed, 84 insertions, 14 deletions
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index ad2bb3b3acc1..fb7b361e6eea 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
8Modified by Paul Jackson <pj@sgi.com> 8Modified by Paul Jackson <pj@sgi.com>
9Modified by Christoph Lameter <clameter@sgi.com> 9Modified by Christoph Lameter <clameter@sgi.com>
10Modified by Paul Menage <menage@google.com> 10Modified by Paul Menage <menage@google.com>
11Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
11 12
12CONTENTS: 13CONTENTS:
13========= 14=========
@@ -20,7 +21,8 @@ CONTENTS:
20 1.5 What is memory_pressure ? 21 1.5 What is memory_pressure ?
21 1.6 What is memory spread ? 22 1.6 What is memory spread ?
22 1.7 What is sched_load_balance ? 23 1.7 What is sched_load_balance ?
23 1.8 How do I use cpusets ? 24 1.8 What is sched_relax_domain_level ?
25 1.9 How do I use cpusets ?
242. Usage Examples and Syntax 262. Usage Examples and Syntax
25 2.1 Basic Usage 27 2.1 Basic Usage
26 2.2 Adding/removing cpus 28 2.2 Adding/removing cpus
@@ -169,6 +171,7 @@ files describing that cpuset:
169 - memory_migrate flag: if set, move pages to cpusets nodes 171 - memory_migrate flag: if set, move pages to cpusets nodes
170 - cpu_exclusive flag: is cpu placement exclusive? 172 - cpu_exclusive flag: is cpu placement exclusive?
171 - mem_exclusive flag: is memory placement exclusive? 173 - mem_exclusive flag: is memory placement exclusive?
174 - mem_hardwall flag: is memory allocation hardwalled
172 - memory_pressure: measure of how much paging pressure in cpuset 175 - memory_pressure: measure of how much paging pressure in cpuset
173 176
174In addition, the root cpuset only has the following file: 177In addition, the root cpuset only has the following file:
@@ -220,17 +223,18 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than
220a direct ancestor or descendent, may share any of the same CPUs or 223a direct ancestor or descendent, may share any of the same CPUs or
221Memory Nodes. 224Memory Nodes.
222 225
223A cpuset that is mem_exclusive restricts kernel allocations for 226A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
224page, buffer and other data commonly shared by the kernel across 227i.e. it restricts kernel allocations for page, buffer and other data
225multiple users. All cpusets, whether mem_exclusive or not, restrict 228commonly shared by the kernel across multiple users. All cpusets,
226allocations of memory for user space. This enables configuring a 229whether hardwalled or not, restrict allocations of memory for user
227system so that several independent jobs can share common kernel data, 230space. This enables configuring a system so that several independent
228such as file system pages, while isolating each jobs user allocation in 231jobs can share common kernel data, such as file system pages, while
229its own cpuset. To do this, construct a large mem_exclusive cpuset to 232isolating each job's user allocation in its own cpuset. To do this,
230hold all the jobs, and construct child, non-mem_exclusive cpusets for 233construct a large mem_exclusive cpuset to hold all the jobs, and
231each individual job. Only a small amount of typical kernel memory, 234construct child, non-mem_exclusive cpusets for each individual job.
232such as requests from interrupt handlers, is allowed to be taken 235Only a small amount of typical kernel memory, such as requests from
233outside even a mem_exclusive cpuset. 236interrupt handlers, is allowed to be taken outside even a
237mem_exclusive cpuset.
234 238
235 239
2361.5 What is memory_pressure ? 2401.5 What is memory_pressure ?
@@ -497,7 +501,73 @@ the cpuset code to update these sched domains, it compares the new
497partition requested with the current, and updates its sched domains, 501partition requested with the current, and updates its sched domains,
498removing the old and adding the new, for each change. 502removing the old and adding the new, for each change.
499 503
5001.8 How do I use cpusets ? 504
5051.8 What is sched_relax_domain_level ?
506--------------------------------------
507
508In sched domain, the scheduler migrates tasks in 2 ways; periodic load
509balance on tick, and at time of some schedule events.
510
511When a task is woken up, scheduler try to move the task on idle CPU.
512For example, if a task A running on CPU X activates another task B
513on the same CPU X, and if CPU Y is X's sibling and performing idle,
514then scheduler migrate task B to CPU Y so that task B can start on
515CPU Y without waiting task A on CPU X.
516
517And if a CPU run out of tasks in its runqueue, the CPU try to pull
518extra tasks from other busy CPUs to help them before it is going to
519be idle.
520
521Of course it takes some searching cost to find movable tasks and/or
522idle CPUs, the scheduler might not search all CPUs in the domain
523everytime. In fact, in some architectures, the searching ranges on
524events are limited in the same socket or node where the CPU locates,
525while the load balance on tick searchs all.
526
527For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
528is idle while CPU X and the siblings are busy, scheduler can't migrate
529woken task B from X to Z since it is out of its searching range.
530As the result, task B on CPU X need to wait task A or wait load balance
531on the next tick. For some applications in special situation, waiting
5321 tick may be too long.
533
534The 'sched_relax_domain_level' file allows you to request changing
535this searching range as you like. This file takes int value which
536indicates size of searching range in levels ideally as follows,
537otherwise initial value -1 that indicates the cpuset has no request.
538
539 -1 : no request. use system default or follow request of others.
540 0 : no search.
541 1 : search siblings (hyperthreads in a core).
542 2 : search cores in a package.
543 3 : search cpus in a node [= system wide on non-NUMA system]
544 ( 4 : search nodes in a chunk of node [on NUMA system] )
545 ( 5~ : search system wide [on NUMA system])
546
547This file is per-cpuset and affect the sched domain where the cpuset
548belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
549is disabled, then 'sched_relax_domain_level' have no effect since
550there is no sched domain belonging the cpuset.
551
552If multiple cpusets are overlapping and hence they form a single sched
553domain, the largest value among those is used. Be careful, if one
554requests 0 and others are -1 then 0 is used.
555
556Note that modifying this file will have both good and bad effects,
557and whether it is acceptable or not will be depend on your situation.
558Don't modify this file if you are not sure.
559
560If your situation is:
561 - The migration costs between each cpu can be assumed considerably
562 small(for you) due to your special application's behavior or
563 special hardware support for CPU cache etc.
564 - The searching cost doesn't have impact(for you) or you can make
565 the searching cost enough small by managing cpuset to compact etc.
566 - The latency is required even it sacrifices cache hit rate etc.
567then increasing 'sched_relax_domain_level' would benefit you.
568
569
5701.9 How do I use cpusets ?
501-------------------------- 571--------------------------
502 572
503In order to minimize the impact of cpusets on critical kernel 573In order to minimize the impact of cpusets on critical kernel
@@ -639,7 +709,7 @@ Now you want to do something with this cpuset.
639 709
640In this directory you can find several files: 710In this directory you can find several files:
641# ls 711# ls
642cpus cpu_exclusive mems mem_exclusive tasks 712cpus cpu_exclusive mems mem_exclusive mem_hardwall tasks
643 713
644Reading them will give you information about the state of this cpuset: 714Reading them will give you information about the state of this cpuset:
645the CPUs and Memory Nodes it can use, the processes that are using 715the CPUs and Memory Nodes it can use, the processes that are using