aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>2008-04-15 01:03:17 -0400
committerIngo Molnar <mingo@elte.hu>2008-04-19 13:45:00 -0400
commit4d5f35533fb9b2cd553cec6611195bcbfb7ffd84 (patch)
tree9b705cea38a00fe6c3eda29960b957f1b77e50ff /Documentation
parentb758149c02638146a835f42097dd1950a6cae638 (diff)
sched, cpuset: customize sched domains, docs
This patch introduces new feature of cpuset - sched domain customization. This version provides a per-cpuset file 'sched_relax_domain_level' that enable us to change the searching range of scheduler, which used to limit how many cpus the scheduler searches at some schedule events, such as wakening task and running out of runqueue. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/cpusets.txt72
1 files changed, 70 insertions, 2 deletions
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index ad2bb3b3acc1..aa854b9b18cd 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
8Modified by Paul Jackson <pj@sgi.com> 8Modified by Paul Jackson <pj@sgi.com>
9Modified by Christoph Lameter <clameter@sgi.com> 9Modified by Christoph Lameter <clameter@sgi.com>
10Modified by Paul Menage <menage@google.com> 10Modified by Paul Menage <menage@google.com>
11Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
11 12
12CONTENTS: 13CONTENTS:
13========= 14=========
@@ -20,7 +21,8 @@ CONTENTS:
20 1.5 What is memory_pressure ? 21 1.5 What is memory_pressure ?
21 1.6 What is memory spread ? 22 1.6 What is memory spread ?
22 1.7 What is sched_load_balance ? 23 1.7 What is sched_load_balance ?
23 1.8 How do I use cpusets ? 24 1.8 What is sched_relax_domain_level ?
25 1.9 How do I use cpusets ?
242. Usage Examples and Syntax 262. Usage Examples and Syntax
25 2.1 Basic Usage 27 2.1 Basic Usage
26 2.2 Adding/removing cpus 28 2.2 Adding/removing cpus
@@ -497,7 +499,73 @@ the cpuset code to update these sched domains, it compares the new
497partition requested with the current, and updates its sched domains, 499partition requested with the current, and updates its sched domains,
498removing the old and adding the new, for each change. 500removing the old and adding the new, for each change.
499 501
5001.8 How do I use cpusets ? 502
5031.8 What is sched_relax_domain_level ?
504--------------------------------------
505
506In sched domain, the scheduler migrates tasks in 2 ways; periodic load
507balance on tick, and at time of some schedule events.
508
509When a task is woken up, scheduler try to move the task on idle CPU.
510For example, if a task A running on CPU X activates another task B
511on the same CPU X, and if CPU Y is X's sibling and performing idle,
512then scheduler migrate task B to CPU Y so that task B can start on
513CPU Y without waiting task A on CPU X.
514
515And if a CPU run out of tasks in its runqueue, the CPU try to pull
516extra tasks from other busy CPUs to help them before it is going to
517be idle.
518
519Of course it takes some searching cost to find movable tasks and/or
520idle CPUs, the scheduler might not search all CPUs in the domain
521everytime. In fact, in some architectures, the searching ranges on
522events are limited in the same socket or node where the CPU locates,
523while the load balance on tick searchs all.
524
525For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
526is idle while CPU X and the siblings are busy, scheduler can't migrate
527woken task B from X to Z since it is out of its searching range.
528As the result, task B on CPU X need to wait task A or wait load balance
529on the next tick. For some applications in special situation, waiting
5301 tick may be too long.
531
532The 'sched_relax_domain_level' file allows you to request changing
533this searching range as you like. This file takes int value which
534indicates size of searching range in levels ideally as follows,
535otherwise initial value -1 that indicates the cpuset has no request.
536
537 -1 : no request. use system default or follow request of others.
538 0 : no search.
539 1 : search siblings (hyperthreads in a core).
540 2 : search cores in a package.
541 3 : search cpus in a node [= system wide on non-NUMA system]
542 ( 4 : search nodes in a chunk of node [on NUMA system] )
543 ( 5~ : search system wide [on NUMA system])
544
545This file is per-cpuset and affect the sched domain where the cpuset
546belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
547is disabled, then 'sched_relax_domain_level' have no effect since
548there is no sched domain belonging the cpuset.
549
550If multiple cpusets are overlapping and hence they form a single sched
551domain, the largest value among those is used. Be careful, if one
552requests 0 and others are -1 then 0 is used.
553
554Note that modifying this file will have both good and bad effects,
555and whether it is acceptable or not will be depend on your situation.
556Don't modify this file if you are not sure.
557
558If your situation is:
559 - The migration costs between each cpu can be assumed considerably
560 small(for you) due to your special application's behavior or
561 special hardware support for CPU cache etc.
562 - The searching cost doesn't have impact(for you) or you can make
563 the searching cost enough small by managing cpuset to compact etc.
564 - The latency is required even it sacrifices cache hit rate etc.
565then increasing 'sched_relax_domain_level' would benefit you.
566
567
5681.9 How do I use cpusets ?
501-------------------------- 569--------------------------
502 570
503In order to minimize the impact of cpusets on critical kernel 571In order to minimize the impact of cpusets on critical kernel