1 files changed, 70 insertions, 2 deletions
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index ad2bb3b3acc1..aa854b9b18cd 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
 Modified by Paul Jackson <pj@sgi.com>
 Modified by Christoph Lameter <clameter@sgi.com>
 Modified by Paul Menage <menage@google.com>
+Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
 CONTENTS:
 =========
@@ -20,7 +21,8 @@ CONTENTS:
  1.5 What is memory_pressure ?
  1.6 What is memory spread ?
  1.7 What is sched_load_balance ?
-  1.8 How do I use cpusets ?
+  1.8 What is sched_relax_domain_level ?
+  1.9 How do I use cpusets ?
 2. Usage Examples and Syntax
  2.1 Basic Usage
  2.2 Adding/removing cpus
@@ -497,7 +499,73 @@ the cpuset code to update these sched domains, it compares the new
 partition requested with the current, and updates its sched domains,
 removing the old and adding the new, for each change.
-1.8 How do I use cpusets ?
+1.8 What is sched_relax_domain_level ?
+--------------------------------------
+In sched domain, the scheduler migrates tasks in 2 ways; periodic load
+balance on tick, and at time of some schedule events.
+When a task is woken up, scheduler try to move the task on idle CPU.
+For example, if a task A running on CPU X activates another task B
+on the same CPU X, and if CPU Y is X's sibling and performing idle,
+then scheduler migrate task B to CPU Y so that task B can start on
+CPU Y without waiting task A on CPU X.
+And if a CPU run out of tasks in its runqueue, the CPU try to pull
+extra tasks from other busy CPUs to help them before it is going to
+be idle.
+Of course it takes some searching cost to find movable tasks and/or
+idle CPUs, the scheduler might not search all CPUs in the domain
+everytime.  In fact, in some architectures, the searching ranges on
+events are limited in the same socket or node where the CPU locates,
+while the load balance on tick searchs all.
+For example, assume CPU Z is relatively far from CPU X.  Even if CPU Z
+is idle while CPU X and the siblings are busy, scheduler can't migrate
+woken task B from X to Z since it is out of its searching range.
+As the result, task B on CPU X need to wait task A or wait load balance
+on the next tick.  For some applications in special situation, waiting
+1 tick may be too long.
+The 'sched_relax_domain_level' file allows you to request changing
+this searching range as you like.  This file takes int value which
+indicates size of searching range in levels ideally as follows,
+otherwise initial value -1 that indicates the cpuset has no request.
+  -1  : no request. use system default or follow request of others.
+   0  : no search.
+   1  : search siblings (hyperthreads in a core).
+   2  : search cores in a package.
+   3  : search cpus in a node [= system wide on non-NUMA system]
+ ( 4  : search nodes in a chunk of node [on NUMA system] )
+ ( 5~ : search system wide [on NUMA system])
+This file is per-cpuset and affect the sched domain where the cpuset
+belongs to.  Therefore if the flag 'sched_load_balance' of a cpuset
+is disabled, then 'sched_relax_domain_level' have no effect since
+there is no sched domain belonging the cpuset.
+If multiple cpusets are overlapping and hence they form a single sched
+domain, the largest value among those is used.  Be careful, if one
+requests 0 and others are -1 then 0 is used.
+Note that modifying this file will have both good and bad effects,
+and whether it is acceptable or not will be depend on your situation.
+Don't modify this file if you are not sure.
+If your situation is:
+ - The migration costs between each cpu can be assumed considerably
+   small(for you) due to your special application's behavior or
+   special hardware support for CPU cache etc.
+ - The searching cost doesn't have impact(for you) or you can make
+   the searching cost enough small by managing cpuset to compact etc.
+ - The latency is required even it sacrifices cache hit rate etc.
+then increasing 'sched_relax_domain_level' would benefit you.
+1.9 How do I use cpusets ?
 --------------------------
 In order to minimize the impact of cpusets on critical kernel

diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index ad2bb3b3acc1..aa854b9b18cd 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt
@@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
8	Modified by Paul Jackson <pj@sgi.com>	8	Modified by Paul Jackson <pj@sgi.com>
9	Modified by Christoph Lameter <clameter@sgi.com>	9	Modified by Christoph Lameter <clameter@sgi.com>
10	Modified by Paul Menage <menage@google.com>	10	Modified by Paul Menage <menage@google.com>
		11	Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
11		12
12	CONTENTS:	13	CONTENTS:
13	=========	14	=========
@@ -20,7 +21,8 @@ CONTENTS:
20	1.5 What is memory_pressure ?	21	1.5 What is memory_pressure ?
21	1.6 What is memory spread ?	22	1.6 What is memory spread ?
22	1.7 What is sched_load_balance ?	23	1.7 What is sched_load_balance ?
23	1.8 How do I use cpusets ?	24	1.8 What is sched_relax_domain_level ?
		25	1.9 How do I use cpusets ?
24	2. Usage Examples and Syntax	26	2. Usage Examples and Syntax
25	2.1 Basic Usage	27	2.1 Basic Usage
26	2.2 Adding/removing cpus	28	2.2 Adding/removing cpus
@@ -497,7 +499,73 @@ the cpuset code to update these sched domains, it compares the new
497	partition requested with the current, and updates its sched domains,	499	partition requested with the current, and updates its sched domains,
498	removing the old and adding the new, for each change.	500	removing the old and adding the new, for each change.
499		501
500	1.8 How do I use cpusets ?	502
		503	1.8 What is sched_relax_domain_level ?
		504	--------------------------------------
		505
		506	In sched domain, the scheduler migrates tasks in 2 ways; periodic load
		507	balance on tick, and at time of some schedule events.
		508
		509	When a task is woken up, scheduler try to move the task on idle CPU.
		510	For example, if a task A running on CPU X activates another task B
		511	on the same CPU X, and if CPU Y is X's sibling and performing idle,
		512	then scheduler migrate task B to CPU Y so that task B can start on
		513	CPU Y without waiting task A on CPU X.
		514
		515	And if a CPU run out of tasks in its runqueue, the CPU try to pull
		516	extra tasks from other busy CPUs to help them before it is going to
		517	be idle.
		518
		519	Of course it takes some searching cost to find movable tasks and/or
		520	idle CPUs, the scheduler might not search all CPUs in the domain
		521	everytime. In fact, in some architectures, the searching ranges on
		522	events are limited in the same socket or node where the CPU locates,
		523	while the load balance on tick searchs all.
		524
		525	For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
		526	is idle while CPU X and the siblings are busy, scheduler can't migrate
		527	woken task B from X to Z since it is out of its searching range.
		528	As the result, task B on CPU X need to wait task A or wait load balance
		529	on the next tick. For some applications in special situation, waiting
		530	1 tick may be too long.
		531
		532	The 'sched_relax_domain_level' file allows you to request changing
		533	this searching range as you like. This file takes int value which
		534	indicates size of searching range in levels ideally as follows,
		535	otherwise initial value -1 that indicates the cpuset has no request.
		536
		537	-1 : no request. use system default or follow request of others.
		538	0 : no search.
		539	1 : search siblings (hyperthreads in a core).
		540	2 : search cores in a package.
		541	3 : search cpus in a node [= system wide on non-NUMA system]
		542	( 4 : search nodes in a chunk of node [on NUMA system] )
		543	( 5~ : search system wide [on NUMA system])
		544
		545	This file is per-cpuset and affect the sched domain where the cpuset
		546	belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
		547	is disabled, then 'sched_relax_domain_level' have no effect since
		548	there is no sched domain belonging the cpuset.
		549
		550	If multiple cpusets are overlapping and hence they form a single sched
		551	domain, the largest value among those is used. Be careful, if one
		552	requests 0 and others are -1 then 0 is used.
		553
		554	Note that modifying this file will have both good and bad effects,
		555	and whether it is acceptable or not will be depend on your situation.
		556	Don't modify this file if you are not sure.
		557
		558	If your situation is:
		559	- The migration costs between each cpu can be assumed considerably
		560	small(for you) due to your special application's behavior or
		561	special hardware support for CPU cache etc.
		562	- The searching cost doesn't have impact(for you) or you can make
		563	the searching cost enough small by managing cpuset to compact etc.
		564	- The latency is required even it sacrifices cache hit rate etc.
		565	then increasing 'sched_relax_domain_level' would benefit you.
		566
		567
		568	1.9 How do I use cpusets ?
501	--------------------------	569	--------------------------
502		570
503	In order to minimize the impact of cpusets on critical kernel	571	In order to minimize the impact of cpusets on critical kernel