aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorPaul Jackson <pj@sgi.com>2007-10-16 04:27:43 -0400
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-10-16 12:43:09 -0400
commit607717a65d92858fd925bec05baae4d142719f27 (patch)
treeb7faea733fe3426881e63bc7549db9c97c8bdf59 /Documentation
parent2ed6dc34f9ed39bb8e4c81ea1056f0ba56315841 (diff)
cpuset: remove sched domain hooks from cpusets
Remove the cpuset hooks that defined sched domains depending on the setting of the 'cpu_exclusive' flag. The cpu_exclusive flag can only be set on a child if it is set on the parent. This made that flag painfully unsuitable for use as a flag defining a partitioning of a system. It was entirely unobvious to a cpuset user what partitioning of sched domains they would be causing when they set that one cpu_exclusive bit on one cpuset, because it depended on what CPUs were in the remainder of that cpusets siblings and child cpusets, after subtracting out other cpu_exclusive cpusets. Furthermore, there was no way on production systems to query the result. Using the cpu_exclusive flag for this was simply wrong from the get go. Fortunately, it was sufficiently borked that so far as I know, almost no successful use has been made of this. One real time group did use it to affectively isolate CPUs from any load balancing efforts. They are willing to adapt to alternative mechanisms for this, such as someway to manipulate the list of isolated CPUs on a running system. They can do without this present cpu_exclusive based mechanism while we develop an alternative. There is a real risk, to the best of my understanding, of users accidentally setting up a partitioned scheduler domains, inhibiting desired load balancing across all their CPUs, due to the nonobvious (from the cpuset perspective) side affects of the cpu_exclusive flag. Furthermore, since there was no way on a running system to see what one was doing with sched domains, this change will be invisible to any using code. Unless they have real insight to the scheduler load balancing choices, they will be unable to detect that this change has been made in the kernel's behaviour. Initial discussion on lkml of this patch has generated much comment. My (probably controversial) take on that discussion is that it has reached a rough concensus that the current cpuset cpu_exclusive mechanism for defining sched domains is borked. There is no concensus on the replacement. But since we can remove this mechanism, and since its continued presence risks causing unwanted partitioning of the schedulers load balancing, we should remove it while we can, as we proceed to work the replacement scheduler domain mechanisms. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/cpusets.txt17
1 files changed, 0 insertions, 17 deletions
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index b875d231ac74..ec9de6917f01 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -87,9 +87,6 @@ This can be especially valuable on:
87 and a database), or 87 and a database), or
88 * NUMA systems running large HPC applications with demanding 88 * NUMA systems running large HPC applications with demanding
89 performance characteristics. 89 performance characteristics.
90 * Also cpu_exclusive cpusets are useful for servers running orthogonal
91 workloads such as RT applications requiring low latency and HPC
92 applications that are throughput sensitive
93 90
94These subsets, or "soft partitions" must be able to be dynamically 91These subsets, or "soft partitions" must be able to be dynamically
95adjusted, as the job mix changes, without impacting other concurrently 92adjusted, as the job mix changes, without impacting other concurrently
@@ -132,8 +129,6 @@ Cpusets extends these two mechanisms as follows:
132 - A cpuset may be marked exclusive, which ensures that no other 129 - A cpuset may be marked exclusive, which ensures that no other
133 cpuset (except direct ancestors and descendents) may contain 130 cpuset (except direct ancestors and descendents) may contain
134 any overlapping CPUs or Memory Nodes. 131 any overlapping CPUs or Memory Nodes.
135 Also a cpu_exclusive cpuset would be associated with a sched
136 domain.
137 - You can list all the tasks (by pid) attached to any cpuset. 132 - You can list all the tasks (by pid) attached to any cpuset.
138 133
139The implementation of cpusets requires a few, simple hooks 134The implementation of cpusets requires a few, simple hooks
@@ -145,9 +140,6 @@ into the rest of the kernel, none in performance critical paths:
145 allowed in that tasks cpuset. 140 allowed in that tasks cpuset.
146 - in sched.c migrate_all_tasks(), to keep migrating tasks within 141 - in sched.c migrate_all_tasks(), to keep migrating tasks within
147 the CPUs allowed by their cpuset, if possible. 142 the CPUs allowed by their cpuset, if possible.
148 - in sched.c, a new API partition_sched_domains for handling
149 sched domain changes associated with cpu_exclusive cpusets
150 and related changes in both sched.c and arch/ia64/kernel/domain.c
151 - in the mbind and set_mempolicy system calls, to mask the requested 143 - in the mbind and set_mempolicy system calls, to mask the requested
152 Memory Nodes by what's allowed in that tasks cpuset. 144 Memory Nodes by what's allowed in that tasks cpuset.
153 - in page_alloc.c, to restrict memory to allowed nodes. 145 - in page_alloc.c, to restrict memory to allowed nodes.
@@ -232,15 +224,6 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than
232a direct ancestor or descendent, may share any of the same CPUs or 224a direct ancestor or descendent, may share any of the same CPUs or
233Memory Nodes. 225Memory Nodes.
234 226
235A cpuset that is cpu_exclusive has a scheduler (sched) domain
236associated with it. The sched domain consists of all CPUs in the
237current cpuset that are not part of any exclusive child cpusets.
238This ensures that the scheduler load balancing code only balances
239against the CPUs that are in the sched domain as defined above and
240not all of the CPUs in the system. This removes any overhead due to
241load balancing code trying to pull tasks outside of the cpu_exclusive
242cpuset only to be prevented by the tasks' cpus_allowed mask.
243
244A cpuset that is mem_exclusive restricts kernel allocations for 227A cpuset that is mem_exclusive restricts kernel allocations for
245page, buffer and other data commonly shared by the kernel across 228page, buffer and other data commonly shared by the kernel across
246multiple users. All cpusets, whether mem_exclusive or not, restrict 229multiple users. All cpusets, whether mem_exclusive or not, restrict