aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLee Schermerhorn <lee.schermerhorn@hp.com>2008-04-28 05:13:18 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2008-04-28 11:58:24 -0400
commitbea904d54d6faa92400f10c8ea3d3828b8e1eb93 (patch)
tree24966dd4dabadb4bb32aa1e00fae2c2168661229 /Documentation
parent52cd3b074050dd664380b5e8cfc85d4a6ed8ad48 (diff)
mempolicy: use MPOL_PREFERRED for system-wide default policy
Currently, when one specifies MPOL_DEFAULT via a NUMA memory policy API [set_mempolicy(), mbind() and internal versions], the kernel simply installs a NULL struct mempolicy pointer in the appropriate context: task policy, vma policy, or shared policy. This causes any use of that policy to "fall back" to the next most specific policy scope. The only use of MPOL_DEFAULT to mean "local allocation" is in the system default policy. This requires extra checks/cases for MPOL_DEFAULT in many mempolicy.c functions. There is another, "preferred" way to specify local allocation via the APIs. That is using the MPOL_PREFERRED policy mode with an empty nodemask. Internally, the empty nodemask gets converted to a preferred_node id of '-1'. All internal usage of MPOL_PREFERRED will convert the '-1' to the id of the node local to the cpu where the allocation occurs. System default policy, except during boot, is hard-coded to "local allocation". By using the MPOL_PREFERRED mode with a negative value of preferred node for system default policy, MPOL_DEFAULT will never occur in the 'policy' member of a struct mempolicy. Thus, we can remove all checks for MPOL_DEFAULT when converting policy to a node id/zonelist in the allocation paths. In slab_node() return local node id when policy pointer is NULL. No need to set a pol value to take the switch default. Replace switch default with BUG()--i.e., shouldn't happen. With this patch MPOL_DEFAULT is only used in the APIs, including internal calls to do_set_mempolicy() and in the display of policy in /proc/<pid>/numa_maps. It always means "fall back" to the the next most specific policy scope. This simplifies the description of memory policies quite a bit, with no visible change in behavior. get_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when the requested policy [task or vma/shared] is NULL. These are the values one would supply via set_mempolicy() or mbind() to achieve that condition--default behavior. This patch updates Documentation to reflect this change. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/vm/numa_memory_policy.txt54
1 files changed, 18 insertions, 36 deletions
diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
index 6719d642653f..13cca5a3cf17 100644
--- a/Documentation/vm/numa_memory_policy.txt
+++ b/Documentation/vm/numa_memory_policy.txt
@@ -147,35 +147,18 @@ Components of Memory Policies
147 147
148 Linux memory policy supports the following 4 behavioral modes: 148 Linux memory policy supports the following 4 behavioral modes:
149 149
150 Default Mode--MPOL_DEFAULT: The behavior specified by this mode is 150 Default Mode--MPOL_DEFAULT: This mode is only used in the memory
151 context or scope dependent. 151 policy APIs. Internally, MPOL_DEFAULT is converted to the NULL
152 152 memory policy in all policy scopes. Any existing non-default policy
153 As mentioned in the Policy Scope section above, during normal 153 will simply be removed when MPOL_DEFAULT is specified. As a result,
154 system operation, the System Default Policy is hard coded to 154 MPOL_DEFAULT means "fall back to the next most specific policy scope."
155 contain the Default mode. 155
156 156 For example, a NULL or default task policy will fall back to the
157 In this context, default mode means "local" allocation--that is 157 system default policy. A NULL or default vma policy will fall
158 attempt to allocate the page from the node associated with the cpu 158 back to the task policy.
159 where the fault occurs. If the "local" node has no memory, or the 159
160 node's memory can be exhausted [no free pages available], local 160 When specified in one of the memory policy APIs, the Default mode
161 allocation will "fallback to"--attempt to allocate pages from-- 161 does not use the optional set of nodes.
162 "nearby" nodes, in order of increasing "distance".
163
164 Implementation detail -- subject to change: "Fallback" uses
165 a per node list of sibling nodes--called zonelists--built at
166 boot time, or when nodes or memory are added or removed from
167 the system [memory hotplug]. These per node zonelist are
168 constructed with nodes in order of increasing distance based
169 on information provided by the platform firmware.
170
171 When a task/process policy or a shared policy contains the Default
172 mode, this also means "local allocation", as described above.
173
174 In the context of a VMA, Default mode means "fall back to task
175 policy"--which may or may not specify Default mode. Thus, Default
176 mode can not be counted on to mean local allocation when used
177 on a non-shared region of the address space. However, see
178 MPOL_PREFERRED below.
179 162
180 It is an error for the set of nodes specified for this policy to 163 It is an error for the set of nodes specified for this policy to
181 be non-empty. 164 be non-empty.
@@ -187,19 +170,18 @@ Components of Memory Policies
187 170
188 MPOL_PREFERRED: This mode specifies that the allocation should be 171 MPOL_PREFERRED: This mode specifies that the allocation should be
189 attempted from the single node specified in the policy. If that 172 attempted from the single node specified in the policy. If that
190 allocation fails, the kernel will search other nodes, exactly as 173 allocation fails, the kernel will search other nodes, in order of
191 it would for a local allocation that started at the preferred node 174 increasing distance from the preferred node based on information
192 in increasing distance from the preferred node. "Local" allocation 175 provided by the platform firmware.
193 policy can be viewed as a Preferred policy that starts at the node
194 containing the cpu where the allocation takes place. 176 containing the cpu where the allocation takes place.
195 177
196 Internally, the Preferred policy uses a single node--the 178 Internally, the Preferred policy uses a single node--the
197 preferred_node member of struct mempolicy. A "distinguished 179 preferred_node member of struct mempolicy. A "distinguished
198 value of this preferred_node, currently '-1', is interpreted 180 value of this preferred_node, currently '-1', is interpreted
199 as "the node containing the cpu where the allocation takes 181 as "the node containing the cpu where the allocation takes
200 place"--local allocation. This is the way to specify 182 place"--local allocation. "Local" allocation policy can be
201 local allocation for a specific range of addresses--i.e. for 183 viewed as a Preferred policy that starts at the node containing
202 VMA policies. 184 the cpu where the allocation takes place.
203 185
204 It is possible for the user to specify that local allocation is 186 It is possible for the user to specify that local allocation is
205 always preferred by passing an empty nodemask with this mode. 187 always preferred by passing an empty nodemask with this mode.