diff options
author | Lee Schermerhorn <lee.schermerhorn@hp.com> | 2008-04-28 05:13:18 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2008-04-28 11:58:24 -0400 |
commit | bea904d54d6faa92400f10c8ea3d3828b8e1eb93 (patch) | |
tree | 24966dd4dabadb4bb32aa1e00fae2c2168661229 /Documentation | |
parent | 52cd3b074050dd664380b5e8cfc85d4a6ed8ad48 (diff) |
mempolicy: use MPOL_PREFERRED for system-wide default policy
Currently, when one specifies MPOL_DEFAULT via a NUMA memory policy API
[set_mempolicy(), mbind() and internal versions], the kernel simply installs a
NULL struct mempolicy pointer in the appropriate context: task policy, vma
policy, or shared policy. This causes any use of that policy to "fall back"
to the next most specific policy scope.
The only use of MPOL_DEFAULT to mean "local allocation" is in the system
default policy. This requires extra checks/cases for MPOL_DEFAULT in many
mempolicy.c functions.
There is another, "preferred" way to specify local allocation via the APIs.
That is using the MPOL_PREFERRED policy mode with an empty nodemask.
Internally, the empty nodemask gets converted to a preferred_node id of '-1'.
All internal usage of MPOL_PREFERRED will convert the '-1' to the id of the
node local to the cpu where the allocation occurs.
System default policy, except during boot, is hard-coded to "local
allocation". By using the MPOL_PREFERRED mode with a negative value of
preferred node for system default policy, MPOL_DEFAULT will never occur in the
'policy' member of a struct mempolicy. Thus, we can remove all checks for
MPOL_DEFAULT when converting policy to a node id/zonelist in the allocation
paths.
In slab_node() return local node id when policy pointer is NULL. No need to
set a pol value to take the switch default. Replace switch default with
BUG()--i.e., shouldn't happen.
With this patch MPOL_DEFAULT is only used in the APIs, including internal
calls to do_set_mempolicy() and in the display of policy in
/proc/<pid>/numa_maps. It always means "fall back" to the the next most
specific policy scope. This simplifies the description of memory policies
quite a bit, with no visible change in behavior.
get_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when
the requested policy [task or vma/shared] is NULL. These are the values one
would supply via set_mempolicy() or mbind() to achieve that condition--default
behavior.
This patch updates Documentation to reflect this change.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/vm/numa_memory_policy.txt | 54 |
1 files changed, 18 insertions, 36 deletions
diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index 6719d642653f..13cca5a3cf17 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt | |||
@@ -147,35 +147,18 @@ Components of Memory Policies | |||
147 | 147 | ||
148 | Linux memory policy supports the following 4 behavioral modes: | 148 | Linux memory policy supports the following 4 behavioral modes: |
149 | 149 | ||
150 | Default Mode--MPOL_DEFAULT: The behavior specified by this mode is | 150 | Default Mode--MPOL_DEFAULT: This mode is only used in the memory |
151 | context or scope dependent. | 151 | policy APIs. Internally, MPOL_DEFAULT is converted to the NULL |
152 | 152 | memory policy in all policy scopes. Any existing non-default policy | |
153 | As mentioned in the Policy Scope section above, during normal | 153 | will simply be removed when MPOL_DEFAULT is specified. As a result, |
154 | system operation, the System Default Policy is hard coded to | 154 | MPOL_DEFAULT means "fall back to the next most specific policy scope." |
155 | contain the Default mode. | 155 | |
156 | 156 | For example, a NULL or default task policy will fall back to the | |
157 | In this context, default mode means "local" allocation--that is | 157 | system default policy. A NULL or default vma policy will fall |
158 | attempt to allocate the page from the node associated with the cpu | 158 | back to the task policy. |
159 | where the fault occurs. If the "local" node has no memory, or the | 159 | |
160 | node's memory can be exhausted [no free pages available], local | 160 | When specified in one of the memory policy APIs, the Default mode |
161 | allocation will "fallback to"--attempt to allocate pages from-- | 161 | does not use the optional set of nodes. |
162 | "nearby" nodes, in order of increasing "distance". | ||
163 | |||
164 | Implementation detail -- subject to change: "Fallback" uses | ||
165 | a per node list of sibling nodes--called zonelists--built at | ||
166 | boot time, or when nodes or memory are added or removed from | ||
167 | the system [memory hotplug]. These per node zonelist are | ||
168 | constructed with nodes in order of increasing distance based | ||
169 | on information provided by the platform firmware. | ||
170 | |||
171 | When a task/process policy or a shared policy contains the Default | ||
172 | mode, this also means "local allocation", as described above. | ||
173 | |||
174 | In the context of a VMA, Default mode means "fall back to task | ||
175 | policy"--which may or may not specify Default mode. Thus, Default | ||
176 | mode can not be counted on to mean local allocation when used | ||
177 | on a non-shared region of the address space. However, see | ||
178 | MPOL_PREFERRED below. | ||
179 | 162 | ||
180 | It is an error for the set of nodes specified for this policy to | 163 | It is an error for the set of nodes specified for this policy to |
181 | be non-empty. | 164 | be non-empty. |
@@ -187,19 +170,18 @@ Components of Memory Policies | |||
187 | 170 | ||
188 | MPOL_PREFERRED: This mode specifies that the allocation should be | 171 | MPOL_PREFERRED: This mode specifies that the allocation should be |
189 | attempted from the single node specified in the policy. If that | 172 | attempted from the single node specified in the policy. If that |
190 | allocation fails, the kernel will search other nodes, exactly as | 173 | allocation fails, the kernel will search other nodes, in order of |
191 | it would for a local allocation that started at the preferred node | 174 | increasing distance from the preferred node based on information |
192 | in increasing distance from the preferred node. "Local" allocation | 175 | provided by the platform firmware. |
193 | policy can be viewed as a Preferred policy that starts at the node | ||
194 | containing the cpu where the allocation takes place. | 176 | containing the cpu where the allocation takes place. |
195 | 177 | ||
196 | Internally, the Preferred policy uses a single node--the | 178 | Internally, the Preferred policy uses a single node--the |
197 | preferred_node member of struct mempolicy. A "distinguished | 179 | preferred_node member of struct mempolicy. A "distinguished |
198 | value of this preferred_node, currently '-1', is interpreted | 180 | value of this preferred_node, currently '-1', is interpreted |
199 | as "the node containing the cpu where the allocation takes | 181 | as "the node containing the cpu where the allocation takes |
200 | place"--local allocation. This is the way to specify | 182 | place"--local allocation. "Local" allocation policy can be |
201 | local allocation for a specific range of addresses--i.e. for | 183 | viewed as a Preferred policy that starts at the node containing |
202 | VMA policies. | 184 | the cpu where the allocation takes place. |
203 | 185 | ||
204 | It is possible for the user to specify that local allocation is | 186 | It is possible for the user to specify that local allocation is |
205 | always preferred by passing an empty nodemask with this mode. | 187 | always preferred by passing an empty nodemask with this mode. |