aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDavid Rientjes <rientjes@google.com>2010-05-24 17:32:13 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2010-05-25 11:06:58 -0400
commite325c90ffc13b698fa2814102e05275b21c26bec (patch)
treedcb20cad204132e08476d3cb4da66f9a2e08b9fe
parent1a5cb81465b66b74bf3d6ad36e5382238de6a132 (diff)
mm: default to node zonelist ordering when nodes have only lowmem
There are two types of zonelist ordering methodologies: - node order, preferring allocations on a node to stay local to and - zone order, preferring allocations come from a higher zone to avoid allocating in lowmem zones even though they may not be local. The ordering technique used by the kernel is configurable on the command line, but also has some logic to determine what the default should be. This logic currently lacks knowledge of systems where a node may only have lowmem. For such systems, it is necessary to use node order so that GFP_KERNEL allocations may be satisfied by nodes consisting of only lowmem. If zone order is used, GFP_KERNEL allocations to such nodes are actually allocated on a node with local affinity that includes ZONE_NORMAL. This change defaults to node zonelist ordering if any node lacks ZONE_NORMAL. To force zone order, append 'numa_zonelist_order=zone' to the kernel command line. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--mm/page_alloc.c11
1 files changed, 10 insertions, 1 deletions
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f7da2a2934b7..cefe6fe8d991 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2606,7 +2606,7 @@ static int default_zonelist_order(void)
2606 * ZONE_DMA and ZONE_DMA32 can be very small area in the system. 2606 * ZONE_DMA and ZONE_DMA32 can be very small area in the system.
2607 * If they are really small and used heavily, the system can fall 2607 * If they are really small and used heavily, the system can fall
2608 * into OOM very easily. 2608 * into OOM very easily.
2609 * This function detect ZONE_DMA/DMA32 size and confgigures zone order. 2609 * This function detect ZONE_DMA/DMA32 size and configures zone order.
2610 */ 2610 */
2611 /* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */ 2611 /* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */
2612 low_kmem_size = 0; 2612 low_kmem_size = 0;
@@ -2618,6 +2618,15 @@ static int default_zonelist_order(void)
2618 if (zone_type < ZONE_NORMAL) 2618 if (zone_type < ZONE_NORMAL)
2619 low_kmem_size += z->present_pages; 2619 low_kmem_size += z->present_pages;
2620 total_size += z->present_pages; 2620 total_size += z->present_pages;
2621 } else if (zone_type == ZONE_NORMAL) {
2622 /*
2623 * If any node has only lowmem, then node order
2624 * is preferred to allow kernel allocations
2625 * locally; otherwise, they can easily infringe
2626 * on other nodes when there is an abundance of
2627 * lowmem available to allocate from.
2628 */
2629 return ZONELIST_ORDER_NODE;
2621 } 2630 }
2622 } 2631 }
2623 } 2632 }