aboutsummaryrefslogtreecommitdiffstats
path: root/mm
diff options
context:
space:
mode:
authorMel Gorman <mel@csn.ul.ie>2009-09-21 20:03:02 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2009-09-22 10:17:38 -0400
commit78986a678f6ec3759a01976749f4437d8bf2d6c3 (patch)
treeec3a4f4d3fe5a40f8809657341ad34a9fc8eb61c /mm
parentceddc3a52d783fabbf1ba623601419b9d6337194 (diff)
page-allocator: limit the number of MIGRATE_RESERVE pageblocks per zone
After anti-fragmentation was merged, a bug was reported whereby devices that depended on high-order atomic allocations were failing. The solution was to preserve a property in the buddy allocator which tended to keep the minimum number of free pages in the zone at the lower physical addresses and contiguous. To preserve this property, MIGRATE_RESERVE was introduced and a number of pageblocks at the start of a zone would be marked "reserve", the number of which depended on min_free_kbytes. Anti-fragmentation works by avoiding the mixing of page migratetypes within the same pageblock. One way of helping this is to increase min_free_kbytes because it becomes less like that it will be necessary to place pages of of MIGRATE_RESERVE is unbounded, the free memory is kept there in large contiguous blocks instead of helping anti-fragmentation as much as it should. With the page-allocator tracepoint patches applied, it was found during anti-fragmentation tests that the number of fragmentation-related events were far higher than expected even with min_free_kbytes at higher values. This patch limits the number of MIGRATE_RESERVE blocks that exist per zone to two. For example, with a sufficient min_free_kbytes, 4MB of memory will be kept aside on an x86-64 and remain more or less free and contiguous for the systems uptime. This should be sufficient for devices depending on high-order atomic allocations while helping fragmentation control when min_free_kbytes is tuned appropriately. As side-effect of this patch is that the reserve variable is converted to int as unsigned long was the wrong type to use when ensuring that only the required number of reserve blocks are created. With the patches applied, fragmentation-related events as measured by the page allocator tracepoints were significantly reduced when running some fragmentation stress-tests on systems with min_free_kbytes tuned to a value appropriate for hugepage allocations at runtime. On x86, the events recorded were reduced by 99.8%, on x86-64 by 99.72% and on ppc64 by 99.83%. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm')
-rw-r--r--mm/page_alloc.c12
1 files changed, 11 insertions, 1 deletions
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4c847cc57caf..33b1a4762a7b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2836,7 +2836,8 @@ static void setup_zone_migrate_reserve(struct zone *zone)
2836{ 2836{
2837 unsigned long start_pfn, pfn, end_pfn; 2837 unsigned long start_pfn, pfn, end_pfn;
2838 struct page *page; 2838 struct page *page;
2839 unsigned long reserve, block_migratetype; 2839 unsigned long block_migratetype;
2840 int reserve;
2840 2841
2841 /* Get the start pfn, end pfn and the number of blocks to reserve */ 2842 /* Get the start pfn, end pfn and the number of blocks to reserve */
2842 start_pfn = zone->zone_start_pfn; 2843 start_pfn = zone->zone_start_pfn;
@@ -2844,6 +2845,15 @@ static void setup_zone_migrate_reserve(struct zone *zone)
2844 reserve = roundup(min_wmark_pages(zone), pageblock_nr_pages) >> 2845 reserve = roundup(min_wmark_pages(zone), pageblock_nr_pages) >>
2845 pageblock_order; 2846 pageblock_order;
2846 2847
2848 /*
2849 * Reserve blocks are generally in place to help high-order atomic
2850 * allocations that are short-lived. A min_free_kbytes value that
2851 * would result in more than 2 reserve blocks for atomic allocations
2852 * is assumed to be in place to help anti-fragmentation for the
2853 * future allocation of hugepages at runtime.
2854 */
2855 reserve = min(2, reserve);
2856
2847 for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { 2857 for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
2848 if (!pfn_valid(pfn)) 2858 if (!pfn_valid(pfn))
2849 continue; 2859 continue;