summaryrefslogtreecommitdiffstats
path: root/mm
diff options
context:
space:
mode:
authorVlastimil Babka <vbabka@suse.cz>2014-10-09 18:27:02 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2014-10-09 22:25:53 -0400
commit53853e2d2bfb748a8b5aa2fd1de15699266865e0 (patch)
treedd09605e9cd9a4329afc274faffae1c15e81f150 /mm
parent8b1645685acf3c7e0b93611fb4b328ef45c47e92 (diff)
mm, compaction: defer each zone individually instead of preferred zone
When direct sync compaction is often unsuccessful, it may become deferred for some time to avoid further useless attempts, both sync and async. Successful high-order allocations un-defer compaction, while further unsuccessful compaction attempts prolong the compaction deferred period. Currently the checking and setting deferred status is performed only on the preferred zone of the allocation that invoked direct compaction. But compaction itself is attempted on all eligible zones in the zonelist, so the behavior is suboptimal and may lead both to scenarios where 1) compaction is attempted uselessly, or 2) where it's not attempted despite good chances of succeeding, as shown on the examples below: 1) A direct compaction with Normal preferred zone failed and set deferred compaction for the Normal zone. Another unrelated direct compaction with DMA32 as preferred zone will attempt to compact DMA32 zone even though the first compaction attempt also included DMA32 zone. In another scenario, compaction with Normal preferred zone failed to compact Normal zone, but succeeded in the DMA32 zone, so it will not defer compaction. In the next attempt, it will try Normal zone which will fail again, instead of skipping Normal zone and trying DMA32 directly. 2) Kswapd will balance DMA32 zone and reset defer status based on watermarks looking good. A direct compaction with preferred Normal zone will skip compaction of all zones including DMA32 because Normal was still deferred. The allocation might have succeeded in DMA32, but won't. This patch makes compaction deferring work on individual zone basis instead of preferred zone. For each zone, it checks compaction_deferred() to decide if the zone should be skipped. If watermarks fail after compacting the zone, defer_compaction() is called. The zone where watermarks passed can still be deferred when the allocation attempt is unsuccessful. When allocation is successful, compaction_defer_reset() is called for the zone containing the allocated page. This approach should approximate calling defer_compaction() only on zones where compaction was attempted and did not yield allocated page. There might be corner cases but that is inevitable as long as the decision to stop compacting dues not guarantee that a page will be allocated. Due to a new COMPACT_DEFERRED return value, some functions relying implicitly on COMPACT_SKIPPED = 0 had to be updated, with comments made more accurate. The did_some_progress output parameter of __alloc_pages_direct_compact() is removed completely, as the caller actually does not use it after compaction sets it - it is only considered when direct reclaim sets it. During testing on a two-node machine with a single very small Normal zone on node 1, this patch has improved success rates in stress-highalloc mmtests benchmark. The success here were previously made worse by commit 3a025760fc15 ("mm: page_alloc: spill to remote nodes before waking kswapd") as kswapd was no longer resetting often enough the deferred compaction for the Normal zone, and DMA32 zones on both nodes were thus not considered for compaction. On different machine, success rates were improved with __GFP_NO_KSWAPD allocations. [akpm@linux-foundation.org: fix CONFIG_COMPACTION=n build] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm')
-rw-r--r--mm/compaction.c32
-rw-r--r--mm/page_alloc.c57
-rw-r--r--mm/vmscan.c14
3 files changed, 66 insertions, 37 deletions
diff --git a/mm/compaction.c b/mm/compaction.c
index 21bf292b642a..1c7195d42e83 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1125,27 +1125,26 @@ int sysctl_extfrag_threshold = 500;
1125 * @nodemask: The allowed nodes to allocate from 1125 * @nodemask: The allowed nodes to allocate from
1126 * @mode: The migration mode for async, sync light, or sync migration 1126 * @mode: The migration mode for async, sync light, or sync migration
1127 * @contended: Return value that is true if compaction was aborted due to lock contention 1127 * @contended: Return value that is true if compaction was aborted due to lock contention
1128 * @page: Optionally capture a free page of the requested order during compaction 1128 * @candidate_zone: Return the zone where we think allocation should succeed
1129 * 1129 *
1130 * This is the main entry point for direct page compaction. 1130 * This is the main entry point for direct page compaction.
1131 */ 1131 */
1132unsigned long try_to_compact_pages(struct zonelist *zonelist, 1132unsigned long try_to_compact_pages(struct zonelist *zonelist,
1133 int order, gfp_t gfp_mask, nodemask_t *nodemask, 1133 int order, gfp_t gfp_mask, nodemask_t *nodemask,
1134 enum migrate_mode mode, bool *contended) 1134 enum migrate_mode mode, bool *contended,
1135 struct zone **candidate_zone)
1135{ 1136{
1136 enum zone_type high_zoneidx = gfp_zone(gfp_mask); 1137 enum zone_type high_zoneidx = gfp_zone(gfp_mask);
1137 int may_enter_fs = gfp_mask & __GFP_FS; 1138 int may_enter_fs = gfp_mask & __GFP_FS;
1138 int may_perform_io = gfp_mask & __GFP_IO; 1139 int may_perform_io = gfp_mask & __GFP_IO;
1139 struct zoneref *z; 1140 struct zoneref *z;
1140 struct zone *zone; 1141 struct zone *zone;
1141 int rc = COMPACT_SKIPPED; 1142 int rc = COMPACT_DEFERRED;
1142 int alloc_flags = 0; 1143 int alloc_flags = 0;
1143 1144
1144 /* Check if the GFP flags allow compaction */ 1145 /* Check if the GFP flags allow compaction */
1145 if (!order || !may_enter_fs || !may_perform_io) 1146 if (!order || !may_enter_fs || !may_perform_io)
1146 return rc; 1147 return COMPACT_SKIPPED;
1147
1148 count_compact_event(COMPACTSTALL);
1149 1148
1150#ifdef CONFIG_CMA 1149#ifdef CONFIG_CMA
1151 if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) 1150 if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
@@ -1156,14 +1155,33 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
1156 nodemask) { 1155 nodemask) {
1157 int status; 1156 int status;
1158 1157
1158 if (compaction_deferred(zone, order))
1159 continue;
1160
1159 status = compact_zone_order(zone, order, gfp_mask, mode, 1161 status = compact_zone_order(zone, order, gfp_mask, mode,
1160 contended); 1162 contended);
1161 rc = max(status, rc); 1163 rc = max(status, rc);
1162 1164
1163 /* If a normal allocation would succeed, stop compacting */ 1165 /* If a normal allocation would succeed, stop compacting */
1164 if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 1166 if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0,
1165 alloc_flags)) 1167 alloc_flags)) {
1168 *candidate_zone = zone;
1169 /*
1170 * We think the allocation will succeed in this zone,
1171 * but it is not certain, hence the false. The caller
1172 * will repeat this with true if allocation indeed
1173 * succeeds in this zone.
1174 */
1175 compaction_defer_reset(zone, order, false);
1166 break; 1176 break;
1177 } else if (mode != MIGRATE_ASYNC) {
1178 /*
1179 * We think that allocation won't succeed in this zone
1180 * so we defer compaction there. If it ends up
1181 * succeeding after all, it will be reset.
1182 */
1183 defer_compaction(zone, order);
1184 }
1167 } 1185 }
1168 1186
1169 return rc; 1187 return rc;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e63bf7744a0c..514fd8008114 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2297,24 +2297,28 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
2297 struct zonelist *zonelist, enum zone_type high_zoneidx, 2297 struct zonelist *zonelist, enum zone_type high_zoneidx,
2298 nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, 2298 nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
2299 int classzone_idx, int migratetype, enum migrate_mode mode, 2299 int classzone_idx, int migratetype, enum migrate_mode mode,
2300 bool *contended_compaction, bool *deferred_compaction, 2300 bool *contended_compaction, bool *deferred_compaction)
2301 unsigned long *did_some_progress)
2302{ 2301{
2303 if (!order) 2302 struct zone *last_compact_zone = NULL;
2304 return NULL; 2303 unsigned long compact_result;
2305 2304
2306 if (compaction_deferred(preferred_zone, order)) { 2305
2307 *deferred_compaction = true; 2306 if (!order)
2308 return NULL; 2307 return NULL;
2309 }
2310 2308
2311 current->flags |= PF_MEMALLOC; 2309 current->flags |= PF_MEMALLOC;
2312 *did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask, 2310 compact_result = try_to_compact_pages(zonelist, order, gfp_mask,
2313 nodemask, mode, 2311 nodemask, mode,
2314 contended_compaction); 2312 contended_compaction,
2313 &last_compact_zone);
2315 current->flags &= ~PF_MEMALLOC; 2314 current->flags &= ~PF_MEMALLOC;
2316 2315
2317 if (*did_some_progress != COMPACT_SKIPPED) { 2316 if (compact_result > COMPACT_DEFERRED)
2317 count_vm_event(COMPACTSTALL);
2318 else
2319 *deferred_compaction = true;
2320
2321 if (compact_result > COMPACT_SKIPPED) {
2318 struct page *page; 2322 struct page *page;
2319 2323
2320 /* Page migration frees to the PCP lists but we want merging */ 2324 /* Page migration frees to the PCP lists but we want merging */
@@ -2325,27 +2329,31 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
2325 order, zonelist, high_zoneidx, 2329 order, zonelist, high_zoneidx,
2326 alloc_flags & ~ALLOC_NO_WATERMARKS, 2330 alloc_flags & ~ALLOC_NO_WATERMARKS,
2327 preferred_zone, classzone_idx, migratetype); 2331 preferred_zone, classzone_idx, migratetype);
2332
2328 if (page) { 2333 if (page) {
2329 preferred_zone->compact_blockskip_flush = false; 2334 struct zone *zone = page_zone(page);
2330 compaction_defer_reset(preferred_zone, order, true); 2335
2336 zone->compact_blockskip_flush = false;
2337 compaction_defer_reset(zone, order, true);
2331 count_vm_event(COMPACTSUCCESS); 2338 count_vm_event(COMPACTSUCCESS);
2332 return page; 2339 return page;
2333 } 2340 }
2334 2341
2335 /* 2342 /*
2343 * last_compact_zone is where try_to_compact_pages thought
2344 * allocation should succeed, so it did not defer compaction.
2345 * But now we know that it didn't succeed, so we do the defer.
2346 */
2347 if (last_compact_zone && mode != MIGRATE_ASYNC)
2348 defer_compaction(last_compact_zone, order);
2349
2350 /*
2336 * It's bad if compaction run occurs and fails. 2351 * It's bad if compaction run occurs and fails.
2337 * The most likely reason is that pages exist, 2352 * The most likely reason is that pages exist,
2338 * but not enough to satisfy watermarks. 2353 * but not enough to satisfy watermarks.
2339 */ 2354 */
2340 count_vm_event(COMPACTFAIL); 2355 count_vm_event(COMPACTFAIL);
2341 2356
2342 /*
2343 * As async compaction considers a subset of pageblocks, only
2344 * defer if the failure was a sync compaction failure.
2345 */
2346 if (mode != MIGRATE_ASYNC)
2347 defer_compaction(preferred_zone, order);
2348
2349 cond_resched(); 2357 cond_resched();
2350 } 2358 }
2351 2359
@@ -2356,9 +2364,8 @@ static inline struct page *
2356__alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, 2364__alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
2357 struct zonelist *zonelist, enum zone_type high_zoneidx, 2365 struct zonelist *zonelist, enum zone_type high_zoneidx,
2358 nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, 2366 nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
2359 int classzone_idx, int migratetype, 2367 int classzone_idx, int migratetype, enum migrate_mode mode,
2360 enum migrate_mode mode, bool *contended_compaction, 2368 bool *contended_compaction, bool *deferred_compaction)
2361 bool *deferred_compaction, unsigned long *did_some_progress)
2362{ 2369{
2363 return NULL; 2370 return NULL;
2364} 2371}
@@ -2634,8 +2641,7 @@ rebalance:
2634 preferred_zone, 2641 preferred_zone,
2635 classzone_idx, migratetype, 2642 classzone_idx, migratetype,
2636 migration_mode, &contended_compaction, 2643 migration_mode, &contended_compaction,
2637 &deferred_compaction, 2644 &deferred_compaction);
2638 &did_some_progress);
2639 if (page) 2645 if (page)
2640 goto got_pg; 2646 goto got_pg;
2641 2647
@@ -2727,8 +2733,7 @@ rebalance:
2727 preferred_zone, 2733 preferred_zone,
2728 classzone_idx, migratetype, 2734 classzone_idx, migratetype,
2729 migration_mode, &contended_compaction, 2735 migration_mode, &contended_compaction,
2730 &deferred_compaction, 2736 &deferred_compaction);
2731 &did_some_progress);
2732 if (page) 2737 if (page)
2733 goto got_pg; 2738 goto got_pg;
2734 } 2739 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2836b5373b2e..1a71b8b1ea34 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2315,7 +2315,10 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc)
2315 return reclaimable; 2315 return reclaimable;
2316} 2316}
2317 2317
2318/* Returns true if compaction should go ahead for a high-order request */ 2318/*
2319 * Returns true if compaction should go ahead for a high-order request, or
2320 * the high-order allocation would succeed without compaction.
2321 */
2319static inline bool compaction_ready(struct zone *zone, int order) 2322static inline bool compaction_ready(struct zone *zone, int order)
2320{ 2323{
2321 unsigned long balance_gap, watermark; 2324 unsigned long balance_gap, watermark;
@@ -2339,8 +2342,11 @@ static inline bool compaction_ready(struct zone *zone, int order)
2339 if (compaction_deferred(zone, order)) 2342 if (compaction_deferred(zone, order))
2340 return watermark_ok; 2343 return watermark_ok;
2341 2344
2342 /* If compaction is not ready to start, keep reclaiming */ 2345 /*
2343 if (!compaction_suitable(zone, order)) 2346 * If compaction is not ready to start and allocation is not likely
2347 * to succeed without it, then keep reclaiming.
2348 */
2349 if (compaction_suitable(zone, order) == COMPACT_SKIPPED)
2344 return false; 2350 return false;
2345 2351
2346 return watermark_ok; 2352 return watermark_ok;
@@ -2818,7 +2824,7 @@ static bool zone_balanced(struct zone *zone, int order,
2818 return false; 2824 return false;
2819 2825
2820 if (IS_ENABLED(CONFIG_COMPACTION) && order && 2826 if (IS_ENABLED(CONFIG_COMPACTION) && order &&
2821 !compaction_suitable(zone, order)) 2827 compaction_suitable(zone, order) == COMPACT_SKIPPED)
2822 return false; 2828 return false;
2823 2829
2824 return true; 2830 return true;