aboutsummaryrefslogtreecommitdiffstats
path: root/mm/compaction.c
diff options
context:
space:
mode:
authorMel Gorman <mgorman@suse.de>2012-10-08 19:32:33 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2012-10-09 03:22:49 -0400
commit2a1402aa044b55c2d30ab0ed9405693ef06fb07c (patch)
treed286a88cc0882663143ec67d54cd3b1247ef2829 /mm/compaction.c
parent661c4cb9b829110cb68c18ea05a56be39f75a4d2 (diff)
mm: compaction: acquire the zone->lru_lock as late as possible
Richard Davies and Shaohua Li have both reported lock contention problems in compaction on the zone and LRU locks as well as significant amounts of time being spent in compaction. This series aims to reduce lock contention and scanning rates to reduce that CPU usage. Richard reported at https://lkml.org/lkml/2012/9/21/91 that this series made a big different to a problem he reported in August: http://marc.info/?l=kvm&m=134511507015614&w=2 Patch 1 defers acquiring the zone->lru_lock as long as possible. Patch 2 defers acquiring the zone->lock as lock as possible. Patch 3 reverts Rik's "skip-free" patches as the core concept gets reimplemented later and the remaining patches are easier to understand if this is reverted first. Patch 4 adds a pageblock-skip bit to the pageblock flags to cache what pageblocks should be skipped by the migrate and free scanners. This drastically reduces the amount of scanning compaction has to do. Patch 5 reimplements something similar to Rik's idea except it uses the pageblock-skip information to decide where the scanners should restart from and does not need to wrap around. I tested this on 3.6-rc6 + linux-next/akpm. Kernels tested were akpm-20120920 3.6-rc6 + linux-next/akpm as of Septeber 20th, 2012 lesslock Patches 1-6 revert Patches 1-7 cachefail Patches 1-8 skipuseless Patches 1-9 Stress high-order allocation tests looked ok. Success rates are more or less the same with the full series applied but there is an expectation that there is less opportunity to race with other allocation requests if there is less scanning. The time to complete the tests did not vary that much and are uninteresting as were the vmstat statistics so I will not present them here. Using ftrace I recorded how much scanning was done by compaction and got this 3.6.0-rc6 3.6.0-rc6 3.6.0-rc6 3.6.0-rc6 3.6.0-rc6 akpm-20120920 lockless revert-v2r2 cachefail skipuseless Total free scanned 360753976 515414028 565479007 17103281 18916589 Total free isolated 2852429 3597369 4048601 670493 727840 Total free efficiency 0.0079% 0.0070% 0.0072% 0.0392% 0.0385% Total migrate scanned 247728664 822729112 1004645830 17946827 14118903 Total migrate isolated 2555324 3245937 3437501 616359 658616 Total migrate efficiency 0.0103% 0.0039% 0.0034% 0.0343% 0.0466% The efficiency is worthless because of the nature of the test and the number of failures. The really interesting point as far as this patch series is concerned is the number of pages scanned. Note that reverting Rik's patches massively increases the number of pages scanned indicating that those patches really did make a difference to CPU usage. However, caching what pageblocks should be skipped has a much higher impact. With patches 1-8 applied, free page and migrate page scanning are both reduced by 95% in comparison to the akpm kernel. If the basic concept of Rik's patches are implemened on top then scanning then the free scanner barely changed but migrate scanning was further reduced. That said, tests on 3.6-rc5 indicated that the last patch had greater impact than what was measured here so it is a bit variable. One way or the other, this series has a large impact on the amount of scanning compaction does when there is a storm of THP allocations. This patch: Compaction's migrate scanner acquires the zone->lru_lock when scanning a range of pages looking for LRU pages to acquire. It does this even if there are no LRU pages in the range. If multiple processes are compacting then this can cause severe locking contention. To make matters worse commit b2eef8c0 ("mm: compaction: minimise the time IRQs are disabled while isolating pages for migration") releases the lru_lock every SWAP_CLUSTER_MAX pages that are scanned. This patch makes two changes to how the migrate scanner acquires the LRU lock. First, it only releases the LRU lock every SWAP_CLUSTER_MAX pages if the lock is contended. This reduces the number of times it unnecessarily disables and re-enables IRQs. The second is that it defers acquiring the LRU lock for as long as possible. If there are no LRU pages or the only LRU pages are transhuge then the LRU lock will not be acquired at all which reduces contention on zone->lru_lock. [minchan@kernel.org: augment comment] [akpm@linux-foundation.org: tweak comment text] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Richard Davies <richard@arachsys.com> Cc: Shaohua Li <shli@kernel.org> Cc: Avi Kivity <avi@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/compaction.c')
-rw-r--r--mm/compaction.c65
1 files changed, 45 insertions, 20 deletions
diff --git a/mm/compaction.c b/mm/compaction.c
index b16dd3822995..832c4183dccc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -50,6 +50,11 @@ static inline bool migrate_async_suitable(int migratetype)
50 return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; 50 return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
51} 51}
52 52
53static inline bool should_release_lock(spinlock_t *lock)
54{
55 return need_resched() || spin_is_contended(lock);
56}
57
53/* 58/*
54 * Compaction requires the taking of some coarse locks that are potentially 59 * Compaction requires the taking of some coarse locks that are potentially
55 * very heavily contended. Check if the process needs to be scheduled or 60 * very heavily contended. Check if the process needs to be scheduled or
@@ -62,7 +67,7 @@ static inline bool migrate_async_suitable(int migratetype)
62static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, 67static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
63 bool locked, struct compact_control *cc) 68 bool locked, struct compact_control *cc)
64{ 69{
65 if (need_resched() || spin_is_contended(lock)) { 70 if (should_release_lock(lock)) {
66 if (locked) { 71 if (locked) {
67 spin_unlock_irqrestore(lock, *flags); 72 spin_unlock_irqrestore(lock, *flags);
68 locked = false; 73 locked = false;
@@ -327,7 +332,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
327 isolate_mode_t mode = 0; 332 isolate_mode_t mode = 0;
328 struct lruvec *lruvec; 333 struct lruvec *lruvec;
329 unsigned long flags; 334 unsigned long flags;
330 bool locked; 335 bool locked = false;
331 336
332 /* 337 /*
333 * Ensure that there are not too many pages isolated from the LRU 338 * Ensure that there are not too many pages isolated from the LRU
@@ -347,23 +352,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
347 352
348 /* Time to isolate some pages for migration */ 353 /* Time to isolate some pages for migration */
349 cond_resched(); 354 cond_resched();
350 spin_lock_irqsave(&zone->lru_lock, flags);
351 locked = true;
352 for (; low_pfn < end_pfn; low_pfn++) { 355 for (; low_pfn < end_pfn; low_pfn++) {
353 struct page *page; 356 struct page *page;
354 357
355 /* give a chance to irqs before checking need_resched() */ 358 /* give a chance to irqs before checking need_resched() */
356 if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { 359 if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) {
357 spin_unlock_irqrestore(&zone->lru_lock, flags); 360 if (should_release_lock(&zone->lru_lock)) {
358 locked = false; 361 spin_unlock_irqrestore(&zone->lru_lock, flags);
362 locked = false;
363 }
359 } 364 }
360 365
361 /* Check if it is ok to still hold the lock */
362 locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
363 locked, cc);
364 if (!locked || fatal_signal_pending(current))
365 break;
366
367 /* 366 /*
368 * migrate_pfn does not necessarily start aligned to a 367 * migrate_pfn does not necessarily start aligned to a
369 * pageblock. Ensure that pfn_valid is called when moving 368 * pageblock. Ensure that pfn_valid is called when moving
@@ -403,21 +402,40 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
403 pageblock_nr = low_pfn >> pageblock_order; 402 pageblock_nr = low_pfn >> pageblock_order;
404 if (!cc->sync && last_pageblock_nr != pageblock_nr && 403 if (!cc->sync && last_pageblock_nr != pageblock_nr &&
405 !migrate_async_suitable(get_pageblock_migratetype(page))) { 404 !migrate_async_suitable(get_pageblock_migratetype(page))) {
406 low_pfn += pageblock_nr_pages; 405 goto next_pageblock;
407 low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
408 last_pageblock_nr = pageblock_nr;
409 continue;
410 } 406 }
411 407
408 /* Check may be lockless but that's ok as we recheck later */
412 if (!PageLRU(page)) 409 if (!PageLRU(page))
413 continue; 410 continue;
414 411
415 /* 412 /*
416 * PageLRU is set, and lru_lock excludes isolation, 413 * PageLRU is set. lru_lock normally excludes isolation
417 * splitting and collapsing (collapsing has already 414 * splitting and collapsing (collapsing has already happened
418 * happened if PageLRU is set). 415 * if PageLRU is set) but the lock is not necessarily taken
416 * here and it is wasteful to take it just to check transhuge.
417 * Check TransHuge without lock and skip the whole pageblock if
418 * it's either a transhuge or hugetlbfs page, as calling
419 * compound_order() without preventing THP from splitting the
420 * page underneath us may return surprising results.
419 */ 421 */
420 if (PageTransHuge(page)) { 422 if (PageTransHuge(page)) {
423 if (!locked)
424 goto next_pageblock;
425 low_pfn += (1 << compound_order(page)) - 1;
426 continue;
427 }
428
429 /* Check if it is ok to still hold the lock */
430 locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
431 locked, cc);
432 if (!locked || fatal_signal_pending(current))
433 break;
434
435 /* Recheck PageLRU and PageTransHuge under lock */
436 if (!PageLRU(page))
437 continue;
438 if (PageTransHuge(page)) {
421 low_pfn += (1 << compound_order(page)) - 1; 439 low_pfn += (1 << compound_order(page)) - 1;
422 continue; 440 continue;
423 } 441 }
@@ -444,6 +462,13 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
444 ++low_pfn; 462 ++low_pfn;
445 break; 463 break;
446 } 464 }
465
466 continue;
467
468next_pageblock:
469 low_pfn += pageblock_nr_pages;
470 low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
471 last_pageblock_nr = pageblock_nr;
447 } 472 }
448 473
449 acct_isolated(zone, locked, cc); 474 acct_isolated(zone, locked, cc);