summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMinchan Kim <minchan@kernel.org>2019-09-25 19:49:08 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2019-09-25 20:51:41 -0400
commit9c276cc65a58faf98be8e56962745ec99ab87636 (patch)
tree34789d8c8a0b1556c06e7f15c3524f919ee67183
parentce18d171cb7368557e6498a3ce111d7d3dc03e4d (diff)
mm: introduce MADV_COLD
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--arch/alpha/include/uapi/asm/mman.h2
-rw-r--r--arch/mips/include/uapi/asm/mman.h2
-rw-r--r--arch/parisc/include/uapi/asm/mman.h2
-rw-r--r--arch/xtensa/include/uapi/asm/mman.h2
-rw-r--r--include/linux/swap.h1
-rw-r--r--include/uapi/asm-generic/mman-common.h2
-rw-r--r--mm/internal.h2
-rw-r--r--mm/madvise.c179
-rw-r--r--mm/oom_kill.c2
-rw-r--r--mm/swap.c42
10 files changed, 232 insertions, 4 deletions
diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index ac23379b7a87..f3258fbf03d0 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -68,6 +68,8 @@
68#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ 68#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
69#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ 69#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
70 70
71#define MADV_COLD 20 /* deactivate these pages */
72
71/* compatibility flags */ 73/* compatibility flags */
72#define MAP_FILE 0 74#define MAP_FILE 0
73 75
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index c2b40969eb1f..00ad09fc5eb1 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -95,6 +95,8 @@
95#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ 95#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
96#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ 96#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
97 97
98#define MADV_COLD 20 /* deactivate these pages */
99
98/* compatibility flags */ 100/* compatibility flags */
99#define MAP_FILE 0 101#define MAP_FILE 0
100 102
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index c98162f494db..eb14e3a7b8f3 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -48,6 +48,8 @@
48#define MADV_DONTFORK 10 /* don't inherit across fork */ 48#define MADV_DONTFORK 10 /* don't inherit across fork */
49#define MADV_DOFORK 11 /* do inherit across fork */ 49#define MADV_DOFORK 11 /* do inherit across fork */
50 50
51#define MADV_COLD 20 /* deactivate these pages */
52
51#define MADV_MERGEABLE 65 /* KSM may merge identical pages */ 53#define MADV_MERGEABLE 65 /* KSM may merge identical pages */
52#define MADV_UNMERGEABLE 66 /* KSM may not merge identical pages */ 54#define MADV_UNMERGEABLE 66 /* KSM may not merge identical pages */
53 55
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index ebbb48842190..f926b00ff11f 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -103,6 +103,8 @@
103#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ 103#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
104#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ 104#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
105 105
106#define MADV_COLD 20 /* deactivate these pages */
107
106/* compatibility flags */ 108/* compatibility flags */
107#define MAP_FILE 0 109#define MAP_FILE 0
108 110
diff --git a/include/linux/swap.h b/include/linux/swap.h
index de2c67a33b7e..0ce997edb8bb 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu);
340extern void lru_add_drain_all(void); 340extern void lru_add_drain_all(void);
341extern void rotate_reclaimable_page(struct page *page); 341extern void rotate_reclaimable_page(struct page *page);
342extern void deactivate_file_page(struct page *page); 342extern void deactivate_file_page(struct page *page);
343extern void deactivate_page(struct page *page);
343extern void mark_page_lazyfree(struct page *page); 344extern void mark_page_lazyfree(struct page *page);
344extern void swap_setup(void); 345extern void swap_setup(void);
345 346
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 63b1f506ea67..23431faf0eb6 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -67,6 +67,8 @@
67#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ 67#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
68#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ 68#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
69 69
70#define MADV_COLD 20 /* deactivate these pages */
71
70/* compatibility flags */ 72/* compatibility flags */
71#define MAP_FILE 0 73#define MAP_FILE 0
72 74
diff --git a/mm/internal.h b/mm/internal.h
index e32390802fd3..0d5f720c75ab 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -39,7 +39,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf);
39void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, 39void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
40 unsigned long floor, unsigned long ceiling); 40 unsigned long floor, unsigned long ceiling);
41 41
42static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) 42static inline bool can_madv_lru_vma(struct vm_area_struct *vma)
43{ 43{
44 return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); 44 return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP));
45} 45}
diff --git a/mm/madvise.c b/mm/madvise.c
index 1f8a6fdc6878..e1aee62967c3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -11,6 +11,7 @@
11#include <linux/syscalls.h> 11#include <linux/syscalls.h>
12#include <linux/mempolicy.h> 12#include <linux/mempolicy.h>
13#include <linux/page-isolation.h> 13#include <linux/page-isolation.h>
14#include <linux/page_idle.h>
14#include <linux/userfaultfd_k.h> 15#include <linux/userfaultfd_k.h>
15#include <linux/hugetlb.h> 16#include <linux/hugetlb.h>
16#include <linux/falloc.h> 17#include <linux/falloc.h>
@@ -42,6 +43,7 @@ static int madvise_need_mmap_write(int behavior)
42 case MADV_REMOVE: 43 case MADV_REMOVE:
43 case MADV_WILLNEED: 44 case MADV_WILLNEED:
44 case MADV_DONTNEED: 45 case MADV_DONTNEED:
46 case MADV_COLD:
45 case MADV_FREE: 47 case MADV_FREE:
46 return 0; 48 return 0;
47 default: 49 default:
@@ -289,6 +291,176 @@ static long madvise_willneed(struct vm_area_struct *vma,
289 return 0; 291 return 0;
290} 292}
291 293
294static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr,
295 unsigned long end, struct mm_walk *walk)
296{
297 struct mmu_gather *tlb = walk->private;
298 struct mm_struct *mm = tlb->mm;
299 struct vm_area_struct *vma = walk->vma;
300 pte_t *orig_pte, *pte, ptent;
301 spinlock_t *ptl;
302 struct page *page;
303
304#ifdef CONFIG_TRANSPARENT_HUGEPAGE
305 if (pmd_trans_huge(*pmd)) {
306 pmd_t orig_pmd;
307 unsigned long next = pmd_addr_end(addr, end);
308
309 tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
310 ptl = pmd_trans_huge_lock(pmd, vma);
311 if (!ptl)
312 return 0;
313
314 orig_pmd = *pmd;
315 if (is_huge_zero_pmd(orig_pmd))
316 goto huge_unlock;
317
318 if (unlikely(!pmd_present(orig_pmd))) {
319 VM_BUG_ON(thp_migration_supported() &&
320 !is_pmd_migration_entry(orig_pmd));
321 goto huge_unlock;
322 }
323
324 page = pmd_page(orig_pmd);
325 if (next - addr != HPAGE_PMD_SIZE) {
326 int err;
327
328 if (page_mapcount(page) != 1)
329 goto huge_unlock;
330
331 get_page(page);
332 spin_unlock(ptl);
333 lock_page(page);
334 err = split_huge_page(page);
335 unlock_page(page);
336 put_page(page);
337 if (!err)
338 goto regular_page;
339 return 0;
340 }
341
342 if (pmd_young(orig_pmd)) {
343 pmdp_invalidate(vma, addr, pmd);
344 orig_pmd = pmd_mkold(orig_pmd);
345
346 set_pmd_at(mm, addr, pmd, orig_pmd);
347 tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
348 }
349
350 test_and_clear_page_young(page);
351 deactivate_page(page);
352huge_unlock:
353 spin_unlock(ptl);
354 return 0;
355 }
356
357 if (pmd_trans_unstable(pmd))
358 return 0;
359regular_page:
360#endif
361 tlb_change_page_size(tlb, PAGE_SIZE);
362 orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
363 flush_tlb_batched_pending(mm);
364 arch_enter_lazy_mmu_mode();
365 for (; addr < end; pte++, addr += PAGE_SIZE) {
366 ptent = *pte;
367
368 if (pte_none(ptent))
369 continue;
370
371 if (!pte_present(ptent))
372 continue;
373
374 page = vm_normal_page(vma, addr, ptent);
375 if (!page)
376 continue;
377
378 /*
379 * Creating a THP page is expensive so split it only if we
380 * are sure it's worth. Split it if we are only owner.
381 */
382 if (PageTransCompound(page)) {
383 if (page_mapcount(page) != 1)
384 break;
385 get_page(page);
386 if (!trylock_page(page)) {
387 put_page(page);
388 break;
389 }
390 pte_unmap_unlock(orig_pte, ptl);
391 if (split_huge_page(page)) {
392 unlock_page(page);
393 put_page(page);
394 pte_offset_map_lock(mm, pmd, addr, &ptl);
395 break;
396 }
397 unlock_page(page);
398 put_page(page);
399 pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
400 pte--;
401 addr -= PAGE_SIZE;
402 continue;
403 }
404
405 VM_BUG_ON_PAGE(PageTransCompound(page), page);
406
407 if (pte_young(ptent)) {
408 ptent = ptep_get_and_clear_full(mm, addr, pte,
409 tlb->fullmm);
410 ptent = pte_mkold(ptent);
411 set_pte_at(mm, addr, pte, ptent);
412 tlb_remove_tlb_entry(tlb, pte, addr);
413 }
414
415 /*
416 * We are deactivating a page for accelerating reclaiming.
417 * VM couldn't reclaim the page unless we clear PG_young.
418 * As a side effect, it makes confuse idle-page tracking
419 * because they will miss recent referenced history.
420 */
421 test_and_clear_page_young(page);
422 deactivate_page(page);
423 }
424
425 arch_leave_lazy_mmu_mode();
426 pte_unmap_unlock(orig_pte, ptl);
427 cond_resched();
428
429 return 0;
430}
431
432static const struct mm_walk_ops cold_walk_ops = {
433 .pmd_entry = madvise_cold_pte_range,
434};
435
436static void madvise_cold_page_range(struct mmu_gather *tlb,
437 struct vm_area_struct *vma,
438 unsigned long addr, unsigned long end)
439{
440 tlb_start_vma(tlb, vma);
441 walk_page_range(vma->vm_mm, addr, end, &cold_walk_ops, NULL);
442 tlb_end_vma(tlb, vma);
443}
444
445static long madvise_cold(struct vm_area_struct *vma,
446 struct vm_area_struct **prev,
447 unsigned long start_addr, unsigned long end_addr)
448{
449 struct mm_struct *mm = vma->vm_mm;
450 struct mmu_gather tlb;
451
452 *prev = vma;
453 if (!can_madv_lru_vma(vma))
454 return -EINVAL;
455
456 lru_add_drain();
457 tlb_gather_mmu(&tlb, mm, start_addr, end_addr);
458 madvise_cold_page_range(&tlb, vma, start_addr, end_addr);
459 tlb_finish_mmu(&tlb, start_addr, end_addr);
460
461 return 0;
462}
463
292static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, 464static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
293 unsigned long end, struct mm_walk *walk) 465 unsigned long end, struct mm_walk *walk)
294 466
@@ -493,7 +665,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
493 int behavior) 665 int behavior)
494{ 666{
495 *prev = vma; 667 *prev = vma;
496 if (!can_madv_dontneed_vma(vma)) 668 if (!can_madv_lru_vma(vma))
497 return -EINVAL; 669 return -EINVAL;
498 670
499 if (!userfaultfd_remove(vma, start, end)) { 671 if (!userfaultfd_remove(vma, start, end)) {
@@ -515,7 +687,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
515 */ 687 */
516 return -ENOMEM; 688 return -ENOMEM;
517 } 689 }
518 if (!can_madv_dontneed_vma(vma)) 690 if (!can_madv_lru_vma(vma))
519 return -EINVAL; 691 return -EINVAL;
520 if (end > vma->vm_end) { 692 if (end > vma->vm_end) {
521 /* 693 /*
@@ -669,6 +841,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
669 return madvise_remove(vma, prev, start, end); 841 return madvise_remove(vma, prev, start, end);
670 case MADV_WILLNEED: 842 case MADV_WILLNEED:
671 return madvise_willneed(vma, prev, start, end); 843 return madvise_willneed(vma, prev, start, end);
844 case MADV_COLD:
845 return madvise_cold(vma, prev, start, end);
672 case MADV_FREE: 846 case MADV_FREE:
673 case MADV_DONTNEED: 847 case MADV_DONTNEED:
674 return madvise_dontneed_free(vma, prev, start, end, behavior); 848 return madvise_dontneed_free(vma, prev, start, end, behavior);
@@ -690,6 +864,7 @@ madvise_behavior_valid(int behavior)
690 case MADV_WILLNEED: 864 case MADV_WILLNEED:
691 case MADV_DONTNEED: 865 case MADV_DONTNEED:
692 case MADV_FREE: 866 case MADV_FREE:
867 case MADV_COLD:
693#ifdef CONFIG_KSM 868#ifdef CONFIG_KSM
694 case MADV_MERGEABLE: 869 case MADV_MERGEABLE:
695 case MADV_UNMERGEABLE: 870 case MADV_UNMERGEABLE:
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c1d9496b4c43..71e3acea7817 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -523,7 +523,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
523 set_bit(MMF_UNSTABLE, &mm->flags); 523 set_bit(MMF_UNSTABLE, &mm->flags);
524 524
525 for (vma = mm->mmap ; vma; vma = vma->vm_next) { 525 for (vma = mm->mmap ; vma; vma = vma->vm_next) {
526 if (!can_madv_dontneed_vma(vma)) 526 if (!can_madv_lru_vma(vma))
527 continue; 527 continue;
528 528
529 /* 529 /*
diff --git a/mm/swap.c b/mm/swap.c
index 784dc1620620..38c3fa4308e2 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -47,6 +47,7 @@ int page_cluster;
47static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); 47static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
48static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); 48static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
49static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); 49static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
50static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
50static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); 51static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
51#ifdef CONFIG_SMP 52#ifdef CONFIG_SMP
52static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); 53static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
@@ -538,6 +539,22 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
538 update_page_reclaim_stat(lruvec, file, 0); 539 update_page_reclaim_stat(lruvec, file, 0);
539} 540}
540 541
542static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
543 void *arg)
544{
545 if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
546 int file = page_is_file_cache(page);
547 int lru = page_lru_base_type(page);
548
549 del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
550 ClearPageActive(page);
551 ClearPageReferenced(page);
552 add_page_to_lru_list(page, lruvec, lru);
553
554 __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page));
555 update_page_reclaim_stat(lruvec, file, 0);
556 }
557}
541 558
542static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, 559static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
543 void *arg) 560 void *arg)
@@ -590,6 +607,10 @@ void lru_add_drain_cpu(int cpu)
590 if (pagevec_count(pvec)) 607 if (pagevec_count(pvec))
591 pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); 608 pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
592 609
610 pvec = &per_cpu(lru_deactivate_pvecs, cpu);
611 if (pagevec_count(pvec))
612 pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
613
593 pvec = &per_cpu(lru_lazyfree_pvecs, cpu); 614 pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
594 if (pagevec_count(pvec)) 615 if (pagevec_count(pvec))
595 pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); 616 pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
@@ -623,6 +644,26 @@ void deactivate_file_page(struct page *page)
623 } 644 }
624} 645}
625 646
647/*
648 * deactivate_page - deactivate a page
649 * @page: page to deactivate
650 *
651 * deactivate_page() moves @page to the inactive list if @page was on the active
652 * list and was not an unevictable page. This is done to accelerate the reclaim
653 * of @page.
654 */
655void deactivate_page(struct page *page)
656{
657 if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
658 struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
659
660 get_page(page);
661 if (!pagevec_add(pvec, page) || PageCompound(page))
662 pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
663 put_cpu_var(lru_deactivate_pvecs);
664 }
665}
666
626/** 667/**
627 * mark_page_lazyfree - make an anon page lazyfree 668 * mark_page_lazyfree - make an anon page lazyfree
628 * @page: page to deactivate 669 * @page: page to deactivate
@@ -687,6 +728,7 @@ void lru_add_drain_all(void)
687 if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || 728 if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
688 pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || 729 pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
689 pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || 730 pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
731 pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
690 pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || 732 pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
691 need_activate_page_drain(cpu)) { 733 need_activate_page_drain(cpu)) {
692 INIT_WORK(work, lru_add_drain_per_cpu); 734 INIT_WORK(work, lru_add_drain_per_cpu);