aboutsummaryrefslogtreecommitdiffstats
path: root/mm/vmscan.c
diff options
context:
space:
mode:
authorMinchan Kim <minchan@kernel.org>2013-02-22 19:35:37 -0500
committerLinus Torvalds <torvalds@linux-foundation.org>2013-02-23 20:50:21 -0500
commit0e50ce3b50fb4ffc38c98fe7622361da4d0808c1 (patch)
treead0a2ffd8a1b7cf6c70108e2b88334df147348e1 /mm/vmscan.c
parent00ef2d2f84babb9b209f0fc003bc490c6bf1e6ef (diff)
mm: use up free swap space before reaching OOM kill
Recently, Luigi reported there are lots of free swap space when OOM happens. It's easily reproduced on zram-over-swap, where many instance of memory hogs are running and laptop_mode is enabled. He said there was no problem when he disabled laptop_mode. The problem when I investigate problem is following as. Assumption for easy explanation: There are no page cache page in system because they all are already reclaimed. 1. try_to_free_pages disable may_writepage when laptop_mode is enabled. 2. shrink_inactive_list isolates victim pages from inactive anon lru list. 3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't pageout because sc->may_writepage is 0 so the page is rotated back into inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty. 4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and retry reclaim with higher priority. 5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list but got failed because it try to isolate pages with ISOLATE_CLEAN mode but inactive anon lru list is full of dirty pages by 3 so it just returns without any reclaim progress. 6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned. Because sc->nr_scanned is increased by shrink_page_list but we don't call shrink_page_list in 5 due to short of isolated pages. Above loop is continued until OOM happens. The problem didn't happen before [1] was merged because old logic's isolatation in shrink_inactive_list was successful and tried to call shrink_page_list to pageout them but it still ends up failed to page out by may_writepage. But important point is that sc->nr_scanned was increased although we couldn't swap out them so do_try_to_free_pages could set may_writepages. Since commit f80c0673610e ("mm: zone_reclaim: make isolate_lru_page() filter-aware") was introduced, it's not a good idea any more to depends on only the number of scanned pages for setting may_writepage. So this patch adds new trigger point of setting may_writepage as below DEF_PRIOIRTY - 2 which is used to show the significant memory pressure in VM so it's good fit for our purpose which would be better to lose power saving or clickety rather than OOM killing. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Luigi Semenzato <semenzato@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/vmscan.c')
-rw-r--r--mm/vmscan.c15
1 files changed, 10 insertions, 5 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 396ecee281d0..606d0bb46091 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2195,6 +2195,13 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
2195 goto out; 2195 goto out;
2196 2196
2197 /* 2197 /*
2198 * If we're getting trouble reclaiming, start doing
2199 * writepage even in laptop mode.
2200 */
2201 if (sc->priority < DEF_PRIORITY - 2)
2202 sc->may_writepage = 1;
2203
2204 /*
2198 * Try to write back as many pages as we just scanned. This 2205 * Try to write back as many pages as we just scanned. This
2199 * tends to cause slow streaming writers to write data to the 2206 * tends to cause slow streaming writers to write data to the
2200 * disk smoothly, at the dirtying rate, which is nice. But 2207 * disk smoothly, at the dirtying rate, which is nice. But
@@ -2765,12 +2772,10 @@ loop_again:
2765 } 2772 }
2766 2773
2767 /* 2774 /*
2768 * If we've done a decent amount of scanning and 2775 * If we're getting trouble reclaiming, start doing
2769 * the reclaim ratio is low, start doing writepage 2776 * writepage even in laptop mode.
2770 * even in laptop mode
2771 */ 2777 */
2772 if (total_scanned > SWAP_CLUSTER_MAX * 2 && 2778 if (sc.priority < DEF_PRIORITY - 2)
2773 total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
2774 sc.may_writepage = 1; 2779 sc.may_writepage = 1;
2775 2780
2776 if (zone->all_unreclaimable) { 2781 if (zone->all_unreclaimable) {