aboutsummaryrefslogtreecommitdiffstats
path: root/mm/memcontrol.c
diff options
context:
space:
mode:
authorHugh Dickins <hughd@google.com>2012-03-05 17:59:18 -0500
committerLinus Torvalds <torvalds@linux-foundation.org>2012-03-05 18:49:43 -0500
commit7512102cf64d36e3c7444480273623c7aab3563f (patch)
tree4c5b589455ed0d343384d7eeaedfab5057648a0d /mm/memcontrol.c
parent9f78ff005a6b6313728247113948450b2adddde8 (diff)
memcg: fix GPF when cgroup removal races with last exit
When moving tasks from old memcg (with move_charge_at_immigrate on new memcg), followed by removal of old memcg, hit General Protection Fault in mem_cgroup_lru_del_list() (called from release_pages called from free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from exit_mmap from mmput from exit_mm from do_exit). Somewhat reproducible, takes a few hours: the old struct mem_cgroup has been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is still trying to update its stats, and take page off lru before freeing. A task, or a charge, or a page on lru: each secures a memcg against removal. In this case, the last task has been moved out of the old memcg, and it is exiting: anonymous pages are uncharged one by one from the memcg, as they are zapped from its pagetables, so the charge gets down to 0; but the pages themselves are queued in an mmu_gather for freeing. Most of those pages will be on lru (and force_empty is careful to lru_add_drain_all, to add pages from pagevec to lru first), but not necessarily all: perhaps some have been isolated for page reclaim, perhaps some isolated for other reasons. So, force_empty may find no task, no charge and no page on lru, and let the removal proceed. There would still be no problem if these pages were immediately freed; but typically (and the put_page_testzero protocol demands it) they have to be added back to lru before they are found freeable, then removed from lru and freed. We don't see the issue when adding, because the mem_cgroup_iter() loops keep their own reference to the memcg being scanned; but when it comes to mem_cgroup_lru_del_list(). I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and PageCgroupUsed flags were used (like a trick with mirrors) to deflect view of pc->mem_cgroup to the stable root_mem_cgroup when neither set. 38c5d72f3ebe ("memcg: simplify LRU handling by new rule") mercifully removed those convolutions, but left this General Protection Fault. But it's surprisingly easy to restore the old behaviour: just check PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec to add), and reset pc to root_mem_cgroup if page is uncharged. A risky change? just going back to how it worked before; testing, and an audit of uses of pc->mem_cgroup, show no problem. And there's a nice bonus: with mem_cgroup_lru_add_list() itself making sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no longer has any purpose, and we can safely revert 4e5f01c2b9b9 ("memcg: clear pc->mem_cgroup if necessary"). Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c is not strictly necessary: the lru_lock there, with RCU before memcg structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe without that; but it seems cleaner to rely on one dependency less. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/memcontrol.c')
-rw-r--r--mm/memcontrol.c30
1 files changed, 13 insertions, 17 deletions
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1097d8098f8c..d0e57a3cda18 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1042,6 +1042,19 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
1042 1042
1043 pc = lookup_page_cgroup(page); 1043 pc = lookup_page_cgroup(page);
1044 memcg = pc->mem_cgroup; 1044 memcg = pc->mem_cgroup;
1045
1046 /*
1047 * Surreptitiously switch any uncharged page to root:
1048 * an uncharged page off lru does nothing to secure
1049 * its former mem_cgroup from sudden removal.
1050 *
1051 * Our caller holds lru_lock, and PageCgroupUsed is updated
1052 * under page_cgroup lock: between them, they make all uses
1053 * of pc->mem_cgroup safe.
1054 */
1055 if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
1056 pc->mem_cgroup = memcg = root_mem_cgroup;
1057
1045 mz = page_cgroup_zoneinfo(memcg, page); 1058 mz = page_cgroup_zoneinfo(memcg, page);
1046 /* compound_order() is stabilized through lru_lock */ 1059 /* compound_order() is stabilized through lru_lock */
1047 MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page); 1060 MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
@@ -3029,23 +3042,6 @@ void mem_cgroup_uncharge_end(void)
3029 batch->memcg = NULL; 3042 batch->memcg = NULL;
3030} 3043}
3031 3044
3032/*
3033 * A function for resetting pc->mem_cgroup for newly allocated pages.
3034 * This function should be called if the newpage will be added to LRU
3035 * before start accounting.
3036 */
3037void mem_cgroup_reset_owner(struct page *newpage)
3038{
3039 struct page_cgroup *pc;
3040
3041 if (mem_cgroup_disabled())
3042 return;
3043
3044 pc = lookup_page_cgroup(newpage);
3045 VM_BUG_ON(PageCgroupUsed(pc));
3046 pc->mem_cgroup = root_mem_cgroup;
3047}
3048
3049#ifdef CONFIG_SWAP 3045#ifdef CONFIG_SWAP
3050/* 3046/*
3051 * called after __delete_from_swap_cache() and drop "page" account. 3047 * called after __delete_from_swap_cache() and drop "page" account.