aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>2009-01-07 21:08:31 -0500
committerLinus Torvalds <torvalds@linux-foundation.org>2009-01-08 11:31:10 -0500
commit03f3c433648a97ae7c86be789edba67690f6ea60 (patch)
treed92a17f6fe9d90d3a6b46762742ef239bf5ecc44
parent42e9abb628def2c335a4ecf130bb6c88d916d885 (diff)
memcg: fix swap accounting leak
Fix swapin charge operation of memcg. Now, memcg has hooks to swap-out operation and checks SwapCache is really unused or not. That check depends on contents of struct page. I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as still-in-use. Now, reuse_swap_page() calles delete_from_swap_cache() before establishment of any rmap. Then, in followinig sequence (Page fault with WRITE) try_charge() (charge += PAGESIZE) commit_charge() (Check page_cgroup is used or not..) reuse_swap_page() -> delete_from_swapcache() -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE) ...... New charge is uncharged soon.... To avoid this, move commit_charge() after page_mapcount() goes up to 1. By this, try_charge() (usage += PAGESIZE) reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set) commit_charge() (If page_cgroup is not marked as PCG_USED, add new charge.) Accounting will be correct. Changelog (v2) -> (v3) - fixed invalid charge to swp_entry==0. - updated documentation. Changelog (v1) -> (v2) - fixed comment. [nishimura@mxp.nes.nec.co.jp: swap accounting leak doc fix] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--Documentation/controllers/memcg_test.txt41
-rw-r--r--mm/memcontrol.c7
-rw-r--r--mm/memory.c11
3 files changed, 46 insertions, 13 deletions
diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
index c91f69b0b549..08d4d3ea0d79 100644
--- a/Documentation/controllers/memcg_test.txt
+++ b/Documentation/controllers/memcg_test.txt
@@ -1,6 +1,6 @@
1Memory Resource Controller(Memcg) Implementation Memo. 1Memory Resource Controller(Memcg) Implementation Memo.
2Last Updated: 2008/12/10 2Last Updated: 2008/12/15
3Base Kernel Version: based on 2.6.28-rc7-mm. 3Base Kernel Version: based on 2.6.28-rc8-mm.
4 4
5Because VM is getting complex (one of reasons is memcg...), memcg's behavior 5Because VM is getting complex (one of reasons is memcg...), memcg's behavior
6is complex. This is a document for memcg's internal behavior. 6is complex. This is a document for memcg's internal behavior.
@@ -111,9 +111,40 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
111 (b) If the SwapCache has been mapped by processes, it has been 111 (b) If the SwapCache has been mapped by processes, it has been
112 charged already. 112 charged already.
113 113
114 In case (a), we charge it. In case (b), we don't charge it. 114 This swap-in is one of the most complicated work. In do_swap_page(),
115 (But racy state between (a) and (b) exists. We do check it.) 115 following events occur when pte is unchanged.
116 At charging, a charge recorded in swap_cgroup is moved to page_cgroup. 116
117 (1) the page (SwapCache) is looked up.
118 (2) lock_page()
119 (3) try_charge_swapin()
120 (4) reuse_swap_page() (may call delete_swap_cache())
121 (5) commit_charge_swapin()
122 (6) swap_free().
123
124 Considering following situation for example.
125
126 (A) The page has not been charged before (2) and reuse_swap_page()
127 doesn't call delete_from_swap_cache().
128 (B) The page has not been charged before (2) and reuse_swap_page()
129 calls delete_from_swap_cache().
130 (C) The page has been charged before (2) and reuse_swap_page() doesn't
131 call delete_from_swap_cache().
132 (D) The page has been charged before (2) and reuse_swap_page() calls
133 delete_from_swap_cache().
134
135 memory.usage/memsw.usage changes to this page/swp_entry will be
136 Case (A) (B) (C) (D)
137 Event
138 Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
139 ===========================================
140 (3) +1/+1 +1/+1 +1/+1 +1/+1
141 (4) - 0/ 0 - -1/ 0
142 (5) 0/-1 0/ 0 -1/-1 0/ 0
143 (6) - 0/-1 - 0/-1
144 ===========================================
145 Result 1/ 1 1/ 1 1/ 1 1/ 1
146
147 In any cases, charges to this page should be 1/ 1.
117 148
118 4.2 Swap-out. 149 4.2 Swap-out.
119 At swap-out, typical state transition is below. 150 At swap-out, typical state transition is below.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a7ecf23150c5..0ed61e27d526 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1169,10 +1169,11 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
1169 /* 1169 /*
1170 * Now swap is on-memory. This means this page may be 1170 * Now swap is on-memory. This means this page may be
1171 * counted both as mem and swap....double count. 1171 * counted both as mem and swap....double count.
1172 * Fix it by uncharging from memsw. This SwapCache is stable 1172 * Fix it by uncharging from memsw. Basically, this SwapCache is stable
1173 * because we're still under lock_page(). 1173 * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
1174 * may call delete_from_swap_cache() before reach here.
1174 */ 1175 */
1175 if (do_swap_account) { 1176 if (do_swap_account && PageSwapCache(page)) {
1176 swp_entry_t ent = {.val = page_private(page)}; 1177 swp_entry_t ent = {.val = page_private(page)};
1177 struct mem_cgroup *memcg; 1178 struct mem_cgroup *memcg;
1178 memcg = swap_cgroup_record(ent, NULL); 1179 memcg = swap_cgroup_record(ent, NULL);
diff --git a/mm/memory.c b/mm/memory.c
index e5bfbe6b594c..e009ce870859 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2457,22 +2457,23 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
2457 * while the page is counted on swap but not yet in mapcount i.e. 2457 * while the page is counted on swap but not yet in mapcount i.e.
2458 * before page_add_anon_rmap() and swap_free(); try_to_free_swap() 2458 * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
2459 * must be called after the swap_free(), or it will never succeed. 2459 * must be called after the swap_free(), or it will never succeed.
2460 * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry 2460 * Because delete_from_swap_page() may be called by reuse_swap_page(),
2461 * in page->private, must be called before reuse_swap_page(), 2461 * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
2462 * which may delete_from_swap_cache(). 2462 * in page->private. In this case, a record in swap_cgroup is silently
2463 * discarded at swap_free().
2463 */ 2464 */
2464 2465
2465 mem_cgroup_commit_charge_swapin(page, ptr);
2466 inc_mm_counter(mm, anon_rss); 2466 inc_mm_counter(mm, anon_rss);
2467 pte = mk_pte(page, vma->vm_page_prot); 2467 pte = mk_pte(page, vma->vm_page_prot);
2468 if (write_access && reuse_swap_page(page)) { 2468 if (write_access && reuse_swap_page(page)) {
2469 pte = maybe_mkwrite(pte_mkdirty(pte), vma); 2469 pte = maybe_mkwrite(pte_mkdirty(pte), vma);
2470 write_access = 0; 2470 write_access = 0;
2471 } 2471 }
2472
2473 flush_icache_page(vma, page); 2472 flush_icache_page(vma, page);
2474 set_pte_at(mm, address, page_table, pte); 2473 set_pte_at(mm, address, page_table, pte);
2475 page_add_anon_rmap(page, vma, address); 2474 page_add_anon_rmap(page, vma, address);
2475 /* It's better to call commit-charge after rmap is established */
2476 mem_cgroup_commit_charge_swapin(page, ptr);
2476 2477
2477 swap_free(entry); 2478 swap_free(entry);
2478 if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) 2479 if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))