aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cgroups
diff options
context:
space:
mode:
authorJohannes Weiner <hannes@cmpxchg.org>2014-08-08 17:19:22 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2014-08-08 18:57:17 -0400
commit0a31bc97c80c3fa87b32c091d9a930ac19cd0c40 (patch)
tree06dafd237309f9b8ded980eb420a5377989e2c0b /Documentation/cgroups
parent00501b531c4723972aa11d6d4ebcf8d6552007c8 (diff)
mm: memcontrol: rewrite uncharge API
The memcg uncharging code that is involved towards the end of a page's lifetime - truncation, reclaim, swapout, migration - is impressively complicated and fragile. Because anonymous and file pages were always charged before they had their page->mapping established, uncharges had to happen when the page type could still be known from the context; as in unmap for anonymous, page cache removal for file and shmem pages, and swap cache truncation for swap pages. However, these operations happen well before the page is actually freed, and so a lot of synchronization is necessary: - Charging, uncharging, page migration, and charge migration all need to take a per-page bit spinlock as they could race with uncharging. - Swap cache truncation happens during both swap-in and swap-out, and possibly repeatedly before the page is actually freed. This means that the memcg swapout code is called from many contexts that make no sense and it has to figure out the direction from page state to make sure memory and memory+swap are always correctly charged. - On page migration, the old page might be unmapped but then reused, so memcg code has to prevent untimely uncharging in that case. Because this code - which should be a simple charge transfer - is so special-cased, it is not reusable for replace_page_cache(). But now that charged pages always have a page->mapping, introduce mem_cgroup_uncharge(), which is called after the final put_page(), when we know for sure that nobody is looking at the page anymore. For page migration, introduce mem_cgroup_migrate(), which is called after the migration is successful and the new page is fully rmapped. Because the old page is no longer uncharged after migration, prevent double charges by decoupling the page's memcg association (PCG_USED and pc->mem_cgroup) from the page holding an actual charge. The new bits PCG_MEM and PCG_MEMSW represent the respective charges and are transferred to the new page during migration. mem_cgroup_migrate() is suitable for replace_page_cache() as well, which gets rid of mem_cgroup_replace_page_cache(). However, care needs to be taken because both the source and the target page can already be charged and on the LRU when fuse is splicing: grab the page lock on the charge moving side to prevent changing pc->mem_cgroup of a page under migration. Also, the lruvecs of both pages change as we uncharge the old and charge the new during migration, and putback may race with us, so grab the lru lock and isolate the pages iff on LRU to prevent races and ensure the pages are on the right lruvec afterward. Swap accounting is massively simplified: because the page is no longer uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry before the final put_page() in page reclaim. Finally, page_cgroup changes are now protected by whatever protection the page itself offers: anonymous pages are charged under the page table lock, whereas page cache insertions, swapin, and migration hold the page lock. Uncharging happens under full exclusion with no outstanding references. Charging and uncharging also ensure that the page is off-LRU, which serializes against charge migration. Remove the very costly page_cgroup lock and set pc->flags non-atomically. [mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable] [vdavydov@parallels.com: fix flags definition] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Tested-by: Jet Chen <jet.chen@intel.com> Acked-by: Michal Hocko <mhocko@suse.cz> Tested-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/cgroups')
-rw-r--r--Documentation/cgroups/memcg_test.txt128
1 files changed, 6 insertions, 122 deletions
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt
index bcf750d3cecd..8870b0212150 100644
--- a/Documentation/cgroups/memcg_test.txt
+++ b/Documentation/cgroups/memcg_test.txt
@@ -29,28 +29,13 @@ Please note that implementation details can be changed.
292. Uncharge 292. Uncharge
30 a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by 30 a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by
31 31
32 mem_cgroup_uncharge_page() 32 mem_cgroup_uncharge()
33 Called when an anonymous page is fully unmapped. I.e., mapcount goes 33 Called when a page's refcount goes down to 0.
34 to 0. If the page is SwapCache, uncharge is delayed until
35 mem_cgroup_uncharge_swapcache().
36
37 mem_cgroup_uncharge_cache_page()
38 Called when a page-cache is deleted from radix-tree. If the page is
39 SwapCache, uncharge is delayed until mem_cgroup_uncharge_swapcache().
40
41 mem_cgroup_uncharge_swapcache()
42 Called when SwapCache is removed from radix-tree. The charge itself
43 is moved to swap_cgroup. (If mem+swap controller is disabled, no
44 charge to swap occurs.)
45 34
46 mem_cgroup_uncharge_swap() 35 mem_cgroup_uncharge_swap()
47 Called when swp_entry's refcnt goes down to 0. A charge against swap 36 Called when swp_entry's refcnt goes down to 0. A charge against swap
48 disappears. 37 disappears.
49 38
50 mem_cgroup_end_migration(old, new)
51 At success of migration old is uncharged (if necessary), a charge
52 to new page is committed. At failure, charge to old page is committed.
53
543. charge-commit-cancel 393. charge-commit-cancel
55 Memcg pages are charged in two steps: 40 Memcg pages are charged in two steps:
56 mem_cgroup_try_charge() 41 mem_cgroup_try_charge()
@@ -69,18 +54,6 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
69 Anonymous page is newly allocated at 54 Anonymous page is newly allocated at
70 - page fault into MAP_ANONYMOUS mapping. 55 - page fault into MAP_ANONYMOUS mapping.
71 - Copy-On-Write. 56 - Copy-On-Write.
72 It is charged right after it's allocated before doing any page table
73 related operations. Of course, it's uncharged when another page is used
74 for the fault address.
75
76 At freeing anonymous page (by exit() or munmap()), zap_pte() is called
77 and pages for ptes are freed one by one.(see mm/memory.c). Uncharges
78 are done at page_remove_rmap() when page_mapcount() goes down to 0.
79
80 Another page freeing is by page-reclaim (vmscan.c) and anonymous
81 pages are swapped out. In this case, the page is marked as
82 PageSwapCache(). uncharge() routine doesn't uncharge the page marked
83 as SwapCache(). It's delayed until __delete_from_swap_cache().
84 57
85 4.1 Swap-in. 58 4.1 Swap-in.
86 At swap-in, the page is taken from swap-cache. There are 2 cases. 59 At swap-in, the page is taken from swap-cache. There are 2 cases.
@@ -89,41 +62,6 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
89 (b) If the SwapCache has been mapped by processes, it has been 62 (b) If the SwapCache has been mapped by processes, it has been
90 charged already. 63 charged already.
91 64
92 This swap-in is one of the most complicated work. In do_swap_page(),
93 following events occur when pte is unchanged.
94
95 (1) the page (SwapCache) is looked up.
96 (2) lock_page()
97 (3) try_charge_swapin()
98 (4) reuse_swap_page() (may call delete_swap_cache())
99 (5) commit_charge_swapin()
100 (6) swap_free().
101
102 Considering following situation for example.
103
104 (A) The page has not been charged before (2) and reuse_swap_page()
105 doesn't call delete_from_swap_cache().
106 (B) The page has not been charged before (2) and reuse_swap_page()
107 calls delete_from_swap_cache().
108 (C) The page has been charged before (2) and reuse_swap_page() doesn't
109 call delete_from_swap_cache().
110 (D) The page has been charged before (2) and reuse_swap_page() calls
111 delete_from_swap_cache().
112
113 memory.usage/memsw.usage changes to this page/swp_entry will be
114 Case (A) (B) (C) (D)
115 Event
116 Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
117 ===========================================
118 (3) +1/+1 +1/+1 +1/+1 +1/+1
119 (4) - 0/ 0 - -1/ 0
120 (5) 0/-1 0/ 0 -1/-1 0/ 0
121 (6) - 0/-1 - 0/-1
122 ===========================================
123 Result 1/ 1 1/ 1 1/ 1 1/ 1
124
125 In any cases, charges to this page should be 1/ 1.
126
127 4.2 Swap-out. 65 4.2 Swap-out.
128 At swap-out, typical state transition is below. 66 At swap-out, typical state transition is below.
129 67
@@ -136,28 +74,20 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
136 swp_entry's refcnt -= 1. 74 swp_entry's refcnt -= 1.
137 75
138 76
139 At (b), the page is marked as SwapCache and not uncharged.
140 At (d), the page is removed from SwapCache and a charge in page_cgroup
141 is moved to swap_cgroup.
142
143 Finally, at task exit, 77 Finally, at task exit,
144 (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0. 78 (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0.
145 Here, a charge in swap_cgroup disappears.
146 79
1475. Page Cache 805. Page Cache
148 Page Cache is charged at 81 Page Cache is charged at
149 - add_to_page_cache_locked(). 82 - add_to_page_cache_locked().
150 83
151 uncharged at
152 - __remove_from_page_cache().
153
154 The logic is very clear. (About migration, see below) 84 The logic is very clear. (About migration, see below)
155 Note: __remove_from_page_cache() is called by remove_from_page_cache() 85 Note: __remove_from_page_cache() is called by remove_from_page_cache()
156 and __remove_mapping(). 86 and __remove_mapping().
157 87
1586. Shmem(tmpfs) Page Cache 886. Shmem(tmpfs) Page Cache
159 Memcg's charge/uncharge have special handlers of shmem. The best way 89 The best way to understand shmem's page state transition is to read
160 to understand shmem's page state transition is to read mm/shmem.c. 90 mm/shmem.c.
161 But brief explanation of the behavior of memcg around shmem will be 91 But brief explanation of the behavior of memcg around shmem will be
162 helpful to understand the logic. 92 helpful to understand the logic.
163 93
@@ -170,56 +100,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
170 It's charged when... 100 It's charged when...
171 - A new page is added to shmem's radix-tree. 101 - A new page is added to shmem's radix-tree.
172 - A swp page is read. (move a charge from swap_cgroup to page_cgroup) 102 - A swp page is read. (move a charge from swap_cgroup to page_cgroup)
173 It's uncharged when
174 - A page is removed from radix-tree and not SwapCache.
175 - When SwapCache is removed, a charge is moved to swap_cgroup.
176 - When swp_entry's refcnt goes down to 0, a charge in swap_cgroup
177 disappears.
178 103
1797. Page Migration 1047. Page Migration
180 One of the most complicated functions is page-migration-handler. 105
181 Memcg has 2 routines. Assume that we are migrating a page's contents 106 mem_cgroup_migrate()
182 from OLDPAGE to NEWPAGE.
183
184 Usual migration logic is..
185 (a) remove the page from LRU.
186 (b) allocate NEWPAGE (migration target)
187 (c) lock by lock_page().
188 (d) unmap all mappings.
189 (e-1) If necessary, replace entry in radix-tree.
190 (e-2) move contents of a page.
191 (f) map all mappings again.
192 (g) pushback the page to LRU.
193 (-) OLDPAGE will be freed.
194
195 Before (g), memcg should complete all necessary charge/uncharge to
196 NEWPAGE/OLDPAGE.
197
198 The point is....
199 - If OLDPAGE is anonymous, all charges will be dropped at (d) because
200 try_to_unmap() drops all mapcount and the page will not be
201 SwapCache.
202
203 - If OLDPAGE is SwapCache, charges will be kept at (g) because
204 __delete_from_swap_cache() isn't called at (e-1)
205
206 - If OLDPAGE is page-cache, charges will be kept at (g) because
207 remove_from_swap_cache() isn't called at (e-1)
208
209 memcg provides following hooks.
210
211 - mem_cgroup_prepare_migration(OLDPAGE)
212 Called after (b) to account a charge (usage += PAGE_SIZE) against
213 memcg which OLDPAGE belongs to.
214
215 - mem_cgroup_end_migration(OLDPAGE, NEWPAGE)
216 Called after (f) before (g).
217 If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already
218 charged, a charge by prepare_migration() is automatically canceled.
219 If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE.
220
221 But zap_pte() (by exit or munmap) can be called while migration,
222 we have to check if OLDPAGE/NEWPAGE is a valid page after commit().
223 107
2248. LRU 1088. LRU
225 Each memcg has its own private LRU. Now, its handling is under global 109 Each memcg has its own private LRU. Now, its handling is under global