diff options
author | KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> | 2012-03-21 19:34:25 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-03-21 20:55:01 -0400 |
commit | 89c06bd52fb9ffceddf84f7309d2e8c9f1666216 (patch) | |
tree | 43ec3d97a89988bc143bed5796bcd7bef64212dc /mm/rmap.c | |
parent | 312734c04e2fecc58429aec98194e4ff12d8f7d6 (diff) |
memcg: use new logic for page stat accounting
Now, page-stat-per-memcg is recorded into per page_cgroup flag by
duplicating page's status into the flag. The reason is that memcg has a
feature to move a page from a group to another group and we have race
between "move" and "page stat accounting",
Under current logic, assume CPU-A and CPU-B. CPU-A does "move" and CPU-B
does "page stat accounting".
When CPU-A goes 1st,
CPU-A CPU-B
update "struct page" info.
move_lock_mem_cgroup(memcg)
see pc->flags
copy page stat to new group
overwrite pc->mem_cgroup.
move_unlock_mem_cgroup(memcg)
move_lock_mem_cgroup(mem)
set pc->flags
update page stat accounting
move_unlock_mem_cgroup(mem)
stat accounting is guarded by move_lock_mem_cgroup() and "move" logic
(CPU-A) doesn't see changes in "struct page" information.
But it's costly to have the same information both in 'struct page' and
'struct page_cgroup'. And, there is a potential problem.
For example, assume we have PG_dirty accounting in memcg.
PG_..is a flag for struct page.
PCG_ is a flag for struct page_cgroup.
(This is just an example. The same problem can be found in any
kind of page stat accounting.)
CPU-A CPU-B
TestSet PG_dirty
(delay) TestClear PG_dirty
if (TestClear(PCG_dirty))
memcg->nr_dirty--
if (TestSet(PCG_dirty))
memcg->nr_dirty++
Here, memcg->nr_dirty = +1, this is wrong. This race was reported by Greg
Thelen <gthelen@google.com>. Now, only FILE_MAPPED is supported but
fortunately, it's serialized by page table lock and this is not real bug,
_now_,
If this potential problem is caused by having duplicated information in
struct page and struct page_cgroup, we may be able to fix this by using
original 'struct page' information. But we'll have a problem in "move
account"
Assume we use only PG_dirty.
CPU-A CPU-B
TestSet PG_dirty
(delay) move_lock_mem_cgroup()
if (PageDirty(page))
new_memcg->nr_dirty++
pc->mem_cgroup = new_memcg;
move_unlock_mem_cgroup()
move_lock_mem_cgroup()
memcg = pc->mem_cgroup
new_memcg->nr_dirty++
accounting information may be double-counted. This was original reason to
have PCG_xxx flags but it seems PCG_xxx has another problem.
I think we need a bigger lock as
move_lock_mem_cgroup(page)
TestSetPageDirty(page)
update page stats (without any checks)
move_unlock_mem_cgroup(page)
This fixes both of problems and we don't have to duplicate page flag into
page_cgroup. Please note: move_lock_mem_cgroup() is held only when there
are possibility of "account move" under the system. So, in most path,
status update will go without atomic locks.
This patch introduces mem_cgroup_begin_update_page_stat() and
mem_cgroup_end_update_page_stat() both should be called at modifying
'struct page' information if memcg takes care of it. as
mem_cgroup_begin_update_page_stat()
modify page information
mem_cgroup_update_page_stat()
=> never check any 'struct page' info, just update counters.
mem_cgroup_end_update_page_stat().
This patch is slow because we need to call begin_update_page_stat()/
end_update_page_stat() regardless of accounted will be changed or not. A
following patch adds an easy optimization and reduces the cost.
[akpm@linux-foundation.org: s/lock/locked/]
[hughd@google.com: fix deadlock by avoiding stat lock when anon]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Greg Thelen <gthelen@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/rmap.c')
-rw-r--r-- | mm/rmap.c | 28 |
1 files changed, 24 insertions, 4 deletions
@@ -1148,10 +1148,15 @@ void page_add_new_anon_rmap(struct page *page, | |||
1148 | */ | 1148 | */ |
1149 | void page_add_file_rmap(struct page *page) | 1149 | void page_add_file_rmap(struct page *page) |
1150 | { | 1150 | { |
1151 | bool locked; | ||
1152 | unsigned long flags; | ||
1153 | |||
1154 | mem_cgroup_begin_update_page_stat(page, &locked, &flags); | ||
1151 | if (atomic_inc_and_test(&page->_mapcount)) { | 1155 | if (atomic_inc_and_test(&page->_mapcount)) { |
1152 | __inc_zone_page_state(page, NR_FILE_MAPPED); | 1156 | __inc_zone_page_state(page, NR_FILE_MAPPED); |
1153 | mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); | 1157 | mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); |
1154 | } | 1158 | } |
1159 | mem_cgroup_end_update_page_stat(page, &locked, &flags); | ||
1155 | } | 1160 | } |
1156 | 1161 | ||
1157 | /** | 1162 | /** |
@@ -1162,9 +1167,21 @@ void page_add_file_rmap(struct page *page) | |||
1162 | */ | 1167 | */ |
1163 | void page_remove_rmap(struct page *page) | 1168 | void page_remove_rmap(struct page *page) |
1164 | { | 1169 | { |
1170 | bool anon = PageAnon(page); | ||
1171 | bool locked; | ||
1172 | unsigned long flags; | ||
1173 | |||
1174 | /* | ||
1175 | * The anon case has no mem_cgroup page_stat to update; but may | ||
1176 | * uncharge_page() below, where the lock ordering can deadlock if | ||
1177 | * we hold the lock against page_stat move: so avoid it on anon. | ||
1178 | */ | ||
1179 | if (!anon) | ||
1180 | mem_cgroup_begin_update_page_stat(page, &locked, &flags); | ||
1181 | |||
1165 | /* page still mapped by someone else? */ | 1182 | /* page still mapped by someone else? */ |
1166 | if (!atomic_add_negative(-1, &page->_mapcount)) | 1183 | if (!atomic_add_negative(-1, &page->_mapcount)) |
1167 | return; | 1184 | goto out; |
1168 | 1185 | ||
1169 | /* | 1186 | /* |
1170 | * Now that the last pte has gone, s390 must transfer dirty | 1187 | * Now that the last pte has gone, s390 must transfer dirty |
@@ -1173,7 +1190,7 @@ void page_remove_rmap(struct page *page) | |||
1173 | * not if it's in swapcache - there might be another pte slot | 1190 | * not if it's in swapcache - there might be another pte slot |
1174 | * containing the swap entry, but page not yet written to swap. | 1191 | * containing the swap entry, but page not yet written to swap. |
1175 | */ | 1192 | */ |
1176 | if ((!PageAnon(page) || PageSwapCache(page)) && | 1193 | if ((!anon || PageSwapCache(page)) && |
1177 | page_test_and_clear_dirty(page_to_pfn(page), 1)) | 1194 | page_test_and_clear_dirty(page_to_pfn(page), 1)) |
1178 | set_page_dirty(page); | 1195 | set_page_dirty(page); |
1179 | /* | 1196 | /* |
@@ -1181,8 +1198,8 @@ void page_remove_rmap(struct page *page) | |||
1181 | * and not charged by memcg for now. | 1198 | * and not charged by memcg for now. |
1182 | */ | 1199 | */ |
1183 | if (unlikely(PageHuge(page))) | 1200 | if (unlikely(PageHuge(page))) |
1184 | return; | 1201 | goto out; |
1185 | if (PageAnon(page)) { | 1202 | if (anon) { |
1186 | mem_cgroup_uncharge_page(page); | 1203 | mem_cgroup_uncharge_page(page); |
1187 | if (!PageTransHuge(page)) | 1204 | if (!PageTransHuge(page)) |
1188 | __dec_zone_page_state(page, NR_ANON_PAGES); | 1205 | __dec_zone_page_state(page, NR_ANON_PAGES); |
@@ -1202,6 +1219,9 @@ void page_remove_rmap(struct page *page) | |||
1202 | * Leaving it set also helps swapoff to reinstate ptes | 1219 | * Leaving it set also helps swapoff to reinstate ptes |
1203 | * faster for those pages still in swapcache. | 1220 | * faster for those pages still in swapcache. |
1204 | */ | 1221 | */ |
1222 | out: | ||
1223 | if (!anon) | ||
1224 | mem_cgroup_end_update_page_stat(page, &locked, &flags); | ||
1205 | } | 1225 | } |
1206 | 1226 | ||
1207 | /* | 1227 | /* |