aboutsummaryrefslogtreecommitdiffstats
path: root/mm/memcontrol.c
diff options
context:
space:
mode:
authorMichal Hocko <mhocko@suse.cz>2013-04-29 18:07:14 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2013-04-29 18:54:32 -0400
commitc40046f3ad5e877b18cc721aaa7906b98077bc2d (patch)
tree1203830ae9127e3f0a95e9c0c9a6ad6793d25314 /mm/memcontrol.c
parent5e7ccf8635c93b493f7d378a57ce300fbe1484e8 (diff)
memcg: keep prev's css alive for the whole mem_cgroup_iter
The patchset tries to make mem_cgroup_iter saner in the way how it walks hierarchies. css->id based traversal is far from being ideal as it is not deterministic because it depends on the creation ordering. Additional to that css_id is considered a burden for cgroup maintainers because it is quite some code and memcg is the last user of it. After this series only the swap accounting uses css_id but that one will follow up later. Diffstat (if we exclude removed/added comments) looks quite promising. We got rid of some code: $ git diff mmotm... | grep -v "^[+-][[:space:]]*[/ ]\*" | diffstat b/include/linux/cgroup.h | 3 --- kernel/cgroup.c | 33 --------------------------------- mm/memcontrol.c | 4 +++- 3 files changed, 3 insertions(+), 37 deletions(-) The first patch is just preparatory and it changes when we release css of the previously returned memcg. Nothing controlversial. The second patch is the core of the patchset and it replaces css_get_next based on css_id by the generic cgroup pre-order. This brings some chalanges for the last visited group caching during the reclaim (mem_cgroup_per_zone::reclaim_iter). We have to use memcg pointers directly now which means that we have to keep a reference to those groups' css to keep them alive. I also folded iter_lock introduced by https://lkml.org/lkml/2013/1/3/295 in the previous version into this patch. Johannes felt the race I was describing should be mostly harmless and I haven't been able to trigger it so the lock doesn't deserve its own patch. It is still needed temporarily, though, because the reference counting on iter->last_visited depends on it. It will go away with the next patch. The next patch fixups an unbounded cgroup removal holdoff caused by the elevated css refcount. The issue has been observed by Ying Han. Johannes wasn't impressed by the previous version of the fix (https://lkml.org/lkml/2013/2/8/379) which cleaned up pending references during mem_cgroup_css_offline when a group is removed. He has suggested a different way when the iterator checks whether a cached memcg is still valid or no. More on that in the patch but the basic idea is that every memcg tracks the number removed subgroups and iterator records this number when a group is cached. These numbers are checked before iter->last_visited is about to be used and the iteration is restarted if it is invalid. The fourth and fifth patches are an attempt for simplification of the mem_cgroup_iter. css juggling is removed and the iteration logic is moved to a helper so that the reference counting and iteration are separated. The last patch just removes css_get_next as there is no user for it any longer. My testing looked as follows: A (use_hierarchy=1, limit_in_bytes=150M) /|\ 1 2 3 Children groups were created so that the number is never higher than 3 and their limits were random between 50-100M. Each group hosts a kernel build (starting with tar -xf so the tree is not shared and make -jNUM_CPUs/3) and terminated after random time - up to 5 minutes) and then it is removed. This should exercise both leaf and hierarchical reclaim as well as races with cgroup removals and debugging messages I added on top proved that. 100 groups were created during the test. This patch: css reference counting keeps the cgroup alive even though it has been already removed. mem_cgroup_iter relies on this fact and takes a reference to the returned group. The reference is then released on the next iteration or mem_cgroup_iter_break. mem_cgroup_iter currently releases the reference right after it gets the last css_id. This is correct because neither prev's memcg nor cgroup are accessed after then. This will change in the next patch so we need to hold the group alive a bit longer so let's move the css_put at the end of the function. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Ying Han <yinghan@google.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/memcontrol.c')
-rw-r--r--mm/memcontrol.c13
1 files changed, 7 insertions, 6 deletions
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2b552224f5cf..661a2c679f64 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1100,12 +1100,9 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
1100 if (prev && !reclaim) 1100 if (prev && !reclaim)
1101 id = css_id(&prev->css); 1101 id = css_id(&prev->css);
1102 1102
1103 if (prev && prev != root)
1104 css_put(&prev->css);
1105
1106 if (!root->use_hierarchy && root != root_mem_cgroup) { 1103 if (!root->use_hierarchy && root != root_mem_cgroup) {
1107 if (prev) 1104 if (prev)
1108 return NULL; 1105 goto out_css_put;
1109 return root; 1106 return root;
1110 } 1107 }
1111 1108
@@ -1121,7 +1118,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
1121 mz = mem_cgroup_zoneinfo(root, nid, zid); 1118 mz = mem_cgroup_zoneinfo(root, nid, zid);
1122 iter = &mz->reclaim_iter[reclaim->priority]; 1119 iter = &mz->reclaim_iter[reclaim->priority];
1123 if (prev && reclaim->generation != iter->generation) 1120 if (prev && reclaim->generation != iter->generation)
1124 return NULL; 1121 goto out_css_put;
1125 id = iter->position; 1122 id = iter->position;
1126 } 1123 }
1127 1124
@@ -1143,8 +1140,12 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
1143 } 1140 }
1144 1141
1145 if (prev && !css) 1142 if (prev && !css)
1146 return NULL; 1143 goto out_css_put;
1147 } 1144 }
1145out_css_put:
1146 if (prev && prev != root)
1147 css_put(&prev->css);
1148
1148 return memcg; 1149 return memcg;
1149} 1150}
1150 1151