aboutsummaryrefslogtreecommitdiffstats
path: root/mm/vmscan.c
diff options
context:
space:
mode:
authorYing Han <yinghan@google.com>2011-05-26 19:25:33 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2011-05-26 20:12:35 -0400
commit889976dbcb1218119fdd950fb7819084e37d7d37 (patch)
tree7508706ddb6bcbe0f673aca3744f30f281b17734 /mm/vmscan.c
parent4e4c941c108eff10844d2b441d96dab44f32f424 (diff)
memcg: reclaim memory from nodes in round-robin order
Presently, memory cgroup's direct reclaim frees memory from the current node. But this has some troubles. Usually when a set of threads works in a cooperative way, they tend to operate on the same node. So if they hit limits under memcg they will reclaim memory from themselves, damaging the active working set. For example, assume 2 node system which has Node 0 and Node 1 and a memcg which has 1G limit. After some work, file cache remains and the usages are Node 0: 1M Node 1: 998M. and run an application on Node 0, it will eat its foot before freeing unnecessary file caches. This patch adds round-robin for NUMA and adds equal pressure to each node. When using cpuset's spread memory feature, this will work very well. But yes, a better algorithm is needed. [akpm@linux-foundation.org: comment editing] [kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons] Signed-off-by: Ying Han <yinghan@google.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/vmscan.c')
-rw-r--r--mm/vmscan.c10
1 files changed, 9 insertions, 1 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 884ae08c16cc..b0875871820d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2226,6 +2226,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
2226{ 2226{
2227 struct zonelist *zonelist; 2227 struct zonelist *zonelist;
2228 unsigned long nr_reclaimed; 2228 unsigned long nr_reclaimed;
2229 int nid;
2229 struct scan_control sc = { 2230 struct scan_control sc = {
2230 .may_writepage = !laptop_mode, 2231 .may_writepage = !laptop_mode,
2231 .may_unmap = 1, 2232 .may_unmap = 1,
@@ -2242,7 +2243,14 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
2242 .gfp_mask = sc.gfp_mask, 2243 .gfp_mask = sc.gfp_mask,
2243 }; 2244 };
2244 2245
2245 zonelist = NODE_DATA(numa_node_id())->node_zonelists; 2246 /*
2247 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
2248 * take care of from where we get pages. So the node where we start the
2249 * scan does not need to be the current node.
2250 */
2251 nid = mem_cgroup_select_victim_node(mem_cont);
2252
2253 zonelist = NODE_DATA(nid)->node_zonelists;
2246 2254
2247 trace_mm_vmscan_memcg_reclaim_begin(0, 2255 trace_mm_vmscan_memcg_reclaim_begin(0,
2248 sc.may_writepage, 2256 sc.may_writepage,