aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cgroups
diff options
context:
space:
mode:
authorIngo Molnar <mingo@elte.hu>2009-04-08 11:25:42 -0400
committerIngo Molnar <mingo@elte.hu>2009-04-08 11:26:00 -0400
commit5af8c4e0fac9838428bd718040b664043a05f37c (patch)
tree75a01d98ed244db45fe3c734c4a81c1a3d92ac37 /Documentation/cgroups
parent46e0bb9c12f4bab539736f1714cbf16600f681ec (diff)
parent577c9c456f0e1371cbade38eaf91ae8e8a308555 (diff)
Merge commit 'v2.6.30-rc1' into sched/urgent
Merge reason: update to latest upstream to queue up fix Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'Documentation/cgroups')
-rw-r--r--Documentation/cgroups/00-INDEX18
-rw-r--r--Documentation/cgroups/cgroups.txt42
-rw-r--r--Documentation/cgroups/cpusets.txt77
-rw-r--r--Documentation/cgroups/devices.txt2
-rw-r--r--Documentation/cgroups/memcg_test.txt22
-rw-r--r--Documentation/cgroups/memory.txt2
6 files changed, 112 insertions, 51 deletions
diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX
new file mode 100644
index 000000000000..3f58fa3d6d00
--- /dev/null
+++ b/Documentation/cgroups/00-INDEX
@@ -0,0 +1,18 @@
100-INDEX
2 - this file
3cgroups.txt
4 - Control Groups definition, implementation details, examples and API.
5cpuacct.txt
6 - CPU Accounting Controller; account CPU usage for groups of tasks.
7cpusets.txt
8 - documents the cpusets feature; assign CPUs and Mem to a set of tasks.
9devices.txt
10 - Device Whitelist Controller; description, interface and security.
11freezer-subsystem.txt
12 - checkpointing; rationale to not use signals, interface.
13memcg_test.txt
14 - Memory Resource Controller; implementation details.
15memory.txt
16 - Memory Resource Controller; design, accounting, interface, testing.
17resource_counter.txt
18 - Resource Counter API.
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index d9e5d6f41b92..6eb1a97e88ce 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -56,7 +56,7 @@ hierarchy, and a set of subsystems; each subsystem has system-specific
56state attached to each cgroup in the hierarchy. Each hierarchy has 56state attached to each cgroup in the hierarchy. Each hierarchy has
57an instance of the cgroup virtual filesystem associated with it. 57an instance of the cgroup virtual filesystem associated with it.
58 58
59At any one time there may be multiple active hierachies of task 59At any one time there may be multiple active hierarchies of task
60cgroups. Each hierarchy is a partition of all tasks in the system. 60cgroups. Each hierarchy is a partition of all tasks in the system.
61 61
62User level code may create and destroy cgroups by name in an 62User level code may create and destroy cgroups by name in an
@@ -124,10 +124,10 @@ following lines:
124 / \ 124 / \
125 Prof (15%) students (5%) 125 Prof (15%) students (5%)
126 126
127Browsers like firefox/lynx go into the WWW network class, while (k)nfsd go 127Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go
128into NFS network class. 128into NFS network class.
129 129
130At the same time firefox/lynx will share an appropriate CPU/Memory class 130At the same time Firefox/Lynx will share an appropriate CPU/Memory class
131depending on who launched it (prof/student). 131depending on who launched it (prof/student).
132 132
133With the ability to classify tasks differently for different resources 133With the ability to classify tasks differently for different resources
@@ -252,10 +252,8 @@ cgroup file system directories.
252When a task is moved from one cgroup to another, it gets a new 252When a task is moved from one cgroup to another, it gets a new
253css_set pointer - if there's an already existing css_set with the 253css_set pointer - if there's an already existing css_set with the
254desired collection of cgroups then that group is reused, else a new 254desired collection of cgroups then that group is reused, else a new
255css_set is allocated. Note that the current implementation uses a 255css_set is allocated. The appropriate existing css_set is located by
256linear search to locate an appropriate existing css_set, so isn't 256looking into a hash table.
257very efficient. A future version will use a hash table for better
258performance.
259 257
260To allow access from a cgroup to the css_sets (and hence tasks) 258To allow access from a cgroup to the css_sets (and hence tasks)
261that comprise it, a set of cg_cgroup_link objects form a lattice; 259that comprise it, a set of cg_cgroup_link objects form a lattice;
@@ -327,7 +325,7 @@ and then start a subshell 'sh' in that cgroup:
327Creating, modifying, using the cgroups can be done through the cgroup 325Creating, modifying, using the cgroups can be done through the cgroup
328virtual filesystem. 326virtual filesystem.
329 327
330To mount a cgroup hierarchy will all available subsystems, type: 328To mount a cgroup hierarchy with all available subsystems, type:
331# mount -t cgroup xxx /dev/cgroup 329# mount -t cgroup xxx /dev/cgroup
332 330
333The "xxx" is not interpreted by the cgroup code, but will appear in 331The "xxx" is not interpreted by the cgroup code, but will appear in
@@ -335,12 +333,23 @@ The "xxx" is not interpreted by the cgroup code, but will appear in
335 333
336To mount a cgroup hierarchy with just the cpuset and numtasks 334To mount a cgroup hierarchy with just the cpuset and numtasks
337subsystems, type: 335subsystems, type:
338# mount -t cgroup -o cpuset,numtasks hier1 /dev/cgroup 336# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup
339 337
340To change the set of subsystems bound to a mounted hierarchy, just 338To change the set of subsystems bound to a mounted hierarchy, just
341remount with different options: 339remount with different options:
340# mount -o remount,cpuset,ns hier1 /dev/cgroup
342 341
343# mount -o remount,cpuset,ns /dev/cgroup 342Now memory is removed from the hierarchy and ns is added.
343
344Note this will add ns to the hierarchy but won't remove memory or
345cpuset, because the new options are appended to the old ones:
346# mount -o remount,ns /dev/cgroup
347
348To Specify a hierarchy's release_agent:
349# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
350 xxx /dev/cgroup
351
352Note that specifying 'release_agent' more than once will return failure.
344 353
345Note that changing the set of subsystems is currently only supported 354Note that changing the set of subsystems is currently only supported
346when the hierarchy consists of a single (root) cgroup. Supporting 355when the hierarchy consists of a single (root) cgroup. Supporting
@@ -351,6 +360,11 @@ Then under /dev/cgroup you can find a tree that corresponds to the
351tree of the cgroups in the system. For instance, /dev/cgroup 360tree of the cgroups in the system. For instance, /dev/cgroup
352is the cgroup that holds the whole system. 361is the cgroup that holds the whole system.
353 362
363If you want to change the value of release_agent:
364# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent
365
366It can also be changed via remount.
367
354If you want to create a new cgroup under /dev/cgroup: 368If you want to create a new cgroup under /dev/cgroup:
355# cd /dev/cgroup 369# cd /dev/cgroup
356# mkdir my_cgroup 370# mkdir my_cgroup
@@ -478,11 +492,13 @@ cgroup->parent is still valid. (Note - can also be called for a
478newly-created cgroup if an error occurs after this subsystem's 492newly-created cgroup if an error occurs after this subsystem's
479create() method has been called for the new cgroup). 493create() method has been called for the new cgroup).
480 494
481void pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp); 495int pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
482 496
483Called before checking the reference count on each subsystem. This may 497Called before checking the reference count on each subsystem. This may
484be useful for subsystems which have some extra references even if 498be useful for subsystems which have some extra references even if
485there are not tasks in the cgroup. 499there are not tasks in the cgroup. If pre_destroy() returns error code,
500rmdir() will fail with it. From this behavior, pre_destroy() can be
501called multiple times against a cgroup.
486 502
487int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, 503int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
488 struct task_struct *task) 504 struct task_struct *task)
@@ -523,7 +539,7 @@ always handled well.
523void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) 539void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
524(cgroup_mutex held by caller) 540(cgroup_mutex held by caller)
525 541
526Called at the end of cgroup_clone() to do any paramater 542Called at the end of cgroup_clone() to do any parameter
527initialization which might be required before a task could attach. For 543initialization which might be required before a task could attach. For
528example in cpusets, no task may attach before 'cpus' and 'mems' are set 544example in cpusets, no task may attach before 'cpus' and 'mems' are set
529up. 545up.
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index 5c86c258c791..f9ca389dddf4 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -131,7 +131,7 @@ Cpusets extends these two mechanisms as follows:
131 - The hierarchy of cpusets can be mounted at /dev/cpuset, for 131 - The hierarchy of cpusets can be mounted at /dev/cpuset, for
132 browsing and manipulation from user space. 132 browsing and manipulation from user space.
133 - A cpuset may be marked exclusive, which ensures that no other 133 - A cpuset may be marked exclusive, which ensures that no other
134 cpuset (except direct ancestors and descendents) may contain 134 cpuset (except direct ancestors and descendants) may contain
135 any overlapping CPUs or Memory Nodes. 135 any overlapping CPUs or Memory Nodes.
136 - You can list all the tasks (by pid) attached to any cpuset. 136 - You can list all the tasks (by pid) attached to any cpuset.
137 137
@@ -142,7 +142,7 @@ into the rest of the kernel, none in performance critical paths:
142 - in fork and exit, to attach and detach a task from its cpuset. 142 - in fork and exit, to attach and detach a task from its cpuset.
143 - in sched_setaffinity, to mask the requested CPUs by what's 143 - in sched_setaffinity, to mask the requested CPUs by what's
144 allowed in that tasks cpuset. 144 allowed in that tasks cpuset.
145 - in sched.c migrate_all_tasks(), to keep migrating tasks within 145 - in sched.c migrate_live_tasks(), to keep migrating tasks within
146 the CPUs allowed by their cpuset, if possible. 146 the CPUs allowed by their cpuset, if possible.
147 - in the mbind and set_mempolicy system calls, to mask the requested 147 - in the mbind and set_mempolicy system calls, to mask the requested
148 Memory Nodes by what's allowed in that tasks cpuset. 148 Memory Nodes by what's allowed in that tasks cpuset.
@@ -175,6 +175,10 @@ files describing that cpuset:
175 - mem_exclusive flag: is memory placement exclusive? 175 - mem_exclusive flag: is memory placement exclusive?
176 - mem_hardwall flag: is memory allocation hardwalled 176 - mem_hardwall flag: is memory allocation hardwalled
177 - memory_pressure: measure of how much paging pressure in cpuset 177 - memory_pressure: measure of how much paging pressure in cpuset
178 - memory_spread_page flag: if set, spread page cache evenly on allowed nodes
179 - memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes
180 - sched_load_balance flag: if set, load balance within CPUs on that cpuset
181 - sched_relax_domain_level: the searching range when migrating tasks
178 182
179In addition, the root cpuset only has the following file: 183In addition, the root cpuset only has the following file:
180 - memory_pressure_enabled flag: compute memory_pressure? 184 - memory_pressure_enabled flag: compute memory_pressure?
@@ -222,7 +226,7 @@ nodes with memory--using the cpuset_track_online_nodes() hook.
222-------------------------------- 226--------------------------------
223 227
224If a cpuset is cpu or mem exclusive, no other cpuset, other than 228If a cpuset is cpu or mem exclusive, no other cpuset, other than
225a direct ancestor or descendent, may share any of the same CPUs or 229a direct ancestor or descendant, may share any of the same CPUs or
226Memory Nodes. 230Memory Nodes.
227 231
228A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled", 232A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
@@ -252,7 +256,7 @@ is causing.
252 256
253This is useful both on tightly managed systems running a wide mix of 257This is useful both on tightly managed systems running a wide mix of
254submitted jobs, which may choose to terminate or re-prioritize jobs that 258submitted jobs, which may choose to terminate or re-prioritize jobs that
255are trying to use more memory than allowed on the nodes assigned them, 259are trying to use more memory than allowed on the nodes assigned to them,
256and with tightly coupled, long running, massively parallel scientific 260and with tightly coupled, long running, massively parallel scientific
257computing jobs that will dramatically fail to meet required performance 261computing jobs that will dramatically fail to meet required performance
258goals if they start to use more memory than allowed to them. 262goals if they start to use more memory than allowed to them.
@@ -378,7 +382,7 @@ as cpusets and sched_setaffinity.
378The algorithmic cost of load balancing and its impact on key shared 382The algorithmic cost of load balancing and its impact on key shared
379kernel data structures such as the task list increases more than 383kernel data structures such as the task list increases more than
380linearly with the number of CPUs being balanced. So the scheduler 384linearly with the number of CPUs being balanced. So the scheduler
381has support to partition the systems CPUs into a number of sched 385has support to partition the systems CPUs into a number of sched
382domains such that it only load balances within each sched domain. 386domains such that it only load balances within each sched domain.
383Each sched domain covers some subset of the CPUs in the system; 387Each sched domain covers some subset of the CPUs in the system;
384no two sched domains overlap; some CPUs might not be in any sched 388no two sched domains overlap; some CPUs might not be in any sched
@@ -423,7 +427,7 @@ child cpusets have this flag enabled.
423When doing this, you don't usually want to leave any unpinned tasks in 427When doing this, you don't usually want to leave any unpinned tasks in
424the top cpuset that might use non-trivial amounts of CPU, as such tasks 428the top cpuset that might use non-trivial amounts of CPU, as such tasks
425may be artificially constrained to some subset of CPUs, depending on 429may be artificially constrained to some subset of CPUs, depending on
426the particulars of this flag setting in descendent cpusets. Even if 430the particulars of this flag setting in descendant cpusets. Even if
427such a task could use spare CPU cycles in some other CPUs, the kernel 431such a task could use spare CPU cycles in some other CPUs, the kernel
428scheduler might not consider the possibility of load balancing that 432scheduler might not consider the possibility of load balancing that
429task to that underused CPU. 433task to that underused CPU.
@@ -485,17 +489,22 @@ of CPUs allowed to a cpuset having 'sched_load_balance' enabled.
485The internal kernel cpuset to scheduler interface passes from the 489The internal kernel cpuset to scheduler interface passes from the
486cpuset code to the scheduler code a partition of the load balanced 490cpuset code to the scheduler code a partition of the load balanced
487CPUs in the system. This partition is a set of subsets (represented 491CPUs in the system. This partition is a set of subsets (represented
488as an array of cpumask_t) of CPUs, pairwise disjoint, that cover all 492as an array of struct cpumask) of CPUs, pairwise disjoint, that cover
489the CPUs that must be load balanced. 493all the CPUs that must be load balanced.
490 494
491Whenever the 'sched_load_balance' flag changes, or CPUs come or go 495The cpuset code builds a new such partition and passes it to the
492from a cpuset with this flag enabled, or a cpuset with this flag 496scheduler sched domain setup code, to have the sched domains rebuilt
493enabled is removed, the cpuset code builds a new such partition and 497as necessary, whenever:
494passes it to the scheduler sched domain setup code, to have the sched 498 - the 'sched_load_balance' flag of a cpuset with non-empty CPUs changes,
495domains rebuilt as necessary. 499 - or CPUs come or go from a cpuset with this flag enabled,
500 - or 'sched_relax_domain_level' value of a cpuset with non-empty CPUs
501 and with this flag enabled changes,
502 - or a cpuset with non-empty CPUs and with this flag enabled is removed,
503 - or a cpu is offlined/onlined.
496 504
497This partition exactly defines what sched domains the scheduler should 505This partition exactly defines what sched domains the scheduler should
498setup - one sched domain for each element (cpumask_t) in the partition. 506setup - one sched domain for each element (struct cpumask) in the
507partition.
499 508
500The scheduler remembers the currently active sched domain partitions. 509The scheduler remembers the currently active sched domain partitions.
501When the scheduler routine partition_sched_domains() is invoked from 510When the scheduler routine partition_sched_domains() is invoked from
@@ -522,9 +531,9 @@ be idle.
522 531
523Of course it takes some searching cost to find movable tasks and/or 532Of course it takes some searching cost to find movable tasks and/or
524idle CPUs, the scheduler might not search all CPUs in the domain 533idle CPUs, the scheduler might not search all CPUs in the domain
525everytime. In fact, in some architectures, the searching ranges on 534every time. In fact, in some architectures, the searching ranges on
526events are limited in the same socket or node where the CPU locates, 535events are limited in the same socket or node where the CPU locates,
527while the load balance on tick searchs all. 536while the load balance on tick searches all.
528 537
529For example, assume CPU Z is relatively far from CPU X. Even if CPU Z 538For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
530is idle while CPU X and the siblings are busy, scheduler can't migrate 539is idle while CPU X and the siblings are busy, scheduler can't migrate
@@ -559,7 +568,7 @@ domain, the largest value among those is used. Be careful, if one
559requests 0 and others are -1 then 0 is used. 568requests 0 and others are -1 then 0 is used.
560 569
561Note that modifying this file will have both good and bad effects, 570Note that modifying this file will have both good and bad effects,
562and whether it is acceptable or not will be depend on your situation. 571and whether it is acceptable or not depends on your situation.
563Don't modify this file if you are not sure. 572Don't modify this file if you are not sure.
564 573
565If your situation is: 574If your situation is:
@@ -592,7 +601,7 @@ its new cpuset, then the task will continue to use whatever subset
592of MPOL_BIND nodes are still allowed in the new cpuset. If the task 601of MPOL_BIND nodes are still allowed in the new cpuset. If the task
593was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed 602was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
594in the new cpuset, then the task will be essentially treated as if it 603in the new cpuset, then the task will be essentially treated as if it
595was MPOL_BIND bound to the new cpuset (even though its numa placement, 604was MPOL_BIND bound to the new cpuset (even though its NUMA placement,
596as queried by get_mempolicy(), doesn't change). If a task is moved 605as queried by get_mempolicy(), doesn't change). If a task is moved
597from one cpuset to another, then the kernel will adjust the tasks 606from one cpuset to another, then the kernel will adjust the tasks
598memory placement, as above, the next time that the kernel attempts 607memory placement, as above, the next time that the kernel attempts
@@ -600,19 +609,15 @@ to allocate a page of memory for that task.
600 609
601If a cpuset has its 'cpus' modified, then each task in that cpuset 610If a cpuset has its 'cpus' modified, then each task in that cpuset
602will have its allowed CPU placement changed immediately. Similarly, 611will have its allowed CPU placement changed immediately. Similarly,
603if a tasks pid is written to a cpusets 'tasks' file, in either its 612if a tasks pid is written to another cpusets 'tasks' file, then its
604current cpuset or another cpuset, then its allowed CPU placement is 613allowed CPU placement is changed immediately. If such a task had been
605changed immediately. If such a task had been bound to some subset 614bound to some subset of its cpuset using the sched_setaffinity() call,
606of its cpuset using the sched_setaffinity() call, the task will be 615the task will be allowed to run on any CPU allowed in its new cpuset,
607allowed to run on any CPU allowed in its new cpuset, negating the 616negating the effect of the prior sched_setaffinity() call.
608affect of the prior sched_setaffinity() call.
609 617
610In summary, the memory placement of a task whose cpuset is changed is 618In summary, the memory placement of a task whose cpuset is changed is
611updated by the kernel, on the next allocation of a page for that task, 619updated by the kernel, on the next allocation of a page for that task,
612but the processor placement is not updated, until that tasks pid is 620and the processor placement is updated immediately.
613rewritten to the 'tasks' file of its cpuset. This is done to avoid
614impacting the scheduler code in the kernel with a check for changes
615in a tasks processor placement.
616 621
617Normally, once a page is allocated (given a physical page 622Normally, once a page is allocated (given a physical page
618of main memory) then that page stays on whatever node it 623of main memory) then that page stays on whatever node it
@@ -681,10 +686,14 @@ and then start a subshell 'sh' in that cpuset:
681 # The next line should display '/Charlie' 686 # The next line should display '/Charlie'
682 cat /proc/self/cpuset 687 cat /proc/self/cpuset
683 688
684In the future, a C library interface to cpusets will likely be 689There are ways to query or modify cpusets:
685available. For now, the only way to query or modify cpusets is 690 - via the cpuset file system directly, using the various cd, mkdir, echo,
686via the cpuset file system, using the various cd, mkdir, echo, cat, 691 cat, rmdir commands from the shell, or their equivalent from C.
687rmdir commands from the shell, or their equivalent from C. 692 - via the C library libcpuset.
693 - via the C library libcgroup.
694 (http://sourceforge.net/proects/libcg/)
695 - via the python application cset.
696 (http://developer.novell.com/wiki/index.php/Cpuset)
688 697
689The sched_setaffinity calls can also be done at the shell prompt using 698The sched_setaffinity calls can also be done at the shell prompt using
690SGI's runon or Robert Love's taskset. The mbind and set_mempolicy 699SGI's runon or Robert Love's taskset. The mbind and set_mempolicy
@@ -756,7 +765,7 @@ mount -t cpuset X /dev/cpuset
756 765
757is equivalent to 766is equivalent to
758 767
759mount -t cgroup -ocpuset X /dev/cpuset 768mount -t cgroup -ocpuset,noprefix X /dev/cpuset
760echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent 769echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
761 770
7622.2 Adding/removing cpus 7712.2 Adding/removing cpus
diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt
index 7cc6e6a60672..57ca4c89fe5c 100644
--- a/Documentation/cgroups/devices.txt
+++ b/Documentation/cgroups/devices.txt
@@ -42,7 +42,7 @@ suffice, but we can decide the best way to adequately restrict
42movement as people get some experience with this. We may just want 42movement as people get some experience with this. We may just want
43to require CAP_SYS_ADMIN, which at least is a separate bit from 43to require CAP_SYS_ADMIN, which at least is a separate bit from
44CAP_MKNOD. We may want to just refuse moving to a cgroup which 44CAP_MKNOD. We may want to just refuse moving to a cgroup which
45isn't a descendent of the current one. Or we may want to use 45isn't a descendant of the current one. Or we may want to use
46CAP_MAC_ADMIN, since we really are trying to lock down root. 46CAP_MAC_ADMIN, since we really are trying to lock down root.
47 47
48CAP_SYS_ADMIN is needed to modify the whitelist or move another 48CAP_SYS_ADMIN is needed to modify the whitelist or move another
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt
index 523a9c16c400..72db89ed0609 100644
--- a/Documentation/cgroups/memcg_test.txt
+++ b/Documentation/cgroups/memcg_test.txt
@@ -1,5 +1,5 @@
1Memory Resource Controller(Memcg) Implementation Memo. 1Memory Resource Controller(Memcg) Implementation Memo.
2Last Updated: 2009/1/19 2Last Updated: 2009/1/20
3Base Kernel Version: based on 2.6.29-rc2. 3Base Kernel Version: based on 2.6.29-rc2.
4 4
5Because VM is getting complex (one of reasons is memcg...), memcg's behavior 5Because VM is getting complex (one of reasons is memcg...), memcg's behavior
@@ -356,7 +356,25 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
356 (Shell-B) 356 (Shell-B)
357 # move all tasks in /cgroup/test to /cgroup 357 # move all tasks in /cgroup/test to /cgroup
358 # /sbin/swapoff -a 358 # /sbin/swapoff -a
359 # rmdir /test/cgroup 359 # rmdir /cgroup/test
360 # kill malloc task. 360 # kill malloc task.
361 361
362 Of course, tmpfs v.s. swapoff test should be tested, too. 362 Of course, tmpfs v.s. swapoff test should be tested, too.
363
364 9.8 OOM-Killer
365 Out-of-memory caused by memcg's limit will kill tasks under
366 the memcg. When hierarchy is used, a task under hierarchy
367 will be killed by the kernel.
368 In this case, panic_on_oom shouldn't be invoked and tasks
369 in other groups shouldn't be killed.
370
371 It's not difficult to cause OOM under memcg as following.
372 Case A) when you can swapoff
373 #swapoff -a
374 #echo 50M > /memory.limit_in_bytes
375 run 51M of malloc
376
377 Case B) when you use mem+swap limitation.
378 #echo 50M > memory.limit_in_bytes
379 #echo 50M > memory.memsw.limit_in_bytes
380 run 51M of malloc
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index e1501964df1e..a98a7fe7aabb 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -302,7 +302,7 @@ will be charged as a new owner of it.
302 unevictable - # of pages cannot be reclaimed.(mlocked etc) 302 unevictable - # of pages cannot be reclaimed.(mlocked etc)
303 303
304 Below is depend on CONFIG_DEBUG_VM. 304 Below is depend on CONFIG_DEBUG_VM.
305 inactive_ratio - VM inernal parameter. (see mm/page_alloc.c) 305 inactive_ratio - VM internal parameter. (see mm/page_alloc.c)
306 recent_rotated_anon - VM internal parameter. (see mm/vmscan.c) 306 recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
307 recent_rotated_file - VM internal parameter. (see mm/vmscan.c) 307 recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
308 recent_scanned_anon - VM internal parameter. (see mm/vmscan.c) 308 recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)