diff options
Diffstat (limited to 'Documentation/cgroups/memory.txt')
-rw-r--r-- | Documentation/cgroups/memory.txt | 78 |
1 files changed, 55 insertions, 23 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 7781857dc940..06eb6d957c83 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -1,8 +1,8 @@ | |||
1 | Memory Resource Controller | 1 | Memory Resource Controller |
2 | 2 | ||
3 | NOTE: The Memory Resource Controller has been generically been referred | 3 | NOTE: The Memory Resource Controller has generically been referred to as the |
4 | to as the memory controller in this document. Do not confuse memory | 4 | memory controller in this document. Do not confuse memory controller |
5 | controller used here with the memory controller that is used in hardware. | 5 | used here with the memory controller that is used in hardware. |
6 | 6 | ||
7 | (For editors) | 7 | (For editors) |
8 | In this document: | 8 | In this document: |
@@ -52,8 +52,10 @@ Brief summary of control files. | |||
52 | tasks # attach a task(thread) and show list of threads | 52 | tasks # attach a task(thread) and show list of threads |
53 | cgroup.procs # show list of processes | 53 | cgroup.procs # show list of processes |
54 | cgroup.event_control # an interface for event_fd() | 54 | cgroup.event_control # an interface for event_fd() |
55 | memory.usage_in_bytes # show current memory(RSS+Cache) usage. | 55 | memory.usage_in_bytes # show current res_counter usage for memory |
56 | memory.memsw.usage_in_bytes # show current memory+Swap usage | 56 | (See 5.5 for details) |
57 | memory.memsw.usage_in_bytes # show current res_counter usage for memory+Swap | ||
58 | (See 5.5 for details) | ||
57 | memory.limit_in_bytes # set/show limit of memory usage | 59 | memory.limit_in_bytes # set/show limit of memory usage |
58 | memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage | 60 | memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage |
59 | memory.failcnt # show the number of memory usage hits limits | 61 | memory.failcnt # show the number of memory usage hits limits |
@@ -68,6 +70,7 @@ Brief summary of control files. | |||
68 | (See sysctl's vm.swappiness) | 70 | (See sysctl's vm.swappiness) |
69 | memory.move_charge_at_immigrate # set/show controls of moving charges | 71 | memory.move_charge_at_immigrate # set/show controls of moving charges |
70 | memory.oom_control # set/show oom controls. | 72 | memory.oom_control # set/show oom controls. |
73 | memory.numa_stat # show the number of memory usage per numa node | ||
71 | 74 | ||
72 | 1. History | 75 | 1. History |
73 | 76 | ||
@@ -179,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared | |||
179 | page will eventually get charged for it (once it is uncharged from | 182 | page will eventually get charged for it (once it is uncharged from |
180 | the cgroup that brought it in -- this will happen on memory pressure). | 183 | the cgroup that brought it in -- this will happen on memory pressure). |
181 | 184 | ||
182 | Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. | 185 | Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. |
183 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to | 186 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to |
184 | be backed into memory in force, charges for pages are accounted against the | 187 | be backed into memory in force, charges for pages are accounted against the |
185 | caller of swapoff rather than the users of shmem. | 188 | caller of swapoff rather than the users of shmem. |
@@ -211,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from | |||
211 | OS point of view. | 214 | OS point of view. |
212 | 215 | ||
213 | * What happens when a cgroup hits memory.memsw.limit_in_bytes | 216 | * What happens when a cgroup hits memory.memsw.limit_in_bytes |
214 | When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out | 217 | When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out |
215 | in this cgroup. Then, swap-out will not be done by cgroup routine and file | 218 | in this cgroup. Then, swap-out will not be done by cgroup routine and file |
216 | caches are dropped. But as mentioned above, global LRU can do swapout memory | 219 | caches are dropped. But as mentioned above, global LRU can do swapout memory |
217 | from it for sanity of the system's memory management state. You can't forbid | 220 | from it for sanity of the system's memory management state. You can't forbid |
@@ -261,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS | |||
261 | c. Enable CONFIG_CGROUP_MEM_RES_CTLR | 264 | c. Enable CONFIG_CGROUP_MEM_RES_CTLR |
262 | d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) | 265 | d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) |
263 | 266 | ||
264 | 1. Prepare the cgroups | 267 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
265 | # mkdir -p /cgroups | 268 | # mount -t tmpfs none /sys/fs/cgroup |
266 | # mount -t cgroup none /cgroups -o memory | 269 | # mkdir /sys/fs/cgroup/memory |
270 | # mount -t cgroup none /sys/fs/cgroup/memory -o memory | ||
267 | 271 | ||
268 | 2. Make the new group and move bash into it | 272 | 2. Make the new group and move bash into it |
269 | # mkdir /cgroups/0 | 273 | # mkdir /sys/fs/cgroup/memory/0 |
270 | # echo $$ > /cgroups/0/tasks | 274 | # echo $$ > /sys/fs/cgroup/memory/0/tasks |
271 | 275 | ||
272 | Since now we're in the 0 cgroup, we can alter the memory limit: | 276 | Since now we're in the 0 cgroup, we can alter the memory limit: |
273 | # echo 4M > /cgroups/0/memory.limit_in_bytes | 277 | # echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
274 | 278 | ||
275 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, | 279 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, |
276 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) | 280 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) |
@@ -278,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) | |||
278 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). | 282 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). |
279 | NOTE: We cannot set limits on the root cgroup any more. | 283 | NOTE: We cannot set limits on the root cgroup any more. |
280 | 284 | ||
281 | # cat /cgroups/0/memory.limit_in_bytes | 285 | # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
282 | 4194304 | 286 | 4194304 |
283 | 287 | ||
284 | We can check the usage: | 288 | We can check the usage: |
285 | # cat /cgroups/0/memory.usage_in_bytes | 289 | # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes |
286 | 1216512 | 290 | 1216512 |
287 | 291 | ||
288 | A successful write to this file does not guarantee a successful set of | 292 | A successful write to this file does not guarantee a successful set of |
@@ -453,6 +457,33 @@ memory under it will be reclaimed. | |||
453 | You can reset failcnt by writing 0 to failcnt file. | 457 | You can reset failcnt by writing 0 to failcnt file. |
454 | # echo 0 > .../memory.failcnt | 458 | # echo 0 > .../memory.failcnt |
455 | 459 | ||
460 | 5.5 usage_in_bytes | ||
461 | |||
462 | For efficiency, as other kernel components, memory cgroup uses some optimization | ||
463 | to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the | ||
464 | method and doesn't show 'exact' value of memory(and swap) usage, it's an fuzz | ||
465 | value for efficient access. (Of course, when necessary, it's synchronized.) | ||
466 | If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) | ||
467 | value in memory.stat(see 5.2). | ||
468 | |||
469 | 5.6 numa_stat | ||
470 | |||
471 | This is similar to numa_maps but operates on a per-memcg basis. This is | ||
472 | useful for providing visibility into the numa locality information within | ||
473 | an memcg since the pages are allowed to be allocated from any physical | ||
474 | node. One of the usecases is evaluating application performance by | ||
475 | combining this information with the application's cpu allocation. | ||
476 | |||
477 | We export "total", "file", "anon" and "unevictable" pages per-node for | ||
478 | each memcg. The ouput format of memory.numa_stat is: | ||
479 | |||
480 | total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
481 | file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
482 | anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
483 | unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
484 | |||
485 | And we have total = file + anon + unevictable. | ||
486 | |||
456 | 6. Hierarchy support | 487 | 6. Hierarchy support |
457 | 488 | ||
458 | The memory controller supports a deep hierarchy and hierarchical accounting. | 489 | The memory controller supports a deep hierarchy and hierarchical accounting. |
@@ -460,13 +491,13 @@ The hierarchy is created by creating the appropriate cgroups in the | |||
460 | cgroup filesystem. Consider for example, the following cgroup filesystem | 491 | cgroup filesystem. Consider for example, the following cgroup filesystem |
461 | hierarchy | 492 | hierarchy |
462 | 493 | ||
463 | root | 494 | root |
464 | / | \ | 495 | / | \ |
465 | / | \ | 496 | / | \ |
466 | a b c | 497 | a b c |
467 | | \ | 498 | | \ |
468 | | \ | 499 | | \ |
469 | d e | 500 | d e |
470 | 501 | ||
471 | In the diagram above, with hierarchical accounting enabled, all memory | 502 | In the diagram above, with hierarchical accounting enabled, all memory |
472 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), | 503 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), |
@@ -485,8 +516,9 @@ The feature can be disabled by | |||
485 | 516 | ||
486 | # echo 0 > memory.use_hierarchy | 517 | # echo 0 > memory.use_hierarchy |
487 | 518 | ||
488 | NOTE1: Enabling/disabling will fail if the cgroup already has other | 519 | NOTE1: Enabling/disabling will fail if either the cgroup already has other |
489 | cgroups created below it. | 520 | cgroups created below it, or if the parent cgroup has use_hierarchy |
521 | enabled. | ||
490 | 522 | ||
491 | NOTE2: When panic_on_oom is set to "2", the whole system will panic in | 523 | NOTE2: When panic_on_oom is set to "2", the whole system will panic in |
492 | case of an OOM event in any cgroup. | 524 | case of an OOM event in any cgroup. |