diff options
Diffstat (limited to 'Documentation/cgroups/memory.txt')
-rw-r--r-- | Documentation/cgroups/memory.txt | 58 |
1 files changed, 39 insertions, 19 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 7c163477fcd8..06eb6d957c83 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -1,8 +1,8 @@ | |||
1 | Memory Resource Controller | 1 | Memory Resource Controller |
2 | 2 | ||
3 | NOTE: The Memory Resource Controller has been generically been referred | 3 | NOTE: The Memory Resource Controller has generically been referred to as the |
4 | to as the memory controller in this document. Do not confuse memory | 4 | memory controller in this document. Do not confuse memory controller |
5 | controller used here with the memory controller that is used in hardware. | 5 | used here with the memory controller that is used in hardware. |
6 | 6 | ||
7 | (For editors) | 7 | (For editors) |
8 | In this document: | 8 | In this document: |
@@ -70,6 +70,7 @@ Brief summary of control files. | |||
70 | (See sysctl's vm.swappiness) | 70 | (See sysctl's vm.swappiness) |
71 | memory.move_charge_at_immigrate # set/show controls of moving charges | 71 | memory.move_charge_at_immigrate # set/show controls of moving charges |
72 | memory.oom_control # set/show oom controls. | 72 | memory.oom_control # set/show oom controls. |
73 | memory.numa_stat # show the number of memory usage per numa node | ||
73 | 74 | ||
74 | 1. History | 75 | 1. History |
75 | 76 | ||
@@ -181,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared | |||
181 | page will eventually get charged for it (once it is uncharged from | 182 | page will eventually get charged for it (once it is uncharged from |
182 | the cgroup that brought it in -- this will happen on memory pressure). | 183 | the cgroup that brought it in -- this will happen on memory pressure). |
183 | 184 | ||
184 | Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. | 185 | Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. |
185 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to | 186 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to |
186 | be backed into memory in force, charges for pages are accounted against the | 187 | be backed into memory in force, charges for pages are accounted against the |
187 | caller of swapoff rather than the users of shmem. | 188 | caller of swapoff rather than the users of shmem. |
@@ -213,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from | |||
213 | OS point of view. | 214 | OS point of view. |
214 | 215 | ||
215 | * What happens when a cgroup hits memory.memsw.limit_in_bytes | 216 | * What happens when a cgroup hits memory.memsw.limit_in_bytes |
216 | When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out | 217 | When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out |
217 | in this cgroup. Then, swap-out will not be done by cgroup routine and file | 218 | in this cgroup. Then, swap-out will not be done by cgroup routine and file |
218 | caches are dropped. But as mentioned above, global LRU can do swapout memory | 219 | caches are dropped. But as mentioned above, global LRU can do swapout memory |
219 | from it for sanity of the system's memory management state. You can't forbid | 220 | from it for sanity of the system's memory management state. You can't forbid |
@@ -263,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS | |||
263 | c. Enable CONFIG_CGROUP_MEM_RES_CTLR | 264 | c. Enable CONFIG_CGROUP_MEM_RES_CTLR |
264 | d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) | 265 | d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) |
265 | 266 | ||
266 | 1. Prepare the cgroups | 267 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
267 | # mkdir -p /cgroups | 268 | # mount -t tmpfs none /sys/fs/cgroup |
268 | # mount -t cgroup none /cgroups -o memory | 269 | # mkdir /sys/fs/cgroup/memory |
270 | # mount -t cgroup none /sys/fs/cgroup/memory -o memory | ||
269 | 271 | ||
270 | 2. Make the new group and move bash into it | 272 | 2. Make the new group and move bash into it |
271 | # mkdir /cgroups/0 | 273 | # mkdir /sys/fs/cgroup/memory/0 |
272 | # echo $$ > /cgroups/0/tasks | 274 | # echo $$ > /sys/fs/cgroup/memory/0/tasks |
273 | 275 | ||
274 | Since now we're in the 0 cgroup, we can alter the memory limit: | 276 | Since now we're in the 0 cgroup, we can alter the memory limit: |
275 | # echo 4M > /cgroups/0/memory.limit_in_bytes | 277 | # echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
276 | 278 | ||
277 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, | 279 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, |
278 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) | 280 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) |
@@ -280,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) | |||
280 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). | 282 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). |
281 | NOTE: We cannot set limits on the root cgroup any more. | 283 | NOTE: We cannot set limits on the root cgroup any more. |
282 | 284 | ||
283 | # cat /cgroups/0/memory.limit_in_bytes | 285 | # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
284 | 4194304 | 286 | 4194304 |
285 | 287 | ||
286 | We can check the usage: | 288 | We can check the usage: |
287 | # cat /cgroups/0/memory.usage_in_bytes | 289 | # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes |
288 | 1216512 | 290 | 1216512 |
289 | 291 | ||
290 | A successful write to this file does not guarantee a successful set of | 292 | A successful write to this file does not guarantee a successful set of |
@@ -464,6 +466,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.) | |||
464 | If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) | 466 | If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) |
465 | value in memory.stat(see 5.2). | 467 | value in memory.stat(see 5.2). |
466 | 468 | ||
469 | 5.6 numa_stat | ||
470 | |||
471 | This is similar to numa_maps but operates on a per-memcg basis. This is | ||
472 | useful for providing visibility into the numa locality information within | ||
473 | an memcg since the pages are allowed to be allocated from any physical | ||
474 | node. One of the usecases is evaluating application performance by | ||
475 | combining this information with the application's cpu allocation. | ||
476 | |||
477 | We export "total", "file", "anon" and "unevictable" pages per-node for | ||
478 | each memcg. The ouput format of memory.numa_stat is: | ||
479 | |||
480 | total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
481 | file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
482 | anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
483 | unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
484 | |||
485 | And we have total = file + anon + unevictable. | ||
486 | |||
467 | 6. Hierarchy support | 487 | 6. Hierarchy support |
468 | 488 | ||
469 | The memory controller supports a deep hierarchy and hierarchical accounting. | 489 | The memory controller supports a deep hierarchy and hierarchical accounting. |
@@ -471,13 +491,13 @@ The hierarchy is created by creating the appropriate cgroups in the | |||
471 | cgroup filesystem. Consider for example, the following cgroup filesystem | 491 | cgroup filesystem. Consider for example, the following cgroup filesystem |
472 | hierarchy | 492 | hierarchy |
473 | 493 | ||
474 | root | 494 | root |
475 | / | \ | 495 | / | \ |
476 | / | \ | 496 | / | \ |
477 | a b c | 497 | a b c |
478 | | \ | 498 | | \ |
479 | | \ | 499 | | \ |
480 | d e | 500 | d e |
481 | 501 | ||
482 | In the diagram above, with hierarchical accounting enabled, all memory | 502 | In the diagram above, with hierarchical accounting enabled, all memory |
483 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), | 503 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), |