diff options
Diffstat (limited to 'Documentation/cgroups/memory.txt')
-rw-r--r-- | Documentation/cgroups/memory.txt | 143 |
1 files changed, 123 insertions, 20 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 7c163477fcd8..6f3c598971fc 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -1,8 +1,8 @@ | |||
1 | Memory Resource Controller | 1 | Memory Resource Controller |
2 | 2 | ||
3 | NOTE: The Memory Resource Controller has been generically been referred | 3 | NOTE: The Memory Resource Controller has generically been referred to as the |
4 | to as the memory controller in this document. Do not confuse memory | 4 | memory controller in this document. Do not confuse memory controller |
5 | controller used here with the memory controller that is used in hardware. | 5 | used here with the memory controller that is used in hardware. |
6 | 6 | ||
7 | (For editors) | 7 | (For editors) |
8 | In this document: | 8 | In this document: |
@@ -70,6 +70,7 @@ Brief summary of control files. | |||
70 | (See sysctl's vm.swappiness) | 70 | (See sysctl's vm.swappiness) |
71 | memory.move_charge_at_immigrate # set/show controls of moving charges | 71 | memory.move_charge_at_immigrate # set/show controls of moving charges |
72 | memory.oom_control # set/show oom controls. | 72 | memory.oom_control # set/show oom controls. |
73 | memory.numa_stat # show the number of memory usage per numa node | ||
73 | 74 | ||
74 | 1. History | 75 | 1. History |
75 | 76 | ||
@@ -181,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared | |||
181 | page will eventually get charged for it (once it is uncharged from | 182 | page will eventually get charged for it (once it is uncharged from |
182 | the cgroup that brought it in -- this will happen on memory pressure). | 183 | the cgroup that brought it in -- this will happen on memory pressure). |
183 | 184 | ||
184 | Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. | 185 | Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. |
185 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to | 186 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to |
186 | be backed into memory in force, charges for pages are accounted against the | 187 | be backed into memory in force, charges for pages are accounted against the |
187 | caller of swapoff rather than the users of shmem. | 188 | caller of swapoff rather than the users of shmem. |
@@ -213,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from | |||
213 | OS point of view. | 214 | OS point of view. |
214 | 215 | ||
215 | * What happens when a cgroup hits memory.memsw.limit_in_bytes | 216 | * What happens when a cgroup hits memory.memsw.limit_in_bytes |
216 | When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out | 217 | When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out |
217 | in this cgroup. Then, swap-out will not be done by cgroup routine and file | 218 | in this cgroup. Then, swap-out will not be done by cgroup routine and file |
218 | caches are dropped. But as mentioned above, global LRU can do swapout memory | 219 | caches are dropped. But as mentioned above, global LRU can do swapout memory |
219 | from it for sanity of the system's memory management state. You can't forbid | 220 | from it for sanity of the system's memory management state. You can't forbid |
@@ -263,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS | |||
263 | c. Enable CONFIG_CGROUP_MEM_RES_CTLR | 264 | c. Enable CONFIG_CGROUP_MEM_RES_CTLR |
264 | d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) | 265 | d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) |
265 | 266 | ||
266 | 1. Prepare the cgroups | 267 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
267 | # mkdir -p /cgroups | 268 | # mount -t tmpfs none /sys/fs/cgroup |
268 | # mount -t cgroup none /cgroups -o memory | 269 | # mkdir /sys/fs/cgroup/memory |
270 | # mount -t cgroup none /sys/fs/cgroup/memory -o memory | ||
269 | 271 | ||
270 | 2. Make the new group and move bash into it | 272 | 2. Make the new group and move bash into it |
271 | # mkdir /cgroups/0 | 273 | # mkdir /sys/fs/cgroup/memory/0 |
272 | # echo $$ > /cgroups/0/tasks | 274 | # echo $$ > /sys/fs/cgroup/memory/0/tasks |
273 | 275 | ||
274 | Since now we're in the 0 cgroup, we can alter the memory limit: | 276 | Since now we're in the 0 cgroup, we can alter the memory limit: |
275 | # echo 4M > /cgroups/0/memory.limit_in_bytes | 277 | # echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
276 | 278 | ||
277 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, | 279 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, |
278 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) | 280 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) |
@@ -280,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) | |||
280 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). | 282 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). |
281 | NOTE: We cannot set limits on the root cgroup any more. | 283 | NOTE: We cannot set limits on the root cgroup any more. |
282 | 284 | ||
283 | # cat /cgroups/0/memory.limit_in_bytes | 285 | # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
284 | 4194304 | 286 | 4194304 |
285 | 287 | ||
286 | We can check the usage: | 288 | We can check the usage: |
287 | # cat /cgroups/0/memory.usage_in_bytes | 289 | # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes |
288 | 1216512 | 290 | 1216512 |
289 | 291 | ||
290 | A successful write to this file does not guarantee a successful set of | 292 | A successful write to this file does not guarantee a successful set of |
@@ -378,7 +380,7 @@ will be charged as a new owner of it. | |||
378 | 380 | ||
379 | 5.2 stat file | 381 | 5.2 stat file |
380 | 382 | ||
381 | memory.stat file includes following statistics | 383 | 5.2.1 memory.stat file includes following statistics |
382 | 384 | ||
383 | # per-memory cgroup local status | 385 | # per-memory cgroup local status |
384 | cache - # of bytes of page cache memory. | 386 | cache - # of bytes of page cache memory. |
@@ -436,6 +438,89 @@ Note: | |||
436 | file_mapped is accounted only when the memory cgroup is owner of page | 438 | file_mapped is accounted only when the memory cgroup is owner of page |
437 | cache.) | 439 | cache.) |
438 | 440 | ||
441 | 5.2.2 memory.vmscan_stat | ||
442 | |||
443 | memory.vmscan_stat includes statistics information for memory scanning and | ||
444 | freeing, reclaiming. The statistics shows memory scanning information since | ||
445 | memory cgroup creation and can be reset to 0 by writing 0 as | ||
446 | |||
447 | #echo 0 > ../memory.vmscan_stat | ||
448 | |||
449 | This file contains following statistics. | ||
450 | |||
451 | [param]_[file_or_anon]_pages_by_[reason]_[under_heararchy] | ||
452 | [param]_elapsed_ns_by_[reason]_[under_hierarchy] | ||
453 | |||
454 | For example, | ||
455 | |||
456 | scanned_file_pages_by_limit indicates the number of scanned | ||
457 | file pages at vmscan. | ||
458 | |||
459 | Now, 3 parameters are supported | ||
460 | |||
461 | scanned - the number of pages scanned by vmscan | ||
462 | rotated - the number of pages activated at vmscan | ||
463 | freed - the number of pages freed by vmscan | ||
464 | |||
465 | If "rotated" is high against scanned/freed, the memcg seems busy. | ||
466 | |||
467 | Now, 2 reason are supported | ||
468 | |||
469 | limit - the memory cgroup's limit | ||
470 | system - global memory pressure + softlimit | ||
471 | (global memory pressure not under softlimit is not handled now) | ||
472 | |||
473 | When under_hierarchy is added in the tail, the number indicates the | ||
474 | total memcg scan of its children and itself. | ||
475 | |||
476 | elapsed_ns is a elapsed time in nanosecond. This may include sleep time | ||
477 | and not indicates CPU usage. So, please take this as just showing | ||
478 | latency. | ||
479 | |||
480 | Here is an example. | ||
481 | |||
482 | # cat /cgroup/memory/A/memory.vmscan_stat | ||
483 | scanned_pages_by_limit 9471864 | ||
484 | scanned_anon_pages_by_limit 6640629 | ||
485 | scanned_file_pages_by_limit 2831235 | ||
486 | rotated_pages_by_limit 4243974 | ||
487 | rotated_anon_pages_by_limit 3971968 | ||
488 | rotated_file_pages_by_limit 272006 | ||
489 | freed_pages_by_limit 2318492 | ||
490 | freed_anon_pages_by_limit 962052 | ||
491 | freed_file_pages_by_limit 1356440 | ||
492 | elapsed_ns_by_limit 351386416101 | ||
493 | scanned_pages_by_system 0 | ||
494 | scanned_anon_pages_by_system 0 | ||
495 | scanned_file_pages_by_system 0 | ||
496 | rotated_pages_by_system 0 | ||
497 | rotated_anon_pages_by_system 0 | ||
498 | rotated_file_pages_by_system 0 | ||
499 | freed_pages_by_system 0 | ||
500 | freed_anon_pages_by_system 0 | ||
501 | freed_file_pages_by_system 0 | ||
502 | elapsed_ns_by_system 0 | ||
503 | scanned_pages_by_limit_under_hierarchy 9471864 | ||
504 | scanned_anon_pages_by_limit_under_hierarchy 6640629 | ||
505 | scanned_file_pages_by_limit_under_hierarchy 2831235 | ||
506 | rotated_pages_by_limit_under_hierarchy 4243974 | ||
507 | rotated_anon_pages_by_limit_under_hierarchy 3971968 | ||
508 | rotated_file_pages_by_limit_under_hierarchy 272006 | ||
509 | freed_pages_by_limit_under_hierarchy 2318492 | ||
510 | freed_anon_pages_by_limit_under_hierarchy 962052 | ||
511 | freed_file_pages_by_limit_under_hierarchy 1356440 | ||
512 | elapsed_ns_by_limit_under_hierarchy 351386416101 | ||
513 | scanned_pages_by_system_under_hierarchy 0 | ||
514 | scanned_anon_pages_by_system_under_hierarchy 0 | ||
515 | scanned_file_pages_by_system_under_hierarchy 0 | ||
516 | rotated_pages_by_system_under_hierarchy 0 | ||
517 | rotated_anon_pages_by_system_under_hierarchy 0 | ||
518 | rotated_file_pages_by_system_under_hierarchy 0 | ||
519 | freed_pages_by_system_under_hierarchy 0 | ||
520 | freed_anon_pages_by_system_under_hierarchy 0 | ||
521 | freed_file_pages_by_system_under_hierarchy 0 | ||
522 | elapsed_ns_by_system_under_hierarchy 0 | ||
523 | |||
439 | 5.3 swappiness | 524 | 5.3 swappiness |
440 | 525 | ||
441 | Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only. | 526 | Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only. |
@@ -464,6 +549,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.) | |||
464 | If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) | 549 | If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) |
465 | value in memory.stat(see 5.2). | 550 | value in memory.stat(see 5.2). |
466 | 551 | ||
552 | 5.6 numa_stat | ||
553 | |||
554 | This is similar to numa_maps but operates on a per-memcg basis. This is | ||
555 | useful for providing visibility into the numa locality information within | ||
556 | an memcg since the pages are allowed to be allocated from any physical | ||
557 | node. One of the usecases is evaluating application performance by | ||
558 | combining this information with the application's cpu allocation. | ||
559 | |||
560 | We export "total", "file", "anon" and "unevictable" pages per-node for | ||
561 | each memcg. The ouput format of memory.numa_stat is: | ||
562 | |||
563 | total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
564 | file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
565 | anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
566 | unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | ||
567 | |||
568 | And we have total = file + anon + unevictable. | ||
569 | |||
467 | 6. Hierarchy support | 570 | 6. Hierarchy support |
468 | 571 | ||
469 | The memory controller supports a deep hierarchy and hierarchical accounting. | 572 | The memory controller supports a deep hierarchy and hierarchical accounting. |
@@ -471,13 +574,13 @@ The hierarchy is created by creating the appropriate cgroups in the | |||
471 | cgroup filesystem. Consider for example, the following cgroup filesystem | 574 | cgroup filesystem. Consider for example, the following cgroup filesystem |
472 | hierarchy | 575 | hierarchy |
473 | 576 | ||
474 | root | 577 | root |
475 | / | \ | 578 | / | \ |
476 | / | \ | 579 | / | \ |
477 | a b c | 580 | a b c |
478 | | \ | 581 | | \ |
479 | | \ | 582 | | \ |
480 | d e | 583 | d e |
481 | 584 | ||
482 | In the diagram above, with hierarchical accounting enabled, all memory | 585 | In the diagram above, with hierarchical accounting enabled, all memory |
483 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), | 586 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), |