aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cgroups/memory.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/cgroups/memory.txt')
-rw-r--r--Documentation/cgroups/memory.txt143
1 files changed, 123 insertions, 20 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 7c163477fcd8..6f3c598971fc 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -1,8 +1,8 @@
1Memory Resource Controller 1Memory Resource Controller
2 2
3NOTE: The Memory Resource Controller has been generically been referred 3NOTE: The Memory Resource Controller has generically been referred to as the
4 to as the memory controller in this document. Do not confuse memory 4 memory controller in this document. Do not confuse memory controller
5 controller used here with the memory controller that is used in hardware. 5 used here with the memory controller that is used in hardware.
6 6
7(For editors) 7(For editors)
8In this document: 8In this document:
@@ -70,6 +70,7 @@ Brief summary of control files.
70 (See sysctl's vm.swappiness) 70 (See sysctl's vm.swappiness)
71 memory.move_charge_at_immigrate # set/show controls of moving charges 71 memory.move_charge_at_immigrate # set/show controls of moving charges
72 memory.oom_control # set/show oom controls. 72 memory.oom_control # set/show oom controls.
73 memory.numa_stat # show the number of memory usage per numa node
73 74
741. History 751. History
75 76
@@ -181,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared
181page will eventually get charged for it (once it is uncharged from 182page will eventually get charged for it (once it is uncharged from
182the cgroup that brought it in -- this will happen on memory pressure). 183the cgroup that brought it in -- this will happen on memory pressure).
183 184
184Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. 185Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.
185When you do swapoff and make swapped-out pages of shmem(tmpfs) to 186When you do swapoff and make swapped-out pages of shmem(tmpfs) to
186be backed into memory in force, charges for pages are accounted against the 187be backed into memory in force, charges for pages are accounted against the
187caller of swapoff rather than the users of shmem. 188caller of swapoff rather than the users of shmem.
@@ -213,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from
213OS point of view. 214OS point of view.
214 215
215* What happens when a cgroup hits memory.memsw.limit_in_bytes 216* What happens when a cgroup hits memory.memsw.limit_in_bytes
216When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out 217When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
217in this cgroup. Then, swap-out will not be done by cgroup routine and file 218in this cgroup. Then, swap-out will not be done by cgroup routine and file
218caches are dropped. But as mentioned above, global LRU can do swapout memory 219caches are dropped. But as mentioned above, global LRU can do swapout memory
219from it for sanity of the system's memory management state. You can't forbid 220from it for sanity of the system's memory management state. You can't forbid
@@ -263,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS
263c. Enable CONFIG_CGROUP_MEM_RES_CTLR 264c. Enable CONFIG_CGROUP_MEM_RES_CTLR
264d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) 265d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
265 266
2661. Prepare the cgroups 2671. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
267# mkdir -p /cgroups 268# mount -t tmpfs none /sys/fs/cgroup
268# mount -t cgroup none /cgroups -o memory 269# mkdir /sys/fs/cgroup/memory
270# mount -t cgroup none /sys/fs/cgroup/memory -o memory
269 271
2702. Make the new group and move bash into it 2722. Make the new group and move bash into it
271# mkdir /cgroups/0 273# mkdir /sys/fs/cgroup/memory/0
272# echo $$ > /cgroups/0/tasks 274# echo $$ > /sys/fs/cgroup/memory/0/tasks
273 275
274Since now we're in the 0 cgroup, we can alter the memory limit: 276Since now we're in the 0 cgroup, we can alter the memory limit:
275# echo 4M > /cgroups/0/memory.limit_in_bytes 277# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
276 278
277NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, 279NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
278mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) 280mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
@@ -280,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
280NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). 282NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
281NOTE: We cannot set limits on the root cgroup any more. 283NOTE: We cannot set limits on the root cgroup any more.
282 284
283# cat /cgroups/0/memory.limit_in_bytes 285# cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
2844194304 2864194304
285 287
286We can check the usage: 288We can check the usage:
287# cat /cgroups/0/memory.usage_in_bytes 289# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes
2881216512 2901216512
289 291
290A successful write to this file does not guarantee a successful set of 292A successful write to this file does not guarantee a successful set of
@@ -378,7 +380,7 @@ will be charged as a new owner of it.
378 380
3795.2 stat file 3815.2 stat file
380 382
381memory.stat file includes following statistics 3835.2.1 memory.stat file includes following statistics
382 384
383# per-memory cgroup local status 385# per-memory cgroup local status
384cache - # of bytes of page cache memory. 386cache - # of bytes of page cache memory.
@@ -436,6 +438,89 @@ Note:
436 file_mapped is accounted only when the memory cgroup is owner of page 438 file_mapped is accounted only when the memory cgroup is owner of page
437 cache.) 439 cache.)
438 440
4415.2.2 memory.vmscan_stat
442
443memory.vmscan_stat includes statistics information for memory scanning and
444freeing, reclaiming. The statistics shows memory scanning information since
445memory cgroup creation and can be reset to 0 by writing 0 as
446
447 #echo 0 > ../memory.vmscan_stat
448
449This file contains following statistics.
450
451[param]_[file_or_anon]_pages_by_[reason]_[under_heararchy]
452[param]_elapsed_ns_by_[reason]_[under_hierarchy]
453
454For example,
455
456 scanned_file_pages_by_limit indicates the number of scanned
457 file pages at vmscan.
458
459Now, 3 parameters are supported
460
461 scanned - the number of pages scanned by vmscan
462 rotated - the number of pages activated at vmscan
463 freed - the number of pages freed by vmscan
464
465If "rotated" is high against scanned/freed, the memcg seems busy.
466
467Now, 2 reason are supported
468
469 limit - the memory cgroup's limit
470 system - global memory pressure + softlimit
471 (global memory pressure not under softlimit is not handled now)
472
473When under_hierarchy is added in the tail, the number indicates the
474total memcg scan of its children and itself.
475
476elapsed_ns is a elapsed time in nanosecond. This may include sleep time
477and not indicates CPU usage. So, please take this as just showing
478latency.
479
480Here is an example.
481
482# cat /cgroup/memory/A/memory.vmscan_stat
483scanned_pages_by_limit 9471864
484scanned_anon_pages_by_limit 6640629
485scanned_file_pages_by_limit 2831235
486rotated_pages_by_limit 4243974
487rotated_anon_pages_by_limit 3971968
488rotated_file_pages_by_limit 272006
489freed_pages_by_limit 2318492
490freed_anon_pages_by_limit 962052
491freed_file_pages_by_limit 1356440
492elapsed_ns_by_limit 351386416101
493scanned_pages_by_system 0
494scanned_anon_pages_by_system 0
495scanned_file_pages_by_system 0
496rotated_pages_by_system 0
497rotated_anon_pages_by_system 0
498rotated_file_pages_by_system 0
499freed_pages_by_system 0
500freed_anon_pages_by_system 0
501freed_file_pages_by_system 0
502elapsed_ns_by_system 0
503scanned_pages_by_limit_under_hierarchy 9471864
504scanned_anon_pages_by_limit_under_hierarchy 6640629
505scanned_file_pages_by_limit_under_hierarchy 2831235
506rotated_pages_by_limit_under_hierarchy 4243974
507rotated_anon_pages_by_limit_under_hierarchy 3971968
508rotated_file_pages_by_limit_under_hierarchy 272006
509freed_pages_by_limit_under_hierarchy 2318492
510freed_anon_pages_by_limit_under_hierarchy 962052
511freed_file_pages_by_limit_under_hierarchy 1356440
512elapsed_ns_by_limit_under_hierarchy 351386416101
513scanned_pages_by_system_under_hierarchy 0
514scanned_anon_pages_by_system_under_hierarchy 0
515scanned_file_pages_by_system_under_hierarchy 0
516rotated_pages_by_system_under_hierarchy 0
517rotated_anon_pages_by_system_under_hierarchy 0
518rotated_file_pages_by_system_under_hierarchy 0
519freed_pages_by_system_under_hierarchy 0
520freed_anon_pages_by_system_under_hierarchy 0
521freed_file_pages_by_system_under_hierarchy 0
522elapsed_ns_by_system_under_hierarchy 0
523
4395.3 swappiness 5245.3 swappiness
440 525
441Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only. 526Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
@@ -464,6 +549,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.)
464If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) 549If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
465value in memory.stat(see 5.2). 550value in memory.stat(see 5.2).
466 551
5525.6 numa_stat
553
554This is similar to numa_maps but operates on a per-memcg basis. This is
555useful for providing visibility into the numa locality information within
556an memcg since the pages are allowed to be allocated from any physical
557node. One of the usecases is evaluating application performance by
558combining this information with the application's cpu allocation.
559
560We export "total", "file", "anon" and "unevictable" pages per-node for
561each memcg. The ouput format of memory.numa_stat is:
562
563total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
564file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
565anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
566unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
567
568And we have total = file + anon + unevictable.
569
4676. Hierarchy support 5706. Hierarchy support
468 571
469The memory controller supports a deep hierarchy and hierarchical accounting. 572The memory controller supports a deep hierarchy and hierarchical accounting.
@@ -471,13 +574,13 @@ The hierarchy is created by creating the appropriate cgroups in the
471cgroup filesystem. Consider for example, the following cgroup filesystem 574cgroup filesystem. Consider for example, the following cgroup filesystem
472hierarchy 575hierarchy
473 576
474 root 577 root
475 / | \ 578 / | \
476 / | \ 579 / | \
477 a b c 580 a b c
478 | \ 581 | \
479 | \ 582 | \
480 d e 583 d e
481 584
482In the diagram above, with hierarchical accounting enabled, all memory 585In the diagram above, with hierarchical accounting enabled, all memory
483usage of e, is accounted to its ancestors up until the root (i.e, c and root), 586usage of e, is accounted to its ancestors up until the root (i.e, c and root),