aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2012-12-18 18:08:12 -0500
committerLinus Torvalds <torvalds@linux-foundation.org>2012-12-18 18:08:12 -0500
commit673ab8783b596cda5b616b317b1a1b47480c66fd (patch)
treed3fc9bb4279720c53d0dc69c2a34c40635cf05f3 /Documentation
parentd7b96ca5d08a8f2f836feb2b3b3bd721d2837a8e (diff)
parent3cf23841b4b76eb94d3f8d0fb3627690e4431413 (diff)
Merge branch 'akpm' (more patches from Andrew)
Merge patches from Andrew Morton: "Most of the rest of MM, plus a few dribs and drabs. I still have quite a few irritating patches left around: ones with dubious testing results, lack of review, ones which should have gone via maintainer trees but the maintainers are slack, etc. I need to be more activist in getting these things wrapped up outside the merge window, but they're such a PITA." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits) mm/vmscan.c: avoid possible deadlock caused by too_many_isolated() vmscan: comment too_many_isolated() mm/kmemleak.c: remove obsolete simple_strtoul mm/memory_hotplug.c: improve comments mm/hugetlb: create hugetlb cgroup file in hugetlb_init mm/mprotect.c: coding-style cleanups Documentation: ABI: /sys/devices/system/node/ slub: drop mutex before deleting sysfs entry memcg: add comments clarifying aspects of cache attribute propagation kmem: add slab-specific documentation about the kmem controller slub: slub-specific propagation changes slab: propagate tunable values memcg: aggregate memcg cache values in slabinfo memcg/sl[au]b: shrink dead caches memcg/sl[au]b: track all the memcg children of a kmem_cache memcg: destroy memcg caches sl[au]b: allocate objects from memcg cache sl[au]b: always get the cache from its page in kmem_cache_free() memcg: skip memcg kmem allocations in specified code regions memcg: infrastructure to match an allocation to the right cache ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/stable/sysfs-devices-node96
-rw-r--r--Documentation/cgroups/memory.txt66
-rw-r--r--Documentation/cgroups/resource_counter.txt7
3 files changed, 164 insertions, 5 deletions
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index 49b82cad7003..ce259c13c36a 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -1,7 +1,101 @@
1What: /sys/devices/system/node/possible
2Date: October 2002
3Contact: Linux Memory Management list <linux-mm@kvack.org>
4Description:
5 Nodes that could be possibly become online at some point.
6
7What: /sys/devices/system/node/online
8Date: October 2002
9Contact: Linux Memory Management list <linux-mm@kvack.org>
10Description:
11 Nodes that are online.
12
13What: /sys/devices/system/node/has_normal_memory
14Date: October 2002
15Contact: Linux Memory Management list <linux-mm@kvack.org>
16Description:
17 Nodes that have regular memory.
18
19What: /sys/devices/system/node/has_cpu
20Date: October 2002
21Contact: Linux Memory Management list <linux-mm@kvack.org>
22Description:
23 Nodes that have one or more CPUs.
24
25What: /sys/devices/system/node/has_high_memory
26Date: October 2002
27Contact: Linux Memory Management list <linux-mm@kvack.org>
28Description:
29 Nodes that have regular or high memory.
30 Depends on CONFIG_HIGHMEM.
31
1What: /sys/devices/system/node/nodeX 32What: /sys/devices/system/node/nodeX
2Date: October 2002 33Date: October 2002
3Contact: Linux Memory Management list <linux-mm@kvack.org> 34Contact: Linux Memory Management list <linux-mm@kvack.org>
4Description: 35Description:
5 When CONFIG_NUMA is enabled, this is a directory containing 36 When CONFIG_NUMA is enabled, this is a directory containing
6 information on node X such as what CPUs are local to the 37 information on node X such as what CPUs are local to the
7 node. 38 node. Each file is detailed next.
39
40What: /sys/devices/system/node/nodeX/cpumap
41Date: October 2002
42Contact: Linux Memory Management list <linux-mm@kvack.org>
43Description:
44 The node's cpumap.
45
46What: /sys/devices/system/node/nodeX/cpulist
47Date: October 2002
48Contact: Linux Memory Management list <linux-mm@kvack.org>
49Description:
50 The CPUs associated to the node.
51
52What: /sys/devices/system/node/nodeX/meminfo
53Date: October 2002
54Contact: Linux Memory Management list <linux-mm@kvack.org>
55Description:
56 Provides information about the node's distribution and memory
57 utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt
58
59What: /sys/devices/system/node/nodeX/numastat
60Date: October 2002
61Contact: Linux Memory Management list <linux-mm@kvack.org>
62Description:
63 The node's hit/miss statistics, in units of pages.
64 See Documentation/numastat.txt
65
66What: /sys/devices/system/node/nodeX/distance
67Date: October 2002
68Contact: Linux Memory Management list <linux-mm@kvack.org>
69Description:
70 Distance between the node and all the other nodes
71 in the system.
72
73What: /sys/devices/system/node/nodeX/vmstat
74Date: October 2002
75Contact: Linux Memory Management list <linux-mm@kvack.org>
76Description:
77 The node's zoned virtual memory statistics.
78 This is a superset of numastat.
79
80What: /sys/devices/system/node/nodeX/compact
81Date: February 2010
82Contact: Mel Gorman <mel@csn.ul.ie>
83Description:
84 When this file is written to, all memory within that node
85 will be compacted. When it completes, memory will be freed
86 into blocks which have as many contiguous pages as possible
87
88What: /sys/devices/system/node/nodeX/scan_unevictable_pages
89Date: October 2008
90Contact: Lee Schermerhorn <lee.schermerhorn@hp.com>
91Description:
92 When set, it triggers scanning the node's unevictable lists
93 and move any pages that have become evictable onto the respective
94 zone's inactive list. See mm/vmscan.c
95
96What: /sys/devices/system/node/nodeX/hugepages/hugepages-<size>/
97Date: December 2009
98Contact: Lee Schermerhorn <lee.schermerhorn@hp.com>
99Description:
100 The node's huge page size control/query attributes.
101 See Documentation/vm/hugetlbpage.txt \ No newline at end of file
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index a25cb3fafeba..8b8c28b9864c 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -71,6 +71,11 @@ Brief summary of control files.
71 memory.oom_control # set/show oom controls. 71 memory.oom_control # set/show oom controls.
72 memory.numa_stat # show the number of memory usage per numa node 72 memory.numa_stat # show the number of memory usage per numa node
73 73
74 memory.kmem.limit_in_bytes # set/show hard limit for kernel memory
75 memory.kmem.usage_in_bytes # show current kernel memory allocation
76 memory.kmem.failcnt # show the number of kernel memory usage hits limits
77 memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded
78
74 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory 79 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
75 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation 80 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
76 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits 81 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits
@@ -268,20 +273,73 @@ the amount of kernel memory used by the system. Kernel memory is fundamentally
268different than user memory, since it can't be swapped out, which makes it 273different than user memory, since it can't be swapped out, which makes it
269possible to DoS the system by consuming too much of this precious resource. 274possible to DoS the system by consuming too much of this precious resource.
270 275
276Kernel memory won't be accounted at all until limit on a group is set. This
277allows for existing setups to continue working without disruption. The limit
278cannot be set if the cgroup have children, or if there are already tasks in the
279cgroup. Attempting to set the limit under those conditions will return -EBUSY.
280When use_hierarchy == 1 and a group is accounted, its children will
281automatically be accounted regardless of their limit value.
282
283After a group is first limited, it will be kept being accounted until it
284is removed. The memory limitation itself, can of course be removed by writing
285-1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not
286limited.
287
271Kernel memory limits are not imposed for the root cgroup. Usage for the root 288Kernel memory limits are not imposed for the root cgroup. Usage for the root
272cgroup may or may not be accounted. 289cgroup may or may not be accounted. The memory used is accumulated into
290memory.kmem.usage_in_bytes, or in a separate counter when it makes sense.
291(currently only for tcp).
292The main "kmem" counter is fed into the main counter, so kmem charges will
293also be visible from the user counter.
273 294
274Currently no soft limit is implemented for kernel memory. It is future work 295Currently no soft limit is implemented for kernel memory. It is future work
275to trigger slab reclaim when those limits are reached. 296to trigger slab reclaim when those limits are reached.
276 297
2772.7.1 Current Kernel Memory resources accounted 2982.7.1 Current Kernel Memory resources accounted
278 299
300* stack pages: every process consumes some stack pages. By accounting into
301kernel memory, we prevent new processes from being created when the kernel
302memory usage is too high.
303
304* slab pages: pages allocated by the SLAB or SLUB allocator are tracked. A copy
305of each kmem_cache is created everytime the cache is touched by the first time
306from inside the memcg. The creation is done lazily, so some objects can still be
307skipped while the cache is being created. All objects in a slab page should
308belong to the same memcg. This only fails to hold when a task is migrated to a
309different memcg during the page allocation by the cache.
310
279* sockets memory pressure: some sockets protocols have memory pressure 311* sockets memory pressure: some sockets protocols have memory pressure
280thresholds. The Memory Controller allows them to be controlled individually 312thresholds. The Memory Controller allows them to be controlled individually
281per cgroup, instead of globally. 313per cgroup, instead of globally.
282 314
283* tcp memory pressure: sockets memory pressure for the tcp protocol. 315* tcp memory pressure: sockets memory pressure for the tcp protocol.
284 316
3172.7.3 Common use cases
318
319Because the "kmem" counter is fed to the main user counter, kernel memory can
320never be limited completely independently of user memory. Say "U" is the user
321limit, and "K" the kernel limit. There are three possible ways limits can be
322set:
323
324 U != 0, K = unlimited:
325 This is the standard memcg limitation mechanism already present before kmem
326 accounting. Kernel memory is completely ignored.
327
328 U != 0, K < U:
329 Kernel memory is a subset of the user memory. This setup is useful in
330 deployments where the total amount of memory per-cgroup is overcommited.
331 Overcommiting kernel memory limits is definitely not recommended, since the
332 box can still run out of non-reclaimable memory.
333 In this case, the admin could set up K so that the sum of all groups is
334 never greater than the total memory, and freely set U at the cost of his
335 QoS.
336
337 U != 0, K >= U:
338 Since kmem charges will also be fed to the user counter and reclaim will be
339 triggered for the cgroup for both kinds of memory. This setup gives the
340 admin a unified view of memory, and it is also useful for people who just
341 want to track kernel memory usage.
342
2853. User Interface 3433. User Interface
286 344
2870. Configuration 3450. Configuration
@@ -290,6 +348,7 @@ a. Enable CONFIG_CGROUPS
290b. Enable CONFIG_RESOURCE_COUNTERS 348b. Enable CONFIG_RESOURCE_COUNTERS
291c. Enable CONFIG_MEMCG 349c. Enable CONFIG_MEMCG
292d. Enable CONFIG_MEMCG_SWAP (to use swap extension) 350d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
351d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
293 352
2941. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) 3531. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
295# mount -t tmpfs none /sys/fs/cgroup 354# mount -t tmpfs none /sys/fs/cgroup
@@ -406,6 +465,11 @@ About use_hierarchy, see Section 6.
406 Because rmdir() moves all pages to parent, some out-of-use page caches can be 465 Because rmdir() moves all pages to parent, some out-of-use page caches can be
407 moved to the parent. If you want to avoid that, force_empty will be useful. 466 moved to the parent. If you want to avoid that, force_empty will be useful.
408 467
468 Also, note that when memory.kmem.limit_in_bytes is set the charges due to
469 kernel pages will still be seen. This is not considered a failure and the
470 write will still return success. In this case, it is expected that
471 memory.kmem.usage_in_bytes == memory.usage_in_bytes.
472
409 About use_hierarchy, see Section 6. 473 About use_hierarchy, see Section 6.
410 474
4115.2 stat file 4755.2 stat file
diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
index 0c4a344e78fa..c4d99ed0b418 100644
--- a/Documentation/cgroups/resource_counter.txt
+++ b/Documentation/cgroups/resource_counter.txt
@@ -83,16 +83,17 @@ to work with it.
83 res_counter->lock internally (it must be called with res_counter->lock 83 res_counter->lock internally (it must be called with res_counter->lock
84 held). The force parameter indicates whether we can bypass the limit. 84 held). The force parameter indicates whether we can bypass the limit.
85 85
86 e. void res_counter_uncharge[_locked] 86 e. u64 res_counter_uncharge[_locked]
87 (struct res_counter *rc, unsigned long val) 87 (struct res_counter *rc, unsigned long val)
88 88
89 When a resource is released (freed) it should be de-accounted 89 When a resource is released (freed) it should be de-accounted
90 from the resource counter it was accounted to. This is called 90 from the resource counter it was accounted to. This is called
91 "uncharging". 91 "uncharging". The return value of this function indicate the amount
92 of charges still present in the counter.
92 93
93 The _locked routines imply that the res_counter->lock is taken. 94 The _locked routines imply that the res_counter->lock is taken.
94 95
95 f. void res_counter_uncharge_until 96 f. u64 res_counter_uncharge_until
96 (struct res_counter *rc, struct res_counter *top, 97 (struct res_counter *rc, struct res_counter *top,
97 unsinged long val) 98 unsinged long val)
98 99