diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-12-18 18:08:12 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-12-18 18:08:12 -0500 |
commit | 673ab8783b596cda5b616b317b1a1b47480c66fd (patch) | |
tree | d3fc9bb4279720c53d0dc69c2a34c40635cf05f3 /Documentation | |
parent | d7b96ca5d08a8f2f836feb2b3b3bd721d2837a8e (diff) | |
parent | 3cf23841b4b76eb94d3f8d0fb3627690e4431413 (diff) |
Merge branch 'akpm' (more patches from Andrew)
Merge patches from Andrew Morton:
"Most of the rest of MM, plus a few dribs and drabs.
I still have quite a few irritating patches left around: ones with
dubious testing results, lack of review, ones which should have gone
via maintainer trees but the maintainers are slack, etc.
I need to be more activist in getting these things wrapped up outside
the merge window, but they're such a PITA."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits)
mm/vmscan.c: avoid possible deadlock caused by too_many_isolated()
vmscan: comment too_many_isolated()
mm/kmemleak.c: remove obsolete simple_strtoul
mm/memory_hotplug.c: improve comments
mm/hugetlb: create hugetlb cgroup file in hugetlb_init
mm/mprotect.c: coding-style cleanups
Documentation: ABI: /sys/devices/system/node/
slub: drop mutex before deleting sysfs entry
memcg: add comments clarifying aspects of cache attribute propagation
kmem: add slab-specific documentation about the kmem controller
slub: slub-specific propagation changes
slab: propagate tunable values
memcg: aggregate memcg cache values in slabinfo
memcg/sl[au]b: shrink dead caches
memcg/sl[au]b: track all the memcg children of a kmem_cache
memcg: destroy memcg caches
sl[au]b: allocate objects from memcg cache
sl[au]b: always get the cache from its page in kmem_cache_free()
memcg: skip memcg kmem allocations in specified code regions
memcg: infrastructure to match an allocation to the right cache
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/stable/sysfs-devices-node | 96 | ||||
-rw-r--r-- | Documentation/cgroups/memory.txt | 66 | ||||
-rw-r--r-- | Documentation/cgroups/resource_counter.txt | 7 |
3 files changed, 164 insertions, 5 deletions
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 49b82cad7003..ce259c13c36a 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node | |||
@@ -1,7 +1,101 @@ | |||
1 | What: /sys/devices/system/node/possible | ||
2 | Date: October 2002 | ||
3 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
4 | Description: | ||
5 | Nodes that could be possibly become online at some point. | ||
6 | |||
7 | What: /sys/devices/system/node/online | ||
8 | Date: October 2002 | ||
9 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
10 | Description: | ||
11 | Nodes that are online. | ||
12 | |||
13 | What: /sys/devices/system/node/has_normal_memory | ||
14 | Date: October 2002 | ||
15 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
16 | Description: | ||
17 | Nodes that have regular memory. | ||
18 | |||
19 | What: /sys/devices/system/node/has_cpu | ||
20 | Date: October 2002 | ||
21 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
22 | Description: | ||
23 | Nodes that have one or more CPUs. | ||
24 | |||
25 | What: /sys/devices/system/node/has_high_memory | ||
26 | Date: October 2002 | ||
27 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
28 | Description: | ||
29 | Nodes that have regular or high memory. | ||
30 | Depends on CONFIG_HIGHMEM. | ||
31 | |||
1 | What: /sys/devices/system/node/nodeX | 32 | What: /sys/devices/system/node/nodeX |
2 | Date: October 2002 | 33 | Date: October 2002 |
3 | Contact: Linux Memory Management list <linux-mm@kvack.org> | 34 | Contact: Linux Memory Management list <linux-mm@kvack.org> |
4 | Description: | 35 | Description: |
5 | When CONFIG_NUMA is enabled, this is a directory containing | 36 | When CONFIG_NUMA is enabled, this is a directory containing |
6 | information on node X such as what CPUs are local to the | 37 | information on node X such as what CPUs are local to the |
7 | node. | 38 | node. Each file is detailed next. |
39 | |||
40 | What: /sys/devices/system/node/nodeX/cpumap | ||
41 | Date: October 2002 | ||
42 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
43 | Description: | ||
44 | The node's cpumap. | ||
45 | |||
46 | What: /sys/devices/system/node/nodeX/cpulist | ||
47 | Date: October 2002 | ||
48 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
49 | Description: | ||
50 | The CPUs associated to the node. | ||
51 | |||
52 | What: /sys/devices/system/node/nodeX/meminfo | ||
53 | Date: October 2002 | ||
54 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
55 | Description: | ||
56 | Provides information about the node's distribution and memory | ||
57 | utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt | ||
58 | |||
59 | What: /sys/devices/system/node/nodeX/numastat | ||
60 | Date: October 2002 | ||
61 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
62 | Description: | ||
63 | The node's hit/miss statistics, in units of pages. | ||
64 | See Documentation/numastat.txt | ||
65 | |||
66 | What: /sys/devices/system/node/nodeX/distance | ||
67 | Date: October 2002 | ||
68 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
69 | Description: | ||
70 | Distance between the node and all the other nodes | ||
71 | in the system. | ||
72 | |||
73 | What: /sys/devices/system/node/nodeX/vmstat | ||
74 | Date: October 2002 | ||
75 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
76 | Description: | ||
77 | The node's zoned virtual memory statistics. | ||
78 | This is a superset of numastat. | ||
79 | |||
80 | What: /sys/devices/system/node/nodeX/compact | ||
81 | Date: February 2010 | ||
82 | Contact: Mel Gorman <mel@csn.ul.ie> | ||
83 | Description: | ||
84 | When this file is written to, all memory within that node | ||
85 | will be compacted. When it completes, memory will be freed | ||
86 | into blocks which have as many contiguous pages as possible | ||
87 | |||
88 | What: /sys/devices/system/node/nodeX/scan_unevictable_pages | ||
89 | Date: October 2008 | ||
90 | Contact: Lee Schermerhorn <lee.schermerhorn@hp.com> | ||
91 | Description: | ||
92 | When set, it triggers scanning the node's unevictable lists | ||
93 | and move any pages that have become evictable onto the respective | ||
94 | zone's inactive list. See mm/vmscan.c | ||
95 | |||
96 | What: /sys/devices/system/node/nodeX/hugepages/hugepages-<size>/ | ||
97 | Date: December 2009 | ||
98 | Contact: Lee Schermerhorn <lee.schermerhorn@hp.com> | ||
99 | Description: | ||
100 | The node's huge page size control/query attributes. | ||
101 | See Documentation/vm/hugetlbpage.txt \ No newline at end of file | ||
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index a25cb3fafeba..8b8c28b9864c 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -71,6 +71,11 @@ Brief summary of control files. | |||
71 | memory.oom_control # set/show oom controls. | 71 | memory.oom_control # set/show oom controls. |
72 | memory.numa_stat # show the number of memory usage per numa node | 72 | memory.numa_stat # show the number of memory usage per numa node |
73 | 73 | ||
74 | memory.kmem.limit_in_bytes # set/show hard limit for kernel memory | ||
75 | memory.kmem.usage_in_bytes # show current kernel memory allocation | ||
76 | memory.kmem.failcnt # show the number of kernel memory usage hits limits | ||
77 | memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded | ||
78 | |||
74 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory | 79 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory |
75 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation | 80 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation |
76 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits | 81 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits |
@@ -268,20 +273,73 @@ the amount of kernel memory used by the system. Kernel memory is fundamentally | |||
268 | different than user memory, since it can't be swapped out, which makes it | 273 | different than user memory, since it can't be swapped out, which makes it |
269 | possible to DoS the system by consuming too much of this precious resource. | 274 | possible to DoS the system by consuming too much of this precious resource. |
270 | 275 | ||
276 | Kernel memory won't be accounted at all until limit on a group is set. This | ||
277 | allows for existing setups to continue working without disruption. The limit | ||
278 | cannot be set if the cgroup have children, or if there are already tasks in the | ||
279 | cgroup. Attempting to set the limit under those conditions will return -EBUSY. | ||
280 | When use_hierarchy == 1 and a group is accounted, its children will | ||
281 | automatically be accounted regardless of their limit value. | ||
282 | |||
283 | After a group is first limited, it will be kept being accounted until it | ||
284 | is removed. The memory limitation itself, can of course be removed by writing | ||
285 | -1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not | ||
286 | limited. | ||
287 | |||
271 | Kernel memory limits are not imposed for the root cgroup. Usage for the root | 288 | Kernel memory limits are not imposed for the root cgroup. Usage for the root |
272 | cgroup may or may not be accounted. | 289 | cgroup may or may not be accounted. The memory used is accumulated into |
290 | memory.kmem.usage_in_bytes, or in a separate counter when it makes sense. | ||
291 | (currently only for tcp). | ||
292 | The main "kmem" counter is fed into the main counter, so kmem charges will | ||
293 | also be visible from the user counter. | ||
273 | 294 | ||
274 | Currently no soft limit is implemented for kernel memory. It is future work | 295 | Currently no soft limit is implemented for kernel memory. It is future work |
275 | to trigger slab reclaim when those limits are reached. | 296 | to trigger slab reclaim when those limits are reached. |
276 | 297 | ||
277 | 2.7.1 Current Kernel Memory resources accounted | 298 | 2.7.1 Current Kernel Memory resources accounted |
278 | 299 | ||
300 | * stack pages: every process consumes some stack pages. By accounting into | ||
301 | kernel memory, we prevent new processes from being created when the kernel | ||
302 | memory usage is too high. | ||
303 | |||
304 | * slab pages: pages allocated by the SLAB or SLUB allocator are tracked. A copy | ||
305 | of each kmem_cache is created everytime the cache is touched by the first time | ||
306 | from inside the memcg. The creation is done lazily, so some objects can still be | ||
307 | skipped while the cache is being created. All objects in a slab page should | ||
308 | belong to the same memcg. This only fails to hold when a task is migrated to a | ||
309 | different memcg during the page allocation by the cache. | ||
310 | |||
279 | * sockets memory pressure: some sockets protocols have memory pressure | 311 | * sockets memory pressure: some sockets protocols have memory pressure |
280 | thresholds. The Memory Controller allows them to be controlled individually | 312 | thresholds. The Memory Controller allows them to be controlled individually |
281 | per cgroup, instead of globally. | 313 | per cgroup, instead of globally. |
282 | 314 | ||
283 | * tcp memory pressure: sockets memory pressure for the tcp protocol. | 315 | * tcp memory pressure: sockets memory pressure for the tcp protocol. |
284 | 316 | ||
317 | 2.7.3 Common use cases | ||
318 | |||
319 | Because the "kmem" counter is fed to the main user counter, kernel memory can | ||
320 | never be limited completely independently of user memory. Say "U" is the user | ||
321 | limit, and "K" the kernel limit. There are three possible ways limits can be | ||
322 | set: | ||
323 | |||
324 | U != 0, K = unlimited: | ||
325 | This is the standard memcg limitation mechanism already present before kmem | ||
326 | accounting. Kernel memory is completely ignored. | ||
327 | |||
328 | U != 0, K < U: | ||
329 | Kernel memory is a subset of the user memory. This setup is useful in | ||
330 | deployments where the total amount of memory per-cgroup is overcommited. | ||
331 | Overcommiting kernel memory limits is definitely not recommended, since the | ||
332 | box can still run out of non-reclaimable memory. | ||
333 | In this case, the admin could set up K so that the sum of all groups is | ||
334 | never greater than the total memory, and freely set U at the cost of his | ||
335 | QoS. | ||
336 | |||
337 | U != 0, K >= U: | ||
338 | Since kmem charges will also be fed to the user counter and reclaim will be | ||
339 | triggered for the cgroup for both kinds of memory. This setup gives the | ||
340 | admin a unified view of memory, and it is also useful for people who just | ||
341 | want to track kernel memory usage. | ||
342 | |||
285 | 3. User Interface | 343 | 3. User Interface |
286 | 344 | ||
287 | 0. Configuration | 345 | 0. Configuration |
@@ -290,6 +348,7 @@ a. Enable CONFIG_CGROUPS | |||
290 | b. Enable CONFIG_RESOURCE_COUNTERS | 348 | b. Enable CONFIG_RESOURCE_COUNTERS |
291 | c. Enable CONFIG_MEMCG | 349 | c. Enable CONFIG_MEMCG |
292 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) | 350 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) |
351 | d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) | ||
293 | 352 | ||
294 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) | 353 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
295 | # mount -t tmpfs none /sys/fs/cgroup | 354 | # mount -t tmpfs none /sys/fs/cgroup |
@@ -406,6 +465,11 @@ About use_hierarchy, see Section 6. | |||
406 | Because rmdir() moves all pages to parent, some out-of-use page caches can be | 465 | Because rmdir() moves all pages to parent, some out-of-use page caches can be |
407 | moved to the parent. If you want to avoid that, force_empty will be useful. | 466 | moved to the parent. If you want to avoid that, force_empty will be useful. |
408 | 467 | ||
468 | Also, note that when memory.kmem.limit_in_bytes is set the charges due to | ||
469 | kernel pages will still be seen. This is not considered a failure and the | ||
470 | write will still return success. In this case, it is expected that | ||
471 | memory.kmem.usage_in_bytes == memory.usage_in_bytes. | ||
472 | |||
409 | About use_hierarchy, see Section 6. | 473 | About use_hierarchy, see Section 6. |
410 | 474 | ||
411 | 5.2 stat file | 475 | 5.2 stat file |
diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt index 0c4a344e78fa..c4d99ed0b418 100644 --- a/Documentation/cgroups/resource_counter.txt +++ b/Documentation/cgroups/resource_counter.txt | |||
@@ -83,16 +83,17 @@ to work with it. | |||
83 | res_counter->lock internally (it must be called with res_counter->lock | 83 | res_counter->lock internally (it must be called with res_counter->lock |
84 | held). The force parameter indicates whether we can bypass the limit. | 84 | held). The force parameter indicates whether we can bypass the limit. |
85 | 85 | ||
86 | e. void res_counter_uncharge[_locked] | 86 | e. u64 res_counter_uncharge[_locked] |
87 | (struct res_counter *rc, unsigned long val) | 87 | (struct res_counter *rc, unsigned long val) |
88 | 88 | ||
89 | When a resource is released (freed) it should be de-accounted | 89 | When a resource is released (freed) it should be de-accounted |
90 | from the resource counter it was accounted to. This is called | 90 | from the resource counter it was accounted to. This is called |
91 | "uncharging". | 91 | "uncharging". The return value of this function indicate the amount |
92 | of charges still present in the counter. | ||
92 | 93 | ||
93 | The _locked routines imply that the res_counter->lock is taken. | 94 | The _locked routines imply that the res_counter->lock is taken. |
94 | 95 | ||
95 | f. void res_counter_uncharge_until | 96 | f. u64 res_counter_uncharge_until |
96 | (struct res_counter *rc, struct res_counter *top, | 97 | (struct res_counter *rc, struct res_counter *top, |
97 | unsinged long val) | 98 | unsinged long val) |
98 | 99 | ||