aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cgroups/memory.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/cgroups/memory.txt')
-rw-r--r--Documentation/cgroups/memory.txt66
1 files changed, 65 insertions, 1 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index a25cb3fafeba..8b8c28b9864c 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -71,6 +71,11 @@ Brief summary of control files.
71 memory.oom_control # set/show oom controls. 71 memory.oom_control # set/show oom controls.
72 memory.numa_stat # show the number of memory usage per numa node 72 memory.numa_stat # show the number of memory usage per numa node
73 73
74 memory.kmem.limit_in_bytes # set/show hard limit for kernel memory
75 memory.kmem.usage_in_bytes # show current kernel memory allocation
76 memory.kmem.failcnt # show the number of kernel memory usage hits limits
77 memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded
78
74 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory 79 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
75 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation 80 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
76 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits 81 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits
@@ -268,20 +273,73 @@ the amount of kernel memory used by the system. Kernel memory is fundamentally
268different than user memory, since it can't be swapped out, which makes it 273different than user memory, since it can't be swapped out, which makes it
269possible to DoS the system by consuming too much of this precious resource. 274possible to DoS the system by consuming too much of this precious resource.
270 275
276Kernel memory won't be accounted at all until limit on a group is set. This
277allows for existing setups to continue working without disruption. The limit
278cannot be set if the cgroup have children, or if there are already tasks in the
279cgroup. Attempting to set the limit under those conditions will return -EBUSY.
280When use_hierarchy == 1 and a group is accounted, its children will
281automatically be accounted regardless of their limit value.
282
283After a group is first limited, it will be kept being accounted until it
284is removed. The memory limitation itself, can of course be removed by writing
285-1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not
286limited.
287
271Kernel memory limits are not imposed for the root cgroup. Usage for the root 288Kernel memory limits are not imposed for the root cgroup. Usage for the root
272cgroup may or may not be accounted. 289cgroup may or may not be accounted. The memory used is accumulated into
290memory.kmem.usage_in_bytes, or in a separate counter when it makes sense.
291(currently only for tcp).
292The main "kmem" counter is fed into the main counter, so kmem charges will
293also be visible from the user counter.
273 294
274Currently no soft limit is implemented for kernel memory. It is future work 295Currently no soft limit is implemented for kernel memory. It is future work
275to trigger slab reclaim when those limits are reached. 296to trigger slab reclaim when those limits are reached.
276 297
2772.7.1 Current Kernel Memory resources accounted 2982.7.1 Current Kernel Memory resources accounted
278 299
300* stack pages: every process consumes some stack pages. By accounting into
301kernel memory, we prevent new processes from being created when the kernel
302memory usage is too high.
303
304* slab pages: pages allocated by the SLAB or SLUB allocator are tracked. A copy
305of each kmem_cache is created everytime the cache is touched by the first time
306from inside the memcg. The creation is done lazily, so some objects can still be
307skipped while the cache is being created. All objects in a slab page should
308belong to the same memcg. This only fails to hold when a task is migrated to a
309different memcg during the page allocation by the cache.
310
279* sockets memory pressure: some sockets protocols have memory pressure 311* sockets memory pressure: some sockets protocols have memory pressure
280thresholds. The Memory Controller allows them to be controlled individually 312thresholds. The Memory Controller allows them to be controlled individually
281per cgroup, instead of globally. 313per cgroup, instead of globally.
282 314
283* tcp memory pressure: sockets memory pressure for the tcp protocol. 315* tcp memory pressure: sockets memory pressure for the tcp protocol.
284 316
3172.7.3 Common use cases
318
319Because the "kmem" counter is fed to the main user counter, kernel memory can
320never be limited completely independently of user memory. Say "U" is the user
321limit, and "K" the kernel limit. There are three possible ways limits can be
322set:
323
324 U != 0, K = unlimited:
325 This is the standard memcg limitation mechanism already present before kmem
326 accounting. Kernel memory is completely ignored.
327
328 U != 0, K < U:
329 Kernel memory is a subset of the user memory. This setup is useful in
330 deployments where the total amount of memory per-cgroup is overcommited.
331 Overcommiting kernel memory limits is definitely not recommended, since the
332 box can still run out of non-reclaimable memory.
333 In this case, the admin could set up K so that the sum of all groups is
334 never greater than the total memory, and freely set U at the cost of his
335 QoS.
336
337 U != 0, K >= U:
338 Since kmem charges will also be fed to the user counter and reclaim will be
339 triggered for the cgroup for both kinds of memory. This setup gives the
340 admin a unified view of memory, and it is also useful for people who just
341 want to track kernel memory usage.
342
2853. User Interface 3433. User Interface
286 344
2870. Configuration 3450. Configuration
@@ -290,6 +348,7 @@ a. Enable CONFIG_CGROUPS
290b. Enable CONFIG_RESOURCE_COUNTERS 348b. Enable CONFIG_RESOURCE_COUNTERS
291c. Enable CONFIG_MEMCG 349c. Enable CONFIG_MEMCG
292d. Enable CONFIG_MEMCG_SWAP (to use swap extension) 350d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
351d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
293 352
2941. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) 3531. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
295# mount -t tmpfs none /sys/fs/cgroup 354# mount -t tmpfs none /sys/fs/cgroup
@@ -406,6 +465,11 @@ About use_hierarchy, see Section 6.
406 Because rmdir() moves all pages to parent, some out-of-use page caches can be 465 Because rmdir() moves all pages to parent, some out-of-use page caches can be
407 moved to the parent. If you want to avoid that, force_empty will be useful. 466 moved to the parent. If you want to avoid that, force_empty will be useful.
408 467
468 Also, note that when memory.kmem.limit_in_bytes is set the charges due to
469 kernel pages will still be seen. This is not considered a failure and the
470 write will still return success. In this case, it is expected that
471 memory.kmem.usage_in_bytes == memory.usage_in_bytes.
472
409 About use_hierarchy, see Section 6. 473 About use_hierarchy, see Section 6.
410 474
4115.2 stat file 4755.2 stat file