aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/cgroups/memory.txt59
1 files changed, 58 insertions, 1 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index a25cb3fafeba..5b5b63143778 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -71,6 +71,11 @@ Brief summary of control files.
71 memory.oom_control # set/show oom controls. 71 memory.oom_control # set/show oom controls.
72 memory.numa_stat # show the number of memory usage per numa node 72 memory.numa_stat # show the number of memory usage per numa node
73 73
74 memory.kmem.limit_in_bytes # set/show hard limit for kernel memory
75 memory.kmem.usage_in_bytes # show current kernel memory allocation
76 memory.kmem.failcnt # show the number of kernel memory usage hits limits
77 memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded
78
74 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory 79 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
75 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation 80 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
76 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits 81 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits
@@ -268,20 +273,66 @@ the amount of kernel memory used by the system. Kernel memory is fundamentally
268different than user memory, since it can't be swapped out, which makes it 273different than user memory, since it can't be swapped out, which makes it
269possible to DoS the system by consuming too much of this precious resource. 274possible to DoS the system by consuming too much of this precious resource.
270 275
276Kernel memory won't be accounted at all until limit on a group is set. This
277allows for existing setups to continue working without disruption. The limit
278cannot be set if the cgroup have children, or if there are already tasks in the
279cgroup. Attempting to set the limit under those conditions will return -EBUSY.
280When use_hierarchy == 1 and a group is accounted, its children will
281automatically be accounted regardless of their limit value.
282
283After a group is first limited, it will be kept being accounted until it
284is removed. The memory limitation itself, can of course be removed by writing
285-1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not
286limited.
287
271Kernel memory limits are not imposed for the root cgroup. Usage for the root 288Kernel memory limits are not imposed for the root cgroup. Usage for the root
272cgroup may or may not be accounted. 289cgroup may or may not be accounted. The memory used is accumulated into
290memory.kmem.usage_in_bytes, or in a separate counter when it makes sense.
291(currently only for tcp).
292The main "kmem" counter is fed into the main counter, so kmem charges will
293also be visible from the user counter.
273 294
274Currently no soft limit is implemented for kernel memory. It is future work 295Currently no soft limit is implemented for kernel memory. It is future work
275to trigger slab reclaim when those limits are reached. 296to trigger slab reclaim when those limits are reached.
276 297
2772.7.1 Current Kernel Memory resources accounted 2982.7.1 Current Kernel Memory resources accounted
278 299
300* stack pages: every process consumes some stack pages. By accounting into
301kernel memory, we prevent new processes from being created when the kernel
302memory usage is too high.
303
279* sockets memory pressure: some sockets protocols have memory pressure 304* sockets memory pressure: some sockets protocols have memory pressure
280thresholds. The Memory Controller allows them to be controlled individually 305thresholds. The Memory Controller allows them to be controlled individually
281per cgroup, instead of globally. 306per cgroup, instead of globally.
282 307
283* tcp memory pressure: sockets memory pressure for the tcp protocol. 308* tcp memory pressure: sockets memory pressure for the tcp protocol.
284 309
3102.7.3 Common use cases
311
312Because the "kmem" counter is fed to the main user counter, kernel memory can
313never be limited completely independently of user memory. Say "U" is the user
314limit, and "K" the kernel limit. There are three possible ways limits can be
315set:
316
317 U != 0, K = unlimited:
318 This is the standard memcg limitation mechanism already present before kmem
319 accounting. Kernel memory is completely ignored.
320
321 U != 0, K < U:
322 Kernel memory is a subset of the user memory. This setup is useful in
323 deployments where the total amount of memory per-cgroup is overcommited.
324 Overcommiting kernel memory limits is definitely not recommended, since the
325 box can still run out of non-reclaimable memory.
326 In this case, the admin could set up K so that the sum of all groups is
327 never greater than the total memory, and freely set U at the cost of his
328 QoS.
329
330 U != 0, K >= U:
331 Since kmem charges will also be fed to the user counter and reclaim will be
332 triggered for the cgroup for both kinds of memory. This setup gives the
333 admin a unified view of memory, and it is also useful for people who just
334 want to track kernel memory usage.
335
2853. User Interface 3363. User Interface
286 337
2870. Configuration 3380. Configuration
@@ -290,6 +341,7 @@ a. Enable CONFIG_CGROUPS
290b. Enable CONFIG_RESOURCE_COUNTERS 341b. Enable CONFIG_RESOURCE_COUNTERS
291c. Enable CONFIG_MEMCG 342c. Enable CONFIG_MEMCG
292d. Enable CONFIG_MEMCG_SWAP (to use swap extension) 343d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
344d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
293 345
2941. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) 3461. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
295# mount -t tmpfs none /sys/fs/cgroup 347# mount -t tmpfs none /sys/fs/cgroup
@@ -406,6 +458,11 @@ About use_hierarchy, see Section 6.
406 Because rmdir() moves all pages to parent, some out-of-use page caches can be 458 Because rmdir() moves all pages to parent, some out-of-use page caches can be
407 moved to the parent. If you want to avoid that, force_empty will be useful. 459 moved to the parent. If you want to avoid that, force_empty will be useful.
408 460
461 Also, note that when memory.kmem.limit_in_bytes is set the charges due to
462 kernel pages will still be seen. This is not considered a failure and the
463 write will still return success. In this case, it is expected that
464 memory.kmem.usage_in_bytes == memory.usage_in_bytes.
465
409 About use_hierarchy, see Section 6. 466 About use_hierarchy, see Section 6.
410 467
4115.2 stat file 4685.2 stat file