diff options
-rw-r--r-- | Documentation/cgroups/memory.txt | 59 |
1 files changed, 58 insertions, 1 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index a25cb3fafeba..5b5b63143778 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -71,6 +71,11 @@ Brief summary of control files. | |||
71 | memory.oom_control # set/show oom controls. | 71 | memory.oom_control # set/show oom controls. |
72 | memory.numa_stat # show the number of memory usage per numa node | 72 | memory.numa_stat # show the number of memory usage per numa node |
73 | 73 | ||
74 | memory.kmem.limit_in_bytes # set/show hard limit for kernel memory | ||
75 | memory.kmem.usage_in_bytes # show current kernel memory allocation | ||
76 | memory.kmem.failcnt # show the number of kernel memory usage hits limits | ||
77 | memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded | ||
78 | |||
74 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory | 79 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory |
75 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation | 80 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation |
76 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits | 81 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits |
@@ -268,20 +273,66 @@ the amount of kernel memory used by the system. Kernel memory is fundamentally | |||
268 | different than user memory, since it can't be swapped out, which makes it | 273 | different than user memory, since it can't be swapped out, which makes it |
269 | possible to DoS the system by consuming too much of this precious resource. | 274 | possible to DoS the system by consuming too much of this precious resource. |
270 | 275 | ||
276 | Kernel memory won't be accounted at all until limit on a group is set. This | ||
277 | allows for existing setups to continue working without disruption. The limit | ||
278 | cannot be set if the cgroup have children, or if there are already tasks in the | ||
279 | cgroup. Attempting to set the limit under those conditions will return -EBUSY. | ||
280 | When use_hierarchy == 1 and a group is accounted, its children will | ||
281 | automatically be accounted regardless of their limit value. | ||
282 | |||
283 | After a group is first limited, it will be kept being accounted until it | ||
284 | is removed. The memory limitation itself, can of course be removed by writing | ||
285 | -1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not | ||
286 | limited. | ||
287 | |||
271 | Kernel memory limits are not imposed for the root cgroup. Usage for the root | 288 | Kernel memory limits are not imposed for the root cgroup. Usage for the root |
272 | cgroup may or may not be accounted. | 289 | cgroup may or may not be accounted. The memory used is accumulated into |
290 | memory.kmem.usage_in_bytes, or in a separate counter when it makes sense. | ||
291 | (currently only for tcp). | ||
292 | The main "kmem" counter is fed into the main counter, so kmem charges will | ||
293 | also be visible from the user counter. | ||
273 | 294 | ||
274 | Currently no soft limit is implemented for kernel memory. It is future work | 295 | Currently no soft limit is implemented for kernel memory. It is future work |
275 | to trigger slab reclaim when those limits are reached. | 296 | to trigger slab reclaim when those limits are reached. |
276 | 297 | ||
277 | 2.7.1 Current Kernel Memory resources accounted | 298 | 2.7.1 Current Kernel Memory resources accounted |
278 | 299 | ||
300 | * stack pages: every process consumes some stack pages. By accounting into | ||
301 | kernel memory, we prevent new processes from being created when the kernel | ||
302 | memory usage is too high. | ||
303 | |||
279 | * sockets memory pressure: some sockets protocols have memory pressure | 304 | * sockets memory pressure: some sockets protocols have memory pressure |
280 | thresholds. The Memory Controller allows them to be controlled individually | 305 | thresholds. The Memory Controller allows them to be controlled individually |
281 | per cgroup, instead of globally. | 306 | per cgroup, instead of globally. |
282 | 307 | ||
283 | * tcp memory pressure: sockets memory pressure for the tcp protocol. | 308 | * tcp memory pressure: sockets memory pressure for the tcp protocol. |
284 | 309 | ||
310 | 2.7.3 Common use cases | ||
311 | |||
312 | Because the "kmem" counter is fed to the main user counter, kernel memory can | ||
313 | never be limited completely independently of user memory. Say "U" is the user | ||
314 | limit, and "K" the kernel limit. There are three possible ways limits can be | ||
315 | set: | ||
316 | |||
317 | U != 0, K = unlimited: | ||
318 | This is the standard memcg limitation mechanism already present before kmem | ||
319 | accounting. Kernel memory is completely ignored. | ||
320 | |||
321 | U != 0, K < U: | ||
322 | Kernel memory is a subset of the user memory. This setup is useful in | ||
323 | deployments where the total amount of memory per-cgroup is overcommited. | ||
324 | Overcommiting kernel memory limits is definitely not recommended, since the | ||
325 | box can still run out of non-reclaimable memory. | ||
326 | In this case, the admin could set up K so that the sum of all groups is | ||
327 | never greater than the total memory, and freely set U at the cost of his | ||
328 | QoS. | ||
329 | |||
330 | U != 0, K >= U: | ||
331 | Since kmem charges will also be fed to the user counter and reclaim will be | ||
332 | triggered for the cgroup for both kinds of memory. This setup gives the | ||
333 | admin a unified view of memory, and it is also useful for people who just | ||
334 | want to track kernel memory usage. | ||
335 | |||
285 | 3. User Interface | 336 | 3. User Interface |
286 | 337 | ||
287 | 0. Configuration | 338 | 0. Configuration |
@@ -290,6 +341,7 @@ a. Enable CONFIG_CGROUPS | |||
290 | b. Enable CONFIG_RESOURCE_COUNTERS | 341 | b. Enable CONFIG_RESOURCE_COUNTERS |
291 | c. Enable CONFIG_MEMCG | 342 | c. Enable CONFIG_MEMCG |
292 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) | 343 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) |
344 | d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) | ||
293 | 345 | ||
294 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) | 346 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
295 | # mount -t tmpfs none /sys/fs/cgroup | 347 | # mount -t tmpfs none /sys/fs/cgroup |
@@ -406,6 +458,11 @@ About use_hierarchy, see Section 6. | |||
406 | Because rmdir() moves all pages to parent, some out-of-use page caches can be | 458 | Because rmdir() moves all pages to parent, some out-of-use page caches can be |
407 | moved to the parent. If you want to avoid that, force_empty will be useful. | 459 | moved to the parent. If you want to avoid that, force_empty will be useful. |
408 | 460 | ||
461 | Also, note that when memory.kmem.limit_in_bytes is set the charges due to | ||
462 | kernel pages will still be seen. This is not considered a failure and the | ||
463 | write will still return success. In this case, it is expected that | ||
464 | memory.kmem.usage_in_bytes == memory.usage_in_bytes. | ||
465 | |||
409 | About use_hierarchy, see Section 6. | 466 | About use_hierarchy, see Section 6. |
410 | 467 | ||
411 | 5.2 stat file | 468 | 5.2 stat file |