diff options
Diffstat (limited to 'Documentation/cgroups/memory.txt')
| -rw-r--r-- | Documentation/cgroups/memory.txt | 66 |
1 files changed, 65 insertions, 1 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index a25cb3fafeba..8b8c28b9864c 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
| @@ -71,6 +71,11 @@ Brief summary of control files. | |||
| 71 | memory.oom_control # set/show oom controls. | 71 | memory.oom_control # set/show oom controls. |
| 72 | memory.numa_stat # show the number of memory usage per numa node | 72 | memory.numa_stat # show the number of memory usage per numa node |
| 73 | 73 | ||
| 74 | memory.kmem.limit_in_bytes # set/show hard limit for kernel memory | ||
| 75 | memory.kmem.usage_in_bytes # show current kernel memory allocation | ||
| 76 | memory.kmem.failcnt # show the number of kernel memory usage hits limits | ||
| 77 | memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded | ||
| 78 | |||
| 74 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory | 79 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory |
| 75 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation | 80 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation |
| 76 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits | 81 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits |
| @@ -268,20 +273,73 @@ the amount of kernel memory used by the system. Kernel memory is fundamentally | |||
| 268 | different than user memory, since it can't be swapped out, which makes it | 273 | different than user memory, since it can't be swapped out, which makes it |
| 269 | possible to DoS the system by consuming too much of this precious resource. | 274 | possible to DoS the system by consuming too much of this precious resource. |
| 270 | 275 | ||
| 276 | Kernel memory won't be accounted at all until limit on a group is set. This | ||
| 277 | allows for existing setups to continue working without disruption. The limit | ||
| 278 | cannot be set if the cgroup have children, or if there are already tasks in the | ||
| 279 | cgroup. Attempting to set the limit under those conditions will return -EBUSY. | ||
| 280 | When use_hierarchy == 1 and a group is accounted, its children will | ||
| 281 | automatically be accounted regardless of their limit value. | ||
| 282 | |||
| 283 | After a group is first limited, it will be kept being accounted until it | ||
| 284 | is removed. The memory limitation itself, can of course be removed by writing | ||
| 285 | -1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not | ||
| 286 | limited. | ||
| 287 | |||
| 271 | Kernel memory limits are not imposed for the root cgroup. Usage for the root | 288 | Kernel memory limits are not imposed for the root cgroup. Usage for the root |
| 272 | cgroup may or may not be accounted. | 289 | cgroup may or may not be accounted. The memory used is accumulated into |
| 290 | memory.kmem.usage_in_bytes, or in a separate counter when it makes sense. | ||
| 291 | (currently only for tcp). | ||
| 292 | The main "kmem" counter is fed into the main counter, so kmem charges will | ||
| 293 | also be visible from the user counter. | ||
| 273 | 294 | ||
| 274 | Currently no soft limit is implemented for kernel memory. It is future work | 295 | Currently no soft limit is implemented for kernel memory. It is future work |
| 275 | to trigger slab reclaim when those limits are reached. | 296 | to trigger slab reclaim when those limits are reached. |
| 276 | 297 | ||
| 277 | 2.7.1 Current Kernel Memory resources accounted | 298 | 2.7.1 Current Kernel Memory resources accounted |
| 278 | 299 | ||
| 300 | * stack pages: every process consumes some stack pages. By accounting into | ||
| 301 | kernel memory, we prevent new processes from being created when the kernel | ||
| 302 | memory usage is too high. | ||
| 303 | |||
| 304 | * slab pages: pages allocated by the SLAB or SLUB allocator are tracked. A copy | ||
| 305 | of each kmem_cache is created everytime the cache is touched by the first time | ||
| 306 | from inside the memcg. The creation is done lazily, so some objects can still be | ||
| 307 | skipped while the cache is being created. All objects in a slab page should | ||
| 308 | belong to the same memcg. This only fails to hold when a task is migrated to a | ||
| 309 | different memcg during the page allocation by the cache. | ||
| 310 | |||
| 279 | * sockets memory pressure: some sockets protocols have memory pressure | 311 | * sockets memory pressure: some sockets protocols have memory pressure |
| 280 | thresholds. The Memory Controller allows them to be controlled individually | 312 | thresholds. The Memory Controller allows them to be controlled individually |
| 281 | per cgroup, instead of globally. | 313 | per cgroup, instead of globally. |
| 282 | 314 | ||
| 283 | * tcp memory pressure: sockets memory pressure for the tcp protocol. | 315 | * tcp memory pressure: sockets memory pressure for the tcp protocol. |
| 284 | 316 | ||
| 317 | 2.7.3 Common use cases | ||
| 318 | |||
| 319 | Because the "kmem" counter is fed to the main user counter, kernel memory can | ||
| 320 | never be limited completely independently of user memory. Say "U" is the user | ||
| 321 | limit, and "K" the kernel limit. There are three possible ways limits can be | ||
| 322 | set: | ||
| 323 | |||
| 324 | U != 0, K = unlimited: | ||
| 325 | This is the standard memcg limitation mechanism already present before kmem | ||
| 326 | accounting. Kernel memory is completely ignored. | ||
| 327 | |||
| 328 | U != 0, K < U: | ||
| 329 | Kernel memory is a subset of the user memory. This setup is useful in | ||
| 330 | deployments where the total amount of memory per-cgroup is overcommited. | ||
| 331 | Overcommiting kernel memory limits is definitely not recommended, since the | ||
| 332 | box can still run out of non-reclaimable memory. | ||
| 333 | In this case, the admin could set up K so that the sum of all groups is | ||
| 334 | never greater than the total memory, and freely set U at the cost of his | ||
| 335 | QoS. | ||
| 336 | |||
| 337 | U != 0, K >= U: | ||
| 338 | Since kmem charges will also be fed to the user counter and reclaim will be | ||
| 339 | triggered for the cgroup for both kinds of memory. This setup gives the | ||
| 340 | admin a unified view of memory, and it is also useful for people who just | ||
| 341 | want to track kernel memory usage. | ||
| 342 | |||
| 285 | 3. User Interface | 343 | 3. User Interface |
| 286 | 344 | ||
| 287 | 0. Configuration | 345 | 0. Configuration |
| @@ -290,6 +348,7 @@ a. Enable CONFIG_CGROUPS | |||
| 290 | b. Enable CONFIG_RESOURCE_COUNTERS | 348 | b. Enable CONFIG_RESOURCE_COUNTERS |
| 291 | c. Enable CONFIG_MEMCG | 349 | c. Enable CONFIG_MEMCG |
| 292 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) | 350 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) |
| 351 | d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) | ||
| 293 | 352 | ||
| 294 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) | 353 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
| 295 | # mount -t tmpfs none /sys/fs/cgroup | 354 | # mount -t tmpfs none /sys/fs/cgroup |
| @@ -406,6 +465,11 @@ About use_hierarchy, see Section 6. | |||
| 406 | Because rmdir() moves all pages to parent, some out-of-use page caches can be | 465 | Because rmdir() moves all pages to parent, some out-of-use page caches can be |
| 407 | moved to the parent. If you want to avoid that, force_empty will be useful. | 466 | moved to the parent. If you want to avoid that, force_empty will be useful. |
| 408 | 467 | ||
| 468 | Also, note that when memory.kmem.limit_in_bytes is set the charges due to | ||
| 469 | kernel pages will still be seen. This is not considered a failure and the | ||
| 470 | write will still return success. In this case, it is expected that | ||
| 471 | memory.kmem.usage_in_bytes == memory.usage_in_bytes. | ||
| 472 | |||
| 409 | About use_hierarchy, see Section 6. | 473 | About use_hierarchy, see Section 6. |
| 410 | 474 | ||
| 411 | 5.2 stat file | 475 | 5.2 stat file |
