diff options
| -rw-r--r-- | Documentation/cgroup-v2.txt | 460 |
1 files changed, 239 insertions, 221 deletions
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index e6101976e0f1..bde177103567 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt | |||
| @@ -1,7 +1,9 @@ | |||
| 1 | 1 | ================ | |
| 2 | Control Group v2 | 2 | Control Group v2 |
| 3 | ================ | ||
| 3 | 4 | ||
| 4 | October, 2015 Tejun Heo <tj@kernel.org> | 5 | :Date: October, 2015 |
| 6 | :Author: Tejun Heo <tj@kernel.org> | ||
| 5 | 7 | ||
| 6 | This is the authoritative documentation on the design, interface and | 8 | This is the authoritative documentation on the design, interface and |
| 7 | conventions of cgroup v2. It describes all userland-visible aspects | 9 | conventions of cgroup v2. It describes all userland-visible aspects |
| @@ -9,70 +11,72 @@ of cgroup including core and specific controller behaviors. All | |||
| 9 | future changes must be reflected in this document. Documentation for | 11 | future changes must be reflected in this document. Documentation for |
| 10 | v1 is available under Documentation/cgroup-v1/. | 12 | v1 is available under Documentation/cgroup-v1/. |
| 11 | 13 | ||
| 12 | CONTENTS | 14 | .. CONTENTS |
| 13 | 15 | ||
| 14 | 1. Introduction | 16 | 1. Introduction |
| 15 | 1-1. Terminology | 17 | 1-1. Terminology |
| 16 | 1-2. What is cgroup? | 18 | 1-2. What is cgroup? |
| 17 | 2. Basic Operations | 19 | 2. Basic Operations |
| 18 | 2-1. Mounting | 20 | 2-1. Mounting |
| 19 | 2-2. Organizing Processes | 21 | 2-2. Organizing Processes |
| 20 | 2-3. [Un]populated Notification | 22 | 2-3. [Un]populated Notification |
| 21 | 2-4. Controlling Controllers | 23 | 2-4. Controlling Controllers |
| 22 | 2-4-1. Enabling and Disabling | 24 | 2-4-1. Enabling and Disabling |
| 23 | 2-4-2. Top-down Constraint | 25 | 2-4-2. Top-down Constraint |
| 24 | 2-4-3. No Internal Process Constraint | 26 | 2-4-3. No Internal Process Constraint |
| 25 | 2-5. Delegation | 27 | 2-5. Delegation |
| 26 | 2-5-1. Model of Delegation | 28 | 2-5-1. Model of Delegation |
| 27 | 2-5-2. Delegation Containment | 29 | 2-5-2. Delegation Containment |
| 28 | 2-6. Guidelines | 30 | 2-6. Guidelines |
| 29 | 2-6-1. Organize Once and Control | 31 | 2-6-1. Organize Once and Control |
| 30 | 2-6-2. Avoid Name Collisions | 32 | 2-6-2. Avoid Name Collisions |
| 31 | 3. Resource Distribution Models | 33 | 3. Resource Distribution Models |
| 32 | 3-1. Weights | 34 | 3-1. Weights |
| 33 | 3-2. Limits | 35 | 3-2. Limits |
| 34 | 3-3. Protections | 36 | 3-3. Protections |
| 35 | 3-4. Allocations | 37 | 3-4. Allocations |
| 36 | 4. Interface Files | 38 | 4. Interface Files |
| 37 | 4-1. Format | 39 | 4-1. Format |
| 38 | 4-2. Conventions | 40 | 4-2. Conventions |
| 39 | 4-3. Core Interface Files | 41 | 4-3. Core Interface Files |
| 40 | 5. Controllers | 42 | 5. Controllers |
| 41 | 5-1. CPU | 43 | 5-1. CPU |
| 42 | 5-1-1. CPU Interface Files | 44 | 5-1-1. CPU Interface Files |
| 43 | 5-2. Memory | 45 | 5-2. Memory |
| 44 | 5-2-1. Memory Interface Files | 46 | 5-2-1. Memory Interface Files |
| 45 | 5-2-2. Usage Guidelines | 47 | 5-2-2. Usage Guidelines |
| 46 | 5-2-3. Memory Ownership | 48 | 5-2-3. Memory Ownership |
| 47 | 5-3. IO | 49 | 5-3. IO |
| 48 | 5-3-1. IO Interface Files | 50 | 5-3-1. IO Interface Files |
| 49 | 5-3-2. Writeback | 51 | 5-3-2. Writeback |
| 50 | 5-4. PID | 52 | 5-4. PID |
| 51 | 5-4-1. PID Interface Files | 53 | 5-4-1. PID Interface Files |
| 52 | 5-5. RDMA | 54 | 5-5. RDMA |
| 53 | 5-5-1. RDMA Interface Files | 55 | 5-5-1. RDMA Interface Files |
| 54 | 5-6. Misc | 56 | 5-6. Misc |
| 55 | 5-6-1. perf_event | 57 | 5-6-1. perf_event |
| 56 | 6. Namespace | 58 | 6. Namespace |
| 57 | 6-1. Basics | 59 | 6-1. Basics |
| 58 | 6-2. The Root and Views | 60 | 6-2. The Root and Views |
| 59 | 6-3. Migration and setns(2) | 61 | 6-3. Migration and setns(2) |
| 60 | 6-4. Interaction with Other Namespaces | 62 | 6-4. Interaction with Other Namespaces |
| 61 | P. Information on Kernel Programming | 63 | P. Information on Kernel Programming |
| 62 | P-1. Filesystem Support for Writeback | 64 | P-1. Filesystem Support for Writeback |
| 63 | D. Deprecated v1 Core Features | 65 | D. Deprecated v1 Core Features |
| 64 | R. Issues with v1 and Rationales for v2 | 66 | R. Issues with v1 and Rationales for v2 |
| 65 | R-1. Multiple Hierarchies | 67 | R-1. Multiple Hierarchies |
| 66 | R-2. Thread Granularity | 68 | R-2. Thread Granularity |
| 67 | R-3. Competition Between Inner Nodes and Threads | 69 | R-3. Competition Between Inner Nodes and Threads |
| 68 | R-4. Other Interface Issues | 70 | R-4. Other Interface Issues |
| 69 | R-5. Controller Issues and Remedies | 71 | R-5. Controller Issues and Remedies |
| 70 | R-5-1. Memory | 72 | R-5-1. Memory |
| 71 | 73 | ||
| 72 | 74 | ||
| 73 | 1. Introduction | 75 | Introduction |
| 74 | 76 | ============ | |
| 75 | 1-1. Terminology | 77 | |
| 78 | Terminology | ||
| 79 | ----------- | ||
| 76 | 80 | ||
| 77 | "cgroup" stands for "control group" and is never capitalized. The | 81 | "cgroup" stands for "control group" and is never capitalized. The |
| 78 | singular form is used to designate the whole feature and also as a | 82 | singular form is used to designate the whole feature and also as a |
| @@ -80,7 +84,8 @@ qualifier as in "cgroup controllers". When explicitly referring to | |||
| 80 | multiple individual control groups, the plural form "cgroups" is used. | 84 | multiple individual control groups, the plural form "cgroups" is used. |
| 81 | 85 | ||
| 82 | 86 | ||
| 83 | 1-2. What is cgroup? | 87 | What is cgroup? |
| 88 | --------------- | ||
| 84 | 89 | ||
| 85 | cgroup is a mechanism to organize processes hierarchically and | 90 | cgroup is a mechanism to organize processes hierarchically and |
| 86 | distribute system resources along the hierarchy in a controlled and | 91 | distribute system resources along the hierarchy in a controlled and |
| @@ -110,12 +115,14 @@ restrictions set closer to the root in the hierarchy can not be | |||
| 110 | overridden from further away. | 115 | overridden from further away. |
| 111 | 116 | ||
| 112 | 117 | ||
| 113 | 2. Basic Operations | 118 | Basic Operations |
| 119 | ================ | ||
| 114 | 120 | ||
| 115 | 2-1. Mounting | 121 | Mounting |
| 122 | -------- | ||
| 116 | 123 | ||
| 117 | Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2 | 124 | Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2 |
| 118 | hierarchy can be mounted with the following mount command. | 125 | hierarchy can be mounted with the following mount command:: |
| 119 | 126 | ||
| 120 | # mount -t cgroup2 none $MOUNT_POINT | 127 | # mount -t cgroup2 none $MOUNT_POINT |
| 121 | 128 | ||
| @@ -160,10 +167,11 @@ cgroup v2 currently supports the following mount options. | |||
| 160 | Delegation section for details. | 167 | Delegation section for details. |
| 161 | 168 | ||
| 162 | 169 | ||
| 163 | 2-2. Organizing Processes | 170 | Organizing Processes |
| 171 | -------------------- | ||
| 164 | 172 | ||
| 165 | Initially, only the root cgroup exists to which all processes belong. | 173 | Initially, only the root cgroup exists to which all processes belong. |
| 166 | A child cgroup can be created by creating a sub-directory. | 174 | A child cgroup can be created by creating a sub-directory:: |
| 167 | 175 | ||
| 168 | # mkdir $CGROUP_NAME | 176 | # mkdir $CGROUP_NAME |
| 169 | 177 | ||
| @@ -190,28 +198,29 @@ moved to another cgroup. | |||
| 190 | A cgroup which doesn't have any children or live processes can be | 198 | A cgroup which doesn't have any children or live processes can be |
| 191 | destroyed by removing the directory. Note that a cgroup which doesn't | 199 | destroyed by removing the directory. Note that a cgroup which doesn't |
| 192 | have any children and is associated only with zombie processes is | 200 | have any children and is associated only with zombie processes is |
| 193 | considered empty and can be removed. | 201 | considered empty and can be removed:: |
| 194 | 202 | ||
| 195 | # rmdir $CGROUP_NAME | 203 | # rmdir $CGROUP_NAME |
| 196 | 204 | ||
| 197 | "/proc/$PID/cgroup" lists a process's cgroup membership. If legacy | 205 | "/proc/$PID/cgroup" lists a process's cgroup membership. If legacy |
| 198 | cgroup is in use in the system, this file may contain multiple lines, | 206 | cgroup is in use in the system, this file may contain multiple lines, |
| 199 | one for each hierarchy. The entry for cgroup v2 is always in the | 207 | one for each hierarchy. The entry for cgroup v2 is always in the |
| 200 | format "0::$PATH". | 208 | format "0::$PATH":: |
| 201 | 209 | ||
| 202 | # cat /proc/842/cgroup | 210 | # cat /proc/842/cgroup |
| 203 | ... | 211 | ... |
| 204 | 0::/test-cgroup/test-cgroup-nested | 212 | 0::/test-cgroup/test-cgroup-nested |
| 205 | 213 | ||
| 206 | If the process becomes a zombie and the cgroup it was associated with | 214 | If the process becomes a zombie and the cgroup it was associated with |
| 207 | is removed subsequently, " (deleted)" is appended to the path. | 215 | is removed subsequently, " (deleted)" is appended to the path:: |
| 208 | 216 | ||
| 209 | # cat /proc/842/cgroup | 217 | # cat /proc/842/cgroup |
| 210 | ... | 218 | ... |
| 211 | 0::/test-cgroup/test-cgroup-nested (deleted) | 219 | 0::/test-cgroup/test-cgroup-nested (deleted) |
| 212 | 220 | ||
| 213 | 221 | ||
| 214 | 2-3. [Un]populated Notification | 222 | [Un]populated Notification |
| 223 | -------------------------- | ||
| 215 | 224 | ||
| 216 | Each non-root cgroup has a "cgroup.events" file which contains | 225 | Each non-root cgroup has a "cgroup.events" file which contains |
| 217 | "populated" field indicating whether the cgroup's sub-hierarchy has | 226 | "populated" field indicating whether the cgroup's sub-hierarchy has |
| @@ -222,7 +231,7 @@ example, to start a clean-up operation after all processes of a given | |||
| 222 | sub-hierarchy have exited. The populated state updates and | 231 | sub-hierarchy have exited. The populated state updates and |
| 223 | notifications are recursive. Consider the following sub-hierarchy | 232 | notifications are recursive. Consider the following sub-hierarchy |
| 224 | where the numbers in the parentheses represent the numbers of processes | 233 | where the numbers in the parentheses represent the numbers of processes |
| 225 | in each cgroup. | 234 | in each cgroup:: |
| 226 | 235 | ||
| 227 | A(4) - B(0) - C(1) | 236 | A(4) - B(0) - C(1) |
| 228 | \ D(0) | 237 | \ D(0) |
| @@ -233,18 +242,20 @@ file modified events will be generated on the "cgroup.events" files of | |||
| 233 | both cgroups. | 242 | both cgroups. |
| 234 | 243 | ||
| 235 | 244 | ||
| 236 | 2-4. Controlling Controllers | 245 | Controlling Controllers |
| 246 | ----------------------- | ||
| 237 | 247 | ||
| 238 | 2-4-1. Enabling and Disabling | 248 | Enabling and Disabling |
| 249 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
| 239 | 250 | ||
| 240 | Each cgroup has a "cgroup.controllers" file which lists all | 251 | Each cgroup has a "cgroup.controllers" file which lists all |
| 241 | controllers available for the cgroup to enable. | 252 | controllers available for the cgroup to enable:: |
| 242 | 253 | ||
| 243 | # cat cgroup.controllers | 254 | # cat cgroup.controllers |
| 244 | cpu io memory | 255 | cpu io memory |
| 245 | 256 | ||
| 246 | No controller is enabled by default. Controllers can be enabled and | 257 | No controller is enabled by default. Controllers can be enabled and |
| 247 | disabled by writing to the "cgroup.subtree_control" file. | 258 | disabled by writing to the "cgroup.subtree_control" file:: |
| 248 | 259 | ||
| 249 | # echo "+cpu +memory -io" > cgroup.subtree_control | 260 | # echo "+cpu +memory -io" > cgroup.subtree_control |
| 250 | 261 | ||
| @@ -256,7 +267,7 @@ are specified, the last one is effective. | |||
| 256 | Enabling a controller in a cgroup indicates that the distribution of | 267 | Enabling a controller in a cgroup indicates that the distribution of |
| 257 | the target resource across its immediate children will be controlled. | 268 | the target resource across its immediate children will be controlled. |
| 258 | Consider the following sub-hierarchy. The enabled controllers are | 269 | Consider the following sub-hierarchy. The enabled controllers are |
| 259 | listed in parentheses. | 270 | listed in parentheses:: |
| 260 | 271 | ||
| 261 | A(cpu,memory) - B(memory) - C() | 272 | A(cpu,memory) - B(memory) - C() |
| 262 | \ D() | 273 | \ D() |
| @@ -276,7 +287,8 @@ controller interface files - anything which doesn't start with | |||
| 276 | "cgroup." are owned by the parent rather than the cgroup itself. | 287 | "cgroup." are owned by the parent rather than the cgroup itself. |
| 277 | 288 | ||
| 278 | 289 | ||
| 279 | 2-4-2. Top-down Constraint | 290 | Top-down Constraint |
| 291 | ~~~~~~~~~~~~~~~~~~~ | ||
| 280 | 292 | ||
| 281 | Resources are distributed top-down and a cgroup can further distribute | 293 | Resources are distributed top-down and a cgroup can further distribute |
| 282 | a resource only if the resource has been distributed to it from the | 294 | a resource only if the resource has been distributed to it from the |
| @@ -287,7 +299,8 @@ the parent has the controller enabled and a controller can't be | |||
| 287 | disabled if one or more children have it enabled. | 299 | disabled if one or more children have it enabled. |
| 288 | 300 | ||
| 289 | 301 | ||
| 290 | 2-4-3. No Internal Process Constraint | 302 | No Internal Process Constraint |
| 303 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 291 | 304 | ||
| 292 | Non-root cgroups can only distribute resources to their children when | 305 | Non-root cgroups can only distribute resources to their children when |
| 293 | they don't have any processes of their own. In other words, only | 306 | they don't have any processes of their own. In other words, only |
| @@ -314,9 +327,11 @@ children before enabling controllers in its "cgroup.subtree_control" | |||
| 314 | file. | 327 | file. |
| 315 | 328 | ||
| 316 | 329 | ||
| 317 | 2-5. Delegation | 330 | Delegation |
| 331 | ---------- | ||
| 318 | 332 | ||
| 319 | 2-5-1. Model of Delegation | 333 | Model of Delegation |
| 334 | ~~~~~~~~~~~~~~~~~~~ | ||
| 320 | 335 | ||
| 321 | A cgroup can be delegated in two ways. First, to a less privileged | 336 | A cgroup can be delegated in two ways. First, to a less privileged |
| 322 | user by granting write access of the directory and its "cgroup.procs" | 337 | user by granting write access of the directory and its "cgroup.procs" |
| @@ -345,7 +360,8 @@ cgroups in or nesting depth of a delegated sub-hierarchy; however, | |||
| 345 | this may be limited explicitly in the future. | 360 | this may be limited explicitly in the future. |
| 346 | 361 | ||
| 347 | 362 | ||
| 348 | 2-5-2. Delegation Containment | 363 | Delegation Containment |
| 364 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
| 349 | 365 | ||
| 350 | A delegated sub-hierarchy is contained in the sense that processes | 366 | A delegated sub-hierarchy is contained in the sense that processes |
| 351 | can't be moved into or out of the sub-hierarchy by the delegatee. | 367 | can't be moved into or out of the sub-hierarchy by the delegatee. |
| @@ -366,7 +382,7 @@ in from or push out to outside the sub-hierarchy. | |||
| 366 | 382 | ||
| 367 | For an example, let's assume cgroups C0 and C1 have been delegated to | 383 | For an example, let's assume cgroups C0 and C1 have been delegated to |
| 368 | user U0 who created C00, C01 under C0 and C10 under C1 as follows and | 384 | user U0 who created C00, C01 under C0 and C10 under C1 as follows and |
| 369 | all processes under C0 and C1 belong to U0. | 385 | all processes under C0 and C1 belong to U0:: |
| 370 | 386 | ||
| 371 | ~~~~~~~~~~~~~ - C0 - C00 | 387 | ~~~~~~~~~~~~~ - C0 - C00 |
| 372 | ~ cgroup ~ \ C01 | 388 | ~ cgroup ~ \ C01 |
| @@ -386,9 +402,11 @@ namespace of the process which is attempting the migration. If either | |||
| 386 | is not reachable, the migration is rejected with -ENOENT. | 402 | is not reachable, the migration is rejected with -ENOENT. |
| 387 | 403 | ||
| 388 | 404 | ||
| 389 | 2-6. Guidelines | 405 | Guidelines |
| 406 | ---------- | ||
| 390 | 407 | ||
| 391 | 2-6-1. Organize Once and Control | 408 | Organize Once and Control |
| 409 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 392 | 410 | ||
| 393 | Migrating a process across cgroups is a relatively expensive operation | 411 | Migrating a process across cgroups is a relatively expensive operation |
| 394 | and stateful resources such as memory are not moved together with the | 412 | and stateful resources such as memory are not moved together with the |
| @@ -404,7 +422,8 @@ distribution can be made by changing controller configuration through | |||
| 404 | the interface files. | 422 | the interface files. |
| 405 | 423 | ||
| 406 | 424 | ||
| 407 | 2-6-2. Avoid Name Collisions | 425 | Avoid Name Collisions |
| 426 | ~~~~~~~~~~~~~~~~~~~~~ | ||
| 408 | 427 | ||
| 409 | Interface files for a cgroup and its children cgroups occupy the same | 428 | Interface files for a cgroup and its children cgroups occupy the same |
| 410 | directory and it is possible to create children cgroups which collide | 429 | directory and it is possible to create children cgroups which collide |
| @@ -422,14 +441,16 @@ cgroup doesn't do anything to prevent name collisions and it's the | |||
| 422 | user's responsibility to avoid them. | 441 | user's responsibility to avoid them. |
| 423 | 442 | ||
| 424 | 443 | ||
| 425 | 3. Resource Distribution Models | 444 | Resource Distribution Models |
| 445 | ============================ | ||
| 426 | 446 | ||
| 427 | cgroup controllers implement several resource distribution schemes | 447 | cgroup controllers implement several resource distribution schemes |
| 428 | depending on the resource type and expected use cases. This section | 448 | depending on the resource type and expected use cases. This section |
| 429 | describes major schemes in use along with their expected behaviors. | 449 | describes major schemes in use along with their expected behaviors. |
| 430 | 450 | ||
| 431 | 451 | ||
| 432 | 3-1. Weights | 452 | Weights |
| 453 | ------- | ||
| 433 | 454 | ||
| 434 | A parent's resource is distributed by adding up the weights of all | 455 | A parent's resource is distributed by adding up the weights of all |
| 435 | active children and giving each the fraction matching the ratio of its | 456 | active children and giving each the fraction matching the ratio of its |
| @@ -450,7 +471,8 @@ process migrations. | |||
| 450 | and is an example of this type. | 471 | and is an example of this type. |
| 451 | 472 | ||
| 452 | 473 | ||
| 453 | 3-2. Limits | 474 | Limits |
| 475 | ------ | ||
| 454 | 476 | ||
| 455 | A child can only consume upto the configured amount of the resource. | 477 | A child can only consume upto the configured amount of the resource. |
| 456 | Limits can be over-committed - the sum of the limits of children can | 478 | Limits can be over-committed - the sum of the limits of children can |
| @@ -466,7 +488,8 @@ process migrations. | |||
| 466 | on an IO device and is an example of this type. | 488 | on an IO device and is an example of this type. |
| 467 | 489 | ||
| 468 | 490 | ||
| 469 | 3-3. Protections | 491 | Protections |
| 492 | ----------- | ||
| 470 | 493 | ||
| 471 | A cgroup is protected to be allocated upto the configured amount of | 494 | A cgroup is protected to be allocated upto the configured amount of |
| 472 | the resource if the usages of all its ancestors are under their | 495 | the resource if the usages of all its ancestors are under their |
| @@ -486,7 +509,8 @@ process migrations. | |||
| 486 | example of this type. | 509 | example of this type. |
| 487 | 510 | ||
| 488 | 511 | ||
| 489 | 3-4. Allocations | 512 | Allocations |
| 513 | ----------- | ||
| 490 | 514 | ||
| 491 | A cgroup is exclusively allocated a certain amount of a finite | 515 | A cgroup is exclusively allocated a certain amount of a finite |
| 492 | resource. Allocations can't be over-committed - the sum of the | 516 | resource. Allocations can't be over-committed - the sum of the |
| @@ -505,12 +529,14 @@ may be rejected. | |||
| 505 | type. | 529 | type. |
| 506 | 530 | ||
| 507 | 531 | ||
| 508 | 4. Interface Files | 532 | Interface Files |
| 533 | =============== | ||
| 509 | 534 | ||
| 510 | 4-1. Format | 535 | Format |
| 536 | ------ | ||
| 511 | 537 | ||
| 512 | All interface files should be in one of the following formats whenever | 538 | All interface files should be in one of the following formats whenever |
| 513 | possible. | 539 | possible:: |
| 514 | 540 | ||
| 515 | New-line separated values | 541 | New-line separated values |
| 516 | (when only one value can be written at once) | 542 | (when only one value can be written at once) |
| @@ -545,7 +571,8 @@ can be written at a time. For nested keyed files, the sub key pairs | |||
| 545 | may be specified in any order and not all pairs have to be specified. | 571 | may be specified in any order and not all pairs have to be specified. |
| 546 | 572 | ||
| 547 | 573 | ||
| 548 | 4-2. Conventions | 574 | Conventions |
| 575 | ----------- | ||
| 549 | 576 | ||
| 550 | - Settings for a single feature should be contained in a single file. | 577 | - Settings for a single feature should be contained in a single file. |
| 551 | 578 | ||
| @@ -581,25 +608,25 @@ may be specified in any order and not all pairs have to be specified. | |||
| 581 | with "default" as the value must not appear when read. | 608 | with "default" as the value must not appear when read. |
| 582 | 609 | ||
| 583 | For example, a setting which is keyed by major:minor device numbers | 610 | For example, a setting which is keyed by major:minor device numbers |
| 584 | with integer values may look like the following. | 611 | with integer values may look like the following:: |
| 585 | 612 | ||
| 586 | # cat cgroup-example-interface-file | 613 | # cat cgroup-example-interface-file |
| 587 | default 150 | 614 | default 150 |
| 588 | 8:0 300 | 615 | 8:0 300 |
| 589 | 616 | ||
| 590 | The default value can be updated by | 617 | The default value can be updated by:: |
| 591 | 618 | ||
| 592 | # echo 125 > cgroup-example-interface-file | 619 | # echo 125 > cgroup-example-interface-file |
| 593 | 620 | ||
| 594 | or | 621 | or:: |
| 595 | 622 | ||
| 596 | # echo "default 125" > cgroup-example-interface-file | 623 | # echo "default 125" > cgroup-example-interface-file |
| 597 | 624 | ||
| 598 | An override can be set by | 625 | An override can be set by:: |
| 599 | 626 | ||
| 600 | # echo "8:16 170" > cgroup-example-interface-file | 627 | # echo "8:16 170" > cgroup-example-interface-file |
| 601 | 628 | ||
| 602 | and cleared by | 629 | and cleared by:: |
| 603 | 630 | ||
| 604 | # echo "8:0 default" > cgroup-example-interface-file | 631 | # echo "8:0 default" > cgroup-example-interface-file |
| 605 | # cat cgroup-example-interface-file | 632 | # cat cgroup-example-interface-file |
| @@ -612,12 +639,12 @@ may be specified in any order and not all pairs have to be specified. | |||
| 612 | generated on the file. | 639 | generated on the file. |
| 613 | 640 | ||
| 614 | 641 | ||
| 615 | 4-3. Core Interface Files | 642 | Core Interface Files |
| 643 | -------------------- | ||
| 616 | 644 | ||
| 617 | All cgroup core files are prefixed with "cgroup." | 645 | All cgroup core files are prefixed with "cgroup." |
| 618 | 646 | ||
| 619 | cgroup.procs | 647 | cgroup.procs |
| 620 | |||
| 621 | A read-write new-line separated values file which exists on | 648 | A read-write new-line separated values file which exists on |
| 622 | all cgroups. | 649 | all cgroups. |
| 623 | 650 | ||
| @@ -643,7 +670,6 @@ All cgroup core files are prefixed with "cgroup." | |||
| 643 | should be granted along with the containing directory. | 670 | should be granted along with the containing directory. |
| 644 | 671 | ||
| 645 | cgroup.controllers | 672 | cgroup.controllers |
| 646 | |||
| 647 | A read-only space separated values file which exists on all | 673 | A read-only space separated values file which exists on all |
| 648 | cgroups. | 674 | cgroups. |
| 649 | 675 | ||
| @@ -651,7 +677,6 @@ All cgroup core files are prefixed with "cgroup." | |||
| 651 | the cgroup. The controllers are not ordered. | 677 | the cgroup. The controllers are not ordered. |
| 652 | 678 | ||
| 653 | cgroup.subtree_control | 679 | cgroup.subtree_control |
| 654 | |||
| 655 | A read-write space separated values file which exists on all | 680 | A read-write space separated values file which exists on all |
| 656 | cgroups. Starts out empty. | 681 | cgroups. Starts out empty. |
| 657 | 682 | ||
| @@ -667,23 +692,25 @@ All cgroup core files are prefixed with "cgroup." | |||
| 667 | operations are specified, either all succeed or all fail. | 692 | operations are specified, either all succeed or all fail. |
| 668 | 693 | ||
| 669 | cgroup.events | 694 | cgroup.events |
| 670 | |||
| 671 | A read-only flat-keyed file which exists on non-root cgroups. | 695 | A read-only flat-keyed file which exists on non-root cgroups. |
| 672 | The following entries are defined. Unless specified | 696 | The following entries are defined. Unless specified |
| 673 | otherwise, a value change in this file generates a file | 697 | otherwise, a value change in this file generates a file |
| 674 | modified event. | 698 | modified event. |
| 675 | 699 | ||
| 676 | populated | 700 | populated |
| 677 | |||
| 678 | 1 if the cgroup or its descendants contains any live | 701 | 1 if the cgroup or its descendants contains any live |
| 679 | processes; otherwise, 0. | 702 | processes; otherwise, 0. |
| 680 | 703 | ||
| 681 | 704 | ||
| 682 | 5. Controllers | 705 | Controllers |
| 706 | =========== | ||
| 683 | 707 | ||
| 684 | 5-1. CPU | 708 | CPU |
| 709 | --- | ||
| 685 | 710 | ||
| 686 | [NOTE: The interface for the cpu controller hasn't been merged yet] | 711 | .. note:: |
| 712 | |||
| 713 | The interface for the cpu controller hasn't been merged yet | ||
| 687 | 714 | ||
| 688 | The "cpu" controllers regulates distribution of CPU cycles. This | 715 | The "cpu" controllers regulates distribution of CPU cycles. This |
| 689 | controller implements weight and absolute bandwidth limit models for | 716 | controller implements weight and absolute bandwidth limit models for |
| @@ -691,36 +718,34 @@ normal scheduling policy and absolute bandwidth allocation model for | |||
| 691 | realtime scheduling policy. | 718 | realtime scheduling policy. |
| 692 | 719 | ||
| 693 | 720 | ||
| 694 | 5-1-1. CPU Interface Files | 721 | CPU Interface Files |
| 722 | ~~~~~~~~~~~~~~~~~~~ | ||
| 695 | 723 | ||
| 696 | All time durations are in microseconds. | 724 | All time durations are in microseconds. |
| 697 | 725 | ||
| 698 | cpu.stat | 726 | cpu.stat |
| 699 | |||
| 700 | A read-only flat-keyed file which exists on non-root cgroups. | 727 | A read-only flat-keyed file which exists on non-root cgroups. |
| 701 | 728 | ||
| 702 | It reports the following six stats. | 729 | It reports the following six stats: |
| 703 | 730 | ||
| 704 | usage_usec | 731 | - usage_usec |
| 705 | user_usec | 732 | - user_usec |
| 706 | system_usec | 733 | - system_usec |
| 707 | nr_periods | 734 | - nr_periods |
| 708 | nr_throttled | 735 | - nr_throttled |
| 709 | throttled_usec | 736 | - throttled_usec |
| 710 | 737 | ||
| 711 | cpu.weight | 738 | cpu.weight |
| 712 | |||
| 713 | A read-write single value file which exists on non-root | 739 | A read-write single value file which exists on non-root |
| 714 | cgroups. The default is "100". | 740 | cgroups. The default is "100". |
| 715 | 741 | ||
| 716 | The weight in the range [1, 10000]. | 742 | The weight in the range [1, 10000]. |
| 717 | 743 | ||
| 718 | cpu.max | 744 | cpu.max |
| 719 | |||
| 720 | A read-write two value file which exists on non-root cgroups. | 745 | A read-write two value file which exists on non-root cgroups. |
| 721 | The default is "max 100000". | 746 | The default is "max 100000". |
| 722 | 747 | ||
| 723 | The maximum bandwidth limit. It's in the following format. | 748 | The maximum bandwidth limit. It's in the following format:: |
| 724 | 749 | ||
| 725 | $MAX $PERIOD | 750 | $MAX $PERIOD |
| 726 | 751 | ||
| @@ -729,9 +754,10 @@ All time durations are in microseconds. | |||
| 729 | one number is written, $MAX is updated. | 754 | one number is written, $MAX is updated. |
| 730 | 755 | ||
| 731 | cpu.rt.max | 756 | cpu.rt.max |
| 757 | .. note:: | ||
| 732 | 758 | ||
| 733 | [NOTE: The semantics of this file is still under discussion and the | 759 | The semantics of this file is still under discussion and the |
| 734 | interface hasn't been merged yet] | 760 | interface hasn't been merged yet |
| 735 | 761 | ||
| 736 | A read-write two value file which exists on all cgroups. | 762 | A read-write two value file which exists on all cgroups. |
| 737 | The default is "0 100000". | 763 | The default is "0 100000". |
| @@ -739,7 +765,7 @@ All time durations are in microseconds. | |||
| 739 | The maximum realtime runtime allocation. Over-committing | 765 | The maximum realtime runtime allocation. Over-committing |
| 740 | configurations are disallowed and process migrations are | 766 | configurations are disallowed and process migrations are |
| 741 | rejected if not enough bandwidth is available. It's in the | 767 | rejected if not enough bandwidth is available. It's in the |
| 742 | following format. | 768 | following format:: |
| 743 | 769 | ||
| 744 | $MAX $PERIOD | 770 | $MAX $PERIOD |
| 745 | 771 | ||
| @@ -748,7 +774,8 @@ All time durations are in microseconds. | |||
| 748 | updated. | 774 | updated. |
| 749 | 775 | ||
| 750 | 776 | ||
| 751 | 5-2. Memory | 777 | Memory |
| 778 | ------ | ||
| 752 | 779 | ||
| 753 | The "memory" controller regulates distribution of memory. Memory is | 780 | The "memory" controller regulates distribution of memory. Memory is |
| 754 | stateful and implements both limit and protection models. Due to the | 781 | stateful and implements both limit and protection models. Due to the |
| @@ -770,14 +797,14 @@ following types of memory usages are tracked. | |||
| 770 | The above list may expand in the future for better coverage. | 797 | The above list may expand in the future for better coverage. |
| 771 | 798 | ||
| 772 | 799 | ||
| 773 | 5-2-1. Memory Interface Files | 800 | Memory Interface Files |
| 801 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
| 774 | 802 | ||
| 775 | All memory amounts are in bytes. If a value which is not aligned to | 803 | All memory amounts are in bytes. If a value which is not aligned to |
| 776 | PAGE_SIZE is written, the value may be rounded up to the closest | 804 | PAGE_SIZE is written, the value may be rounded up to the closest |
| 777 | PAGE_SIZE multiple when read back. | 805 | PAGE_SIZE multiple when read back. |
| 778 | 806 | ||
| 779 | memory.current | 807 | memory.current |
| 780 | |||
| 781 | A read-only single value file which exists on non-root | 808 | A read-only single value file which exists on non-root |
| 782 | cgroups. | 809 | cgroups. |
| 783 | 810 | ||
| @@ -785,7 +812,6 @@ PAGE_SIZE multiple when read back. | |||
| 785 | and its descendants. | 812 | and its descendants. |
| 786 | 813 | ||
| 787 | memory.low | 814 | memory.low |
| 788 | |||
| 789 | A read-write single value file which exists on non-root | 815 | A read-write single value file which exists on non-root |
| 790 | cgroups. The default is "0". | 816 | cgroups. The default is "0". |
| 791 | 817 | ||
| @@ -798,7 +824,6 @@ PAGE_SIZE multiple when read back. | |||
| 798 | protection is discouraged. | 824 | protection is discouraged. |
| 799 | 825 | ||
| 800 | memory.high | 826 | memory.high |
| 801 | |||
| 802 | A read-write single value file which exists on non-root | 827 | A read-write single value file which exists on non-root |
| 803 | cgroups. The default is "max". | 828 | cgroups. The default is "max". |
| 804 | 829 | ||
| @@ -811,7 +836,6 @@ PAGE_SIZE multiple when read back. | |||
| 811 | under extreme conditions the limit may be breached. | 836 | under extreme conditions the limit may be breached. |
| 812 | 837 | ||
| 813 | memory.max | 838 | memory.max |
| 814 | |||
| 815 | A read-write single value file which exists on non-root | 839 | A read-write single value file which exists on non-root |
| 816 | cgroups. The default is "max". | 840 | cgroups. The default is "max". |
| 817 | 841 | ||
| @@ -826,21 +850,18 @@ PAGE_SIZE multiple when read back. | |||
| 826 | utility is limited to providing the final safety net. | 850 | utility is limited to providing the final safety net. |
| 827 | 851 | ||
| 828 | memory.events | 852 | memory.events |
| 829 | |||
| 830 | A read-only flat-keyed file which exists on non-root cgroups. | 853 | A read-only flat-keyed file which exists on non-root cgroups. |
| 831 | The following entries are defined. Unless specified | 854 | The following entries are defined. Unless specified |
| 832 | otherwise, a value change in this file generates a file | 855 | otherwise, a value change in this file generates a file |
| 833 | modified event. | 856 | modified event. |
| 834 | 857 | ||
| 835 | low | 858 | low |
| 836 | |||
| 837 | The number of times the cgroup is reclaimed due to | 859 | The number of times the cgroup is reclaimed due to |
| 838 | high memory pressure even though its usage is under | 860 | high memory pressure even though its usage is under |
| 839 | the low boundary. This usually indicates that the low | 861 | the low boundary. This usually indicates that the low |
| 840 | boundary is over-committed. | 862 | boundary is over-committed. |
| 841 | 863 | ||
| 842 | high | 864 | high |
| 843 | |||
| 844 | The number of times processes of the cgroup are | 865 | The number of times processes of the cgroup are |
| 845 | throttled and routed to perform direct memory reclaim | 866 | throttled and routed to perform direct memory reclaim |
| 846 | because the high memory boundary was exceeded. For a | 867 | because the high memory boundary was exceeded. For a |
| @@ -849,13 +870,11 @@ PAGE_SIZE multiple when read back. | |||
| 849 | occurrences are expected. | 870 | occurrences are expected. |
| 850 | 871 | ||
| 851 | max | 872 | max |
| 852 | |||
| 853 | The number of times the cgroup's memory usage was | 873 | The number of times the cgroup's memory usage was |
| 854 | about to go over the max boundary. If direct reclaim | 874 | about to go over the max boundary. If direct reclaim |
| 855 | fails to bring it down, the cgroup goes to OOM state. | 875 | fails to bring it down, the cgroup goes to OOM state. |
| 856 | 876 | ||
| 857 | oom | 877 | oom |
| 858 | |||
| 859 | The number of time the cgroup's memory usage was | 878 | The number of time the cgroup's memory usage was |
| 860 | reached the limit and allocation was about to fail. | 879 | reached the limit and allocation was about to fail. |
| 861 | 880 | ||
| @@ -864,16 +883,14 @@ PAGE_SIZE multiple when read back. | |||
| 864 | 883 | ||
| 865 | Failed allocation in its turn could be returned into | 884 | Failed allocation in its turn could be returned into |
| 866 | userspace as -ENOMEM or siletly ignored in cases like | 885 | userspace as -ENOMEM or siletly ignored in cases like |
| 867 | disk readahead. For now OOM in memory cgroup kills | 886 | disk readahead. For now OOM in memory cgroup kills |
| 868 | tasks iff shortage has happened inside page fault. | 887 | tasks iff shortage has happened inside page fault. |
| 869 | 888 | ||
| 870 | oom_kill | 889 | oom_kill |
| 871 | |||
| 872 | The number of processes belonging to this cgroup | 890 | The number of processes belonging to this cgroup |
| 873 | killed by any kind of OOM killer. | 891 | killed by any kind of OOM killer. |
| 874 | 892 | ||
| 875 | memory.stat | 893 | memory.stat |
| 876 | |||
| 877 | A read-only flat-keyed file which exists on non-root cgroups. | 894 | A read-only flat-keyed file which exists on non-root cgroups. |
| 878 | 895 | ||
| 879 | This breaks down the cgroup's memory footprint into different | 896 | This breaks down the cgroup's memory footprint into different |
| @@ -887,73 +904,55 @@ PAGE_SIZE multiple when read back. | |||
| 887 | fixed position; use the keys to look up specific values! | 904 | fixed position; use the keys to look up specific values! |
| 888 | 905 | ||
| 889 | anon | 906 | anon |
| 890 | |||
| 891 | Amount of memory used in anonymous mappings such as | 907 | Amount of memory used in anonymous mappings such as |
| 892 | brk(), sbrk(), and mmap(MAP_ANONYMOUS) | 908 | brk(), sbrk(), and mmap(MAP_ANONYMOUS) |
| 893 | 909 | ||
| 894 | file | 910 | file |
| 895 | |||
| 896 | Amount of memory used to cache filesystem data, | 911 | Amount of memory used to cache filesystem data, |
| 897 | including tmpfs and shared memory. | 912 | including tmpfs and shared memory. |
| 898 | 913 | ||
| 899 | kernel_stack | 914 | kernel_stack |
| 900 | |||
| 901 | Amount of memory allocated to kernel stacks. | 915 | Amount of memory allocated to kernel stacks. |
| 902 | 916 | ||
| 903 | slab | 917 | slab |
| 904 | |||
| 905 | Amount of memory used for storing in-kernel data | 918 | Amount of memory used for storing in-kernel data |
| 906 | structures. | 919 | structures. |
| 907 | 920 | ||
| 908 | sock | 921 | sock |
| 909 | |||
| 910 | Amount of memory used in network transmission buffers | 922 | Amount of memory used in network transmission buffers |
| 911 | 923 | ||
| 912 | shmem | 924 | shmem |
| 913 | |||
| 914 | Amount of cached filesystem data that is swap-backed, | 925 | Amount of cached filesystem data that is swap-backed, |
| 915 | such as tmpfs, shm segments, shared anonymous mmap()s | 926 | such as tmpfs, shm segments, shared anonymous mmap()s |
| 916 | 927 | ||
| 917 | file_mapped | 928 | file_mapped |
| 918 | |||
| 919 | Amount of cached filesystem data mapped with mmap() | 929 | Amount of cached filesystem data mapped with mmap() |
| 920 | 930 | ||
| 921 | file_dirty | 931 | file_dirty |
| 922 | |||
| 923 | Amount of cached filesystem data that was modified but | 932 | Amount of cached filesystem data that was modified but |
| 924 | not yet written back to disk | 933 | not yet written back to disk |
| 925 | 934 | ||
| 926 | file_writeback | 935 | file_writeback |
| 927 | |||
| 928 | Amount of cached filesystem data that was modified and | 936 | Amount of cached filesystem data that was modified and |
| 929 | is currently being written back to disk | 937 | is currently being written back to disk |
| 930 | 938 | ||
| 931 | inactive_anon | 939 | inactive_anon, active_anon, inactive_file, active_file, unevictable |
| 932 | active_anon | ||
| 933 | inactive_file | ||
| 934 | active_file | ||
| 935 | unevictable | ||
| 936 | |||
| 937 | Amount of memory, swap-backed and filesystem-backed, | 940 | Amount of memory, swap-backed and filesystem-backed, |
| 938 | on the internal memory management lists used by the | 941 | on the internal memory management lists used by the |
| 939 | page reclaim algorithm | 942 | page reclaim algorithm |
| 940 | 943 | ||
| 941 | slab_reclaimable | 944 | slab_reclaimable |
| 942 | |||
| 943 | Part of "slab" that might be reclaimed, such as | 945 | Part of "slab" that might be reclaimed, such as |
| 944 | dentries and inodes. | 946 | dentries and inodes. |
| 945 | 947 | ||
| 946 | slab_unreclaimable | 948 | slab_unreclaimable |
| 947 | |||
| 948 | Part of "slab" that cannot be reclaimed on memory | 949 | Part of "slab" that cannot be reclaimed on memory |
| 949 | pressure. | 950 | pressure. |
| 950 | 951 | ||
| 951 | pgfault | 952 | pgfault |
| 952 | |||
| 953 | Total number of page faults incurred | 953 | Total number of page faults incurred |
| 954 | 954 | ||
| 955 | pgmajfault | 955 | pgmajfault |
| 956 | |||
| 957 | Number of major page faults incurred | 956 | Number of major page faults incurred |
| 958 | 957 | ||
| 959 | workingset_refault | 958 | workingset_refault |
| @@ -997,7 +996,6 @@ PAGE_SIZE multiple when read back. | |||
| 997 | Amount of reclaimed lazyfree pages | 996 | Amount of reclaimed lazyfree pages |
| 998 | 997 | ||
| 999 | memory.swap.current | 998 | memory.swap.current |
| 1000 | |||
| 1001 | A read-only single value file which exists on non-root | 999 | A read-only single value file which exists on non-root |
| 1002 | cgroups. | 1000 | cgroups. |
| 1003 | 1001 | ||
| @@ -1005,7 +1003,6 @@ PAGE_SIZE multiple when read back. | |||
| 1005 | and its descendants. | 1003 | and its descendants. |
| 1006 | 1004 | ||
| 1007 | memory.swap.max | 1005 | memory.swap.max |
| 1008 | |||
| 1009 | A read-write single value file which exists on non-root | 1006 | A read-write single value file which exists on non-root |
| 1010 | cgroups. The default is "max". | 1007 | cgroups. The default is "max". |
| 1011 | 1008 | ||
| @@ -1013,7 +1010,8 @@ PAGE_SIZE multiple when read back. | |||
| 1013 | limit, anonymous meomry of the cgroup will not be swapped out. | 1010 | limit, anonymous meomry of the cgroup will not be swapped out. |
| 1014 | 1011 | ||
| 1015 | 1012 | ||
| 1016 | 5-2-2. Usage Guidelines | 1013 | Usage Guidelines |
| 1014 | ~~~~~~~~~~~~~~~~ | ||
| 1017 | 1015 | ||
| 1018 | "memory.high" is the main mechanism to control memory usage. | 1016 | "memory.high" is the main mechanism to control memory usage. |
| 1019 | Over-committing on high limit (sum of high limits > available memory) | 1017 | Over-committing on high limit (sum of high limits > available memory) |
| @@ -1036,7 +1034,8 @@ memory; unfortunately, memory pressure monitoring mechanism isn't | |||
| 1036 | implemented yet. | 1034 | implemented yet. |
| 1037 | 1035 | ||
| 1038 | 1036 | ||
| 1039 | 5-2-3. Memory Ownership | 1037 | Memory Ownership |
| 1038 | ~~~~~~~~~~~~~~~~ | ||
| 1040 | 1039 | ||
| 1041 | A memory area is charged to the cgroup which instantiated it and stays | 1040 | A memory area is charged to the cgroup which instantiated it and stays |
| 1042 | charged to the cgroup until the area is released. Migrating a process | 1041 | charged to the cgroup until the area is released. Migrating a process |
| @@ -1054,7 +1053,8 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas | |||
| 1054 | belonging to the affected files to ensure correct memory ownership. | 1053 | belonging to the affected files to ensure correct memory ownership. |
| 1055 | 1054 | ||
| 1056 | 1055 | ||
| 1057 | 5-3. IO | 1056 | IO |
| 1057 | -- | ||
| 1058 | 1058 | ||
| 1059 | The "io" controller regulates the distribution of IO resources. This | 1059 | The "io" controller regulates the distribution of IO resources. This |
| 1060 | controller implements both weight based and absolute bandwidth or IOPS | 1060 | controller implements both weight based and absolute bandwidth or IOPS |
| @@ -1063,28 +1063,29 @@ only if cfq-iosched is in use and neither scheme is available for | |||
| 1063 | blk-mq devices. | 1063 | blk-mq devices. |
| 1064 | 1064 | ||
| 1065 | 1065 | ||
| 1066 | 5-3-1. IO Interface Files | 1066 | IO Interface Files |
| 1067 | ~~~~~~~~~~~~~~~~~~ | ||
| 1067 | 1068 | ||
| 1068 | io.stat | 1069 | io.stat |
| 1069 | |||
| 1070 | A read-only nested-keyed file which exists on non-root | 1070 | A read-only nested-keyed file which exists on non-root |
| 1071 | cgroups. | 1071 | cgroups. |
| 1072 | 1072 | ||
| 1073 | Lines are keyed by $MAJ:$MIN device numbers and not ordered. | 1073 | Lines are keyed by $MAJ:$MIN device numbers and not ordered. |
| 1074 | The following nested keys are defined. | 1074 | The following nested keys are defined. |
| 1075 | 1075 | ||
| 1076 | ====== =================== | ||
| 1076 | rbytes Bytes read | 1077 | rbytes Bytes read |
| 1077 | wbytes Bytes written | 1078 | wbytes Bytes written |
| 1078 | rios Number of read IOs | 1079 | rios Number of read IOs |
| 1079 | wios Number of write IOs | 1080 | wios Number of write IOs |
| 1081 | ====== =================== | ||
| 1080 | 1082 | ||
| 1081 | An example read output follows. | 1083 | An example read output follows: |
| 1082 | 1084 | ||
| 1083 | 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 | 1085 | 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 |
| 1084 | 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 | 1086 | 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 |
| 1085 | 1087 | ||
| 1086 | io.weight | 1088 | io.weight |
| 1087 | |||
| 1088 | A read-write flat-keyed file which exists on non-root cgroups. | 1089 | A read-write flat-keyed file which exists on non-root cgroups. |
| 1089 | The default is "default 100". | 1090 | The default is "default 100". |
| 1090 | 1091 | ||
| @@ -1098,14 +1099,13 @@ blk-mq devices. | |||
| 1098 | $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing | 1099 | $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing |
| 1099 | "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default". | 1100 | "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default". |
| 1100 | 1101 | ||
| 1101 | An example read output follows. | 1102 | An example read output follows:: |
| 1102 | 1103 | ||
| 1103 | default 100 | 1104 | default 100 |
| 1104 | 8:16 200 | 1105 | 8:16 200 |
| 1105 | 8:0 50 | 1106 | 8:0 50 |
| 1106 | 1107 | ||
| 1107 | io.max | 1108 | io.max |
| 1108 | |||
| 1109 | A read-write nested-keyed file which exists on non-root | 1109 | A read-write nested-keyed file which exists on non-root |
| 1110 | cgroups. | 1110 | cgroups. |
| 1111 | 1111 | ||
| @@ -1113,10 +1113,12 @@ blk-mq devices. | |||
| 1113 | device numbers and not ordered. The following nested keys are | 1113 | device numbers and not ordered. The following nested keys are |
| 1114 | defined. | 1114 | defined. |
| 1115 | 1115 | ||
| 1116 | ===== ================================== | ||
| 1116 | rbps Max read bytes per second | 1117 | rbps Max read bytes per second |
| 1117 | wbps Max write bytes per second | 1118 | wbps Max write bytes per second |
| 1118 | riops Max read IO operations per second | 1119 | riops Max read IO operations per second |
| 1119 | wiops Max write IO operations per second | 1120 | wiops Max write IO operations per second |
| 1121 | ===== ================================== | ||
| 1120 | 1122 | ||
| 1121 | When writing, any number of nested key-value pairs can be | 1123 | When writing, any number of nested key-value pairs can be |
| 1122 | specified in any order. "max" can be specified as the value | 1124 | specified in any order. "max" can be specified as the value |
| @@ -1126,24 +1128,25 @@ blk-mq devices. | |||
| 1126 | BPS and IOPS are measured in each IO direction and IOs are | 1128 | BPS and IOPS are measured in each IO direction and IOs are |
| 1127 | delayed if limit is reached. Temporary bursts are allowed. | 1129 | delayed if limit is reached. Temporary bursts are allowed. |
| 1128 | 1130 | ||
| 1129 | Setting read limit at 2M BPS and write at 120 IOPS for 8:16. | 1131 | Setting read limit at 2M BPS and write at 120 IOPS for 8:16:: |
| 1130 | 1132 | ||
| 1131 | echo "8:16 rbps=2097152 wiops=120" > io.max | 1133 | echo "8:16 rbps=2097152 wiops=120" > io.max |
| 1132 | 1134 | ||
| 1133 | Reading returns the following. | 1135 | Reading returns the following:: |
| 1134 | 1136 | ||
| 1135 | 8:16 rbps=2097152 wbps=max riops=max wiops=120 | 1137 | 8:16 rbps=2097152 wbps=max riops=max wiops=120 |
| 1136 | 1138 | ||
| 1137 | Write IOPS limit can be removed by writing the following. | 1139 | Write IOPS limit can be removed by writing the following:: |
| 1138 | 1140 | ||
| 1139 | echo "8:16 wiops=max" > io.max | 1141 | echo "8:16 wiops=max" > io.max |
| 1140 | 1142 | ||
| 1141 | Reading now returns the following. | 1143 | Reading now returns the following:: |
| 1142 | 1144 | ||
| 1143 | 8:16 rbps=2097152 wbps=max riops=max wiops=max | 1145 | 8:16 rbps=2097152 wbps=max riops=max wiops=max |
| 1144 | 1146 | ||
| 1145 | 1147 | ||
| 1146 | 5-3-2. Writeback | 1148 | Writeback |
| 1149 | ~~~~~~~~~ | ||
| 1147 | 1150 | ||
| 1148 | Page cache is dirtied through buffered writes and shared mmaps and | 1151 | Page cache is dirtied through buffered writes and shared mmaps and |
| 1149 | written asynchronously to the backing filesystem by the writeback | 1152 | written asynchronously to the backing filesystem by the writeback |
| @@ -1191,22 +1194,19 @@ patterns. | |||
| 1191 | The sysctl knobs which affect writeback behavior are applied to cgroup | 1194 | The sysctl knobs which affect writeback behavior are applied to cgroup |
| 1192 | writeback as follows. | 1195 | writeback as follows. |
| 1193 | 1196 | ||
| 1194 | vm.dirty_background_ratio | 1197 | vm.dirty_background_ratio, vm.dirty_ratio |
| 1195 | vm.dirty_ratio | ||
| 1196 | |||
| 1197 | These ratios apply the same to cgroup writeback with the | 1198 | These ratios apply the same to cgroup writeback with the |
| 1198 | amount of available memory capped by limits imposed by the | 1199 | amount of available memory capped by limits imposed by the |
| 1199 | memory controller and system-wide clean memory. | 1200 | memory controller and system-wide clean memory. |
| 1200 | 1201 | ||
| 1201 | vm.dirty_background_bytes | 1202 | vm.dirty_background_bytes, vm.dirty_bytes |
| 1202 | vm.dirty_bytes | ||
| 1203 | |||
| 1204 | For cgroup writeback, this is calculated into ratio against | 1203 | For cgroup writeback, this is calculated into ratio against |
| 1205 | total available memory and applied the same way as | 1204 | total available memory and applied the same way as |
| 1206 | vm.dirty[_background]_ratio. | 1205 | vm.dirty[_background]_ratio. |
| 1207 | 1206 | ||
| 1208 | 1207 | ||
| 1209 | 5-4. PID | 1208 | PID |
| 1209 | --- | ||
| 1210 | 1210 | ||
| 1211 | The process number controller is used to allow a cgroup to stop any | 1211 | The process number controller is used to allow a cgroup to stop any |
| 1212 | new tasks from being fork()'d or clone()'d after a specified limit is | 1212 | new tasks from being fork()'d or clone()'d after a specified limit is |
| @@ -1221,17 +1221,16 @@ Note that PIDs used in this controller refer to TIDs, process IDs as | |||
| 1221 | used by the kernel. | 1221 | used by the kernel. |
| 1222 | 1222 | ||
| 1223 | 1223 | ||
| 1224 | 5-4-1. PID Interface Files | 1224 | PID Interface Files |
| 1225 | ~~~~~~~~~~~~~~~~~~~ | ||
| 1225 | 1226 | ||
| 1226 | pids.max | 1227 | pids.max |
| 1227 | |||
| 1228 | A read-write single value file which exists on non-root | 1228 | A read-write single value file which exists on non-root |
| 1229 | cgroups. The default is "max". | 1229 | cgroups. The default is "max". |
| 1230 | 1230 | ||
| 1231 | Hard limit of number of processes. | 1231 | Hard limit of number of processes. |
| 1232 | 1232 | ||
| 1233 | pids.current | 1233 | pids.current |
| 1234 | |||
| 1235 | A read-only single value file which exists on all cgroups. | 1234 | A read-only single value file which exists on all cgroups. |
| 1236 | 1235 | ||
| 1237 | The number of processes currently in the cgroup and its | 1236 | The number of processes currently in the cgroup and its |
| @@ -1246,12 +1245,14 @@ through fork() or clone(). These will return -EAGAIN if the creation | |||
| 1246 | of a new process would cause a cgroup policy to be violated. | 1245 | of a new process would cause a cgroup policy to be violated. |
| 1247 | 1246 | ||
| 1248 | 1247 | ||
| 1249 | 5-5. RDMA | 1248 | RDMA |
| 1249 | ---- | ||
| 1250 | 1250 | ||
| 1251 | The "rdma" controller regulates the distribution and accounting of | 1251 | The "rdma" controller regulates the distribution and accounting of |
| 1252 | of RDMA resources. | 1252 | of RDMA resources. |
| 1253 | 1253 | ||
| 1254 | 5-5-1. RDMA Interface Files | 1254 | RDMA Interface Files |
| 1255 | ~~~~~~~~~~~~~~~~~~~~ | ||
| 1255 | 1256 | ||
| 1256 | rdma.max | 1257 | rdma.max |
| 1257 | A readwrite nested-keyed file that exists for all the cgroups | 1258 | A readwrite nested-keyed file that exists for all the cgroups |
| @@ -1264,10 +1265,12 @@ of RDMA resources. | |||
| 1264 | 1265 | ||
| 1265 | The following nested keys are defined. | 1266 | The following nested keys are defined. |
| 1266 | 1267 | ||
| 1268 | ========== ============================= | ||
| 1267 | hca_handle Maximum number of HCA Handles | 1269 | hca_handle Maximum number of HCA Handles |
| 1268 | hca_object Maximum number of HCA Objects | 1270 | hca_object Maximum number of HCA Objects |
| 1271 | ========== ============================= | ||
| 1269 | 1272 | ||
| 1270 | An example for mlx4 and ocrdma device follows. | 1273 | An example for mlx4 and ocrdma device follows:: |
| 1271 | 1274 | ||
| 1272 | mlx4_0 hca_handle=2 hca_object=2000 | 1275 | mlx4_0 hca_handle=2 hca_object=2000 |
| 1273 | ocrdma1 hca_handle=3 hca_object=max | 1276 | ocrdma1 hca_handle=3 hca_object=max |
| @@ -1276,15 +1279,17 @@ of RDMA resources. | |||
| 1276 | A read-only file that describes current resource usage. | 1279 | A read-only file that describes current resource usage. |
| 1277 | It exists for all the cgroup except root. | 1280 | It exists for all the cgroup except root. |
| 1278 | 1281 | ||
| 1279 | An example for mlx4 and ocrdma device follows. | 1282 | An example for mlx4 and ocrdma device follows:: |
| 1280 | 1283 | ||
| 1281 | mlx4_0 hca_handle=1 hca_object=20 | 1284 | mlx4_0 hca_handle=1 hca_object=20 |
| 1282 | ocrdma1 hca_handle=1 hca_object=23 | 1285 | ocrdma1 hca_handle=1 hca_object=23 |
| 1283 | 1286 | ||
| 1284 | 1287 | ||
| 1285 | 5-6. Misc | 1288 | Misc |
| 1289 | ---- | ||
| 1286 | 1290 | ||
| 1287 | 5-6-1. perf_event | 1291 | perf_event |
| 1292 | ~~~~~~~~~~ | ||
| 1288 | 1293 | ||
| 1289 | perf_event controller, if not mounted on a legacy hierarchy, is | 1294 | perf_event controller, if not mounted on a legacy hierarchy, is |
| 1290 | automatically enabled on the v2 hierarchy so that perf events can | 1295 | automatically enabled on the v2 hierarchy so that perf events can |
| @@ -1292,9 +1297,11 @@ always be filtered by cgroup v2 path. The controller can still be | |||
| 1292 | moved to a legacy hierarchy after v2 hierarchy is populated. | 1297 | moved to a legacy hierarchy after v2 hierarchy is populated. |
| 1293 | 1298 | ||
| 1294 | 1299 | ||
| 1295 | 6. Namespace | 1300 | Namespace |
| 1301 | ========= | ||
| 1296 | 1302 | ||
| 1297 | 6-1. Basics | 1303 | Basics |
| 1304 | ------ | ||
| 1298 | 1305 | ||
| 1299 | cgroup namespace provides a mechanism to virtualize the view of the | 1306 | cgroup namespace provides a mechanism to virtualize the view of the |
| 1300 | "/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone | 1307 | "/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone |
| @@ -1308,7 +1315,7 @@ Without cgroup namespace, the "/proc/$PID/cgroup" file shows the | |||
| 1308 | complete path of the cgroup of a process. In a container setup where | 1315 | complete path of the cgroup of a process. In a container setup where |
| 1309 | a set of cgroups and namespaces are intended to isolate processes the | 1316 | a set of cgroups and namespaces are intended to isolate processes the |
| 1310 | "/proc/$PID/cgroup" file may leak potential system level information | 1317 | "/proc/$PID/cgroup" file may leak potential system level information |
| 1311 | to the isolated processes. For Example: | 1318 | to the isolated processes. For Example:: |
| 1312 | 1319 | ||
| 1313 | # cat /proc/self/cgroup | 1320 | # cat /proc/self/cgroup |
| 1314 | 0::/batchjobs/container_id1 | 1321 | 0::/batchjobs/container_id1 |
| @@ -1316,14 +1323,14 @@ to the isolated processes. For Example: | |||
| 1316 | The path '/batchjobs/container_id1' can be considered as system-data | 1323 | The path '/batchjobs/container_id1' can be considered as system-data |
| 1317 | and undesirable to expose to the isolated processes. cgroup namespace | 1324 | and undesirable to expose to the isolated processes. cgroup namespace |
| 1318 | can be used to restrict visibility of this path. For example, before | 1325 | can be used to restrict visibility of this path. For example, before |
| 1319 | creating a cgroup namespace, one would see: | 1326 | creating a cgroup namespace, one would see:: |
| 1320 | 1327 | ||
| 1321 | # ls -l /proc/self/ns/cgroup | 1328 | # ls -l /proc/self/ns/cgroup |
| 1322 | lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] | 1329 | lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] |
| 1323 | # cat /proc/self/cgroup | 1330 | # cat /proc/self/cgroup |
| 1324 | 0::/batchjobs/container_id1 | 1331 | 0::/batchjobs/container_id1 |
| 1325 | 1332 | ||
| 1326 | After unsharing a new namespace, the view changes. | 1333 | After unsharing a new namespace, the view changes:: |
| 1327 | 1334 | ||
| 1328 | # ls -l /proc/self/ns/cgroup | 1335 | # ls -l /proc/self/ns/cgroup |
| 1329 | lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] | 1336 | lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] |
| @@ -1341,7 +1348,8 @@ namespace is destroyed. The cgroupns root and the actual cgroups | |||
| 1341 | remain. | 1348 | remain. |
| 1342 | 1349 | ||
| 1343 | 1350 | ||
| 1344 | 6-2. The Root and Views | 1351 | The Root and Views |
| 1352 | ------------------ | ||
| 1345 | 1353 | ||
| 1346 | The 'cgroupns root' for a cgroup namespace is the cgroup in which the | 1354 | The 'cgroupns root' for a cgroup namespace is the cgroup in which the |
| 1347 | process calling unshare(2) is running. For example, if a process in | 1355 | process calling unshare(2) is running. For example, if a process in |
| @@ -1350,7 +1358,7 @@ process calling unshare(2) is running. For example, if a process in | |||
| 1350 | init_cgroup_ns, this is the real root ('/') cgroup. | 1358 | init_cgroup_ns, this is the real root ('/') cgroup. |
| 1351 | 1359 | ||
| 1352 | The cgroupns root cgroup does not change even if the namespace creator | 1360 | The cgroupns root cgroup does not change even if the namespace creator |
| 1353 | process later moves to a different cgroup. | 1361 | process later moves to a different cgroup:: |
| 1354 | 1362 | ||
| 1355 | # ~/unshare -c # unshare cgroupns in some cgroup | 1363 | # ~/unshare -c # unshare cgroupns in some cgroup |
| 1356 | # cat /proc/self/cgroup | 1364 | # cat /proc/self/cgroup |
| @@ -1364,7 +1372,7 @@ Each process gets its namespace-specific view of "/proc/$PID/cgroup" | |||
| 1364 | 1372 | ||
| 1365 | Processes running inside the cgroup namespace will be able to see | 1373 | Processes running inside the cgroup namespace will be able to see |
| 1366 | cgroup paths (in /proc/self/cgroup) only inside their root cgroup. | 1374 | cgroup paths (in /proc/self/cgroup) only inside their root cgroup. |
| 1367 | From within an unshared cgroupns: | 1375 | From within an unshared cgroupns:: |
| 1368 | 1376 | ||
| 1369 | # sleep 100000 & | 1377 | # sleep 100000 & |
| 1370 | [1] 7353 | 1378 | [1] 7353 |
| @@ -1373,7 +1381,7 @@ From within an unshared cgroupns: | |||
| 1373 | 0::/sub_cgrp_1 | 1381 | 0::/sub_cgrp_1 |
| 1374 | 1382 | ||
| 1375 | From the initial cgroup namespace, the real cgroup path will be | 1383 | From the initial cgroup namespace, the real cgroup path will be |
| 1376 | visible: | 1384 | visible:: |
| 1377 | 1385 | ||
| 1378 | $ cat /proc/7353/cgroup | 1386 | $ cat /proc/7353/cgroup |
| 1379 | 0::/batchjobs/container_id1/sub_cgrp_1 | 1387 | 0::/batchjobs/container_id1/sub_cgrp_1 |
| @@ -1381,7 +1389,7 @@ visible: | |||
| 1381 | From a sibling cgroup namespace (that is, a namespace rooted at a | 1389 | From a sibling cgroup namespace (that is, a namespace rooted at a |
| 1382 | different cgroup), the cgroup path relative to its own cgroup | 1390 | different cgroup), the cgroup path relative to its own cgroup |
| 1383 | namespace root will be shown. For instance, if PID 7353's cgroup | 1391 | namespace root will be shown. For instance, if PID 7353's cgroup |
| 1384 | namespace root is at '/batchjobs/container_id2', then it will see | 1392 | namespace root is at '/batchjobs/container_id2', then it will see:: |
| 1385 | 1393 | ||
| 1386 | # cat /proc/7353/cgroup | 1394 | # cat /proc/7353/cgroup |
| 1387 | 0::/../container_id2/sub_cgrp_1 | 1395 | 0::/../container_id2/sub_cgrp_1 |
| @@ -1390,13 +1398,14 @@ Note that the relative path always starts with '/' to indicate that | |||
| 1390 | its relative to the cgroup namespace root of the caller. | 1398 | its relative to the cgroup namespace root of the caller. |
| 1391 | 1399 | ||
| 1392 | 1400 | ||
| 1393 | 6-3. Migration and setns(2) | 1401 | Migration and setns(2) |
| 1402 | ---------------------- | ||
| 1394 | 1403 | ||
| 1395 | Processes inside a cgroup namespace can move into and out of the | 1404 | Processes inside a cgroup namespace can move into and out of the |
| 1396 | namespace root if they have proper access to external cgroups. For | 1405 | namespace root if they have proper access to external cgroups. For |
| 1397 | example, from inside a namespace with cgroupns root at | 1406 | example, from inside a namespace with cgroupns root at |
| 1398 | /batchjobs/container_id1, and assuming that the global hierarchy is | 1407 | /batchjobs/container_id1, and assuming that the global hierarchy is |
| 1399 | still accessible inside cgroupns: | 1408 | still accessible inside cgroupns:: |
| 1400 | 1409 | ||
| 1401 | # cat /proc/7353/cgroup | 1410 | # cat /proc/7353/cgroup |
| 1402 | 0::/sub_cgrp_1 | 1411 | 0::/sub_cgrp_1 |
| @@ -1418,10 +1427,11 @@ namespace. It is expected that the someone moves the attaching | |||
| 1418 | process under the target cgroup namespace root. | 1427 | process under the target cgroup namespace root. |
| 1419 | 1428 | ||
| 1420 | 1429 | ||
| 1421 | 6-4. Interaction with Other Namespaces | 1430 | Interaction with Other Namespaces |
| 1431 | --------------------------------- | ||
| 1422 | 1432 | ||
| 1423 | Namespace specific cgroup hierarchy can be mounted by a process | 1433 | Namespace specific cgroup hierarchy can be mounted by a process |
| 1424 | running inside a non-init cgroup namespace. | 1434 | running inside a non-init cgroup namespace:: |
| 1425 | 1435 | ||
| 1426 | # mount -t cgroup2 none $MOUNT_POINT | 1436 | # mount -t cgroup2 none $MOUNT_POINT |
| 1427 | 1437 | ||
| @@ -1434,27 +1444,27 @@ the view of cgroup hierarchy by namespace-private cgroupfs mount | |||
| 1434 | provides a properly isolated cgroup view inside the container. | 1444 | provides a properly isolated cgroup view inside the container. |
| 1435 | 1445 | ||
| 1436 | 1446 | ||
| 1437 | P. Information on Kernel Programming | 1447 | Information on Kernel Programming |
| 1448 | ================================= | ||
| 1438 | 1449 | ||
| 1439 | This section contains kernel programming information in the areas | 1450 | This section contains kernel programming information in the areas |
| 1440 | where interacting with cgroup is necessary. cgroup core and | 1451 | where interacting with cgroup is necessary. cgroup core and |
| 1441 | controllers are not covered. | 1452 | controllers are not covered. |
| 1442 | 1453 | ||
| 1443 | 1454 | ||
| 1444 | P-1. Filesystem Support for Writeback | 1455 | Filesystem Support for Writeback |
| 1456 | -------------------------------- | ||
| 1445 | 1457 | ||
| 1446 | A filesystem can support cgroup writeback by updating | 1458 | A filesystem can support cgroup writeback by updating |
| 1447 | address_space_operations->writepage[s]() to annotate bio's using the | 1459 | address_space_operations->writepage[s]() to annotate bio's using the |
| 1448 | following two functions. | 1460 | following two functions. |
| 1449 | 1461 | ||
| 1450 | wbc_init_bio(@wbc, @bio) | 1462 | wbc_init_bio(@wbc, @bio) |
| 1451 | |||
| 1452 | Should be called for each bio carrying writeback data and | 1463 | Should be called for each bio carrying writeback data and |
| 1453 | associates the bio with the inode's owner cgroup. Can be | 1464 | associates the bio with the inode's owner cgroup. Can be |
| 1454 | called anytime between bio allocation and submission. | 1465 | called anytime between bio allocation and submission. |
| 1455 | 1466 | ||
| 1456 | wbc_account_io(@wbc, @page, @bytes) | 1467 | wbc_account_io(@wbc, @page, @bytes) |
| 1457 | |||
| 1458 | Should be called for each data segment being written out. | 1468 | Should be called for each data segment being written out. |
| 1459 | While this function doesn't care exactly when it's called | 1469 | While this function doesn't care exactly when it's called |
| 1460 | during the writeback session, it's the easiest and most | 1470 | during the writeback session, it's the easiest and most |
| @@ -1475,7 +1485,8 @@ cases by skipping wbc_init_bio() or using bio_associate_blkcg() | |||
| 1475 | directly. | 1485 | directly. |
| 1476 | 1486 | ||
| 1477 | 1487 | ||
| 1478 | D. Deprecated v1 Core Features | 1488 | Deprecated v1 Core Features |
| 1489 | =========================== | ||
| 1479 | 1490 | ||
| 1480 | - Multiple hierarchies including named ones are not supported. | 1491 | - Multiple hierarchies including named ones are not supported. |
| 1481 | 1492 | ||
| @@ -1489,9 +1500,11 @@ D. Deprecated v1 Core Features | |||
| 1489 | at the root instead. | 1500 | at the root instead. |
| 1490 | 1501 | ||
| 1491 | 1502 | ||
| 1492 | R. Issues with v1 and Rationales for v2 | 1503 | Issues with v1 and Rationales for v2 |
| 1504 | ==================================== | ||
| 1493 | 1505 | ||
| 1494 | R-1. Multiple Hierarchies | 1506 | Multiple Hierarchies |
| 1507 | -------------------- | ||
| 1495 | 1508 | ||
| 1496 | cgroup v1 allowed an arbitrary number of hierarchies and each | 1509 | cgroup v1 allowed an arbitrary number of hierarchies and each |
| 1497 | hierarchy could host any number of controllers. While this seemed to | 1510 | hierarchy could host any number of controllers. While this seemed to |
| @@ -1543,7 +1556,8 @@ how memory is distributed beyond a certain level while still wanting | |||
| 1543 | to control how CPU cycles are distributed. | 1556 | to control how CPU cycles are distributed. |
| 1544 | 1557 | ||
| 1545 | 1558 | ||
| 1546 | R-2. Thread Granularity | 1559 | Thread Granularity |
| 1560 | ------------------ | ||
| 1547 | 1561 | ||
| 1548 | cgroup v1 allowed threads of a process to belong to different cgroups. | 1562 | cgroup v1 allowed threads of a process to belong to different cgroups. |
| 1549 | This didn't make sense for some controllers and those controllers | 1563 | This didn't make sense for some controllers and those controllers |
| @@ -1586,7 +1600,8 @@ misbehaving and poorly abstracted interfaces and kernel exposing and | |||
| 1586 | locked into constructs inadvertently. | 1600 | locked into constructs inadvertently. |
| 1587 | 1601 | ||
| 1588 | 1602 | ||
| 1589 | R-3. Competition Between Inner Nodes and Threads | 1603 | Competition Between Inner Nodes and Threads |
| 1604 | ------------------------------------------- | ||
| 1590 | 1605 | ||
| 1591 | cgroup v1 allowed threads to be in any cgroups which created an | 1606 | cgroup v1 allowed threads to be in any cgroups which created an |
| 1592 | interesting problem where threads belonging to a parent cgroup and its | 1607 | interesting problem where threads belonging to a parent cgroup and its |
| @@ -1605,7 +1620,7 @@ simply weren't available for threads. | |||
| 1605 | 1620 | ||
| 1606 | The io controller implicitly created a hidden leaf node for each | 1621 | The io controller implicitly created a hidden leaf node for each |
| 1607 | cgroup to host the threads. The hidden leaf had its own copies of all | 1622 | cgroup to host the threads. The hidden leaf had its own copies of all |
| 1608 | the knobs with "leaf_" prefixed. While this allowed equivalent | 1623 | the knobs with ``leaf_`` prefixed. While this allowed equivalent |
| 1609 | control over internal threads, it was with serious drawbacks. It | 1624 | control over internal threads, it was with serious drawbacks. It |
| 1610 | always added an extra layer of nesting which wouldn't be necessary | 1625 | always added an extra layer of nesting which wouldn't be necessary |
| 1611 | otherwise, made the interface messy and significantly complicated the | 1626 | otherwise, made the interface messy and significantly complicated the |
| @@ -1626,7 +1641,8 @@ This clearly is a problem which needs to be addressed from cgroup core | |||
| 1626 | in a uniform way. | 1641 | in a uniform way. |
| 1627 | 1642 | ||
| 1628 | 1643 | ||
| 1629 | R-4. Other Interface Issues | 1644 | Other Interface Issues |
| 1645 | ---------------------- | ||
| 1630 | 1646 | ||
| 1631 | cgroup v1 grew without oversight and developed a large number of | 1647 | cgroup v1 grew without oversight and developed a large number of |
| 1632 | idiosyncrasies and inconsistencies. One issue on the cgroup core side | 1648 | idiosyncrasies and inconsistencies. One issue on the cgroup core side |
| @@ -1654,9 +1670,11 @@ cgroup v2 establishes common conventions where appropriate and updates | |||
| 1654 | controllers so that they expose minimal and consistent interfaces. | 1670 | controllers so that they expose minimal and consistent interfaces. |
| 1655 | 1671 | ||
| 1656 | 1672 | ||
| 1657 | R-5. Controller Issues and Remedies | 1673 | Controller Issues and Remedies |
| 1674 | ------------------------------ | ||
| 1658 | 1675 | ||
| 1659 | R-5-1. Memory | 1676 | Memory |
| 1677 | ~~~~~~ | ||
| 1660 | 1678 | ||
| 1661 | The original lower boundary, the soft limit, is defined as a limit | 1679 | The original lower boundary, the soft limit, is defined as a limit |
| 1662 | that is per default unset. As a result, the set of cgroups that | 1680 | that is per default unset. As a result, the set of cgroups that |
