diff options
Diffstat (limited to 'Documentation/cgroups/cgroups.txt')
-rw-r--r-- | Documentation/cgroups/cgroups.txt | 143 |
1 files changed, 99 insertions, 44 deletions
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index b34823ff1646..cd67e90003c0 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt | |||
@@ -18,7 +18,8 @@ CONTENTS: | |||
18 | 1.2 Why are cgroups needed ? | 18 | 1.2 Why are cgroups needed ? |
19 | 1.3 How are cgroups implemented ? | 19 | 1.3 How are cgroups implemented ? |
20 | 1.4 What does notify_on_release do ? | 20 | 1.4 What does notify_on_release do ? |
21 | 1.5 How do I use cgroups ? | 21 | 1.5 What does clone_children do ? |
22 | 1.6 How do I use cgroups ? | ||
22 | 2. Usage Examples and Syntax | 23 | 2. Usage Examples and Syntax |
23 | 2.1 Basic Usage | 24 | 2.1 Basic Usage |
24 | 2.2 Attaching processes | 25 | 2.2 Attaching processes |
@@ -109,22 +110,22 @@ university server with various users - students, professors, system | |||
109 | tasks etc. The resource planning for this server could be along the | 110 | tasks etc. The resource planning for this server could be along the |
110 | following lines: | 111 | following lines: |
111 | 112 | ||
112 | CPU : Top cpuset | 113 | CPU : "Top cpuset" |
113 | / \ | 114 | / \ |
114 | CPUSet1 CPUSet2 | 115 | CPUSet1 CPUSet2 |
115 | | | | 116 | | | |
116 | (Profs) (Students) | 117 | (Professors) (Students) |
117 | 118 | ||
118 | In addition (system tasks) are attached to topcpuset (so | 119 | In addition (system tasks) are attached to topcpuset (so |
119 | that they can run anywhere) with a limit of 20% | 120 | that they can run anywhere) with a limit of 20% |
120 | 121 | ||
121 | Memory : Professors (50%), students (30%), system (20%) | 122 | Memory : Professors (50%), Students (30%), system (20%) |
122 | 123 | ||
123 | Disk : Prof (50%), students (30%), system (20%) | 124 | Disk : Professors (50%), Students (30%), system (20%) |
124 | 125 | ||
125 | Network : WWW browsing (20%), Network File System (60%), others (20%) | 126 | Network : WWW browsing (20%), Network File System (60%), others (20%) |
126 | / \ | 127 | / \ |
127 | Prof (15%) students (5%) | 128 | Professors (15%) students (5%) |
128 | 129 | ||
129 | Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go | 130 | Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go |
130 | into NFS network class. | 131 | into NFS network class. |
@@ -137,11 +138,11 @@ With the ability to classify tasks differently for different resources | |||
137 | the admin can easily set up a script which receives exec notifications | 138 | the admin can easily set up a script which receives exec notifications |
138 | and depending on who is launching the browser he can | 139 | and depending on who is launching the browser he can |
139 | 140 | ||
140 | # echo browser_pid > /mnt/<restype>/<userclass>/tasks | 141 | # echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks |
141 | 142 | ||
142 | With only a single hierarchy, he now would potentially have to create | 143 | With only a single hierarchy, he now would potentially have to create |
143 | a separate cgroup for every browser launched and associate it with | 144 | a separate cgroup for every browser launched and associate it with |
144 | approp network and other resource class. This may lead to | 145 | appropriate network and other resource class. This may lead to |
145 | proliferation of such cgroups. | 146 | proliferation of such cgroups. |
146 | 147 | ||
147 | Also lets say that the administrator would like to give enhanced network | 148 | Also lets say that the administrator would like to give enhanced network |
@@ -152,9 +153,9 @@ apps enhanced CPU power, | |||
152 | With ability to write pids directly to resource classes, it's just a | 153 | With ability to write pids directly to resource classes, it's just a |
153 | matter of : | 154 | matter of : |
154 | 155 | ||
155 | # echo pid > /mnt/network/<new_class>/tasks | 156 | # echo pid > /sys/fs/cgroup/network/<new_class>/tasks |
156 | (after some time) | 157 | (after some time) |
157 | # echo pid > /mnt/network/<orig_class>/tasks | 158 | # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks |
158 | 159 | ||
159 | Without this ability, he would have to split the cgroup into | 160 | Without this ability, he would have to split the cgroup into |
160 | multiple separate ones and then associate the new cgroups with the | 161 | multiple separate ones and then associate the new cgroups with the |
@@ -235,7 +236,8 @@ containing the following files describing that cgroup: | |||
235 | - cgroup.procs: list of tgids in the cgroup. This list is not | 236 | - cgroup.procs: list of tgids in the cgroup. This list is not |
236 | guaranteed to be sorted or free of duplicate tgids, and userspace | 237 | guaranteed to be sorted or free of duplicate tgids, and userspace |
237 | should sort/uniquify the list if this property is required. | 238 | should sort/uniquify the list if this property is required. |
238 | This is a read-only file, for now. | 239 | Writing a thread group id into this file moves all threads in that |
240 | group into this cgroup. | ||
239 | - notify_on_release flag: run the release agent on exit? | 241 | - notify_on_release flag: run the release agent on exit? |
240 | - release_agent: the path to use for release notifications (this file | 242 | - release_agent: the path to use for release notifications (this file |
241 | exists in the top cgroup only) | 243 | exists in the top cgroup only) |
@@ -293,27 +295,39 @@ notify_on_release in the root cgroup at system boot is disabled | |||
293 | value of their parents notify_on_release setting. The default value of | 295 | value of their parents notify_on_release setting. The default value of |
294 | a cgroup hierarchy's release_agent path is empty. | 296 | a cgroup hierarchy's release_agent path is empty. |
295 | 297 | ||
296 | 1.5 How do I use cgroups ? | 298 | 1.5 What does clone_children do ? |
299 | --------------------------------- | ||
300 | |||
301 | If the clone_children flag is enabled (1) in a cgroup, then all | ||
302 | cgroups created beneath will call the post_clone callbacks for each | ||
303 | subsystem of the newly created cgroup. Usually when this callback is | ||
304 | implemented for a subsystem, it copies the values of the parent | ||
305 | subsystem, this is the case for the cpuset. | ||
306 | |||
307 | 1.6 How do I use cgroups ? | ||
297 | -------------------------- | 308 | -------------------------- |
298 | 309 | ||
299 | To start a new job that is to be contained within a cgroup, using | 310 | To start a new job that is to be contained within a cgroup, using |
300 | the "cpuset" cgroup subsystem, the steps are something like: | 311 | the "cpuset" cgroup subsystem, the steps are something like: |
301 | 312 | ||
302 | 1) mkdir /dev/cgroup | 313 | 1) mount -t tmpfs cgroup_root /sys/fs/cgroup |
303 | 2) mount -t cgroup -ocpuset cpuset /dev/cgroup | 314 | 2) mkdir /sys/fs/cgroup/cpuset |
304 | 3) Create the new cgroup by doing mkdir's and write's (or echo's) in | 315 | 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset |
305 | the /dev/cgroup virtual file system. | 316 | 4) Create the new cgroup by doing mkdir's and write's (or echo's) in |
306 | 4) Start a task that will be the "founding father" of the new job. | 317 | the /sys/fs/cgroup virtual file system. |
307 | 5) Attach that task to the new cgroup by writing its pid to the | 318 | 5) Start a task that will be the "founding father" of the new job. |
308 | /dev/cgroup tasks file for that cgroup. | 319 | 6) Attach that task to the new cgroup by writing its pid to the |
309 | 6) fork, exec or clone the job tasks from this founding father task. | 320 | /sys/fs/cgroup/cpuset/tasks file for that cgroup. |
321 | 7) fork, exec or clone the job tasks from this founding father task. | ||
310 | 322 | ||
311 | For example, the following sequence of commands will setup a cgroup | 323 | For example, the following sequence of commands will setup a cgroup |
312 | named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, | 324 | named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, |
313 | and then start a subshell 'sh' in that cgroup: | 325 | and then start a subshell 'sh' in that cgroup: |
314 | 326 | ||
315 | mount -t cgroup cpuset -ocpuset /dev/cgroup | 327 | mount -t tmpfs cgroup_root /sys/fs/cgroup |
316 | cd /dev/cgroup | 328 | mkdir /sys/fs/cgroup/cpuset |
329 | mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset | ||
330 | cd /sys/fs/cgroup/cpuset | ||
317 | mkdir Charlie | 331 | mkdir Charlie |
318 | cd Charlie | 332 | cd Charlie |
319 | /bin/echo 2-3 > cpuset.cpus | 333 | /bin/echo 2-3 > cpuset.cpus |
@@ -334,28 +348,41 @@ Creating, modifying, using the cgroups can be done through the cgroup | |||
334 | virtual filesystem. | 348 | virtual filesystem. |
335 | 349 | ||
336 | To mount a cgroup hierarchy with all available subsystems, type: | 350 | To mount a cgroup hierarchy with all available subsystems, type: |
337 | # mount -t cgroup xxx /dev/cgroup | 351 | # mount -t cgroup xxx /sys/fs/cgroup |
338 | 352 | ||
339 | The "xxx" is not interpreted by the cgroup code, but will appear in | 353 | The "xxx" is not interpreted by the cgroup code, but will appear in |
340 | /proc/mounts so may be any useful identifying string that you like. | 354 | /proc/mounts so may be any useful identifying string that you like. |
341 | 355 | ||
356 | Note: Some subsystems do not work without some user input first. For instance, | ||
357 | if cpusets are enabled the user will have to populate the cpus and mems files | ||
358 | for each new cgroup created before that group can be used. | ||
359 | |||
360 | As explained in section `1.2 Why are cgroups needed?' you should create | ||
361 | different hierarchies of cgroups for each single resource or group of | ||
362 | resources you want to control. Therefore, you should mount a tmpfs on | ||
363 | /sys/fs/cgroup and create directories for each cgroup resource or resource | ||
364 | group. | ||
365 | |||
366 | # mount -t tmpfs cgroup_root /sys/fs/cgroup | ||
367 | # mkdir /sys/fs/cgroup/rg1 | ||
368 | |||
342 | To mount a cgroup hierarchy with just the cpuset and memory | 369 | To mount a cgroup hierarchy with just the cpuset and memory |
343 | subsystems, type: | 370 | subsystems, type: |
344 | # mount -t cgroup -o cpuset,memory hier1 /dev/cgroup | 371 | # mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1 |
345 | 372 | ||
346 | To change the set of subsystems bound to a mounted hierarchy, just | 373 | To change the set of subsystems bound to a mounted hierarchy, just |
347 | remount with different options: | 374 | remount with different options: |
348 | # mount -o remount,cpuset,ns hier1 /dev/cgroup | 375 | # mount -o remount,cpuset,blkio hier1 /sys/fs/cgroup/rg1 |
349 | 376 | ||
350 | Now memory is removed from the hierarchy and ns is added. | 377 | Now memory is removed from the hierarchy and blkio is added. |
351 | 378 | ||
352 | Note this will add ns to the hierarchy but won't remove memory or | 379 | Note this will add blkio to the hierarchy but won't remove memory or |
353 | cpuset, because the new options are appended to the old ones: | 380 | cpuset, because the new options are appended to the old ones: |
354 | # mount -o remount,ns /dev/cgroup | 381 | # mount -o remount,blkio /sys/fs/cgroup/rg1 |
355 | 382 | ||
356 | To Specify a hierarchy's release_agent: | 383 | To Specify a hierarchy's release_agent: |
357 | # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \ | 384 | # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \ |
358 | xxx /dev/cgroup | 385 | xxx /sys/fs/cgroup/rg1 |
359 | 386 | ||
360 | Note that specifying 'release_agent' more than once will return failure. | 387 | Note that specifying 'release_agent' more than once will return failure. |
361 | 388 | ||
@@ -364,17 +391,17 @@ when the hierarchy consists of a single (root) cgroup. Supporting | |||
364 | the ability to arbitrarily bind/unbind subsystems from an existing | 391 | the ability to arbitrarily bind/unbind subsystems from an existing |
365 | cgroup hierarchy is intended to be implemented in the future. | 392 | cgroup hierarchy is intended to be implemented in the future. |
366 | 393 | ||
367 | Then under /dev/cgroup you can find a tree that corresponds to the | 394 | Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the |
368 | tree of the cgroups in the system. For instance, /dev/cgroup | 395 | tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1 |
369 | is the cgroup that holds the whole system. | 396 | is the cgroup that holds the whole system. |
370 | 397 | ||
371 | If you want to change the value of release_agent: | 398 | If you want to change the value of release_agent: |
372 | # echo "/sbin/new_release_agent" > /dev/cgroup/release_agent | 399 | # echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent |
373 | 400 | ||
374 | It can also be changed via remount. | 401 | It can also be changed via remount. |
375 | 402 | ||
376 | If you want to create a new cgroup under /dev/cgroup: | 403 | If you want to create a new cgroup under /sys/fs/cgroup/rg1: |
377 | # cd /dev/cgroup | 404 | # cd /sys/fs/cgroup/rg1 |
378 | # mkdir my_cgroup | 405 | # mkdir my_cgroup |
379 | 406 | ||
380 | Now you want to do something with this cgroup. | 407 | Now you want to do something with this cgroup. |
@@ -416,6 +443,20 @@ You can attach the current shell task by echoing 0: | |||
416 | 443 | ||
417 | # echo 0 > tasks | 444 | # echo 0 > tasks |
418 | 445 | ||
446 | You can use the cgroup.procs file instead of the tasks file to move all | ||
447 | threads in a threadgroup at once. Echoing the pid of any task in a | ||
448 | threadgroup to cgroup.procs causes all tasks in that threadgroup to be | ||
449 | be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks | ||
450 | in the writing task's threadgroup. | ||
451 | |||
452 | Note: Since every task is always a member of exactly one cgroup in each | ||
453 | mounted hierarchy, to remove a task from its current cgroup you must | ||
454 | move it into a new cgroup (possibly the root cgroup) by writing to the | ||
455 | new cgroup's tasks file. | ||
456 | |||
457 | Note: If the ns cgroup is active, moving a process to another cgroup can | ||
458 | fail. | ||
459 | |||
419 | 2.3 Mounting hierarchies by name | 460 | 2.3 Mounting hierarchies by name |
420 | -------------------------------- | 461 | -------------------------------- |
421 | 462 | ||
@@ -553,7 +594,7 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be | |||
553 | called multiple times against a cgroup. | 594 | called multiple times against a cgroup. |
554 | 595 | ||
555 | int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, | 596 | int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, |
556 | struct task_struct *task, bool threadgroup) | 597 | struct task_struct *task) |
557 | (cgroup_mutex held by caller) | 598 | (cgroup_mutex held by caller) |
558 | 599 | ||
559 | Called prior to moving a task into a cgroup; if the subsystem | 600 | Called prior to moving a task into a cgroup; if the subsystem |
@@ -562,9 +603,14 @@ task is passed, then a successful result indicates that *any* | |||
562 | unspecified task can be moved into the cgroup. Note that this isn't | 603 | unspecified task can be moved into the cgroup. Note that this isn't |
563 | called on a fork. If this method returns 0 (success) then this should | 604 | called on a fork. If this method returns 0 (success) then this should |
564 | remain valid while the caller holds cgroup_mutex and it is ensured that either | 605 | remain valid while the caller holds cgroup_mutex and it is ensured that either |
565 | attach() or cancel_attach() will be called in future. If threadgroup is | 606 | attach() or cancel_attach() will be called in future. |
566 | true, then a successful result indicates that all threads in the given | 607 | |
567 | thread's threadgroup can be moved together. | 608 | int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk); |
609 | (cgroup_mutex held by caller) | ||
610 | |||
611 | As can_attach, but for operations that must be run once per task to be | ||
612 | attached (possibly many when using cgroup_attach_proc). Called after | ||
613 | can_attach. | ||
568 | 614 | ||
569 | void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, | 615 | void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, |
570 | struct task_struct *task, bool threadgroup) | 616 | struct task_struct *task, bool threadgroup) |
@@ -576,15 +622,24 @@ function, so that the subsystem can implement a rollback. If not, not necessary. | |||
576 | This will be called only about subsystems whose can_attach() operation have | 622 | This will be called only about subsystems whose can_attach() operation have |
577 | succeeded. | 623 | succeeded. |
578 | 624 | ||
625 | void pre_attach(struct cgroup *cgrp); | ||
626 | (cgroup_mutex held by caller) | ||
627 | |||
628 | For any non-per-thread attachment work that needs to happen before | ||
629 | attach_task. Needed by cpuset. | ||
630 | |||
579 | void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, | 631 | void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, |
580 | struct cgroup *old_cgrp, struct task_struct *task, | 632 | struct cgroup *old_cgrp, struct task_struct *task) |
581 | bool threadgroup) | ||
582 | (cgroup_mutex held by caller) | 633 | (cgroup_mutex held by caller) |
583 | 634 | ||
584 | Called after the task has been attached to the cgroup, to allow any | 635 | Called after the task has been attached to the cgroup, to allow any |
585 | post-attachment activity that requires memory allocations or blocking. | 636 | post-attachment activity that requires memory allocations or blocking. |
586 | If threadgroup is true, the subsystem should take care of all threads | 637 | |
587 | in the specified thread's threadgroup. Currently does not support any | 638 | void attach_task(struct cgroup *cgrp, struct task_struct *tsk); |
639 | (cgroup_mutex held by caller) | ||
640 | |||
641 | As attach, but for operations that must be run once per task to be attached, | ||
642 | like can_attach_task. Called before attach. Currently does not support any | ||
588 | subsystem that might need the old_cgrp for every thread in the group. | 643 | subsystem that might need the old_cgrp for every thread in the group. |
589 | 644 | ||
590 | void fork(struct cgroup_subsy *ss, struct task_struct *task) | 645 | void fork(struct cgroup_subsy *ss, struct task_struct *task) |
@@ -608,7 +663,7 @@ always handled well. | |||
608 | void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) | 663 | void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) |
609 | (cgroup_mutex held by caller) | 664 | (cgroup_mutex held by caller) |
610 | 665 | ||
611 | Called at the end of cgroup_clone() to do any parameter | 666 | Called during cgroup_create() to do any parameter |
612 | initialization which might be required before a task could attach. For | 667 | initialization which might be required before a task could attach. For |
613 | example in cpusets, no task may attach before 'cpus' and 'mems' are set | 668 | example in cpusets, no task may attach before 'cpus' and 'mems' are set |
614 | up. | 669 | up. |