diff options
Diffstat (limited to 'Documentation/cgroup-v1/freezer-subsystem.rst')
| -rw-r--r-- | Documentation/cgroup-v1/freezer-subsystem.rst | 127 |
1 files changed, 127 insertions, 0 deletions
diff --git a/Documentation/cgroup-v1/freezer-subsystem.rst b/Documentation/cgroup-v1/freezer-subsystem.rst new file mode 100644 index 000000000000..582d3427de3f --- /dev/null +++ b/Documentation/cgroup-v1/freezer-subsystem.rst | |||
| @@ -0,0 +1,127 @@ | |||
| 1 | ============== | ||
| 2 | Cgroup Freezer | ||
| 3 | ============== | ||
| 4 | |||
| 5 | The cgroup freezer is useful to batch job management system which start | ||
| 6 | and stop sets of tasks in order to schedule the resources of a machine | ||
| 7 | according to the desires of a system administrator. This sort of program | ||
| 8 | is often used on HPC clusters to schedule access to the cluster as a | ||
| 9 | whole. The cgroup freezer uses cgroups to describe the set of tasks to | ||
| 10 | be started/stopped by the batch job management system. It also provides | ||
| 11 | a means to start and stop the tasks composing the job. | ||
| 12 | |||
| 13 | The cgroup freezer will also be useful for checkpointing running groups | ||
| 14 | of tasks. The freezer allows the checkpoint code to obtain a consistent | ||
| 15 | image of the tasks by attempting to force the tasks in a cgroup into a | ||
| 16 | quiescent state. Once the tasks are quiescent another task can | ||
| 17 | walk /proc or invoke a kernel interface to gather information about the | ||
| 18 | quiesced tasks. Checkpointed tasks can be restarted later should a | ||
| 19 | recoverable error occur. This also allows the checkpointed tasks to be | ||
| 20 | migrated between nodes in a cluster by copying the gathered information | ||
| 21 | to another node and restarting the tasks there. | ||
| 22 | |||
| 23 | Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping | ||
| 24 | and resuming tasks in userspace. Both of these signals are observable | ||
| 25 | from within the tasks we wish to freeze. While SIGSTOP cannot be caught, | ||
| 26 | blocked, or ignored it can be seen by waiting or ptracing parent tasks. | ||
| 27 | SIGCONT is especially unsuitable since it can be caught by the task. Any | ||
| 28 | programs designed to watch for SIGSTOP and SIGCONT could be broken by | ||
| 29 | attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can | ||
| 30 | demonstrate this problem using nested bash shells:: | ||
| 31 | |||
| 32 | $ echo $$ | ||
| 33 | 16644 | ||
| 34 | $ bash | ||
| 35 | $ echo $$ | ||
| 36 | 16690 | ||
| 37 | |||
| 38 | From a second, unrelated bash shell: | ||
| 39 | $ kill -SIGSTOP 16690 | ||
| 40 | $ kill -SIGCONT 16690 | ||
| 41 | |||
| 42 | <at this point 16690 exits and causes 16644 to exit too> | ||
| 43 | |||
| 44 | This happens because bash can observe both signals and choose how it | ||
| 45 | responds to them. | ||
| 46 | |||
| 47 | Another example of a program which catches and responds to these | ||
| 48 | signals is gdb. In fact any program designed to use ptrace is likely to | ||
| 49 | have a problem with this method of stopping and resuming tasks. | ||
| 50 | |||
| 51 | In contrast, the cgroup freezer uses the kernel freezer code to | ||
| 52 | prevent the freeze/unfreeze cycle from becoming visible to the tasks | ||
| 53 | being frozen. This allows the bash example above and gdb to run as | ||
| 54 | expected. | ||
| 55 | |||
| 56 | The cgroup freezer is hierarchical. Freezing a cgroup freezes all | ||
| 57 | tasks belonging to the cgroup and all its descendant cgroups. Each | ||
| 58 | cgroup has its own state (self-state) and the state inherited from the | ||
| 59 | parent (parent-state). Iff both states are THAWED, the cgroup is | ||
| 60 | THAWED. | ||
| 61 | |||
| 62 | The following cgroupfs files are created by cgroup freezer. | ||
| 63 | |||
| 64 | * freezer.state: Read-write. | ||
| 65 | |||
| 66 | When read, returns the effective state of the cgroup - "THAWED", | ||
| 67 | "FREEZING" or "FROZEN". This is the combined self and parent-states. | ||
| 68 | If any is freezing, the cgroup is freezing (FREEZING or FROZEN). | ||
| 69 | |||
| 70 | FREEZING cgroup transitions into FROZEN state when all tasks | ||
| 71 | belonging to the cgroup and its descendants become frozen. Note that | ||
| 72 | a cgroup reverts to FREEZING from FROZEN after a new task is added | ||
| 73 | to the cgroup or one of its descendant cgroups until the new task is | ||
| 74 | frozen. | ||
| 75 | |||
| 76 | When written, sets the self-state of the cgroup. Two values are | ||
| 77 | allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup, | ||
| 78 | if not already freezing, enters FREEZING state along with all its | ||
| 79 | descendant cgroups. | ||
| 80 | |||
| 81 | If THAWED is written, the self-state of the cgroup is changed to | ||
| 82 | THAWED. Note that the effective state may not change to THAWED if | ||
| 83 | the parent-state is still freezing. If a cgroup's effective state | ||
| 84 | becomes THAWED, all its descendants which are freezing because of | ||
| 85 | the cgroup also leave the freezing state. | ||
| 86 | |||
| 87 | * freezer.self_freezing: Read only. | ||
| 88 | |||
| 89 | Shows the self-state. 0 if the self-state is THAWED; otherwise, 1. | ||
| 90 | This value is 1 iff the last write to freezer.state was "FROZEN". | ||
| 91 | |||
| 92 | * freezer.parent_freezing: Read only. | ||
| 93 | |||
| 94 | Shows the parent-state. 0 if none of the cgroup's ancestors is | ||
| 95 | frozen; otherwise, 1. | ||
| 96 | |||
| 97 | The root cgroup is non-freezable and the above interface files don't | ||
| 98 | exist. | ||
| 99 | |||
| 100 | * Examples of usage:: | ||
| 101 | |||
| 102 | # mkdir /sys/fs/cgroup/freezer | ||
| 103 | # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer | ||
| 104 | # mkdir /sys/fs/cgroup/freezer/0 | ||
| 105 | # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks | ||
| 106 | |||
| 107 | to get status of the freezer subsystem:: | ||
| 108 | |||
| 109 | # cat /sys/fs/cgroup/freezer/0/freezer.state | ||
| 110 | THAWED | ||
| 111 | |||
| 112 | to freeze all tasks in the container:: | ||
| 113 | |||
| 114 | # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state | ||
| 115 | # cat /sys/fs/cgroup/freezer/0/freezer.state | ||
| 116 | FREEZING | ||
| 117 | # cat /sys/fs/cgroup/freezer/0/freezer.state | ||
| 118 | FROZEN | ||
| 119 | |||
| 120 | to unfreeze all tasks in the container:: | ||
| 121 | |||
| 122 | # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state | ||
| 123 | # cat /sys/fs/cgroup/freezer/0/freezer.state | ||
| 124 | THAWED | ||
| 125 | |||
| 126 | This is the basic mechanism which should do the right thing for user space task | ||
| 127 | in a simple scenario. | ||
