3 files changed, 100 insertions, 1 deletions
diff --git a/Documentation/cgroups.txt b/Documentation/cgroups/cgroups.txt
index d9014aa0eb68..d9014aa0eb68 100644
--- a/Documentation/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
new file mode 100644
index 000000000000..c50ab58b72eb
--- /dev/null
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -0,0 +1,99 @@
+        The cgroup freezer is useful to batch job management system which start
+and stop sets of tasks in order to schedule the resources of a machine
+according to the desires of a system administrator. This sort of program
+is often used on HPC clusters to schedule access to the cluster as a
+whole. The cgroup freezer uses cgroups to describe the set of tasks to
+be started/stopped by the batch job management system. It also provides
+a means to start and stop the tasks composing the job.
+        The cgroup freezer will also be useful for checkpointing running groups
+of tasks. The freezer allows the checkpoint code to obtain a consistent
+image of the tasks by attempting to force the tasks in a cgroup into a
+quiescent state. Once the tasks are quiescent another task can
+walk /proc or invoke a kernel interface to gather information about the
+quiesced tasks. Checkpointed tasks can be restarted later should a
+recoverable error occur. This also allows the checkpointed tasks to be
+migrated between nodes in a cluster by copying the gathered information
+to another node and restarting the tasks there.
+        Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
+and resuming tasks in userspace. Both of these signals are observable
+from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
+blocked, or ignored it can be seen by waiting or ptracing parent tasks.
+SIGCONT is especially unsuitable since it can be caught by the task. Any
+programs designed to watch for SIGSTOP and SIGCONT could be broken by
+attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
+demonstrate this problem using nested bash shells:
+        $ echo $$
+        16644
+        $ bash
+        $ echo $$
+        16690
+        From a second, unrelated bash shell:
+        $ kill -SIGSTOP 16690
+        $ kill -SIGCONT 16990
+        <at this point 16990 exits and causes 16644 to exit too>
+        This happens because bash can observe both signals and choose how it
+responds to them.
+        Another example of a program which catches and responds to these
+signals is gdb. In fact any program designed to use ptrace is likely to
+have a problem with this method of stopping and resuming tasks.
+         In contrast, the cgroup freezer uses the kernel freezer code to
+prevent the freeze/unfreeze cycle from becoming visible to the tasks
+being frozen. This allows the bash example above and gdb to run as
+expected.
+        The freezer subsystem in the container filesystem defines a file named
+freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
+cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.
+Reading will return the current state.
+* Examples of usage :
+   # mkdir /containers/freezer
+   # mount -t cgroup -ofreezer freezer  /containers
+   # mkdir /containers/0
+   # echo $some_pid > /containers/0/tasks
+to get status of the freezer subsystem :
+   # cat /containers/0/freezer.state
+   THAWED
+to freeze all tasks in the container :
+   # echo FROZEN > /containers/0/freezer.state
+   # cat /containers/0/freezer.state
+   FREEZING
+   # cat /containers/0/freezer.state
+   FROZEN
+to unfreeze all tasks in the container :
+   # echo THAWED > /containers/0/freezer.state
+   # cat /containers/0/freezer.state
+   THAWED
+This is the basic mechanism which should do the right thing for user space task
+in a simple scenario.
+It's important to note that freezing can be incomplete. In that case we return
+EBUSY. This means that some tasks in the cgroup are busy doing something that
+prevents us from completely freezing the cgroup at this time. After EBUSY,
+the cgroup will remain partially frozen -- reflected by freezer.state reporting
+"FREEZING" when read. The state will remain "FREEZING" until one of these
+things happens:
+        1) Userspace cancels the freezing operation by writing "THAWED" to
+                the freezer.state file
+        2) Userspace retries the freezing operation by writing "FROZEN" to
+                the freezer.state file (writing "FREEZING" is not legal
+                and returns EIO)
+        3) The tasks that blocked the cgroup from entering the "FROZEN"
+                state disappear from the cgroup's set of tasks.
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 47e568a9370a..5c86c258c791 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -48,7 +48,7 @@ hooks, beyond what is already present, required to manage dynamic
 job placement on large systems.
 Cpusets use the generic cgroup subsystem described in
-Documentation/cgroup.txt.
+Documentation/cgroups/cgroups.txt.
 Requests by a task, using the sched_setaffinity(2) system call to
 include CPUs in its CPU affinity mask, and using the mbind(2) and

diff --git a/Documentation/cgroups.txt b/Documentation/cgroups/cgroups.txt index d9014aa0eb68..d9014aa0eb68 100644 --- a/Documentation/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt


diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt new file mode 100644 index 000000000000..c50ab58b72eb --- /dev/null +++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -0,0 +1,99 @@
		1	The cgroup freezer is useful to batch job management system which start
		2	and stop sets of tasks in order to schedule the resources of a machine
		3	according to the desires of a system administrator. This sort of program
		4	is often used on HPC clusters to schedule access to the cluster as a
		5	whole. The cgroup freezer uses cgroups to describe the set of tasks to
		6	be started/stopped by the batch job management system. It also provides
		7	a means to start and stop the tasks composing the job.
		8
		9	The cgroup freezer will also be useful for checkpointing running groups
		10	of tasks. The freezer allows the checkpoint code to obtain a consistent
		11	image of the tasks by attempting to force the tasks in a cgroup into a
		12	quiescent state. Once the tasks are quiescent another task can
		13	walk /proc or invoke a kernel interface to gather information about the
		14	quiesced tasks. Checkpointed tasks can be restarted later should a
		15	recoverable error occur. This also allows the checkpointed tasks to be
		16	migrated between nodes in a cluster by copying the gathered information
		17	to another node and restarting the tasks there.
		18
		19	Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
		20	and resuming tasks in userspace. Both of these signals are observable
		21	from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
		22	blocked, or ignored it can be seen by waiting or ptracing parent tasks.
		23	SIGCONT is especially unsuitable since it can be caught by the task. Any
		24	programs designed to watch for SIGSTOP and SIGCONT could be broken by
		25	attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
		26	demonstrate this problem using nested bash shells:
		27
		28	$ echo $$
		29	16644
		30	$ bash
		31	$ echo $$
		32	16690
		33
		34	From a second, unrelated bash shell:
		35	$ kill -SIGSTOP 16690
		36	$ kill -SIGCONT 16990
		37
		38	<at this point 16990 exits and causes 16644 to exit too>
		39
		40	This happens because bash can observe both signals and choose how it
		41	responds to them.
		42
		43	Another example of a program which catches and responds to these
		44	signals is gdb. In fact any program designed to use ptrace is likely to
		45	have a problem with this method of stopping and resuming tasks.
		46
		47	In contrast, the cgroup freezer uses the kernel freezer code to
		48	prevent the freeze/unfreeze cycle from becoming visible to the tasks
		49	being frozen. This allows the bash example above and gdb to run as
		50	expected.
		51
		52	The freezer subsystem in the container filesystem defines a file named
		53	freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
		54	cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.
		55	Reading will return the current state.
		56
		57	* Examples of usage :
		58
		59	# mkdir /containers/freezer
		60	# mount -t cgroup -ofreezer freezer /containers
		61	# mkdir /containers/0
		62	# echo $some_pid > /containers/0/tasks
		63
		64	to get status of the freezer subsystem :
		65
		66	# cat /containers/0/freezer.state
		67	THAWED
		68
		69	to freeze all tasks in the container :
		70
		71	# echo FROZEN > /containers/0/freezer.state
		72	# cat /containers/0/freezer.state
		73	FREEZING
		74	# cat /containers/0/freezer.state
		75	FROZEN
		76
		77	to unfreeze all tasks in the container :
		78
		79	# echo THAWED > /containers/0/freezer.state
		80	# cat /containers/0/freezer.state
		81	THAWED
		82
		83	This is the basic mechanism which should do the right thing for user space task
		84	in a simple scenario.
		85
		86	It's important to note that freezing can be incomplete. In that case we return
		87	EBUSY. This means that some tasks in the cgroup are busy doing something that
		88	prevents us from completely freezing the cgroup at this time. After EBUSY,
		89	the cgroup will remain partially frozen -- reflected by freezer.state reporting
		90	"FREEZING" when read. The state will remain "FREEZING" until one of these
		91	things happens:
		92
		93	1) Userspace cancels the freezing operation by writing "THAWED" to
		94	the freezer.state file
		95	2) Userspace retries the freezing operation by writing "FROZEN" to
		96	the freezer.state file (writing "FREEZING" is not legal
		97	and returns EIO)
		98	3) The tasks that blocked the cgroup from entering the "FROZEN"
		99	state disappear from the cgroup's set of tasks.


diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index 47e568a9370a..5c86c258c791 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt
@@ -48,7 +48,7 @@ hooks, beyond what is already present, required to manage dynamic
48	job placement on large systems.	48	job placement on large systems.
49		49
50	Cpusets use the generic cgroup subsystem described in	50	Cpusets use the generic cgroup subsystem described in
51	Documentation/cgroup.txt.	51	Documentation/cgroups/cgroups.txt.
52		52
53	Requests by a task, using the sched_setaffinity(2) system call to	53	Requests by a task, using the sched_setaffinity(2) system call to
54	include CPUs in its CPU affinity mask, and using the mbind(2) and	54	include CPUs in its CPU affinity mask, and using the mbind(2) and