1 files changed, 69 insertions, 1 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 8b8c28b9864c..f336ede58e62 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -40,6 +40,7 @@ Features:
 - soft limit
 - moving (recharging) account at moving a task is selectable.
 - usage threshold notifier
+ - memory pressure notifier
 - oom-killer disable knob and oom-notifier
 - Root cgroup has no limit controls.
@@ -65,6 +66,7 @@ Brief summary of control files.
 memory.stat                     # show various statistics
 memory.use_hierarchy            # set/show hierarchical account enabled
 memory.force_empty              # trigger forced move charge to parent
+ memory.pressure_level           # set memory pressure notifications
 memory.swappiness               # set/show swappiness parameter of vmscan
                                 (See sysctl's vm.swappiness)
 memory.move_charge_at_immigrate # set/show controls of moving charges
@@ -762,7 +764,73 @@ At reading, current status of OOM is shown.
        under_oom        0 or 1 (if 1, the memory cgroup is under OOM, tasks may
                                 be stopped.)
-11. TODO
+11. Memory Pressure
+The pressure level notifications can be used to monitor the memory
+allocation cost; based on the pressure, applications can implement
+different strategies of managing their memory resources. The pressure
+levels are defined as following:
+The "low" level means that the system is reclaiming memory for new
+allocations. Monitoring this reclaiming activity might be useful for
+maintaining cache level. Upon notification, the program (typically
+"Activity Manager") might analyze vmstat and act in advance (i.e.
+prematurely shutdown unimportant services).
+The "medium" level means that the system is experiencing medium memory
+pressure, the system might be making swap, paging out active file caches,
+etc. Upon this event applications may decide to further analyze
+vmstat/zoneinfo/memcg or internal memory usage statistics and free any
+resources that can be easily reconstructed or re-read from a disk.
+The "critical" level means that the system is actively thrashing, it is
+about to out of memory (OOM) or even the in-kernel OOM killer is on its
+way to trigger. Applications should do whatever they can to help the
+system. It might be too late to consult with vmstat or any other
+statistics, so it's advisable to take an immediate action.
+The events are propagated upward until the event is handled, i.e. the
+events are not pass-through. Here is what this means: for example you have
+three cgroups: A->B->C. Now you set up an event listener on cgroups A, B
+and C, and suppose group C experiences some pressure. In this situation,
+only group C will receive the notification, i.e. groups A and B will not
+receive it. This is done to avoid excessive "broadcasting" of messages,
+which disturbs the system and which is especially bad if we are low on
+memory or thrashing. So, organize the cgroups wisely, or propagate the
+events manually (or, ask us to implement the pass-through events,
+explaining why would you need them.)
+The file memory.pressure_level is only used to setup an eventfd. To
+register a notification, an application must:
+- create an eventfd using eventfd(2);
+- open memory.pressure_level;
+- write string like "<event_fd> <fd of memory.pressure_level> <level>"
+  to cgroup.event_control.
+Application will be notified through eventfd when memory pressure is at
+the specific level (or higher). Read/write operations to
+memory.pressure_level are no implemented.
+Test:
+   Here is a small script example that makes a new cgroup, sets up a
+   memory limit, sets up a notification in the cgroup and then makes child
+   cgroup experience a critical pressure:
+   # cd /sys/fs/cgroup/memory/
+   # mkdir foo
+   # cd foo
+   # cgroup_event_listener memory.pressure_level low &
+   # echo 8000000 > memory.limit_in_bytes
+   # echo 8000000 > memory.memsw.limit_in_bytes
+   # echo $$ > tasks
+   # dd if=/dev/zero | read x
+   (Expect a bunch of notifications, and eventually, the oom-killer will
+   trigger.)
+12. TODO
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 8b8c28b9864c..f336ede58e62 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt
@@ -40,6 +40,7 @@ Features:
40	- soft limit	40	- soft limit
41	- moving (recharging) account at moving a task is selectable.	41	- moving (recharging) account at moving a task is selectable.
42	- usage threshold notifier	42	- usage threshold notifier
		43	- memory pressure notifier
43	- oom-killer disable knob and oom-notifier	44	- oom-killer disable knob and oom-notifier
44	- Root cgroup has no limit controls.	45	- Root cgroup has no limit controls.
45		46
@@ -65,6 +66,7 @@ Brief summary of control files.
65	memory.stat # show various statistics	66	memory.stat # show various statistics
66	memory.use_hierarchy # set/show hierarchical account enabled	67	memory.use_hierarchy # set/show hierarchical account enabled
67	memory.force_empty # trigger forced move charge to parent	68	memory.force_empty # trigger forced move charge to parent
		69	memory.pressure_level # set memory pressure notifications
68	memory.swappiness # set/show swappiness parameter of vmscan	70	memory.swappiness # set/show swappiness parameter of vmscan
69	(See sysctl's vm.swappiness)	71	(See sysctl's vm.swappiness)
70	memory.move_charge_at_immigrate # set/show controls of moving charges	72	memory.move_charge_at_immigrate # set/show controls of moving charges
@@ -762,7 +764,73 @@ At reading, current status of OOM is shown.
762	under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may	764	under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may
763	be stopped.)	765	be stopped.)
764		766
765	11. TODO	767	11. Memory Pressure
		768
		769	The pressure level notifications can be used to monitor the memory
		770	allocation cost; based on the pressure, applications can implement
		771	different strategies of managing their memory resources. The pressure
		772	levels are defined as following:
		773
		774	The "low" level means that the system is reclaiming memory for new
		775	allocations. Monitoring this reclaiming activity might be useful for
		776	maintaining cache level. Upon notification, the program (typically
		777	"Activity Manager") might analyze vmstat and act in advance (i.e.
		778	prematurely shutdown unimportant services).
		779
		780	The "medium" level means that the system is experiencing medium memory
		781	pressure, the system might be making swap, paging out active file caches,
		782	etc. Upon this event applications may decide to further analyze
		783	vmstat/zoneinfo/memcg or internal memory usage statistics and free any
		784	resources that can be easily reconstructed or re-read from a disk.
		785
		786	The "critical" level means that the system is actively thrashing, it is
		787	about to out of memory (OOM) or even the in-kernel OOM killer is on its
		788	way to trigger. Applications should do whatever they can to help the
		789	system. It might be too late to consult with vmstat or any other
		790	statistics, so it's advisable to take an immediate action.
		791
		792	The events are propagated upward until the event is handled, i.e. the
		793	events are not pass-through. Here is what this means: for example you have
		794	three cgroups: A->B->C. Now you set up an event listener on cgroups A, B
		795	and C, and suppose group C experiences some pressure. In this situation,
		796	only group C will receive the notification, i.e. groups A and B will not
		797	receive it. This is done to avoid excessive "broadcasting" of messages,
		798	which disturbs the system and which is especially bad if we are low on
		799	memory or thrashing. So, organize the cgroups wisely, or propagate the
		800	events manually (or, ask us to implement the pass-through events,
		801	explaining why would you need them.)
		802
		803	The file memory.pressure_level is only used to setup an eventfd. To
		804	register a notification, an application must:
		805
		806	- create an eventfd using eventfd(2);
		807	- open memory.pressure_level;
		808	- write string like "<event_fd> <fd of memory.pressure_level> <level>"
		809	to cgroup.event_control.
		810
		811	Application will be notified through eventfd when memory pressure is at
		812	the specific level (or higher). Read/write operations to
		813	memory.pressure_level are no implemented.
		814
		815	Test:
		816
		817	Here is a small script example that makes a new cgroup, sets up a
		818	memory limit, sets up a notification in the cgroup and then makes child
		819	cgroup experience a critical pressure:
		820
		821	# cd /sys/fs/cgroup/memory/
		822	# mkdir foo
		823	# cd foo
		824	# cgroup_event_listener memory.pressure_level low &
		825	# echo 8000000 > memory.limit_in_bytes
		826	# echo 8000000 > memory.memsw.limit_in_bytes
		827	# echo $$ > tasks
		828	# dd if=/dev/zero \| read x
		829
		830	(Expect a bunch of notifications, and eventually, the oom-killer will
		831	trigger.)
		832
		833	12. TODO
766		834
767	1. Add support for accounting huge pages (as a separate controller)	835	1. Add support for accounting huge pages (as a separate controller)
768	2. Make per-cgroup scanner reclaim not-shared pages first	836	2. Make per-cgroup scanner reclaim not-shared pages first