diff options
Diffstat (limited to 'Documentation/cgroups/memory.txt')
-rw-r--r-- | Documentation/cgroups/memory.txt | 70 |
1 files changed, 69 insertions, 1 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 8b8c28b9864c..f336ede58e62 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -40,6 +40,7 @@ Features: | |||
40 | - soft limit | 40 | - soft limit |
41 | - moving (recharging) account at moving a task is selectable. | 41 | - moving (recharging) account at moving a task is selectable. |
42 | - usage threshold notifier | 42 | - usage threshold notifier |
43 | - memory pressure notifier | ||
43 | - oom-killer disable knob and oom-notifier | 44 | - oom-killer disable knob and oom-notifier |
44 | - Root cgroup has no limit controls. | 45 | - Root cgroup has no limit controls. |
45 | 46 | ||
@@ -65,6 +66,7 @@ Brief summary of control files. | |||
65 | memory.stat # show various statistics | 66 | memory.stat # show various statistics |
66 | memory.use_hierarchy # set/show hierarchical account enabled | 67 | memory.use_hierarchy # set/show hierarchical account enabled |
67 | memory.force_empty # trigger forced move charge to parent | 68 | memory.force_empty # trigger forced move charge to parent |
69 | memory.pressure_level # set memory pressure notifications | ||
68 | memory.swappiness # set/show swappiness parameter of vmscan | 70 | memory.swappiness # set/show swappiness parameter of vmscan |
69 | (See sysctl's vm.swappiness) | 71 | (See sysctl's vm.swappiness) |
70 | memory.move_charge_at_immigrate # set/show controls of moving charges | 72 | memory.move_charge_at_immigrate # set/show controls of moving charges |
@@ -762,7 +764,73 @@ At reading, current status of OOM is shown. | |||
762 | under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may | 764 | under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may |
763 | be stopped.) | 765 | be stopped.) |
764 | 766 | ||
765 | 11. TODO | 767 | 11. Memory Pressure |
768 | |||
769 | The pressure level notifications can be used to monitor the memory | ||
770 | allocation cost; based on the pressure, applications can implement | ||
771 | different strategies of managing their memory resources. The pressure | ||
772 | levels are defined as following: | ||
773 | |||
774 | The "low" level means that the system is reclaiming memory for new | ||
775 | allocations. Monitoring this reclaiming activity might be useful for | ||
776 | maintaining cache level. Upon notification, the program (typically | ||
777 | "Activity Manager") might analyze vmstat and act in advance (i.e. | ||
778 | prematurely shutdown unimportant services). | ||
779 | |||
780 | The "medium" level means that the system is experiencing medium memory | ||
781 | pressure, the system might be making swap, paging out active file caches, | ||
782 | etc. Upon this event applications may decide to further analyze | ||
783 | vmstat/zoneinfo/memcg or internal memory usage statistics and free any | ||
784 | resources that can be easily reconstructed or re-read from a disk. | ||
785 | |||
786 | The "critical" level means that the system is actively thrashing, it is | ||
787 | about to out of memory (OOM) or even the in-kernel OOM killer is on its | ||
788 | way to trigger. Applications should do whatever they can to help the | ||
789 | system. It might be too late to consult with vmstat or any other | ||
790 | statistics, so it's advisable to take an immediate action. | ||
791 | |||
792 | The events are propagated upward until the event is handled, i.e. the | ||
793 | events are not pass-through. Here is what this means: for example you have | ||
794 | three cgroups: A->B->C. Now you set up an event listener on cgroups A, B | ||
795 | and C, and suppose group C experiences some pressure. In this situation, | ||
796 | only group C will receive the notification, i.e. groups A and B will not | ||
797 | receive it. This is done to avoid excessive "broadcasting" of messages, | ||
798 | which disturbs the system and which is especially bad if we are low on | ||
799 | memory or thrashing. So, organize the cgroups wisely, or propagate the | ||
800 | events manually (or, ask us to implement the pass-through events, | ||
801 | explaining why would you need them.) | ||
802 | |||
803 | The file memory.pressure_level is only used to setup an eventfd. To | ||
804 | register a notification, an application must: | ||
805 | |||
806 | - create an eventfd using eventfd(2); | ||
807 | - open memory.pressure_level; | ||
808 | - write string like "<event_fd> <fd of memory.pressure_level> <level>" | ||
809 | to cgroup.event_control. | ||
810 | |||
811 | Application will be notified through eventfd when memory pressure is at | ||
812 | the specific level (or higher). Read/write operations to | ||
813 | memory.pressure_level are no implemented. | ||
814 | |||
815 | Test: | ||
816 | |||
817 | Here is a small script example that makes a new cgroup, sets up a | ||
818 | memory limit, sets up a notification in the cgroup and then makes child | ||
819 | cgroup experience a critical pressure: | ||
820 | |||
821 | # cd /sys/fs/cgroup/memory/ | ||
822 | # mkdir foo | ||
823 | # cd foo | ||
824 | # cgroup_event_listener memory.pressure_level low & | ||
825 | # echo 8000000 > memory.limit_in_bytes | ||
826 | # echo 8000000 > memory.memsw.limit_in_bytes | ||
827 | # echo $$ > tasks | ||
828 | # dd if=/dev/zero | read x | ||
829 | |||
830 | (Expect a bunch of notifications, and eventually, the oom-killer will | ||
831 | trigger.) | ||
832 | |||
833 | 12. TODO | ||
766 | 834 | ||
767 | 1. Add support for accounting huge pages (as a separate controller) | 835 | 1. Add support for accounting huge pages (as a separate controller) |
768 | 2. Make per-cgroup scanner reclaim not-shared pages first | 836 | 2. Make per-cgroup scanner reclaim not-shared pages first |