diff options
author | Johannes Weiner <hannes@cmpxchg.org> | 2014-12-10 18:42:37 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2014-12-10 20:41:04 -0500 |
commit | 5b1efc027c0b51ca3e76f4e00c83358f8349f543 (patch) | |
tree | ef12cdcbfb7ac8ad45b66e6ee5e6cf9f2d566418 /Documentation/cgroups | |
parent | 71f87bee38edddb21d97895fa938744cf3f477bb (diff) |
kernel: res_counter: remove the unused API
All memory accounting and limiting has been switched over to the
lockless page counters. Bye, res_counter!
[akpm@linux-foundation.org: update Documentation/cgroups/memory.txt]
[mhocko@suse.cz: ditch the last remainings of res_counter]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/cgroups')
-rw-r--r-- | Documentation/cgroups/memory.txt | 17 | ||||
-rw-r--r-- | Documentation/cgroups/resource_counter.txt | 197 |
2 files changed, 8 insertions, 206 deletions
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index f624727ab404..67613ff0270c 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -116,16 +116,16 @@ The memory controller is the first controller developed. | |||
116 | 116 | ||
117 | 2.1. Design | 117 | 2.1. Design |
118 | 118 | ||
119 | The core of the design is a counter called the res_counter. The res_counter | 119 | The core of the design is a counter called the page_counter. The |
120 | tracks the current memory usage and limit of the group of processes associated | 120 | page_counter tracks the current memory usage and limit of the group of |
121 | with the controller. Each cgroup has a memory controller specific data | 121 | processes associated with the controller. Each cgroup has a memory controller |
122 | structure (mem_cgroup) associated with it. | 122 | specific data structure (mem_cgroup) associated with it. |
123 | 123 | ||
124 | 2.2. Accounting | 124 | 2.2. Accounting |
125 | 125 | ||
126 | +--------------------+ | 126 | +--------------------+ |
127 | | mem_cgroup | | 127 | | mem_cgroup | |
128 | | (res_counter) | | 128 | | (page_counter) | |
129 | +--------------------+ | 129 | +--------------------+ |
130 | / ^ \ | 130 | / ^ \ |
131 | / | \ | 131 | / | \ |
@@ -352,9 +352,8 @@ set: | |||
352 | 0. Configuration | 352 | 0. Configuration |
353 | 353 | ||
354 | a. Enable CONFIG_CGROUPS | 354 | a. Enable CONFIG_CGROUPS |
355 | b. Enable CONFIG_RESOURCE_COUNTERS | 355 | b. Enable CONFIG_MEMCG |
356 | c. Enable CONFIG_MEMCG | 356 | c. Enable CONFIG_MEMCG_SWAP (to use swap extension) |
357 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) | ||
358 | d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) | 357 | d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) |
359 | 358 | ||
360 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) | 359 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt deleted file mode 100644 index 762ca54eb929..000000000000 --- a/Documentation/cgroups/resource_counter.txt +++ /dev/null | |||
@@ -1,197 +0,0 @@ | |||
1 | |||
2 | The Resource Counter | ||
3 | |||
4 | The resource counter, declared at include/linux/res_counter.h, | ||
5 | is supposed to facilitate the resource management by controllers | ||
6 | by providing common stuff for accounting. | ||
7 | |||
8 | This "stuff" includes the res_counter structure and routines | ||
9 | to work with it. | ||
10 | |||
11 | |||
12 | |||
13 | 1. Crucial parts of the res_counter structure | ||
14 | |||
15 | a. unsigned long long usage | ||
16 | |||
17 | The usage value shows the amount of a resource that is consumed | ||
18 | by a group at a given time. The units of measurement should be | ||
19 | determined by the controller that uses this counter. E.g. it can | ||
20 | be bytes, items or any other unit the controller operates on. | ||
21 | |||
22 | b. unsigned long long max_usage | ||
23 | |||
24 | The maximal value of the usage over time. | ||
25 | |||
26 | This value is useful when gathering statistical information about | ||
27 | the particular group, as it shows the actual resource requirements | ||
28 | for a particular group, not just some usage snapshot. | ||
29 | |||
30 | c. unsigned long long limit | ||
31 | |||
32 | The maximal allowed amount of resource to consume by the group. In | ||
33 | case the group requests for more resources, so that the usage value | ||
34 | would exceed the limit, the resource allocation is rejected (see | ||
35 | the next section). | ||
36 | |||
37 | d. unsigned long long failcnt | ||
38 | |||
39 | The failcnt stands for "failures counter". This is the number of | ||
40 | resource allocation attempts that failed. | ||
41 | |||
42 | c. spinlock_t lock | ||
43 | |||
44 | Protects changes of the above values. | ||
45 | |||
46 | |||
47 | |||
48 | 2. Basic accounting routines | ||
49 | |||
50 | a. void res_counter_init(struct res_counter *rc, | ||
51 | struct res_counter *rc_parent) | ||
52 | |||
53 | Initializes the resource counter. As usual, should be the first | ||
54 | routine called for a new counter. | ||
55 | |||
56 | The struct res_counter *parent can be used to define a hierarchical | ||
57 | child -> parent relationship directly in the res_counter structure, | ||
58 | NULL can be used to define no relationship. | ||
59 | |||
60 | c. int res_counter_charge(struct res_counter *rc, unsigned long val, | ||
61 | struct res_counter **limit_fail_at) | ||
62 | |||
63 | When a resource is about to be allocated it has to be accounted | ||
64 | with the appropriate resource counter (controller should determine | ||
65 | which one to use on its own). This operation is called "charging". | ||
66 | |||
67 | This is not very important which operation - resource allocation | ||
68 | or charging - is performed first, but | ||
69 | * if the allocation is performed first, this may create a | ||
70 | temporary resource over-usage by the time resource counter is | ||
71 | charged; | ||
72 | * if the charging is performed first, then it should be uncharged | ||
73 | on error path (if the one is called). | ||
74 | |||
75 | If the charging fails and a hierarchical dependency exists, the | ||
76 | limit_fail_at parameter is set to the particular res_counter element | ||
77 | where the charging failed. | ||
78 | |||
79 | d. u64 res_counter_uncharge(struct res_counter *rc, unsigned long val) | ||
80 | |||
81 | When a resource is released (freed) it should be de-accounted | ||
82 | from the resource counter it was accounted to. This is called | ||
83 | "uncharging". The return value of this function indicate the amount | ||
84 | of charges still present in the counter. | ||
85 | |||
86 | The _locked routines imply that the res_counter->lock is taken. | ||
87 | |||
88 | e. u64 res_counter_uncharge_until | ||
89 | (struct res_counter *rc, struct res_counter *top, | ||
90 | unsigned long val) | ||
91 | |||
92 | Almost same as res_counter_uncharge() but propagation of uncharge | ||
93 | stops when rc == top. This is useful when kill a res_counter in | ||
94 | child cgroup. | ||
95 | |||
96 | 2.1 Other accounting routines | ||
97 | |||
98 | There are more routines that may help you with common needs, like | ||
99 | checking whether the limit is reached or resetting the max_usage | ||
100 | value. They are all declared in include/linux/res_counter.h. | ||
101 | |||
102 | |||
103 | |||
104 | 3. Analyzing the resource counter registrations | ||
105 | |||
106 | a. If the failcnt value constantly grows, this means that the counter's | ||
107 | limit is too tight. Either the group is misbehaving and consumes too | ||
108 | many resources, or the configuration is not suitable for the group | ||
109 | and the limit should be increased. | ||
110 | |||
111 | b. The max_usage value can be used to quickly tune the group. One may | ||
112 | set the limits to maximal values and either load the container with | ||
113 | a common pattern or leave one for a while. After this the max_usage | ||
114 | value shows the amount of memory the container would require during | ||
115 | its common activity. | ||
116 | |||
117 | Setting the limit a bit above this value gives a pretty good | ||
118 | configuration that works in most of the cases. | ||
119 | |||
120 | c. If the max_usage is much less than the limit, but the failcnt value | ||
121 | is growing, then the group tries to allocate a big chunk of resource | ||
122 | at once. | ||
123 | |||
124 | d. If the max_usage is much less than the limit, but the failcnt value | ||
125 | is 0, then this group is given too high limit, that it does not | ||
126 | require. It is better to lower the limit a bit leaving more resource | ||
127 | for other groups. | ||
128 | |||
129 | |||
130 | |||
131 | 4. Communication with the control groups subsystem (cgroups) | ||
132 | |||
133 | All the resource controllers that are using cgroups and resource counters | ||
134 | should provide files (in the cgroup filesystem) to work with the resource | ||
135 | counter fields. They are recommended to adhere to the following rules: | ||
136 | |||
137 | a. File names | ||
138 | |||
139 | Field name File name | ||
140 | --------------------------------------------------- | ||
141 | usage usage_in_<unit_of_measurement> | ||
142 | max_usage max_usage_in_<unit_of_measurement> | ||
143 | limit limit_in_<unit_of_measurement> | ||
144 | failcnt failcnt | ||
145 | lock no file :) | ||
146 | |||
147 | b. Reading from file should show the corresponding field value in the | ||
148 | appropriate format. | ||
149 | |||
150 | c. Writing to file | ||
151 | |||
152 | Field Expected behavior | ||
153 | ---------------------------------- | ||
154 | usage prohibited | ||
155 | max_usage reset to usage | ||
156 | limit set the limit | ||
157 | failcnt reset to zero | ||
158 | |||
159 | |||
160 | |||
161 | 5. Usage example | ||
162 | |||
163 | a. Declare a task group (take a look at cgroups subsystem for this) and | ||
164 | fold a res_counter into it | ||
165 | |||
166 | struct my_group { | ||
167 | struct res_counter res; | ||
168 | |||
169 | <other fields> | ||
170 | } | ||
171 | |||
172 | b. Put hooks in resource allocation/release paths | ||
173 | |||
174 | int alloc_something(...) | ||
175 | { | ||
176 | if (res_counter_charge(res_counter_ptr, amount) < 0) | ||
177 | return -ENOMEM; | ||
178 | |||
179 | <allocate the resource and return to the caller> | ||
180 | } | ||
181 | |||
182 | void release_something(...) | ||
183 | { | ||
184 | res_counter_uncharge(res_counter_ptr, amount); | ||
185 | |||
186 | <release the resource> | ||
187 | } | ||
188 | |||
189 | In order to keep the usage value self-consistent, both the | ||
190 | "res_counter_ptr" and the "amount" in release_something() should be | ||
191 | the same as they were in the alloc_something() when the releasing | ||
192 | resource was allocated. | ||
193 | |||
194 | c. Provide the way to read res_counter values and set them (the cgroups | ||
195 | still can help with it). | ||
196 | |||
197 | c. Compile and run :) | ||