diff options
author | Christoph Lameter <clameter@sgi.com> | 2006-09-26 02:31:52 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@g5.osdl.org> | 2006-09-26 11:48:51 -0400 |
commit | 0ff38490c836dc379ff7ec45b10a15a662f4e5f6 (patch) | |
tree | cb42d5d3cace3c8d12f0b304879039c503807981 /Documentation/sysctl | |
parent | 972d1a7b140569084439a81265a0f15b74e924e0 (diff) |
[PATCH] zone_reclaim: dynamic slab reclaim
Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.
However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs. We have had a case where a machine with
46GB of memory was using 40-42GB for slab. Zone reclaim was effective in
dealing with pagecache pages. However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).
This patch implements slab reclaim during zone reclaim. Zone reclaim
occurs if there is a danger of an off node allocation. At that point we
1. Shrink the per node page cache if the number of pagecache
pages is more than min_unmapped_ratio percent of pages in a zone.
2. Shrink the slab cache if the number of the nodes reclaimable slab pages
(patch depends on earlier one that implements that counter)
are more than min_slab_ratio (a new /proc/sys/vm tunable).
The shrinking of the slab cache is a bit problematic since it is not node
specific. So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit. I hope we will have zone based
slab reclaim at some point which will make that easier.
The default for the min_slab_ratio is 5%
Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.
[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'Documentation/sysctl')
-rw-r--r-- | Documentation/sysctl/vm.txt | 27 |
1 files changed, 20 insertions, 7 deletions
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 7cee90223d3..20d0d797f53 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm: | |||
29 | - drop-caches | 29 | - drop-caches |
30 | - zone_reclaim_mode | 30 | - zone_reclaim_mode |
31 | - min_unmapped_ratio | 31 | - min_unmapped_ratio |
32 | - min_slab_ratio | ||
32 | - panic_on_oom | 33 | - panic_on_oom |
33 | 34 | ||
34 | ============================================================== | 35 | ============================================================== |
@@ -138,7 +139,6 @@ This is value ORed together of | |||
138 | 1 = Zone reclaim on | 139 | 1 = Zone reclaim on |
139 | 2 = Zone reclaim writes dirty pages out | 140 | 2 = Zone reclaim writes dirty pages out |
140 | 4 = Zone reclaim swaps pages | 141 | 4 = Zone reclaim swaps pages |
141 | 8 = Also do a global slab reclaim pass | ||
142 | 142 | ||
143 | zone_reclaim_mode is set during bootup to 1 if it is determined that pages | 143 | zone_reclaim_mode is set during bootup to 1 if it is determined that pages |
144 | from remote zones will cause a measurable performance reduction. The | 144 | from remote zones will cause a measurable performance reduction. The |
@@ -162,18 +162,13 @@ Allowing regular swap effectively restricts allocations to the local | |||
162 | node unless explicitly overridden by memory policies or cpuset | 162 | node unless explicitly overridden by memory policies or cpuset |
163 | configurations. | 163 | configurations. |
164 | 164 | ||
165 | It may be advisable to allow slab reclaim if the system makes heavy | ||
166 | use of files and builds up large slab caches. However, the slab | ||
167 | shrink operation is global, may take a long time and free slabs | ||
168 | in all nodes of the system. | ||
169 | |||
170 | ============================================================= | 165 | ============================================================= |
171 | 166 | ||
172 | min_unmapped_ratio: | 167 | min_unmapped_ratio: |
173 | 168 | ||
174 | This is available only on NUMA kernels. | 169 | This is available only on NUMA kernels. |
175 | 170 | ||
176 | A percentage of the file backed pages in each zone. Zone reclaim will only | 171 | A percentage of the total pages in each zone. Zone reclaim will only |
177 | occur if more than this percentage of pages are file backed and unmapped. | 172 | occur if more than this percentage of pages are file backed and unmapped. |
178 | This is to insure that a minimal amount of local pages is still available for | 173 | This is to insure that a minimal amount of local pages is still available for |
179 | file I/O even if the node is overallocated. | 174 | file I/O even if the node is overallocated. |
@@ -182,6 +177,24 @@ The default is 1 percent. | |||
182 | 177 | ||
183 | ============================================================= | 178 | ============================================================= |
184 | 179 | ||
180 | min_slab_ratio: | ||
181 | |||
182 | This is available only on NUMA kernels. | ||
183 | |||
184 | A percentage of the total pages in each zone. On Zone reclaim | ||
185 | (fallback from the local zone occurs) slabs will be reclaimed if more | ||
186 | than this percentage of pages in a zone are reclaimable slab pages. | ||
187 | This insures that the slab growth stays under control even in NUMA | ||
188 | systems that rarely perform global reclaim. | ||
189 | |||
190 | The default is 5 percent. | ||
191 | |||
192 | Note that slab reclaim is triggered in a per zone / node fashion. | ||
193 | The process of reclaiming slab memory is currently not node specific | ||
194 | and may not be fast. | ||
195 | |||
196 | ============================================================= | ||
197 | |||
185 | panic_on_oom | 198 | panic_on_oom |
186 | 199 | ||
187 | This enables or disables panic on out-of-memory feature. If this is set to 1, | 200 | This enables or disables panic on out-of-memory feature. If this is set to 1, |