diff options
author | Martin Hicks <mort@sgi.com> | 2005-06-21 20:14:41 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-06-21 21:46:14 -0400 |
commit | 753ee728964e5afb80c17659cc6c3a6fd0a42fe0 (patch) | |
tree | 41c9a7700d0858c1f77c5bdaba97e5b636f69b06 /include/linux | |
parent | bfbb38fb808ac23ef44472d05d9bb36edfb49ed0 (diff) |
[PATCH] VM: early zone reclaim
This is the core of the (much simplified) early reclaim. The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.
One of the major uses of this is NUMA machines. With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.
This adds a zone tuneable to enable early zone reclaim. It is selected on a
per-zone basis and is turned on/off via syscall.
Adding some extra throttling on the reclaim was also required (patch
4/4). Without the machine would grind to a crawl when doing a "make -j"
kernel build. Even with this patch the System Time is higher on
average, but it seems tolerable. Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
wall user sys %cpu ctx sw. sleeps
---- ---- --- ---- ------ ------
No patch 1009 1384 847 258 298170 504402
w/patch, no reclaim 880 1376 667 288 254064 396745
w/patch & reclaim 1079 1385 926 252 291625 548873
These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot. Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.
I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.
Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).
The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'include/linux')
-rw-r--r-- | include/linux/mmzone.h | 6 | ||||
-rw-r--r-- | include/linux/swap.h | 1 |
2 files changed, 7 insertions, 0 deletions
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index beacd931b606..dfc2452ccb10 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h | |||
@@ -145,6 +145,12 @@ struct zone { | |||
145 | int all_unreclaimable; /* All pages pinned */ | 145 | int all_unreclaimable; /* All pages pinned */ |
146 | 146 | ||
147 | /* | 147 | /* |
148 | * Does the allocator try to reclaim pages from the zone as soon | ||
149 | * as it fails a watermark_ok() in __alloc_pages? | ||
150 | */ | ||
151 | int reclaim_pages; | ||
152 | |||
153 | /* | ||
148 | * prev_priority holds the scanning priority for this zone. It is | 154 | * prev_priority holds the scanning priority for this zone. It is |
149 | * defined as the scanning priority at which we achieved our reclaim | 155 | * defined as the scanning priority at which we achieved our reclaim |
150 | * target at the previous try_to_free_pages() or balance_pgdat() | 156 | * target at the previous try_to_free_pages() or balance_pgdat() |
diff --git a/include/linux/swap.h b/include/linux/swap.h index 3bbc41be9bd0..0d21e682d99d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h | |||
@@ -173,6 +173,7 @@ extern void swap_setup(void); | |||
173 | 173 | ||
174 | /* linux/mm/vmscan.c */ | 174 | /* linux/mm/vmscan.c */ |
175 | extern int try_to_free_pages(struct zone **, unsigned int, unsigned int); | 175 | extern int try_to_free_pages(struct zone **, unsigned int, unsigned int); |
176 | extern int zone_reclaim(struct zone *, unsigned int, unsigned int); | ||
176 | extern int shrink_all_memory(int); | 177 | extern int shrink_all_memory(int); |
177 | extern int vm_swappiness; | 178 | extern int vm_swappiness; |
178 | 179 | ||