aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Hicks <mort@sgi.com>2005-06-21 20:14:40 -0400
committerLinus Torvalds <torvalds@ppc970.osdl.org>2005-06-21 21:46:14 -0400
commitbfbb38fb808ac23ef44472d05d9bb36edfb49ed0 (patch)
tree19d26b575bf0ff1e2b3ec2c8ee12310154fdead5
parent295ab93497ec703f7d6eaf0787dd9768b83035fe (diff)
[PATCH] VM: add may_swap flag to scan_control
Here's the next round of these patches. These are totally different in an attempt to meet the "simpler" request after the last patches. For reference the earlier threads are: http://marc.theaimsgroup.com/?l=linux-kernel&m=110839604924587&w=2 http://marc.theaimsgroup.com/?l=linux-mm&m=111461480721249&w=2 This set of patches replaces my other vm- patches that are currently in -mm. So they're against 2.6.12-rc5-mm1 about half way through the -mm patchset. As I said already this patch is a lot simpler. The reclaim is turned on or off on a per-zone basis using a syscall. I haven't tested the x86 syscall, so it might be wrong. It uses the existing reclaim/pageout code with the small addition of a may_swap flag to scan_control (patch 1/4). I also added __GFP_NORECLAIM (patch 3/4) so that certain allocation types can be flagged to never cause reclaim. This was a deficiency that was in all of my earlier patch sets. Previously, doing a big buffered read would fill one zone with page cache and then start to reclaim from that same zone, leaving the other zones untouched. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.c This patch: This adds an extra switch to the scan_control struct. It simply lets the reclaim code know if its allowed to swap pages out. This was required for a simple per-zone reclaimer. Without this addition pages would be swapped out as soon as a zone ran out of memory and the early reclaim kicked in. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-rw-r--r--mm/vmscan.c7
1 files changed, 6 insertions, 1 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c62cadce0426..6379ddbffd9b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -74,6 +74,9 @@ struct scan_control {
74 74
75 int may_writepage; 75 int may_writepage;
76 76
77 /* Can pages be swapped as part of reclaim? */
78 int may_swap;
79
77 /* This context's SWAP_CLUSTER_MAX. If freeing memory for 80 /* This context's SWAP_CLUSTER_MAX. If freeing memory for
78 * suspend, we effectively ignore SWAP_CLUSTER_MAX. 81 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
79 * In this context, it doesn't matter that we scan the 82 * In this context, it doesn't matter that we scan the
@@ -414,7 +417,7 @@ static int shrink_list(struct list_head *page_list, struct scan_control *sc)
414 * Anonymous process memory has backing store? 417 * Anonymous process memory has backing store?
415 * Try to allocate it some swap space here. 418 * Try to allocate it some swap space here.
416 */ 419 */
417 if (PageAnon(page) && !PageSwapCache(page)) { 420 if (PageAnon(page) && !PageSwapCache(page) && sc->may_swap) {
418 if (!add_to_swap(page)) 421 if (!add_to_swap(page))
419 goto activate_locked; 422 goto activate_locked;
420 } 423 }
@@ -927,6 +930,7 @@ int try_to_free_pages(struct zone **zones,
927 930
928 sc.gfp_mask = gfp_mask; 931 sc.gfp_mask = gfp_mask;
929 sc.may_writepage = 0; 932 sc.may_writepage = 0;
933 sc.may_swap = 1;
930 934
931 inc_page_state(allocstall); 935 inc_page_state(allocstall);
932 936
@@ -1027,6 +1031,7 @@ loop_again:
1027 total_reclaimed = 0; 1031 total_reclaimed = 0;
1028 sc.gfp_mask = GFP_KERNEL; 1032 sc.gfp_mask = GFP_KERNEL;
1029 sc.may_writepage = 0; 1033 sc.may_writepage = 0;
1034 sc.may_swap = 1;
1030 sc.nr_mapped = read_page_state(nr_mapped); 1035 sc.nr_mapped = read_page_state(nr_mapped);
1031 1036
1032 inc_page_state(pageoutrun); 1037 inc_page_state(pageoutrun);