aboutsummaryrefslogtreecommitdiffstats
path: root/fs
diff options
context:
space:
mode:
authorWolfgang Wander <wwc@rentec.com>2005-06-21 20:14:49 -0400
committerLinus Torvalds <torvalds@ppc970.osdl.org>2005-06-21 21:46:16 -0400
commit1363c3cd8603a913a27e2995dccbd70d5312d8e6 (patch)
tree405e7fc1ef44678f3ca0a54c536d0457e6e80f45 /fs
parente7c8d5c9955a4d2e88e36b640563f5d6d5aba48a (diff)
[PATCH] Avoiding mmap fragmentation
Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'fs')
-rw-r--r--fs/binfmt_aout.c1
-rw-r--r--fs/binfmt_elf.c1
-rw-r--r--fs/hugetlbfs/inode.c3
3 files changed, 5 insertions, 0 deletions
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 009b8920c1ff..dd9baabaf016 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -316,6 +316,7 @@ static int load_aout_binary(struct linux_binprm * bprm, struct pt_regs * regs)
316 current->mm->brk = ex.a_bss + 316 current->mm->brk = ex.a_bss +
317 (current->mm->start_brk = N_BSSADDR(ex)); 317 (current->mm->start_brk = N_BSSADDR(ex));
318 current->mm->free_area_cache = current->mm->mmap_base; 318 current->mm->free_area_cache = current->mm->mmap_base;
319 current->mm->cached_hole_size = 0;
319 320
320 set_mm_counter(current->mm, rss, 0); 321 set_mm_counter(current->mm, rss, 0);
321 current->mm->mmap = NULL; 322 current->mm->mmap = NULL;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index f8f6b6b76179..7976a238f0a3 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -775,6 +775,7 @@ static int load_elf_binary(struct linux_binprm * bprm, struct pt_regs * regs)
775 change some of these later */ 775 change some of these later */
776 set_mm_counter(current->mm, rss, 0); 776 set_mm_counter(current->mm, rss, 0);
777 current->mm->free_area_cache = current->mm->mmap_base; 777 current->mm->free_area_cache = current->mm->mmap_base;
778 current->mm->cached_hole_size = 0;
778 retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP), 779 retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
779 executable_stack); 780 executable_stack);
780 if (retval < 0) { 781 if (retval < 0) {
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 2af3338f891b..3a9b6d179cbd 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -122,6 +122,9 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
122 122
123 start_addr = mm->free_area_cache; 123 start_addr = mm->free_area_cache;
124 124
125 if (len <= mm->cached_hole_size)
126 start_addr = TASK_UNMAPPED_BASE;
127
125full_search: 128full_search:
126 addr = ALIGN(start_addr, HPAGE_SIZE); 129 addr = ALIGN(start_addr, HPAGE_SIZE);
127 130