From 1363c3cd8603a913a27e2995dccbd70d5312d8e6 Mon Sep 17 00:00:00 2001 From: Wolfgang Wander Date: Tue, 21 Jun 2005 17:14:49 -0700 Subject: [PATCH] Avoiding mmap fragmentation Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander Credit-to: "Richard Purdie" Signed-off-by: Ken Chen Acked-by: Ingo Molnar (partly) Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index f42a17f88699..876b31cd822d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -194,6 +194,7 @@ static inline int dup_mmap(struct mm_struct * mm, struct mm_struct * oldmm) mm->mmap = NULL; mm->mmap_cache = NULL; mm->free_area_cache = oldmm->mmap_base; + mm->cached_hole_size = ~0UL; mm->map_count = 0; set_mm_counter(mm, rss, 0); set_mm_counter(mm, anon_rss, 0); @@ -322,6 +323,7 @@ static struct mm_struct * mm_init(struct mm_struct * mm) mm->ioctx_list = NULL; mm->default_kioctx = (struct kioctx)INIT_KIOCTX(mm->default_kioctx, *mm); mm->free_area_cache = TASK_UNMAPPED_BASE; + mm->cached_hole_size = ~0UL; if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; -- cgit v1.2.2 From 45918e1a8bfcabc1cb4570b8df276655020eac45 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 21 Jun 2005 17:15:08 -0700 Subject: [PATCH] dup_mmap: update comment on new vma Remove part of comment on linking new vma in dup_mmap: since anon_vma rmap came in, try_to_unmap_one knows the vma without needing find_vma. But add a comment to note that here vma is inserted without mmap_sem. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index 876b31cd822d..a28d11e10877 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -250,8 +250,9 @@ static inline int dup_mmap(struct mm_struct * mm, struct mm_struct * oldmm) /* * Link in the new vma and copy the page table entries: - * link in first so that swapoff can see swap entries, - * and try_to_unmap_one's find_vma find the new vma. + * link in first so that swapoff can see swap entries. + * Note that, exceptionally, here the vma is inserted + * without holding mm->mmap_sem. */ spin_lock(&mm->page_table_lock); *pprev = tmp; -- cgit v1.2.2 From 476d139c218e44e045e4bc6d4cc02b010b343939 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Sat, 25 Jun 2005 14:57:29 -0700 Subject: [PATCH] sched: consolidate sbe sbf Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin Acked-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index a28d11e10877..2c7806873bfd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1003,9 +1003,6 @@ static task_t *copy_process(unsigned long clone_flags, p->pdeath_signal = 0; p->exit_state = 0; - /* Perform scheduler related setup */ - sched_fork(p); - /* * Ok, make it visible to the rest of the system. * We dont wake it up yet. @@ -1014,18 +1011,24 @@ static task_t *copy_process(unsigned long clone_flags, INIT_LIST_HEAD(&p->ptrace_children); INIT_LIST_HEAD(&p->ptrace_list); + /* Perform scheduler related setup. Assign this task to a CPU. */ + sched_fork(p, clone_flags); + /* Need tasklist lock for parent etc handling! */ write_lock_irq(&tasklist_lock); /* - * The task hasn't been attached yet, so cpus_allowed mask cannot - * have changed. The cpus_allowed mask of the parent may have - * changed after it was copied first time, and it may then move to - * another CPU - so we re-copy it here and set the child's CPU to - * the parent's CPU. This avoids alot of nasty races. + * The task hasn't been attached yet, so its cpus_allowed mask will + * not be changed, nor will its assigned CPU. + * + * The cpus_allowed mask of the parent may have changed after it was + * copied first time - so re-copy it here, then check the child's CPU + * to ensure it is on a valid CPU (and if not, just force it back to + * parent's CPU). This avoids alot of nasty races. */ p->cpus_allowed = current->cpus_allowed; - set_task_cpu(p, smp_processor_id()); + if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed))) + set_task_cpu(p, smp_processor_id()); /* * Check for pending SIGKILL! The new thread should not be allowed -- cgit v1.2.2 From 22e2c507c301c3dbbcf91b4948b88f78842ee6c9 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 27 Jun 2005 10:55:12 +0200 Subject: [PATCH] Update cfq io scheduler to time sliced design This updates the CFQ io scheduler to the new time sliced design (cfq v3). It provides full process fairness, while giving excellent aggregate system throughput even for many competing processes. It supports io priorities, either inherited from the cpu nice value or set directly with the ioprio_get/set syscalls. The latter closely mimic set/getpriority. This import is based on my latest from -mm. Signed-off-by: Jens Axboe Signed-off-by: Linus Torvalds --- kernel/fork.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index 2c7806873bfd..cdef6cea8900 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1090,6 +1090,11 @@ static task_t *copy_process(unsigned long clone_flags, spin_unlock(¤t->sighand->siglock); } + /* + * inherit ioprio + */ + p->ioprio = current->ioprio; + SET_LINKS(p); if (unlikely(p->ptrace & PT_PTRACED)) __ptrace_link(p, current->parent); -- cgit v1.2.2