diff options
author | Peter Zijlstra <a.p.zijlstra@chello.nl> | 2011-05-24 20:11:45 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2011-05-25 11:39:12 -0400 |
commit | d16dfc550f5326a4000f3322582a7c05dec91d7a (patch) | |
tree | 8ee963542705cbf2187777f1d3f2b209cbda827a /fs/exec.c | |
parent | d05f3169c0fbca16132ec7c2be71685c6de638b5 (diff) |
mm: mmu_gather rework
Rework the existing mmu_gather infrastructure.
The direct purpose of these patches was to allow preemptible mmu_gather,
but even without that I think these patches provide an improvement to the
status quo.
The first 9 patches rework the mmu_gather infrastructure. For review
purpose I've split them into generic and per-arch patches with the last of
those a generic cleanup.
The next patch provides generic RCU page-table freeing, and the followup
is a patch converting s390 to use this. I've also got 4 patches from
DaveM lined up (not included in this series) that uses this to implement
gup_fast() for sparc64.
Then there is one patch that extends the generic mmu_gather batching.
After that follow the mm preemptibility patches, these make part of the mm
a lot more preemptible. It converts i_mmap_lock and anon_vma->lock to
mutexes which together with the mmu_gather rework makes mmu_gather
preemptible as well.
Making i_mmap_lock a mutex also enables a clean-up of the truncate code.
This also allows for preemptible mmu_notifiers, something that XPMEM I
think wants.
Furthermore, it removes the new and universially detested unmap_mutex.
This patch:
Remove the first obstacle towards a fully preemptible mmu_gather.
The current scheme assumes mmu_gather is always done with preemption
disabled and uses per-cpu storage for the page batches. Change this to
try and allocate a page for batching and in case of failure, use a small
on-stack array to make some progress.
Preemptible mmu_gather is desired in general and usable once i_mmap_lock
becomes a mutex. Doing it before the mutex conversion saves us from
having to rework the code by moving the mmu_gather bits inside the
pte_lock.
Also avoid flushing the tlb batches from under the pte lock, this is
useful even without the i_mmap_lock conversion as it significantly reduces
pte lock hold times.
[akpm@linux-foundation.org: fix comment tpyo]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'fs/exec.c')
-rw-r--r-- | fs/exec.c | 10 |
1 files changed, 5 insertions, 5 deletions
@@ -600,7 +600,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) | |||
600 | unsigned long length = old_end - old_start; | 600 | unsigned long length = old_end - old_start; |
601 | unsigned long new_start = old_start - shift; | 601 | unsigned long new_start = old_start - shift; |
602 | unsigned long new_end = old_end - shift; | 602 | unsigned long new_end = old_end - shift; |
603 | struct mmu_gather *tlb; | 603 | struct mmu_gather tlb; |
604 | 604 | ||
605 | BUG_ON(new_start > new_end); | 605 | BUG_ON(new_start > new_end); |
606 | 606 | ||
@@ -626,12 +626,12 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) | |||
626 | return -ENOMEM; | 626 | return -ENOMEM; |
627 | 627 | ||
628 | lru_add_drain(); | 628 | lru_add_drain(); |
629 | tlb = tlb_gather_mmu(mm, 0); | 629 | tlb_gather_mmu(&tlb, mm, 0); |
630 | if (new_end > old_start) { | 630 | if (new_end > old_start) { |
631 | /* | 631 | /* |
632 | * when the old and new regions overlap clear from new_end. | 632 | * when the old and new regions overlap clear from new_end. |
633 | */ | 633 | */ |
634 | free_pgd_range(tlb, new_end, old_end, new_end, | 634 | free_pgd_range(&tlb, new_end, old_end, new_end, |
635 | vma->vm_next ? vma->vm_next->vm_start : 0); | 635 | vma->vm_next ? vma->vm_next->vm_start : 0); |
636 | } else { | 636 | } else { |
637 | /* | 637 | /* |
@@ -640,10 +640,10 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) | |||
640 | * have constraints on va-space that make this illegal (IA64) - | 640 | * have constraints on va-space that make this illegal (IA64) - |
641 | * for the others its just a little faster. | 641 | * for the others its just a little faster. |
642 | */ | 642 | */ |
643 | free_pgd_range(tlb, old_start, old_end, new_end, | 643 | free_pgd_range(&tlb, old_start, old_end, new_end, |
644 | vma->vm_next ? vma->vm_next->vm_start : 0); | 644 | vma->vm_next ? vma->vm_next->vm_start : 0); |
645 | } | 645 | } |
646 | tlb_finish_mmu(tlb, new_end, old_end); | 646 | tlb_finish_mmu(&tlb, new_end, old_end); |
647 | 647 | ||
648 | /* | 648 | /* |
649 | * Shrink the vma to just the new range. Always succeeds. | 649 | * Shrink the vma to just the new range. Always succeeds. |