aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKirill A. Shutemov <kirill.shutemov@linux.intel.com>2014-05-09 18:37:00 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2014-05-11 04:55:48 -0400
commitdd18dbc2d42af75fffa60c77e0f02220bc329829 (patch)
treed89c03182d9789f2e1e48a63aacbfbcaacf93950
parent3551a9280bcb728980a13783ff295e9f0bdedd9a (diff)
mm, thp: close race between mremap() and split_huge_page()
It's critical for split_huge_page() (and migration) to catch and freeze all PMDs on rmap walk. It gets tricky if there's concurrent fork() or mremap() since usually we copy/move page table entries on dup_mm() or move_page_tables() without rmap lock taken. To get it work we rely on rmap walk order to not miss any entry. We expect to see destination VMA after source one to work correctly. But after switching rmap implementation to interval tree it's not always possible to preserve expected walk order. It works fine for dup_mm() since new VMA has the same vma_start_pgoff() / vma_last_pgoff() and explicitly insert dst VMA after src one with vma_interval_tree_insert_after(). But on move_vma() destination VMA can be merged into adjacent one and as result shifted left in interval tree. Fortunately, we can detect the situation and prevent race with rmap walk by moving page table entries under rmap lock. See commit 38a76013ad80. Problem is that we miss the lock when we move transhuge PMD. Most likely this bug caused the crash[1]. [1] http://thread.gmane.org/gmane.linux.kernel.mm/96473 Fixes: 108d6642ad81 ("mm anon rmap: remove anon_vma_moveto_tail") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Michel Lespinasse <walken@google.com> Cc: Dave Jones <davej@redhat.com> Cc: David Miller <davem@davemloft.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--mm/mremap.c9
1 files changed, 8 insertions, 1 deletions
diff --git a/mm/mremap.c b/mm/mremap.c
index 0843feb66f3d..05f1180e9f21 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -194,10 +194,17 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
194 break; 194 break;
195 if (pmd_trans_huge(*old_pmd)) { 195 if (pmd_trans_huge(*old_pmd)) {
196 int err = 0; 196 int err = 0;
197 if (extent == HPAGE_PMD_SIZE) 197 if (extent == HPAGE_PMD_SIZE) {
198 VM_BUG_ON(vma->vm_file || !vma->anon_vma);
199 /* See comment in move_ptes() */
200 if (need_rmap_locks)
201 anon_vma_lock_write(vma->anon_vma);
198 err = move_huge_pmd(vma, new_vma, old_addr, 202 err = move_huge_pmd(vma, new_vma, old_addr,
199 new_addr, old_end, 203 new_addr, old_end,
200 old_pmd, new_pmd); 204 old_pmd, new_pmd);
205 if (need_rmap_locks)
206 anon_vma_unlock_write(vma->anon_vma);
207 }
201 if (err > 0) { 208 if (err > 0) {
202 need_flush = true; 209 need_flush = true;
203 continue; 210 continue;