summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDave Hansen <dave.hansen@linux.intel.com>2019-04-19 15:47:47 -0400
committerIngo Molnar <mingo@kernel.org>2019-05-09 04:37:17 -0400
commit5a28fc94c9143db766d1ba5480cae82d856ad080 (patch)
tree7e409c919e0683f5bcdc12afb1bb9131ce8045d0
parentffa6f55eb6188ee73339cab710fabf30d13110a7 (diff)
x86/mpx, mm/core: Fix recursive munmap() corruption
This is a bit of a mess, to put it mildly. But, it's a bug that only seems to have showed up in 4.20 but wasn't noticed until now, because nobody uses MPX. MPX has the arch_unmap() hook inside of munmap() because MPX uses bounds tables that protect other areas of memory. When memory is unmapped, there is also a need to unmap the MPX bounds tables. Barring this, unused bounds tables can eat 80% of the address space. But, the recursive do_munmap() that gets called vi arch_unmap() wreaks havoc with __do_munmap()'s state. It can result in freeing populated page tables, accessing bogus VMA state, double-freed VMAs and more. See the "long story" further below for the gory details. To fix this, call arch_unmap() before __do_unmap() has a chance to do anything meaningful. Also, remove the 'vma' argument and force the MPX code to do its own, independent VMA lookup. == UML / unicore32 impact == Remove unused 'vma' argument to arch_unmap(). No functional change. I compile tested this on UML but not unicore32. == powerpc impact == powerpc uses arch_unmap() well to watch for munmap() on the VDSO and zeroes out 'current->mm->context.vdso_base'. Moving arch_unmap() makes this happen earlier in __do_munmap(). But, 'vdso_base' seems to only be used in perf and in the signal delivery that happens near the return to userspace. I can not find any likely impact to powerpc, other than the zeroing happening a little earlier. powerpc does not use the 'vma' argument and is unaffected by its removal. I compile-tested a 64-bit powerpc defconfig. == x86 impact == For the common success case this is functionally identical to what was there before. For the munmap() failure case, it's possible that some MPX tables will be zapped for memory that continues to be in use. But, this is an extraordinarily unlikely scenario and the harm would be that MPX provides no protection since the bounds table got reset (zeroed). I can't imagine anyone doing this: ptr = mmap(); // use ptr ret = munmap(ptr); if (ret) // oh, there was an error, I'll // keep using ptr. Because if you're doing munmap(), you are *done* with the memory. There's probably no good data in there _anyway_. This passes the original reproducer from Richard Biener as well as the existing mpx selftests/. The long story: munmap() has a couple of pieces: 1. Find the affected VMA(s) 2. Split the start/end one(s) if neceesary 3. Pull the VMAs out of the rbtree 4. Actually zap the memory via unmap_region(), including freeing page tables (or queueing them to be freed). 5. Fix up some of the accounting (like fput()) and actually free the VMA itself. This specific ordering was actually introduced by: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") during the 4.20 merge window. The previous __do_munmap() code was actually safe because the only thing after arch_unmap() was remove_vma_list(). arch_unmap() could not see 'vma' in the rbtree because it was detached, so it is not even capable of doing operations unsafe for remove_vma_list()'s use of 'vma'. Richard Biener reported a test that shows this in dmesg: [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551 [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576 What triggered this was the recursive do_munmap() called via arch_unmap(). It was freeing page tables that has not been properly zapped. But, the problem was bigger than this. For one, arch_unmap() can free VMAs. But, the calling __do_munmap() has variables that *point* to VMAs and obviously can't handle them just getting freed while the pointer is still in use. I tried a couple of things here. First, I tried to fix the page table freeing problem in isolation, but I then found the VMA issue. I also tried having the MPX code return a flag if it modified the rbtree which would force __do_munmap() to re-walk to restart. That spiralled out of control in complexity pretty fast. Just moving arch_unmap() and accepting that the bonkers failure case might eat some bounds tables seems like the simplest viable fix. This was also reported in the following kernel bugzilla entry: https://bugzilla.kernel.org/show_bug.cgi?id=203123 There are some reports that this commit triggered this bug: dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap") While that commit certainly made the issues easier to hit, I believe the fundamental issue has been with us as long as MPX itself, thus the Fixes: tag below is for one of the original MPX commits. [ mingo: Minor edits to the changelog and the patch. ] Reported-by: Richard Biener <rguenther@suse.de> Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-um@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@vger.kernel.org Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
-rw-r--r--arch/powerpc/include/asm/mmu_context.h1
-rw-r--r--arch/um/include/asm/mmu_context.h1
-rw-r--r--arch/unicore32/include/asm/mmu_context.h1
-rw-r--r--arch/x86/include/asm/mmu_context.h6
-rw-r--r--arch/x86/include/asm/mpx.h15
-rw-r--r--arch/x86/mm/mpx.c10
-rw-r--r--include/asm-generic/mm_hooks.h1
-rw-r--r--mm/mmap.c15
8 files changed, 25 insertions, 25 deletions
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 6ee8195a2ffb..4a6dd3ba0b0b 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -237,7 +237,6 @@ extern void arch_exit_mmap(struct mm_struct *mm);
237#endif 237#endif
238 238
239static inline void arch_unmap(struct mm_struct *mm, 239static inline void arch_unmap(struct mm_struct *mm,
240 struct vm_area_struct *vma,
241 unsigned long start, unsigned long end) 240 unsigned long start, unsigned long end)
242{ 241{
243 if (start <= mm->context.vdso_base && mm->context.vdso_base < end) 242 if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h
index fca34b2177e2..9f4b4bb78120 100644
--- a/arch/um/include/asm/mmu_context.h
+++ b/arch/um/include/asm/mmu_context.h
@@ -22,7 +22,6 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
22} 22}
23extern void arch_exit_mmap(struct mm_struct *mm); 23extern void arch_exit_mmap(struct mm_struct *mm);
24static inline void arch_unmap(struct mm_struct *mm, 24static inline void arch_unmap(struct mm_struct *mm,
25 struct vm_area_struct *vma,
26 unsigned long start, unsigned long end) 25 unsigned long start, unsigned long end)
27{ 26{
28} 27}
diff --git a/arch/unicore32/include/asm/mmu_context.h b/arch/unicore32/include/asm/mmu_context.h
index 5c205a9cb5a6..9f06ea5466dd 100644
--- a/arch/unicore32/include/asm/mmu_context.h
+++ b/arch/unicore32/include/asm/mmu_context.h
@@ -88,7 +88,6 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
88} 88}
89 89
90static inline void arch_unmap(struct mm_struct *mm, 90static inline void arch_unmap(struct mm_struct *mm,
91 struct vm_area_struct *vma,
92 unsigned long start, unsigned long end) 91 unsigned long start, unsigned long end)
93{ 92{
94} 93}
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 93dff1963337..9024236693d2 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -278,8 +278,8 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
278 mpx_mm_init(mm); 278 mpx_mm_init(mm);
279} 279}
280 280
281static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma, 281static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
282 unsigned long start, unsigned long end) 282 unsigned long end)
283{ 283{
284 /* 284 /*
285 * mpx_notify_unmap() goes and reads a rarely-hot 285 * mpx_notify_unmap() goes and reads a rarely-hot
@@ -299,7 +299,7 @@ static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
299 * consistently wrong. 299 * consistently wrong.
300 */ 300 */
301 if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX))) 301 if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
302 mpx_notify_unmap(mm, vma, start, end); 302 mpx_notify_unmap(mm, start, end);
303} 303}
304 304
305/* 305/*
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index d0b1434fb0b6..143a5c193ed3 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -64,12 +64,15 @@ struct mpx_fault_info {
64}; 64};
65 65
66#ifdef CONFIG_X86_INTEL_MPX 66#ifdef CONFIG_X86_INTEL_MPX
67int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs); 67
68int mpx_handle_bd_fault(void); 68extern int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs);
69extern int mpx_handle_bd_fault(void);
70
69static inline int kernel_managing_mpx_tables(struct mm_struct *mm) 71static inline int kernel_managing_mpx_tables(struct mm_struct *mm)
70{ 72{
71 return (mm->context.bd_addr != MPX_INVALID_BOUNDS_DIR); 73 return (mm->context.bd_addr != MPX_INVALID_BOUNDS_DIR);
72} 74}
75
73static inline void mpx_mm_init(struct mm_struct *mm) 76static inline void mpx_mm_init(struct mm_struct *mm)
74{ 77{
75 /* 78 /*
@@ -78,11 +81,10 @@ static inline void mpx_mm_init(struct mm_struct *mm)
78 */ 81 */
79 mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR; 82 mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
80} 83}
81void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
82 unsigned long start, unsigned long end);
83 84
84unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, 85extern void mpx_notify_unmap(struct mm_struct *mm, unsigned long start, unsigned long end);
85 unsigned long flags); 86extern unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, unsigned long flags);
87
86#else 88#else
87static inline int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs) 89static inline int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs)
88{ 90{
@@ -100,7 +102,6 @@ static inline void mpx_mm_init(struct mm_struct *mm)
100{ 102{
101} 103}
102static inline void mpx_notify_unmap(struct mm_struct *mm, 104static inline void mpx_notify_unmap(struct mm_struct *mm,
103 struct vm_area_struct *vma,
104 unsigned long start, unsigned long end) 105 unsigned long start, unsigned long end)
105{ 106{
106} 107}
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index c805db6236b4..7aeb9fe2955f 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -881,9 +881,10 @@ static int mpx_unmap_tables(struct mm_struct *mm,
881 * the virtual address region start...end have already been split if 881 * the virtual address region start...end have already been split if
882 * necessary, and the 'vma' is the first vma in this range (start -> end). 882 * necessary, and the 'vma' is the first vma in this range (start -> end).
883 */ 883 */
884void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, 884void mpx_notify_unmap(struct mm_struct *mm, unsigned long start,
885 unsigned long start, unsigned long end) 885 unsigned long end)
886{ 886{
887 struct vm_area_struct *vma;
887 int ret; 888 int ret;
888 889
889 /* 890 /*
@@ -902,11 +903,12 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
902 * which should not occur normally. Being strict about it here 903 * which should not occur normally. Being strict about it here
903 * helps ensure that we do not have an exploitable stack overflow. 904 * helps ensure that we do not have an exploitable stack overflow.
904 */ 905 */
905 do { 906 vma = find_vma(mm, start);
907 while (vma && vma->vm_start < end) {
906 if (vma->vm_flags & VM_MPX) 908 if (vma->vm_flags & VM_MPX)
907 return; 909 return;
908 vma = vma->vm_next; 910 vma = vma->vm_next;
909 } while (vma && vma->vm_start < end); 911 }
910 912
911 ret = mpx_unmap_tables(mm, start, end); 913 ret = mpx_unmap_tables(mm, start, end);
912 if (ret) 914 if (ret)
diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
index 8ac4e68a12f0..6736ed2f632b 100644
--- a/include/asm-generic/mm_hooks.h
+++ b/include/asm-generic/mm_hooks.h
@@ -18,7 +18,6 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
18} 18}
19 19
20static inline void arch_unmap(struct mm_struct *mm, 20static inline void arch_unmap(struct mm_struct *mm,
21 struct vm_area_struct *vma,
22 unsigned long start, unsigned long end) 21 unsigned long start, unsigned long end)
23{ 22{
24} 23}
diff --git a/mm/mmap.c b/mm/mmap.c
index bd7b9f293b39..2d6a6662edb9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2735,9 +2735,17 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
2735 return -EINVAL; 2735 return -EINVAL;
2736 2736
2737 len = PAGE_ALIGN(len); 2737 len = PAGE_ALIGN(len);
2738 end = start + len;
2738 if (len == 0) 2739 if (len == 0)
2739 return -EINVAL; 2740 return -EINVAL;
2740 2741
2742 /*
2743 * arch_unmap() might do unmaps itself. It must be called
2744 * and finish any rbtree manipulation before this code
2745 * runs and also starts to manipulate the rbtree.
2746 */
2747 arch_unmap(mm, start, end);
2748
2741 /* Find the first overlapping VMA */ 2749 /* Find the first overlapping VMA */
2742 vma = find_vma(mm, start); 2750 vma = find_vma(mm, start);
2743 if (!vma) 2751 if (!vma)
@@ -2746,7 +2754,6 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
2746 /* we have start < vma->vm_end */ 2754 /* we have start < vma->vm_end */
2747 2755
2748 /* if it doesn't overlap, we have nothing.. */ 2756 /* if it doesn't overlap, we have nothing.. */
2749 end = start + len;
2750 if (vma->vm_start >= end) 2757 if (vma->vm_start >= end)
2751 return 0; 2758 return 0;
2752 2759
@@ -2816,12 +2823,6 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
2816 /* Detach vmas from rbtree */ 2823 /* Detach vmas from rbtree */
2817 detach_vmas_to_be_unmapped(mm, vma, prev, end); 2824 detach_vmas_to_be_unmapped(mm, vma, prev, end);
2818 2825
2819 /*
2820 * mpx unmap needs to be called with mmap_sem held for write.
2821 * It is safe to call it before unmap_region().
2822 */
2823 arch_unmap(mm, vma, start, end);
2824
2825 if (downgrade) 2826 if (downgrade)
2826 downgrade_write(&mm->mmap_sem); 2827 downgrade_write(&mm->mmap_sem);
2827 2828