diff options
author | Kirill A. Shutemov <kirill.shutemov@linux.intel.com> | 2017-04-13 17:56:26 -0400 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2017-04-21 03:31:18 -0400 |
commit | f584803c49427ba9623adf93d7078cbe9775b027 (patch) | |
tree | 821e41e591a90fe973e3c1c871ba6e66d2297a0d | |
parent | 5ef6f4dec559007650c762b11ca30dd74866299a (diff) |
thp: fix MADV_DONTNEED vs. MADV_FREE race
commit 58ceeb6bec86d9140f9d91d71a710e963523d063 upstream.
Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).
It's critical to not clear pmd intermittently while handling MADV_FREE
to avoid race with MADV_DONTNEED:
CPU0: CPU1:
madvise_free_huge_pmd()
pmdp_huge_get_and_clear_full()
madvise_dontneed()
zap_pmd_range()
pmd_trans_huge(*pmd) == 0 (without ptl)
// skip the pmd
set_pmd_at();
// pmd is re-established
It results in MADV_DONTNEED skipping the pmd, leaving it not cleared.
It violates MADV_DONTNEED interface and can result is userspace
misbehaviour.
Basically it's the same race as with numa balancing in
change_huge_pmd(), but a bit simpler to mitigate: we don't need to
preserve dirty/young flags here due to MADV_FREE functionality.
[kirill.shutemov@linux.intel.com: Urgh... Power is special again]
Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com
Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r-- | mm/huge_memory.c | 3 |
1 files changed, 1 insertions, 2 deletions
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 917555cf6be0..d5b2b759f76f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c | |||
@@ -1380,8 +1380,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, | |||
1380 | deactivate_page(page); | 1380 | deactivate_page(page); |
1381 | 1381 | ||
1382 | if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { | 1382 | if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { |
1383 | orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd, | 1383 | pmdp_invalidate(vma, addr, pmd); |
1384 | tlb->fullmm); | ||
1385 | orig_pmd = pmd_mkold(orig_pmd); | 1384 | orig_pmd = pmd_mkold(orig_pmd); |
1386 | orig_pmd = pmd_mkclean(orig_pmd); | 1385 | orig_pmd = pmd_mkclean(orig_pmd); |
1387 | 1386 | ||