aboutsummaryrefslogtreecommitdiffstats
path: root/mm
diff options
context:
space:
mode:
authorNick Piggin <npiggin@suse.de>2008-06-23 08:30:30 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2008-06-23 14:28:32 -0400
commit945754a1754f9d4c2974a8241ad4f92fad7f3a6a (patch)
treed310dc918d8094a1a6e00e15b24f7953616d7a82 /mm
parent672ca28e300c17bf8d792a2a7a8631193e580c74 (diff)
mm: fix race in COW logic
There is a race in the COW logic. It contains a shortcut to avoid the COW and reuse the page if we have the sole reference on the page, however it is possible to have two racing do_wp_page()ers with one causing the other to mistakenly believe it is safe to take the shortcut when it is not. This could lead to data corruption. Process 1 and process2 each have a wp pte of the same anon page (ie. one forked the other). The page's mapcount is 2. Then they both attempt to write to it around the same time... proc1 proc2 thr1 proc2 thr2 CPU0 CPU1 CPU3 do_wp_page() do_wp_page() trylock_page() can_share_swap_page() load page mapcount (==2) reuse = 0 pte unlock copy page to new_page pte lock page_remove_rmap(page); trylock_page() can_share_swap_page() load page mapcount (==1) reuse = 1 ptep_set_access_flags (allow W) write private key into page read from page ptep_clear_flush() set_pte_at(pte of new_page) Fix this by moving the page_remove_rmap of the old page after the pte clear and flush. Potentially the entire branch could be moved down here, but in order to stay consistent, I won't (should probably move all the *_mm_counter stuff with one patch). Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm')
-rw-r--r--mm/memory.c27
1 files changed, 26 insertions, 1 deletions
diff --git a/mm/memory.c b/mm/memory.c
index 423e0e7c2f73..d14b251a25a6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1785,7 +1785,6 @@ gotten:
1785 page_table = pte_offset_map_lock(mm, pmd, address, &ptl); 1785 page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
1786 if (likely(pte_same(*page_table, orig_pte))) { 1786 if (likely(pte_same(*page_table, orig_pte))) {
1787 if (old_page) { 1787 if (old_page) {
1788 page_remove_rmap(old_page, vma);
1789 if (!PageAnon(old_page)) { 1788 if (!PageAnon(old_page)) {
1790 dec_mm_counter(mm, file_rss); 1789 dec_mm_counter(mm, file_rss);
1791 inc_mm_counter(mm, anon_rss); 1790 inc_mm_counter(mm, anon_rss);
@@ -1807,6 +1806,32 @@ gotten:
1807 lru_cache_add_active(new_page); 1806 lru_cache_add_active(new_page);
1808 page_add_new_anon_rmap(new_page, vma, address); 1807 page_add_new_anon_rmap(new_page, vma, address);
1809 1808
1809 if (old_page) {
1810 /*
1811 * Only after switching the pte to the new page may
1812 * we remove the mapcount here. Otherwise another
1813 * process may come and find the rmap count decremented
1814 * before the pte is switched to the new page, and
1815 * "reuse" the old page writing into it while our pte
1816 * here still points into it and can be read by other
1817 * threads.
1818 *
1819 * The critical issue is to order this
1820 * page_remove_rmap with the ptp_clear_flush above.
1821 * Those stores are ordered by (if nothing else,)
1822 * the barrier present in the atomic_add_negative
1823 * in page_remove_rmap.
1824 *
1825 * Then the TLB flush in ptep_clear_flush ensures that
1826 * no process can access the old page before the
1827 * decremented mapcount is visible. And the old page
1828 * cannot be reused until after the decremented
1829 * mapcount is visible. So transitively, TLBs to
1830 * old page will be flushed before it can be reused.
1831 */
1832 page_remove_rmap(old_page, vma);
1833 }
1834
1810 /* Free the old page.. */ 1835 /* Free the old page.. */
1811 new_page = old_page; 1836 new_page = old_page;
1812 ret |= VM_FAULT_WRITE; 1837 ret |= VM_FAULT_WRITE;