summaryrefslogtreecommitdiffstats
path: root/mm/swapfile.c
diff options
context:
space:
mode:
authorAndrea Arcangeli <aarcange@redhat.com>2016-05-12 18:42:25 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2016-05-12 18:52:50 -0400
commit6d0a07edd17cfc12fdc1f36de8072fa17cc3666f (patch)
treea80f20857e658de5aaa8ffa769f32d2f3bf7a9a5 /mm/swapfile.c
parent7496fea9a6bf644afe360af795b121a77635b37d (diff)
mm: thp: calculate the mapcount correctly for THP pages during WP faults
This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false positive copy-on-writes. total_mapcount() isn't the right calculation needed in reuse_swap_page(), so this introduces a page_trans_huge_mapcount() that is effectively the full accurate return value for page_mapcount() if dealing with Transparent Hugepages, however we only use the page_trans_huge_mapcount() during COW faults where it strictly needed, due to its higher runtime cost. This also provide at practical zero cost the total_mapcount information which is needed to know if we can still relocate the page anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we can reuse the page no matter if it's a pte or a pmd_trans_huge triggering the fault, but we can only relocate the page anon_vma to the local vma->anon_vma if we're sure it's only this "vma" mapping the whole THP physical range. Kirill A. Shutemov discovered the problem with moving the page anon_vma to the local vma->anon_vma in a previous version of this patch and another problem in the way page_move_anon_rmap() was called. Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a previous version, because reuse_swap_page must be a macro to call page_trans_huge_mapcount from swap.h, so this uses a macro again instead of an inline function. With this change at least it's a less dangerous usage than it was before, because "page" is used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Dean Luick noticed an uninitialized variable that could result in a rmap inefficiency for the non-THP case in a previous version. Mike Marciniszyn said: : Our RDMA tests are seeing an issue with memory locking that bisects to : commit 61f5d698cc97 ("mm: re-enable THP") : : The test program registers two rather large MRs (512M) and RDMA : writes data to a passive peer using the first and RDMA reads it back : into the second MR and compares that data. The sizes are chosen randomly : between 0 and 1024 bytes. : : The test will get through a few (<= 4 iterations) and then gets a : compare error. : : Tracing indicates the kernel logical addresses associated with the individual : pages at registration ARE correct , the data in the "RDMA read response only" : packets ARE correct. : : The "corruption" occurs when the packet crosse two pages that are not physically : contiguous. The second page reads back as zero in the program. : : It looks like the user VA at the point of the compare error no longer points to : the same physical address as was registered. : : This patch totally resolves the issue! Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reviewed-by: Dean Luick <dean.luick@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Josh Collier <josh.d.collier@intel.com> Cc: Marc Haber <mh+linux-kernel@zugschlus.de> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/swapfile.c')
-rw-r--r--mm/swapfile.c13
1 files changed, 7 insertions, 6 deletions
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 83874eced5bf..031713ab40ce 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -922,18 +922,19 @@ out:
922 * to it. And as a side-effect, free up its swap: because the old content 922 * to it. And as a side-effect, free up its swap: because the old content
923 * on disk will never be read, and seeking back there to write new content 923 * on disk will never be read, and seeking back there to write new content
924 * later would only waste time away from clustering. 924 * later would only waste time away from clustering.
925 *
926 * NOTE: total_mapcount should not be relied upon by the caller if
927 * reuse_swap_page() returns false, but it may be always overwritten
928 * (see the other implementation for CONFIG_SWAP=n).
925 */ 929 */
926int reuse_swap_page(struct page *page) 930bool reuse_swap_page(struct page *page, int *total_mapcount)
927{ 931{
928 int count; 932 int count;
929 933
930 VM_BUG_ON_PAGE(!PageLocked(page), page); 934 VM_BUG_ON_PAGE(!PageLocked(page), page);
931 if (unlikely(PageKsm(page))) 935 if (unlikely(PageKsm(page)))
932 return 0; 936 return false;
933 /* The page is part of THP and cannot be reused */ 937 count = page_trans_huge_mapcount(page, total_mapcount);
934 if (PageTransCompound(page))
935 return 0;
936 count = page_mapcount(page);
937 if (count <= 1 && PageSwapCache(page)) { 938 if (count <= 1 && PageSwapCache(page)) {
938 count += page_swapcount(page); 939 count += page_swapcount(page);
939 if (count == 1 && !PageWriteback(page)) { 940 if (count == 1 && !PageWriteback(page)) {