aboutsummaryrefslogtreecommitdiffstats
path: root/mm/hugetlb.c
diff options
context:
space:
mode:
authorNick Piggin <npiggin@suse.de>2008-02-05 01:29:34 -0500
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2008-02-05 12:44:19 -0500
commit0ed361dec36945f3116ee1338638ada9a8920905 (patch)
tree3e0fc6319ef49f6cac82e8203a8aa199302ab9c5 /mm/hugetlb.c
parent62e1c55300f306e06478f460a7eefba085206e0b (diff)
mm: fix PageUptodate data race
After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after PageUptodate. Many places that test PageUptodate, do so with the page locked, and this would be enough to ensure memory ordering in those places if SetPageUptodate were only called while the page is locked. Unfortunately that is not always the case for some filesystems, but it could be an idea for the future. Also bring the handling of anonymous page uptodateness in line with that of file backed page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing allows us to get rid of the smp_wmb's in the page copying functions, which were especially added for anonymous pages for an analogous memory ordering problem. Both file and anonymous pages are handled with the same barriers. FAQ: Q. Why not do this in flush_dcache_page? A. Firstly, flush_dcache_page handles only one side (the smb side) of the ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away memory barriers in a completely unrelated function is nasty; at least in the PageUptodate macros, they are located together with (half) the operations involved in the ordering. Thirdly, the smp_wmb is only required when first bringing the page uptodate, wheras flush_dcache_page should be called each time it is written to through the kernel mapping. It is logically the wrong place to put it. Q. Why does this increase my text size / reduce my performance / etc. A. Because it is adding the necessary instructions to eliminate the data-race. Q. Can it be improved? A. Yes, eg. if you were to create a rule that all SetPageUptodate operations run under the page lock, we could avoid the smp_rmb places where PageUptodate is queried under the page lock. Requires audit of all filesystems and at least some would need reworking. That's great you're interested, I'm eagerly awaiting your patches. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/hugetlb.c')
-rw-r--r--mm/hugetlb.c2
1 files changed, 2 insertions, 0 deletions
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index db861d8b6c2..1a5642074e3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -813,6 +813,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
813 813
814 spin_unlock(&mm->page_table_lock); 814 spin_unlock(&mm->page_table_lock);
815 copy_huge_page(new_page, old_page, address, vma); 815 copy_huge_page(new_page, old_page, address, vma);
816 __SetPageUptodate(new_page);
816 spin_lock(&mm->page_table_lock); 817 spin_lock(&mm->page_table_lock);
817 818
818 ptep = huge_pte_offset(mm, address & HPAGE_MASK); 819 ptep = huge_pte_offset(mm, address & HPAGE_MASK);
@@ -858,6 +859,7 @@ retry:
858 goto out; 859 goto out;
859 } 860 }
860 clear_huge_page(page, address); 861 clear_huge_page(page, address);
862 __SetPageUptodate(page);
861 863
862 if (vma->vm_flags & VM_SHARED) { 864 if (vma->vm_flags & VM_SHARED) {
863 int err; 865 int err;