aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
diff options
context:
space:
mode:
authorMel Gorman <mel@csn.ul.ie>2008-07-24 00:27:25 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2008-07-24 13:47:16 -0400
commit04f2cbe35699d22dbf428373682ead85ca1240f5 (patch)
tree1987a2c704cc97d8adf603054c9d89d18b9b30e0 /include/linux
parenta1e78772d72b2616ed20e54896e68e0e7044854e (diff)
hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
After patch 2 in this series, a process that successfully calls mmap() for a MAP_PRIVATE mapping will be guaranteed to successfully fault until a process calls fork(). At that point, the next write fault from the parent could fail due to COW if the child still has a reference. We only reserve pages for the parent but a copy must be made to avoid leaking data from the parent to the child after fork(). Reserves could be taken for both parent and child at fork time to guarantee faults but if the mapping is large it is highly likely we will not have sufficient pages for the reservation, and it is common to fork only to exec() immediatly after. A failure here would be very undesirable. Note that the current behaviour of mainline with MAP_PRIVATE pages is pretty bad. The following situation is allowed to occur today. 1. Process calls mmap(MAP_PRIVATE) 2. Process calls mlock() to fault all pages and makes sure it succeeds 3. Process forks() 4. Process writes to MAP_PRIVATE mapping while child still exists 5. If the COW fails at this point, the process gets SIGKILLed even though it had taken care to ensure the pages existed This patch improves the situation by guaranteeing the reliability of the process that successfully calls mmap(). When the parent performs COW, it will try to satisfy the allocation without using reserves. If that fails the parent will steal the page leaving any children without a page. Faults from the child after that point will result in failure. If the child COW happens first, an attempt will be made to allocate the page without reserves and the child will get SIGKILLed on failure. To summarise the new behaviour: 1. If the original mapper performs COW on a private mapping with multiple references, it will attempt to allocate a hugepage from the pool or the buddy allocator without using the existing reserves. On fail, VMAs mapping the same area are traversed and the page being COW'd is unmapped where found. It will then steal the original page as the last mapper in the normal way. 2. The VMAs the pages were unmapped from are flagged to note that pages with data no longer exist. Future no-page faults on those VMAs will terminate the process as otherwise it would appear that data was corrupted. A warning is printed to the console that this situation occured. 2. If the child performs COW first, it will attempt to satisfy the COW from the pool if there are enough pages or via the buddy allocator if overcommit is allowed and the buddy allocator can satisfy the request. If it fails, the child will be killed. If the pool is large enough, existing applications will not notice that the reserves were a factor. Existing applications depending on the no-reserves been set are unlikely to exist as for much of the history of hugetlbfs, pages were prefaulted at mmap(), allocating the pages at that point or failing the mmap(). [npiggin@suse.de: fix CONFIG_HUGETLB=n build] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/hugetlb.h8
1 files changed, 5 insertions, 3 deletions
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 185b14c9f021..abbc187193a1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -23,8 +23,10 @@ int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __us
23int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); 23int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
24int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); 24int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
25int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int); 25int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int);
26void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); 26void unmap_hugepage_range(struct vm_area_struct *,
27void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); 27 unsigned long, unsigned long, struct page *);
28void __unmap_hugepage_range(struct vm_area_struct *,
29 unsigned long, unsigned long, struct page *);
28int hugetlb_prefault(struct address_space *, struct vm_area_struct *); 30int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
29int hugetlb_report_meminfo(char *); 31int hugetlb_report_meminfo(char *);
30int hugetlb_report_node_meminfo(int, char *); 32int hugetlb_report_node_meminfo(int, char *);
@@ -74,7 +76,7 @@ static inline unsigned long hugetlb_total_pages(void)
74#define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) 76#define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL)
75#define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) 77#define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; })
76#define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) 78#define hugetlb_prefault(mapping, vma) ({ BUG(); 0; })
77#define unmap_hugepage_range(vma, start, end) BUG() 79#define unmap_hugepage_range(vma, start, end, page) BUG()
78#define hugetlb_report_meminfo(buf) 0 80#define hugetlb_report_meminfo(buf) 0
79#define hugetlb_report_node_meminfo(n, buf) 0 81#define hugetlb_report_node_meminfo(n, buf) 0
80#define follow_huge_pmd(mm, addr, pmd, write) NULL 82#define follow_huge_pmd(mm, addr, pmd, write) NULL