mmu-notifiers: core

With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages. There are secondary MMUs (with secondary sptes and secondary tlbs) too. sptes in the kvm case are shadow pagetables, but when I say spte in mmu-notifier context, I mean "secondary pte". In GRU case there's no actual secondary pte and there's only a secondary tlb because the GRU secondary MMU has no knowledge about sptes and every secondary tlb miss event in the MMU always generates a page fault that has to be resolved by the CPU (this is not the case of KVM where the a secondary tlb miss will walk sptes in hardware and it will refill the secondary tlb transparently to software if the corresponding spte is present). The same way zap_page_range has to invalidate the pte before freeing the page, the spte (and secondary tlb) must also be invalidated before any page is freed and reused. Currently we take a page_count pin on every page mapped by sptes, but that means the pages can't be swapped whenever they're mapped by any spte because they're part of the guest working set. Furthermore a spte unmap event can immediately lead to a page to be freed when the pin is released (so requiring the same complex and relatively slow tlb_gather smp safe logic we have in zap_page_range and that can be avoided completely if the spte unmap event doesn't require an unpin of the page previously mapped in the secondary MMU). The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know when the VM is swapping or freeing or doing anything on the primary MMU so that the secondary MMU code can drop sptes before the pages are freed, avoiding all page pinning and allowing 100% reliable swapping of guest physical address space. Furthermore it avoids the code that teardown the mappings of the secondary MMU, to implement a logic like tlb_gather in zap_page_range that would require many IPI to flush other cpu tlbs, for each fixed number of spte unmapped. To make an example: if what happens on the primary MMU is a protection downgrade (from writeable to wrprotect) the secondary MMU mappings will be invalidated, and the next secondary-mmu-page-fault will call get_user_pages and trigger a do_wp_page through get_user_pages if it called get_user_pages with write=1, and it'll re-establishing an updated spte or secondary-tlb-mapping on the copied page. Or it will setup a readonly spte or readonly tlb mapping if it's a guest-read, if it calls get_user_pages with write=0. This is just an example. This allows to map any page pointed by any pte (and in turn visible in the primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an full MMU with both sptes and secondary-tlb like the shadow-pagetable layer with kvm), or a remote DMA in software like XPMEM (hence needing of schedule in XPMEM code to send the invalidate to the remote node, while no need to schedule in kvm/gru as it's an immediate event like invalidating primary-mmu pte). At least for KVM without this patch it's impossible to swap guests reliably. And having this feature and removing the page pin allows several other optimizations that simplify life considerably. Dependencies: 1) mm_take_all_locks() to register the mmu notifier when the whole VM isn't doing anything with "mm". This allows mmu notifier users to keep track if the VM is in the middle of the invalidate_range_begin/end critical section with an atomic counter incraese in range_begin and decreased in range_end. No secondary MMU page fault is allowed to map any spte or secondary tlb reference, while the VM is in the middle of range_begin/end as any page returned by get_user_pages in that critical section could later immediately be freed without any further ->invalidate_page notification (invalidate_range_begin/end works on ranges and ->invalidate_page isn't called immediately before freeing the page). To stop all page freeing and pagetable overwrites the mmap_sem must be taken in write mode and all other anon_vma/i_mmap locks must be taken too. 2) It'd be a waste to add branches in the VM if nobody could possibly run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if CONFIG_KVM=m/y. In the current kernel kvm won't yet take advantage of mmu notifiers, but this already allows to compile a KVM external module against a kernel with mmu notifiers enabled and from the next pull from kvm.git we'll start using them. And GRU/XPMEM will also be able to continue the development by enabling KVM=m in their config, until they submit all GRU/XPMEM GPLv2 code to the mainline kernel. Then they can also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n). This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM are all =n. The mmu_notifier_register call can fail because mm_take_all_locks may be interrupted by a signal and return -EINTR. Because mmu_notifier_reigster is used when a driver startup, a failure can be gracefully handled. Here an example of the change applied to kvm to register the mmu notifiers. Usually when a driver startups other allocations are required anyway and -ENOMEM failure paths exists already. struct kvm *kvm_arch_create_vm(void) { struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); + int err; if (!kvm) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops; + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm); + if (err) { + kfree(kvm); + return ERR_PTR(err); + } + return kvm; } mmu_notifier_unregister returns void and it's reliable. The patch also adds a few needed but missing includes that would prevent kernel to compile after these changes on non-x86 archs (x86 didn't need them by luck). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix mm/filemap_xip.c build] [akpm@linux-foundation.org: fix mm/mmu_notifier.c build] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Andrea Arcangeli <andrea@qumranet.com> 2008-07-28 18:46:29 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2008-07-28 19:30:21 -0400
commit: cddb8a5c14aa89810b40495d94d3d2a0faee6619 (patch)
tree: d0b47b071f7d2dd1d6f9c36084aa8cfcef90d1da /include/linux/mmu_notifier.h
parent: 7906d00cd1f687268f0a3599442d113767795ae6 (diff)
1 files changed, 279 insertions, 0 deletions
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
index 000000000000..b77486d152cd
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,279 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/mm_types.h>
+struct mmu_notifier;
+struct mmu_notifier_ops;
+#ifdef CONFIG_MMU_NOTIFIER
+/*
+ * The mmu notifier_mm structure is allocated and installed in
+ * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
+ * critical section and it's released only when mm_count reaches zero
+ * in mmdrop().
+ */
+struct mmu_notifier_mm {
+        /* all mmu notifiers registerd in this mm are queued in this list */
+        struct hlist_head list;
+        /* to serialize the list modifications and hlist_unhashed */
+        spinlock_t lock;
+};
+struct mmu_notifier_ops {
+        /*
+         * Called either by mmu_notifier_unregister or when the mm is
+         * being destroyed by exit_mmap, always before all pages are
+         * freed. This can run concurrently with other mmu notifier
+         * methods (the ones invoked outside the mm context) and it
+         * should tear down all secondary mmu mappings and freeze the
+         * secondary mmu. If this method isn't implemented you've to
+         * be sure that nothing could possibly write to the pages
+         * through the secondary mmu by the time the last thread with
+         * tsk->mm == mm exits.
+         *
+         * As side note: the pages freed after ->release returns could
+         * be immediately reallocated by the gart at an alias physical
+         * address with a different cache model, so if ->release isn't
+         * implemented because all _software_ driven memory accesses
+         * through the secondary mmu are terminated by the time the
+         * last thread of this mm quits, you've also to be sure that
+         * speculative _hardware_ operations can't allocate dirty
+         * cachelines in the cpu that could not be snooped and made
+         * coherent with the other read and write operations happening
+         * through the gart alias address, so leading to memory
+         * corruption.
+         */
+        void (*release)(struct mmu_notifier *mn,
+                        struct mm_struct *mm);
+        /*
+         * clear_flush_young is called after the VM is
+         * test-and-clearing the young/accessed bitflag in the
+         * pte. This way the VM will provide proper aging to the
+         * accesses to the page through the secondary MMUs and not
+         * only to the ones through the Linux pte.
+         */
+        int (*clear_flush_young)(struct mmu_notifier *mn,
+                                 struct mm_struct *mm,
+                                 unsigned long address);
+        /*
+         * Before this is invoked any secondary MMU is still ok to
+         * read/write to the page previously pointed to by the Linux
+         * pte because the page hasn't been freed yet and it won't be
+         * freed until this returns. If required set_page_dirty has to
+         * be called internally to this method.
+         */
+        void (*invalidate_page)(struct mmu_notifier *mn,
+                                struct mm_struct *mm,
+                                unsigned long address);
+        /*
+         * invalidate_range_start() and invalidate_range_end() must be
+         * paired and are called only when the mmap_sem and/or the
+         * locks protecting the reverse maps are held. The subsystem
+         * must guarantee that no additional references are taken to
+         * the pages in the range established between the call to
+         * invalidate_range_start() and the matching call to
+         * invalidate_range_end().
+         *
+         * Invalidation of multiple concurrent ranges may be
+         * optionally permitted by the driver. Either way the
+         * establishment of sptes is forbidden in the range passed to
+         * invalidate_range_begin/end for the whole duration of the
+         * invalidate_range_begin/end critical section.
+         *
+         * invalidate_range_start() is called when all pages in the
+         * range are still mapped and have at least a refcount of one.
+         *
+         * invalidate_range_end() is called when all pages in the
+         * range have been unmapped and the pages have been freed by
+         * the VM.
+         *
+         * The VM will remove the page table entries and potentially
+         * the page between invalidate_range_start() and
+         * invalidate_range_end(). If the page must not be freed
+         * because of pending I/O or other circumstances then the
+         * invalidate_range_start() callback (or the initial mapping
+         * by the driver) must make sure that the refcount is kept
+         * elevated.
+         *
+         * If the driver increases the refcount when the pages are
+         * initially mapped into an address space then either
+         * invalidate_range_start() or invalidate_range_end() may
+         * decrease the refcount. If the refcount is decreased on
+         * invalidate_range_start() then the VM can free pages as page
+         * table entries are removed.  If the refcount is only
+         * droppped on invalidate_range_end() then the driver itself
+         * will drop the last refcount but it must take care to flush
+         * any secondary tlb before doing the final free on the
+         * page. Pages will no longer be referenced by the linux
+         * address space but may still be referenced by sptes until
+         * the last refcount is dropped.
+         */
+        void (*invalidate_range_start)(struct mmu_notifier *mn,
+                                       struct mm_struct *mm,
+                                       unsigned long start, unsigned long end);
+        void (*invalidate_range_end)(struct mmu_notifier *mn,
+                                     struct mm_struct *mm,
+                                     unsigned long start, unsigned long end);
+};
+/*
+ * The notifier chains are protected by mmap_sem and/or the reverse map
+ * semaphores. Notifier chains are only changed when all reverse maps and
+ * the mmap_sem locks are taken.
+ *
+ * Therefore notifier chains can only be traversed when either
+ *
+ * 1. mmap_sem is held.
+ * 2. One of the reverse map locks is held (i_mmap_lock or anon_vma->lock).
+ * 3. No other concurrent thread can access the list (release)
+ */
+struct mmu_notifier {
+        struct hlist_node hlist;
+        const struct mmu_notifier_ops *ops;
+};
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+        return unlikely(mm->mmu_notifier_mm);
+}
+extern int mmu_notifier_register(struct mmu_notifier *mn,
+                                 struct mm_struct *mm);
+extern int __mmu_notifier_register(struct mmu_notifier *mn,
+                                   struct mm_struct *mm);
+extern void mmu_notifier_unregister(struct mmu_notifier *mn,
+                                    struct mm_struct *mm);
+extern void __mmu_notifier_mm_destroy(struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+                                          unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+                                          unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+                                  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+                                  unsigned long start, unsigned long end);
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+        if (mm_has_notifiers(mm))
+                __mmu_notifier_release(mm);
+}
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+                                          unsigned long address)
+{
+        if (mm_has_notifiers(mm))
+                return __mmu_notifier_clear_flush_young(mm, address);
+        return 0;
+}
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+                                          unsigned long address)
+{
+        if (mm_has_notifiers(mm))
+                __mmu_notifier_invalidate_page(mm, address);
+}
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+                                  unsigned long start, unsigned long end)
+{
+        if (mm_has_notifiers(mm))
+                __mmu_notifier_invalidate_range_start(mm, start, end);
+}
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+                                  unsigned long start, unsigned long end)
+{
+        if (mm_has_notifiers(mm))
+                __mmu_notifier_invalidate_range_end(mm, start, end);
+}
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+        mm->mmu_notifier_mm = NULL;
+}
+static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
+{
+        if (mm_has_notifiers(mm))
+                __mmu_notifier_mm_destroy(mm);
+}
+/*
+ * These two macros will sometime replace ptep_clear_flush.
+ * ptep_clear_flush is impleemnted as macro itself, so this also is
+ * implemented as a macro until ptep_clear_flush will converted to an
+ * inline function, to diminish the risk of compilation failure. The
+ * invalidate_page method over time can be moved outside the PT lock
+ * and these two macros can be later removed.
+ */
+#define ptep_clear_flush_notify(__vma, __address, __ptep)               \
+({                                                                      \
+        pte_t __pte;                                                    \
+        struct vm_area_struct *___vma = __vma;                          \
+        unsigned long ___address = __address;                           \
+        __pte = ptep_clear_flush(___vma, ___address, __ptep);           \
+        mmu_notifier_invalidate_page(___vma->vm_mm, ___address);        \
+        __pte;                                                          \
+})
+#define ptep_clear_flush_young_notify(__vma, __address, __ptep)         \
+({                                                                      \
+        int __young;                                                    \
+        struct vm_area_struct *___vma = __vma;                          \
+        unsigned long ___address = __address;                           \
+        __young = ptep_clear_flush_young(___vma, ___address, __ptep);   \
+        __young |= mmu_notifier_clear_flush_young(___vma->vm_mm,        \
+                                                  ___address);          \
+        __young;                                                        \
+})
+#else /* CONFIG_MMU_NOTIFIER */
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+}
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+                                          unsigned long address)
+{
+        return 0;
+}
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+                                          unsigned long address)
+{
+}
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+                                  unsigned long start, unsigned long end)
+{
+}
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+                                  unsigned long start, unsigned long end)
+{
+}
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+}
+static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
+{
+}
+#define ptep_clear_flush_young_notify ptep_clear_flush_young
+#define ptep_clear_flush_notify ptep_clear_flush
+#endif /* CONFIG_MMU_NOTIFIER */
+#endif /* _LINUX_MMU_NOTIFIER_H */
author	Andrea Arcangeli <andrea@qumranet.com>	2008-07-28 18:46:29 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 19:30:21 -0400
commit	cddb8a5c14aa89810b40495d94d3d2a0faee6619 (patch)
tree	d0b47b071f7d2dd1d6f9c36084aa8cfcef90d1da /include/linux/mmu_notifier.h
parent	7906d00cd1f687268f0a3599442d113767795ae6 (diff)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h new file mode 100644 index 000000000000..b77486d152cd --- /dev/null +++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,279 @@
	1	#ifndef _LINUX_MMU_NOTIFIER_H
	2	#define _LINUX_MMU_NOTIFIER_H
	3
	4	#include <linux/list.h>
	5	#include <linux/spinlock.h>
	6	#include <linux/mm_types.h>
	7
	8	struct mmu_notifier;
	9	struct mmu_notifier_ops;
	10
	11	#ifdef CONFIG_MMU_NOTIFIER
	12
	13	/*
	14	* The mmu notifier_mm structure is allocated and installed in
	15	* mm->mmu_notifier_mm inside the mm_take_all_locks() protected
	16	* critical section and it's released only when mm_count reaches zero
	17	* in mmdrop().
	18	*/
	19	struct mmu_notifier_mm {
	20	/* all mmu notifiers registerd in this mm are queued in this list */
	21	struct hlist_head list;
	22	/* to serialize the list modifications and hlist_unhashed */
	23	spinlock_t lock;
	24	};
	25
	26	struct mmu_notifier_ops {
	27	/*
	28	* Called either by mmu_notifier_unregister or when the mm is
	29	* being destroyed by exit_mmap, always before all pages are
	30	* freed. This can run concurrently with other mmu notifier
	31	* methods (the ones invoked outside the mm context) and it
	32	* should tear down all secondary mmu mappings and freeze the
	33	* secondary mmu. If this method isn't implemented you've to
	34	* be sure that nothing could possibly write to the pages
	35	* through the secondary mmu by the time the last thread with
	36	* tsk->mm == mm exits.
	37	*
	38	* As side note: the pages freed after ->release returns could
	39	* be immediately reallocated by the gart at an alias physical
	40	* address with a different cache model, so if ->release isn't
	41	* implemented because all _software_ driven memory accesses
	42	* through the secondary mmu are terminated by the time the
	43	* last thread of this mm quits, you've also to be sure that
	44	* speculative _hardware_ operations can't allocate dirty
	45	* cachelines in the cpu that could not be snooped and made
	46	* coherent with the other read and write operations happening
	47	* through the gart alias address, so leading to memory
	48	* corruption.
	49	*/
	50	void (release)(struct mmu_notifier mn,
	51	struct mm_struct *mm);
	52
	53	/*
	54	* clear_flush_young is called after the VM is
	55	* test-and-clearing the young/accessed bitflag in the
	56	* pte. This way the VM will provide proper aging to the
	57	* accesses to the page through the secondary MMUs and not
	58	* only to the ones through the Linux pte.
	59	*/
	60	int (clear_flush_young)(struct mmu_notifier mn,
	61	struct mm_struct *mm,
	62	unsigned long address);
	63
	64	/*
	65	* Before this is invoked any secondary MMU is still ok to
	66	* read/write to the page previously pointed to by the Linux
	67	* pte because the page hasn't been freed yet and it won't be
	68	* freed until this returns. If required set_page_dirty has to
	69	* be called internally to this method.
	70	*/
	71	void (invalidate_page)(struct mmu_notifier mn,
	72	struct mm_struct *mm,
	73	unsigned long address);
	74
	75	/*
	76	* invalidate_range_start() and invalidate_range_end() must be
	77	* paired and are called only when the mmap_sem and/or the
	78	* locks protecting the reverse maps are held. The subsystem
	79	* must guarantee that no additional references are taken to
	80	* the pages in the range established between the call to
	81	* invalidate_range_start() and the matching call to
	82	* invalidate_range_end().
	83	*
	84	* Invalidation of multiple concurrent ranges may be
	85	* optionally permitted by the driver. Either way the
	86	* establishment of sptes is forbidden in the range passed to
	87	* invalidate_range_begin/end for the whole duration of the
	88	* invalidate_range_begin/end critical section.
	89	*
	90	* invalidate_range_start() is called when all pages in the
	91	* range are still mapped and have at least a refcount of one.
	92	*
	93	* invalidate_range_end() is called when all pages in the
	94	* range have been unmapped and the pages have been freed by
	95	* the VM.
	96	*
	97	* The VM will remove the page table entries and potentially
	98	* the page between invalidate_range_start() and
	99	* invalidate_range_end(). If the page must not be freed
	100	* because of pending I/O or other circumstances then the
	101	* invalidate_range_start() callback (or the initial mapping
	102	* by the driver) must make sure that the refcount is kept
	103	* elevated.
	104	*
	105	* If the driver increases the refcount when the pages are
	106	* initially mapped into an address space then either
	107	* invalidate_range_start() or invalidate_range_end() may
	108	* decrease the refcount. If the refcount is decreased on
	109	* invalidate_range_start() then the VM can free pages as page
	110	* table entries are removed. If the refcount is only
	111	* droppped on invalidate_range_end() then the driver itself
	112	* will drop the last refcount but it must take care to flush
	113	* any secondary tlb before doing the final free on the
	114	* page. Pages will no longer be referenced by the linux
	115	* address space but may still be referenced by sptes until
	116	* the last refcount is dropped.
	117	*/
	118	void (invalidate_range_start)(struct mmu_notifier mn,
	119	struct mm_struct *mm,
	120	unsigned long start, unsigned long end);
	121	void (invalidate_range_end)(struct mmu_notifier mn,
	122	struct mm_struct *mm,
	123	unsigned long start, unsigned long end);
	124	};
	125
	126	/*
	127	* The notifier chains are protected by mmap_sem and/or the reverse map
	128	* semaphores. Notifier chains are only changed when all reverse maps and
	129	* the mmap_sem locks are taken.
	130	*
	131	* Therefore notifier chains can only be traversed when either
	132	*
	133	* 1. mmap_sem is held.
	134	* 2. One of the reverse map locks is held (i_mmap_lock or anon_vma->lock).
	135	* 3. No other concurrent thread can access the list (release)
	136	*/
	137	struct mmu_notifier {
	138	struct hlist_node hlist;
	139	const struct mmu_notifier_ops *ops;
	140	};
	141
	142	static inline int mm_has_notifiers(struct mm_struct *mm)
	143	{
	144	return unlikely(mm->mmu_notifier_mm);
	145	}
	146
	147	extern int mmu_notifier_register(struct mmu_notifier *mn,
	148	struct mm_struct *mm);
	149	extern int __mmu_notifier_register(struct mmu_notifier *mn,
	150	struct mm_struct *mm);
	151	extern void mmu_notifier_unregister(struct mmu_notifier *mn,
	152	struct mm_struct *mm);
	153	extern void __mmu_notifier_mm_destroy(struct mm_struct *mm);
	154	extern void __mmu_notifier_release(struct mm_struct *mm);
	155	extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
	156	unsigned long address);
	157	extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
	158	unsigned long address);
	159	extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
	160	unsigned long start, unsigned long end);
	161	extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
	162	unsigned long start, unsigned long end);
	163
	164	static inline void mmu_notifier_release(struct mm_struct *mm)
	165	{
	166	if (mm_has_notifiers(mm))
	167	__mmu_notifier_release(mm);
	168	}
	169
	170	static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
	171	unsigned long address)
	172	{
	173	if (mm_has_notifiers(mm))
	174	return __mmu_notifier_clear_flush_young(mm, address);
	175	return 0;
	176	}
	177
	178	static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
	179	unsigned long address)
	180	{
	181	if (mm_has_notifiers(mm))
	182	__mmu_notifier_invalidate_page(mm, address);
	183	}
	184
	185	static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
	186	unsigned long start, unsigned long end)
	187	{
	188	if (mm_has_notifiers(mm))
	189	__mmu_notifier_invalidate_range_start(mm, start, end);
	190	}
	191
	192	static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
	193	unsigned long start, unsigned long end)
	194	{
	195	if (mm_has_notifiers(mm))
	196	__mmu_notifier_invalidate_range_end(mm, start, end);
	197	}
	198
	199	static inline void mmu_notifier_mm_init(struct mm_struct *mm)
	200	{
	201	mm->mmu_notifier_mm = NULL;
	202	}
	203
	204	static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
	205	{
	206	if (mm_has_notifiers(mm))
	207	__mmu_notifier_mm_destroy(mm);
	208	}
	209
	210	/*
	211	* These two macros will sometime replace ptep_clear_flush.
	212	* ptep_clear_flush is impleemnted as macro itself, so this also is
	213	* implemented as a macro until ptep_clear_flush will converted to an
	214	* inline function, to diminish the risk of compilation failure. The
	215	* invalidate_page method over time can be moved outside the PT lock
	216	* and these two macros can be later removed.
	217	*/
	218	#define ptep_clear_flush_notify(__vma, __address, __ptep) \
	219	({ \
	220	pte_t __pte; \
	221	struct vm_area_struct *___vma = __vma; \
	222	unsigned long ___address = __address; \
	223	__pte = ptep_clear_flush(___vma, ___address, __ptep); \
	224	mmu_notifier_invalidate_page(___vma->vm_mm, ___address); \
	225	__pte; \
	226	})
	227
	228	#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \
	229	({ \
	230	int __young; \
	231	struct vm_area_struct *___vma = __vma; \
	232	unsigned long ___address = __address; \
	233	__young = ptep_clear_flush_young(___vma, ___address, __ptep); \
	234	__young \|= mmu_notifier_clear_flush_young(___vma->vm_mm, \
	235	___address); \
	236	__young; \
	237	})
	238
	239	#else /* CONFIG_MMU_NOTIFIER */
	240
	241	static inline void mmu_notifier_release(struct mm_struct *mm)
	242	{
	243	}
	244
	245	static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
	246	unsigned long address)
	247	{
	248	return 0;
	249	}
	250
	251	static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
	252	unsigned long address)
	253	{
	254	}
	255
	256	static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
	257	unsigned long start, unsigned long end)
	258	{
	259	}
	260
	261	static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
	262	unsigned long start, unsigned long end)
	263	{
	264	}
	265
	266	static inline void mmu_notifier_mm_init(struct mm_struct *mm)
	267	{
	268	}
	269
	270	static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
	271	{
	272	}
	273
	274	#define ptep_clear_flush_young_notify ptep_clear_flush_young
	275	#define ptep_clear_flush_notify ptep_clear_flush
	276
	277	#endif /* CONFIG_MMU_NOTIFIER */
	278
	279	#endif /* _LINUX_MMU_NOTIFIER_H */