[PATCH] Direct Migration V9: migrate_pages() extension

Add direct migration support with fall back to swap. Direct migration support on top of the swap based page migration facility. This allows the direct migration of anonymous pages and the migration of file backed pages by dropping the associated buffers (requires writeout). Fall back to swap out if necessary. The patch is based on lots of patches from the hotplug project but the code was restructured, documented and simplified as much as possible. Note that an additional patch that defines the migrate_page() method for filesystems is necessary in order to avoid writeback for anonymous and file backed pages. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: Christoph Lameter <clameter@sgi.com> 2006-02-01 06:05:38 -0500
committer: Linus Torvalds <torvalds@g5.osdl.org> 2006-02-01 11:53:16 -0500
commit: a48d07afdf18212de22b959715b16793c5a6e57a (patch)
tree: 36d5963c29ceb5c2f6df53036cef5c0d30383dbf
parent: b16664e44c54525be89dc07ad15a13b4eeec5634 (diff)
5 files changed, 360 insertions, 22 deletions
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
new file mode 100644
index 000000000000..c52820fcf500
--- /dev/null
+++ b/Documentation/vm/page_migration
@@ -0,0 +1,129 @@
+Page migration
+--------------
+Page migration allows the moving of the physical location of pages between
+nodes in a numa system while the process is running. This means that the
+virtual addresses that the process sees do not change. However, the
+system rearranges the physical location of those pages.
+The main intend of page migration is to reduce the latency of memory access
+by moving pages near to the processor where the process accessing that memory
+is running.
+Page migration allows a process to manually relocate the node on which its
+pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
+a new memory policy. The pages of process can also be relocated
+from another process using the sys_migrate_pages() function call. The
+migrate_pages function call takes two sets of nodes and moves pages of a
+process that are located on the from nodes to the destination nodes.
+Manual migration is very useful if for example the scheduler has relocated
+a process to a processor on a distant node. A batch scheduler or an
+administrator may detect the situation and move the pages of the process
+nearer to the new processor. At some point in the future we may have
+some mechanism in the scheduler that will automatically move the pages.
+Larger installations usually partition the system using cpusets into
+sections of nodes. Paul Jackson has equipped cpusets with the ability to
+move pages when a task is moved to another cpuset. This allows automatic
+control over locality of a process. If a task is moved to a new cpuset
+then also all its pages are moved with it so that the performance of the
+process does not sink dramatically (as is the case today).
+Page migration allows the preservation of the relative location of pages
+within a group of nodes for all migration techniques which will preserve a
+particular memory allocation pattern generated even after migrating a
+process. This is necessary in order to preserve the memory latencies.
+Processes will run with similar performance after migration.
+Page migration occurs in several steps. First a high level
+description for those trying to use migrate_pages() and then
+a low level description of how the low level details work.
+A. Use of migrate_pages()
+-------------------------
+1. Remove pages from the LRU.
+   Lists of pages to be migrated are generated by scanning over
+   pages and moving them into lists. This is done by
+   calling isolate_lru_page() or __isolate_lru_page().
+   Calling isolate_lru_page increases the references to the page
+   so that it cannot vanish under us.
+2. Generate a list of newly allocates page to move the contents
+   of the first list to.
+3. The migrate_pages() function is called which attempts
+   to do the migration. It returns the moved pages in the
+   list specified as the third parameter and the failed
+   migrations in the fourth parameter. The first parameter
+   will contain the pages that could still be retried.
+4. The leftover pages of various types are returned
+   to the LRU using putback_to_lru_pages() or otherwise
+   disposed of. The pages will still have the refcount as
+   increased by isolate_lru_pages()!
+B. Operation of migrate_pages()
+--------------------------------
+migrate_pages does several passes over its list of pages. A page is moved
+if all references to a page are removable at the time.
+Steps:
+1. Lock the page to be migrated
+2. Insure that writeback is complete.
+3. Make sure that the page has assigned swap cache entry if
+   it is an anonyous page. The swap cache reference is necessary
+   to preserve the information contain in the page table maps.
+4. Prep the new page that we want to move to. It is locked
+   and set to not being uptodate so that all accesses to the new
+   page immediately lock while we are moving references.
+5. All the page table references to the page are either dropped (file backed)
+   or converted to swap references (anonymous pages). This should decrease the
+   reference count.
+6. The radix tree lock is taken
+7. The refcount of the page is examined and we back out if references remain
+   otherwise we know that we are the only one referencing this page.
+8. The radix tree is checked and if it does not contain the pointer to this
+   page then we back out.
+9. The mapping is checked. If the mapping is gone then a truncate action may
+   be in progress and we back out.
+10. The new page is prepped with some settings from the old page so that accesses
+   to the new page will be discovered to have the correct settings.
+11. The radix tree is changed to point to the new page.
+12. The reference count of the old page is dropped because the reference has now
+    been removed.
+13. The radix tree lock is dropped.
+14. The page contents are copied to the new page.
+15. The remaining page flags are copied to the new page.
+16. The old page flags are cleared to indicate that the page does
+    not use any information anymore.
+17. Queued up writeback on the new page is triggered.
+18. If swap pte's were generated for the page then remove them again.
+19. The locks are dropped from the old and new page.
+20. The new page is moved to the LRU.
+Christoph Lameter, December 19, 2005.
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9d6fbeef2104..0f1ea2d6ed86 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -91,7 +91,7 @@ static inline void page_dup_rmap(struct page *page)
 * Called from mm/vmscan.c to handle paging out
 */
 int page_referenced(struct page *, int is_locked);
-int try_to_unmap(struct page *);
+int try_to_unmap(struct page *, int ignore_refs);
 /*
 * Called from mm/filemap_xip.c to unmap empty zero page
@@ -111,7 +111,7 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
 #define anon_vma_link(vma)      do {} while (0)
 #define page_referenced(page,l) TestClearPageReferenced(page)
-#define try_to_unmap(page)      SWAP_FAIL
+#define try_to_unmap(page, refs) SWAP_FAIL
 #endif  /* CONFIG_MMU */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index e53fef7051e6..d359fc022433 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -191,6 +191,8 @@ static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
 #ifdef CONFIG_MIGRATION
 extern int isolate_lru_page(struct page *p);
 extern int putback_lru_pages(struct list_head *l);
+extern int migrate_page(struct page *, struct page *);
+extern void migrate_page_copy(struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, struct list_head *t,
                struct list_head *moved, struct list_head *failed);
 #else
diff --git a/mm/rmap.c b/mm/rmap.c
index d85a99d28c03..13fad5fcdf79 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -52,6 +52,7 @@
 #include <linux/init.h>
 #include <linux/rmap.h>
 #include <linux/rcupdate.h>
+#include <linux/module.h>
 #include <asm/tlbflush.h>
@@ -541,7 +542,8 @@ void page_remove_rmap(struct page *page)
 * Subfunctions of try_to_unmap: try_to_unmap_one called
 * repeatedly from either try_to_unmap_anon or try_to_unmap_file.
 */
-static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma)
+static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
+                                int ignore_refs)
 {
        struct mm_struct *mm = vma->vm_mm;
        unsigned long address;
@@ -564,7 +566,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma)
         * skipped over this mm) then we should reactivate it.
         */
        if ((vma->vm_flags & VM_LOCKED) ||
-                        ptep_clear_flush_young(vma, address, pte)) {
+                        (ptep_clear_flush_young(vma, address, pte)
+                                && !ignore_refs)) {
                ret = SWAP_FAIL;
                goto out_unmap;
        }
@@ -698,7 +701,7 @@ static void try_to_unmap_cluster(unsigned long cursor,
        pte_unmap_unlock(pte - 1, ptl);
 }
-static int try_to_unmap_anon(struct page *page)
+static int try_to_unmap_anon(struct page *page, int ignore_refs)
 {
        struct anon_vma *anon_vma;
        struct vm_area_struct *vma;
@@ -709,7 +712,7 @@ static int try_to_unmap_anon(struct page *page)
                return ret;
        list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
-                ret = try_to_unmap_one(page, vma);
+                ret = try_to_unmap_one(page, vma, ignore_refs);
                if (ret == SWAP_FAIL || !page_mapped(page))
                        break;
        }
@@ -726,7 +729,7 @@ static int try_to_unmap_anon(struct page *page)
 *
 * This function is only called from try_to_unmap for object-based pages.
 */
-static int try_to_unmap_file(struct page *page)
+static int try_to_unmap_file(struct page *page, int ignore_refs)
 {
        struct address_space *mapping = page->mapping;
        pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -740,7 +743,7 @@ static int try_to_unmap_file(struct page *page)
        spin_lock(&mapping->i_mmap_lock);
        vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-                ret = try_to_unmap_one(page, vma);
+                ret = try_to_unmap_one(page, vma, ignore_refs);
                if (ret == SWAP_FAIL || !page_mapped(page))
                        goto out;
        }
@@ -825,16 +828,16 @@ out:
 * SWAP_AGAIN   - we missed a mapping, try again later
 * SWAP_FAIL    - the page is unswappable
 */
-int try_to_unmap(struct page *page)
+int try_to_unmap(struct page *page, int ignore_refs)
 {
        int ret;
        BUG_ON(!PageLocked(page));
        if (PageAnon(page))
-                ret = try_to_unmap_anon(page);
+                ret = try_to_unmap_anon(page, ignore_refs);
        else
-                ret = try_to_unmap_file(page);
+                ret = try_to_unmap_file(page, ignore_refs);
        if (!page_mapped(page))
                ret = SWAP_SUCCESS;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index aa4b80dbe3ad..8f326ce2b690 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -483,7 +483,7 @@ static int shrink_list(struct list_head *page_list, struct scan_control *sc)
                        if (!sc->may_swap)
                                goto keep_locked;
-                        switch (try_to_unmap(page)) {
+                        switch (try_to_unmap(page, 0)) {
                        case SWAP_FAIL:
                                goto activate_locked;
                        case SWAP_AGAIN:
@@ -623,7 +623,7 @@ static int swap_page(struct page *page)
        struct address_space *mapping = page_mapping(page);
        if (page_mapped(page) && mapping)
-                if (try_to_unmap(page) != SWAP_SUCCESS)
+                if (try_to_unmap(page, 0) != SWAP_SUCCESS)
                        goto unlock_retry;
        if (PageDirty(page)) {
@@ -659,6 +659,154 @@ unlock_retry:
 retry:
        return -EAGAIN;
 }
+/*
+ * Page migration was first developed in the context of the memory hotplug
+ * project. The main authors of the migration code are:
+ *
+ * IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
+ * Hirokazu Takahashi <taka@valinux.co.jp>
+ * Dave Hansen <haveblue@us.ibm.com>
+ * Christoph Lameter <clameter@sgi.com>
+ */
+/*
+ * Remove references for a page and establish the new page with the correct
+ * basic settings to be able to stop accesses to the page.
+ */
+static int migrate_page_remove_references(struct page *newpage,
+                                struct page *page, int nr_refs)
+{
+        struct address_space *mapping = page_mapping(page);
+        struct page **radix_pointer;
+        /*
+         * Avoid doing any of the following work if the page count
+         * indicates that the page is in use or truncate has removed
+         * the page.
+         */
+        if (!mapping || page_mapcount(page) + nr_refs != page_count(page))
+                return 1;
+        /*
+         * Establish swap ptes for anonymous pages or destroy pte
+         * maps for files.
+         *
+         * In order to reestablish file backed mappings the fault handlers
+         * will take the radix tree_lock which may then be used to stop
+         * processses from accessing this page until the new page is ready.
+         *
+         * A process accessing via a swap pte (an anonymous page) will take a
+         * page_lock on the old page which will block the process until the
+         * migration attempt is complete. At that time the PageSwapCache bit
+         * will be examined. If the page was migrated then the PageSwapCache
+         * bit will be clear and the operation to retrieve the page will be
+         * retried which will find the new page in the radix tree. Then a new
+         * direct mapping may be generated based on the radix tree contents.
+         *
+         * If the page was not migrated then the PageSwapCache bit
+         * is still set and the operation may continue.
+         */
+        try_to_unmap(page, 1);
+        /*
+         * Give up if we were unable to remove all mappings.
+         */
+        if (page_mapcount(page))
+                return 1;
+        write_lock_irq(&mapping->tree_lock);
+        radix_pointer = (struct page **)radix_tree_lookup_slot(
+                                                &mapping->page_tree,
+                                                page_index(page));
+        if (!page_mapping(page) || page_count(page) != nr_refs ||
+                        *radix_pointer != page) {
+                write_unlock_irq(&mapping->tree_lock);
+                return 1;
+        }
+        /*
+         * Now we know that no one else is looking at the page.
+         *
+         * Certain minimal information about a page must be available
+         * in order for other subsystems to properly handle the page if they
+         * find it through the radix tree update before we are finished
+         * copying the page.
+         */
+        get_page(newpage);
+        newpage->index = page->index;
+        newpage->mapping = page->mapping;
+        if (PageSwapCache(page)) {
+                SetPageSwapCache(newpage);
+                set_page_private(newpage, page_private(page));
+        }
+        *radix_pointer = newpage;
+        __put_page(page);
+        write_unlock_irq(&mapping->tree_lock);
+        return 0;
+}
+/*
+ * Copy the page to its new location
+ */
+void migrate_page_copy(struct page *newpage, struct page *page)
+{
+        copy_highpage(newpage, page);
+        if (PageError(page))
+                SetPageError(newpage);
+        if (PageReferenced(page))
+                SetPageReferenced(newpage);
+        if (PageUptodate(page))
+                SetPageUptodate(newpage);
+        if (PageActive(page))
+                SetPageActive(newpage);
+        if (PageChecked(page))
+                SetPageChecked(newpage);
+        if (PageMappedToDisk(page))
+                SetPageMappedToDisk(newpage);
+        if (PageDirty(page)) {
+                clear_page_dirty_for_io(page);
+                set_page_dirty(newpage);
+        }
+        ClearPageSwapCache(page);
+        ClearPageActive(page);
+        ClearPagePrivate(page);
+        set_page_private(page, 0);
+        page->mapping = NULL;
+        /*
+         * If any waiters have accumulated on the new page then
+         * wake them up.
+         */
+        if (PageWriteback(newpage))
+                end_page_writeback(newpage);
+}
+/*
+ * Common logic to directly migrate a single page suitable for
+ * pages that do not use PagePrivate.
+ *
+ * Pages are locked upon entry and exit.
+ */
+int migrate_page(struct page *newpage, struct page *page)
+{
+        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
+        if (migrate_page_remove_references(newpage, page, 2))
+                return -EAGAIN;
+        migrate_page_copy(newpage, page);
+        return 0;
+}
 /*
 * migrate_pages
 *
@@ -672,11 +820,6 @@ retry:
 * are movable anymore because t has become empty
 * or no retryable pages exist anymore.
 *
- * SIMPLIFIED VERSION: This implementation of migrate_pages
- * is only swapping out pages and never touches the second
- * list. The direct migration patchset
- * extends this function to avoid the use of swap.
- *
 * Return: Number of pages not migrated when "to" ran empty.
 */
 int migrate_pages(struct list_head *from, struct list_head *to,
@@ -697,6 +840,9 @@ redo:
        retry = 0;
        list_for_each_entry_safe(page, page2, from, lru) {
+                struct page *newpage = NULL;
+                struct address_space *mapping;
                cond_resched();
                rc = 0;
@@ -704,6 +850,9 @@ redo:
                        /* page was freed from under us. So we are done. */
                        goto next;
+                if (to && list_empty(to))
+                        break;
                /*
                 * Skip locked pages during the first two passes to give the
                 * functions holding the lock time to release the page. Later we
@@ -740,12 +889,64 @@ redo:
                        }
                }
+                if (!to) {
+                        rc = swap_page(page);
+                        goto next;
+                }
+                newpage = lru_to_page(to);
+                lock_page(newpage);
                /*
-                 * Page is properly locked and writeback is complete.
+                 * Pages are properly locked and writeback is complete.
                 * Try to migrate the page.
                 */
-                rc = swap_page(page);
+                mapping = page_mapping(page);
-                goto next;
+                if (!mapping)
+                        goto unlock_both;
+                /*
+                 * Trigger writeout if page is dirty
+                 */
+                if (PageDirty(page)) {
+                        switch (pageout(page, mapping)) {
+                        case PAGE_KEEP:
+                        case PAGE_ACTIVATE:
+                                goto unlock_both;
+                        case PAGE_SUCCESS:
+                                unlock_page(newpage);
+                                goto next;
+                        case PAGE_CLEAN:
+                                ; /* try to migrate the page below */
+                        }
+                }
+                /*
+                 * If we have no buffer or can release the buffer
+                 * then do a simple migration.
+                 */
+                if (!page_has_buffers(page) ||
+                    try_to_release_page(page, GFP_KERNEL)) {
+                        rc = migrate_page(newpage, page);
+                        goto unlock_both;
+                }
+                /*
+                 * On early passes with mapped pages simply
+                 * retry. There may be a lock held for some
+                 * buffers that may go away. Later
+                 * swap them out.
+                 */
+                if (pass > 4) {
+                        unlock_page(newpage);
+                        newpage = NULL;
+                        rc = swap_page(page);
+                        goto next;
+                }
+unlock_both:
+                unlock_page(newpage);
 unlock_page:
                unlock_page(page);
@@ -758,7 +959,10 @@ next:
                        list_move(&page->lru, failed);
                        nr_failed++;
                } else {
-                        /* Success */
+                        if (newpage) {
+                                /* Successful migration. Return page to LRU */
+                                move_to_lru(newpage);
+                        }
                        list_move(&page->lru, moved);
                }
        }
author	Christoph Lameter <clameter@sgi.com>	2006-02-01 06:05:38 -0500
committer	Linus Torvalds <torvalds@g5.osdl.org>	2006-02-01 11:53:16 -0500
commit	a48d07afdf18212de22b959715b16793c5a6e57a (patch)
tree	36d5963c29ceb5c2f6df53036cef5c0d30383dbf
parent	b16664e44c54525be89dc07ad15a13b4eeec5634 (diff)

diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration new file mode 100644 index 000000000000..c52820fcf500 --- /dev/null +++ b/Documentation/vm/page_migration
@@ -0,0 +1,129 @@
		1	Page migration
		2	--------------
		3
		4	Page migration allows the moving of the physical location of pages between
		5	nodes in a numa system while the process is running. This means that the
		6	virtual addresses that the process sees do not change. However, the
		7	system rearranges the physical location of those pages.
		8
		9	The main intend of page migration is to reduce the latency of memory access
		10	by moving pages near to the processor where the process accessing that memory
		11	is running.
		12
		13	Page migration allows a process to manually relocate the node on which its
		14	pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
		15	a new memory policy. The pages of process can also be relocated
		16	from another process using the sys_migrate_pages() function call. The
		17	migrate_pages function call takes two sets of nodes and moves pages of a
		18	process that are located on the from nodes to the destination nodes.
		19
		20	Manual migration is very useful if for example the scheduler has relocated
		21	a process to a processor on a distant node. A batch scheduler or an
		22	administrator may detect the situation and move the pages of the process
		23	nearer to the new processor. At some point in the future we may have
		24	some mechanism in the scheduler that will automatically move the pages.
		25
		26	Larger installations usually partition the system using cpusets into
		27	sections of nodes. Paul Jackson has equipped cpusets with the ability to
		28	move pages when a task is moved to another cpuset. This allows automatic
		29	control over locality of a process. If a task is moved to a new cpuset
		30	then also all its pages are moved with it so that the performance of the
		31	process does not sink dramatically (as is the case today).
		32
		33	Page migration allows the preservation of the relative location of pages
		34	within a group of nodes for all migration techniques which will preserve a
		35	particular memory allocation pattern generated even after migrating a
		36	process. This is necessary in order to preserve the memory latencies.
		37	Processes will run with similar performance after migration.
		38
		39	Page migration occurs in several steps. First a high level
		40	description for those trying to use migrate_pages() and then
		41	a low level description of how the low level details work.
		42
		43	A. Use of migrate_pages()
		44	-------------------------
		45
		46	1. Remove pages from the LRU.
		47
		48	Lists of pages to be migrated are generated by scanning over
		49	pages and moving them into lists. This is done by
		50	calling isolate_lru_page() or __isolate_lru_page().
		51	Calling isolate_lru_page increases the references to the page
		52	so that it cannot vanish under us.
		53
		54	2. Generate a list of newly allocates page to move the contents
		55	of the first list to.
		56
		57	3. The migrate_pages() function is called which attempts
		58	to do the migration. It returns the moved pages in the
		59	list specified as the third parameter and the failed
		60	migrations in the fourth parameter. The first parameter
		61	will contain the pages that could still be retried.
		62
		63	4. The leftover pages of various types are returned
		64	to the LRU using putback_to_lru_pages() or otherwise
		65	disposed of. The pages will still have the refcount as
		66	increased by isolate_lru_pages()!
		67
		68	B. Operation of migrate_pages()
		69	--------------------------------
		70
		71	migrate_pages does several passes over its list of pages. A page is moved
		72	if all references to a page are removable at the time.
		73
		74	Steps:
		75
		76	1. Lock the page to be migrated
		77
		78	2. Insure that writeback is complete.
		79
		80	3. Make sure that the page has assigned swap cache entry if
		81	it is an anonyous page. The swap cache reference is necessary
		82	to preserve the information contain in the page table maps.
		83
		84	4. Prep the new page that we want to move to. It is locked
		85	and set to not being uptodate so that all accesses to the new
		86	page immediately lock while we are moving references.
		87
		88	5. All the page table references to the page are either dropped (file backed)
		89	or converted to swap references (anonymous pages). This should decrease the
		90	reference count.
		91
		92	6. The radix tree lock is taken
		93
		94	7. The refcount of the page is examined and we back out if references remain
		95	otherwise we know that we are the only one referencing this page.
		96
		97	8. The radix tree is checked and if it does not contain the pointer to this
		98	page then we back out.
		99
		100	9. The mapping is checked. If the mapping is gone then a truncate action may
		101	be in progress and we back out.
		102
		103	10. The new page is prepped with some settings from the old page so that accesses
		104	to the new page will be discovered to have the correct settings.
		105
		106	11. The radix tree is changed to point to the new page.
		107
		108	12. The reference count of the old page is dropped because the reference has now
		109	been removed.
		110
		111	13. The radix tree lock is dropped.
		112
		113	14. The page contents are copied to the new page.
		114
		115	15. The remaining page flags are copied to the new page.
		116
		117	16. The old page flags are cleared to indicate that the page does
		118	not use any information anymore.
		119
		120	17. Queued up writeback on the new page is triggered.
		121
		122	18. If swap pte's were generated for the page then remove them again.
		123
		124	19. The locks are dropped from the old and new page.
		125
		126	20. The new page is moved to the LRU.
		127
		128	Christoph Lameter, December 19, 2005.
		129


diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9d6fbeef2104..0f1ea2d6ed86 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h
@@ -91,7 +91,7 @@ static inline void page_dup_rmap(struct page *page)
91	* Called from mm/vmscan.c to handle paging out	91	* Called from mm/vmscan.c to handle paging out
92	*/	92	*/
93	int page_referenced(struct page *, int is_locked);	93	int page_referenced(struct page *, int is_locked);
94	int try_to_unmap(struct page *);	94	int try_to_unmap(struct page *, int ignore_refs);
95		95
96	/*	96	/*
97	* Called from mm/filemap_xip.c to unmap empty zero page	97	* Called from mm/filemap_xip.c to unmap empty zero page
@@ -111,7 +111,7 @@ unsigned long page_address_in_vma(struct page , struct vm_area_struct );
111	#define anon_vma_link(vma) do {} while (0)	111	#define anon_vma_link(vma) do {} while (0)
112		112
113	#define page_referenced(page,l) TestClearPageReferenced(page)	113	#define page_referenced(page,l) TestClearPageReferenced(page)
114	#define try_to_unmap(page) SWAP_FAIL	114	#define try_to_unmap(page, refs) SWAP_FAIL
115		115
116	#endif /* CONFIG_MMU */	116	#endif /* CONFIG_MMU */
117		117


diff --git a/include/linux/swap.h b/include/linux/swap.h index e53fef7051e6..d359fc022433 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h
@@ -191,6 +191,8 @@ static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
191	#ifdef CONFIG_MIGRATION	191	#ifdef CONFIG_MIGRATION
192	extern int isolate_lru_page(struct page *p);	192	extern int isolate_lru_page(struct page *p);
193	extern int putback_lru_pages(struct list_head *l);	193	extern int putback_lru_pages(struct list_head *l);
		194	extern int migrate_page(struct page , struct page );
		195	extern void migrate_page_copy(struct page , struct page );
194	extern int migrate_pages(struct list_head l, struct list_head t,	196	extern int migrate_pages(struct list_head l, struct list_head t,
195	struct list_head moved, struct list_head failed);	197	struct list_head moved, struct list_head failed);
196	#else	198	#else


diff --git a/mm/rmap.c b/mm/rmap.c index d85a99d28c03..13fad5fcdf79 100644 --- a/mm/rmap.c +++ b/mm/rmap.c
@@ -52,6 +52,7 @@
52	#include <linux/init.h>	52	#include <linux/init.h>
53	#include <linux/rmap.h>	53	#include <linux/rmap.h>
54	#include <linux/rcupdate.h>	54	#include <linux/rcupdate.h>
		55	#include <linux/module.h>
55		56
56	#include <asm/tlbflush.h>	57	#include <asm/tlbflush.h>
57		58
@@ -541,7 +542,8 @@ void page_remove_rmap(struct page *page)
541	* Subfunctions of try_to_unmap: try_to_unmap_one called	542	* Subfunctions of try_to_unmap: try_to_unmap_one called
542	* repeatedly from either try_to_unmap_anon or try_to_unmap_file.	543	* repeatedly from either try_to_unmap_anon or try_to_unmap_file.
543	*/	544	*/
544	static int try_to_unmap_one(struct page page, struct vm_area_struct vma)	545	static int try_to_unmap_one(struct page page, struct vm_area_struct vma,
		546	int ignore_refs)
545	{	547	{
546	struct mm_struct *mm = vma->vm_mm;	548	struct mm_struct *mm = vma->vm_mm;
547	unsigned long address;	549	unsigned long address;
@@ -564,7 +566,8 @@ static int try_to_unmap_one(struct page page, struct vm_area_struct vma)
564	* skipped over this mm) then we should reactivate it.	566	* skipped over this mm) then we should reactivate it.
565	*/	567	*/
566	if ((vma->vm_flags & VM_LOCKED) \|\|	568	if ((vma->vm_flags & VM_LOCKED) \|\|
567	ptep_clear_flush_young(vma, address, pte)) {	569	(ptep_clear_flush_young(vma, address, pte)
		570	&& !ignore_refs)) {
568	ret = SWAP_FAIL;	571	ret = SWAP_FAIL;
569	goto out_unmap;	572	goto out_unmap;
570	}	573	}
@@ -698,7 +701,7 @@ static void try_to_unmap_cluster(unsigned long cursor,
698	pte_unmap_unlock(pte - 1, ptl);	701	pte_unmap_unlock(pte - 1, ptl);
699	}	702	}
700		703
701	static int try_to_unmap_anon(struct page *page)	704	static int try_to_unmap_anon(struct page *page, int ignore_refs)
702	{	705	{
703	struct anon_vma *anon_vma;	706	struct anon_vma *anon_vma;
704	struct vm_area_struct *vma;	707	struct vm_area_struct *vma;
@@ -709,7 +712,7 @@ static int try_to_unmap_anon(struct page *page)
709	return ret;	712	return ret;
710		713
711	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {	714	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
712	ret = try_to_unmap_one(page, vma);	715	ret = try_to_unmap_one(page, vma, ignore_refs);
713	if (ret == SWAP_FAIL \|\| !page_mapped(page))	716	if (ret == SWAP_FAIL \|\| !page_mapped(page))
714	break;	717	break;
715	}	718	}
@@ -726,7 +729,7 @@ static int try_to_unmap_anon(struct page *page)
726	*	729	*
727	* This function is only called from try_to_unmap for object-based pages.	730	* This function is only called from try_to_unmap for object-based pages.
728	*/	731	*/
729	static int try_to_unmap_file(struct page *page)	732	static int try_to_unmap_file(struct page *page, int ignore_refs)
730	{	733	{
731	struct address_space *mapping = page->mapping;	734	struct address_space *mapping = page->mapping;
732	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);	735	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -740,7 +743,7 @@ static int try_to_unmap_file(struct page *page)
740		743
741	spin_lock(&mapping->i_mmap_lock);	744	spin_lock(&mapping->i_mmap_lock);
742	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {	745	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
743	ret = try_to_unmap_one(page, vma);	746	ret = try_to_unmap_one(page, vma, ignore_refs);
744	if (ret == SWAP_FAIL \|\| !page_mapped(page))	747	if (ret == SWAP_FAIL \|\| !page_mapped(page))
745	goto out;	748	goto out;
746	}	749	}
@@ -825,16 +828,16 @@ out:
825	* SWAP_AGAIN - we missed a mapping, try again later	828	* SWAP_AGAIN - we missed a mapping, try again later
826	* SWAP_FAIL - the page is unswappable	829	* SWAP_FAIL - the page is unswappable
827	*/	830	*/
828	int try_to_unmap(struct page *page)	831	int try_to_unmap(struct page *page, int ignore_refs)
829	{	832	{
830	int ret;	833	int ret;
831		834
832	BUG_ON(!PageLocked(page));	835	BUG_ON(!PageLocked(page));
833		836
834	if (PageAnon(page))	837	if (PageAnon(page))
835	ret = try_to_unmap_anon(page);	838	ret = try_to_unmap_anon(page, ignore_refs);
836	else	839	else
837	ret = try_to_unmap_file(page);	840	ret = try_to_unmap_file(page, ignore_refs);
838		841
839	if (!page_mapped(page))	842	if (!page_mapped(page))
840	ret = SWAP_SUCCESS;	843	ret = SWAP_SUCCESS;


diff --git a/mm/vmscan.c b/mm/vmscan.c index aa4b80dbe3ad..8f326ce2b690 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c
@@ -483,7 +483,7 @@ static int shrink_list(struct list_head page_list, struct scan_control sc)
483	if (!sc->may_swap)	483	if (!sc->may_swap)
484	goto keep_locked;	484	goto keep_locked;
485		485
486	switch (try_to_unmap(page)) {	486	switch (try_to_unmap(page, 0)) {
487	case SWAP_FAIL:	487	case SWAP_FAIL:
488	goto activate_locked;	488	goto activate_locked;
489	case SWAP_AGAIN:	489	case SWAP_AGAIN:
@@ -623,7 +623,7 @@ static int swap_page(struct page *page)
623	struct address_space *mapping = page_mapping(page);	623	struct address_space *mapping = page_mapping(page);
624		624
625	if (page_mapped(page) && mapping)	625	if (page_mapped(page) && mapping)
626	if (try_to_unmap(page) != SWAP_SUCCESS)	626	if (try_to_unmap(page, 0) != SWAP_SUCCESS)
627	goto unlock_retry;	627	goto unlock_retry;
628		628
629	if (PageDirty(page)) {	629	if (PageDirty(page)) {
@@ -659,6 +659,154 @@ unlock_retry:
659	retry:	659	retry:
660	return -EAGAIN;	660	return -EAGAIN;
661	}	661	}
		662
		663	/*
		664	* Page migration was first developed in the context of the memory hotplug
		665	* project. The main authors of the migration code are:
		666	*
		667	* IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
		668	* Hirokazu Takahashi <taka@valinux.co.jp>
		669	* Dave Hansen <haveblue@us.ibm.com>
		670	* Christoph Lameter <clameter@sgi.com>
		671	*/
		672
		673	/*
		674	* Remove references for a page and establish the new page with the correct
		675	* basic settings to be able to stop accesses to the page.
		676	*/
		677	static int migrate_page_remove_references(struct page *newpage,
		678	struct page *page, int nr_refs)
		679	{
		680	struct address_space *mapping = page_mapping(page);
		681	struct page **radix_pointer;
		682
		683	/*
		684	* Avoid doing any of the following work if the page count
		685	* indicates that the page is in use or truncate has removed
		686	* the page.
		687	*/
		688	if (!mapping \|\| page_mapcount(page) + nr_refs != page_count(page))
		689	return 1;
		690
		691	/*
		692	* Establish swap ptes for anonymous pages or destroy pte
		693	* maps for files.
		694	*
		695	* In order to reestablish file backed mappings the fault handlers
		696	* will take the radix tree_lock which may then be used to stop
		697	* processses from accessing this page until the new page is ready.
		698	*
		699	* A process accessing via a swap pte (an anonymous page) will take a
		700	* page_lock on the old page which will block the process until the
		701	* migration attempt is complete. At that time the PageSwapCache bit
		702	* will be examined. If the page was migrated then the PageSwapCache
		703	* bit will be clear and the operation to retrieve the page will be
		704	* retried which will find the new page in the radix tree. Then a new
		705	* direct mapping may be generated based on the radix tree contents.
		706	*
		707	* If the page was not migrated then the PageSwapCache bit
		708	* is still set and the operation may continue.
		709	*/
		710	try_to_unmap(page, 1);
		711
		712	/*
		713	* Give up if we were unable to remove all mappings.
		714	*/
		715	if (page_mapcount(page))
		716	return 1;
		717
		718	write_lock_irq(&mapping->tree_lock);
		719
		720	radix_pointer = (struct page **)radix_tree_lookup_slot(
		721	&mapping->page_tree,
		722	page_index(page));
		723
		724	if (!page_mapping(page) \|\| page_count(page) != nr_refs \|\|
		725	*radix_pointer != page) {
		726	write_unlock_irq(&mapping->tree_lock);
		727	return 1;
		728	}
		729
		730	/*
		731	* Now we know that no one else is looking at the page.
		732	*
		733	* Certain minimal information about a page must be available
		734	* in order for other subsystems to properly handle the page if they
		735	* find it through the radix tree update before we are finished
		736	* copying the page.
		737	*/
		738	get_page(newpage);
		739	newpage->index = page->index;
		740	newpage->mapping = page->mapping;
		741	if (PageSwapCache(page)) {
		742	SetPageSwapCache(newpage);
		743	set_page_private(newpage, page_private(page));
		744	}
		745
		746	*radix_pointer = newpage;
		747	__put_page(page);
		748	write_unlock_irq(&mapping->tree_lock);
		749
		750	return 0;
		751	}
		752
		753	/*
		754	* Copy the page to its new location
		755	*/
		756	void migrate_page_copy(struct page newpage, struct page page)
		757	{
		758	copy_highpage(newpage, page);
		759
		760	if (PageError(page))
		761	SetPageError(newpage);
		762	if (PageReferenced(page))
		763	SetPageReferenced(newpage);
		764	if (PageUptodate(page))
		765	SetPageUptodate(newpage);
		766	if (PageActive(page))
		767	SetPageActive(newpage);
		768	if (PageChecked(page))
		769	SetPageChecked(newpage);
		770	if (PageMappedToDisk(page))
		771	SetPageMappedToDisk(newpage);
		772
		773	if (PageDirty(page)) {
		774	clear_page_dirty_for_io(page);
		775	set_page_dirty(newpage);
		776	}
		777
		778	ClearPageSwapCache(page);
		779	ClearPageActive(page);
		780	ClearPagePrivate(page);
		781	set_page_private(page, 0);
		782	page->mapping = NULL;
		783
		784	/*
		785	* If any waiters have accumulated on the new page then
		786	* wake them up.
		787	*/
		788	if (PageWriteback(newpage))
		789	end_page_writeback(newpage);
		790	}
		791
		792	/*
		793	* Common logic to directly migrate a single page suitable for
		794	* pages that do not use PagePrivate.
		795	*
		796	* Pages are locked upon entry and exit.
		797	*/
		798	int migrate_page(struct page newpage, struct page page)
		799	{
		800	BUG_ON(PageWriteback(page)); /* Writeback must be complete */
		801
		802	if (migrate_page_remove_references(newpage, page, 2))
		803	return -EAGAIN;
		804
		805	migrate_page_copy(newpage, page);
		806
		807	return 0;
		808	}
		809
662	/*	810	/*
663	* migrate_pages	811	* migrate_pages
664	*	812	*
@@ -672,11 +820,6 @@ retry:
672	* are movable anymore because t has become empty	820	* are movable anymore because t has become empty
673	* or no retryable pages exist anymore.	821	* or no retryable pages exist anymore.
674	*	822	*
675	* SIMPLIFIED VERSION: This implementation of migrate_pages
676	* is only swapping out pages and never touches the second
677	* list. The direct migration patchset
678	* extends this function to avoid the use of swap.
679	*
680	* Return: Number of pages not migrated when "to" ran empty.	823	* Return: Number of pages not migrated when "to" ran empty.
681	*/	824	*/
682	int migrate_pages(struct list_head from, struct list_head to,	825	int migrate_pages(struct list_head from, struct list_head to,
@@ -697,6 +840,9 @@ redo:
697	retry = 0;	840	retry = 0;
698		841
699	list_for_each_entry_safe(page, page2, from, lru) {	842	list_for_each_entry_safe(page, page2, from, lru) {
		843	struct page *newpage = NULL;
		844	struct address_space *mapping;
		845
700	cond_resched();	846	cond_resched();
701		847
702	rc = 0;	848	rc = 0;
@@ -704,6 +850,9 @@ redo:
704	/* page was freed from under us. So we are done. */	850	/* page was freed from under us. So we are done. */
705	goto next;	851	goto next;
706		852
		853	if (to && list_empty(to))
		854	break;
		855
707	/*	856	/*
708	* Skip locked pages during the first two passes to give the	857	* Skip locked pages during the first two passes to give the
709	* functions holding the lock time to release the page. Later we	858	* functions holding the lock time to release the page. Later we
@@ -740,12 +889,64 @@ redo:
740	}	889	}
741	}	890	}
742		891
		892	if (!to) {
		893	rc = swap_page(page);
		894	goto next;
		895	}
		896
		897	newpage = lru_to_page(to);
		898	lock_page(newpage);
		899
743	/*	900	/*
744	* Page is properly locked and writeback is complete.	901	* Pages are properly locked and writeback is complete.
745	* Try to migrate the page.	902	* Try to migrate the page.
746	*/	903	*/
747	rc = swap_page(page);	904	mapping = page_mapping(page);
748	goto next;	905	if (!mapping)
		906	goto unlock_both;
		907
		908	/*
		909	* Trigger writeout if page is dirty
		910	*/
		911	if (PageDirty(page)) {
		912	switch (pageout(page, mapping)) {
		913	case PAGE_KEEP:
		914	case PAGE_ACTIVATE:
		915	goto unlock_both;
		916
		917	case PAGE_SUCCESS:
		918	unlock_page(newpage);
		919	goto next;
		920
		921	case PAGE_CLEAN:
		922	; /* try to migrate the page below */
		923	}
		924	}
		925	/*
		926	* If we have no buffer or can release the buffer
		927	* then do a simple migration.
		928	*/
		929	if (!page_has_buffers(page) \|\|
		930	try_to_release_page(page, GFP_KERNEL)) {
		931	rc = migrate_page(newpage, page);
		932	goto unlock_both;
		933	}
		934
		935	/*
		936	* On early passes with mapped pages simply
		937	* retry. There may be a lock held for some
		938	* buffers that may go away. Later
		939	* swap them out.
		940	*/
		941	if (pass > 4) {
		942	unlock_page(newpage);
		943	newpage = NULL;
		944	rc = swap_page(page);
		945	goto next;
		946	}
		947
		948	unlock_both:
		949	unlock_page(newpage);
749		950
750	unlock_page:	951	unlock_page:
751	unlock_page(page);	952	unlock_page(page);
@@ -758,7 +959,10 @@ next:
758	list_move(&page->lru, failed);	959	list_move(&page->lru, failed);
759	nr_failed++;	960	nr_failed++;
760	} else {	961	} else {
761	/* Success */	962	if (newpage) {
		963	/* Successful migration. Return page to LRU */
		964	move_to_lru(newpage);
		965	}
762	list_move(&page->lru, moved);	966	list_move(&page->lru, moved);
763	}	967	}
764	}	968	}