aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMinchan Kim <minchan@kernel.org>2016-07-26 18:23:05 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2016-07-26 19:19:19 -0400
commitbda807d4445414e8e77da704f116bb0880fe0c76 (patch)
tree4e3b462d23437d7521081758c2005ae0025978f7
parentc6c919eb90e021fbcfcbfa9dd3d55930cdbb67f9 (diff)
mm: migrate: support non-lru movable page migration
We have allowed migration for only LRU pages until now and it was enough to make high-order pages. But recently, embedded system(e.g., webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory) so we have seen several reports about troubles of small high-order allocation. For fixing the problem, there were several efforts (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page, reserved memory, vmalloc and so on) but if there are lots of non-movable pages in system, their solutions are void in the long run. So, this patch is to support facility to change non-movable pages with movable. For the feature, this patch introduces functions related to migration to address_space_operations as well as some page flags. If a driver want to make own pages movable, it should define three functions which are function pointers of struct address_space_operations. 1. bool (*isolate_page) (struct page *page, isolate_mode_t mode); What VM expects on isolate_page function of driver is to return *true* if driver isolates page successfully. On returing true, VM marks the page as PG_isolated so concurrent isolation in several CPUs skip the page for isolation. If a driver cannot isolate the page, it should return *false*. Once page is successfully isolated, VM uses page.lru fields so driver shouldn't expect to preserve values in that fields. 2. int (*migratepage) (struct address_space *mapping, struct page *newpage, struct page *oldpage, enum migrate_mode); After isolation, VM calls migratepage of driver with isolated page. The function of migratepage is to move content of the old page to new page and set up fields of struct page newpage. Keep in mind that you should indicate to the VM the oldpage is no longer movable via __ClearPageMovable() under page_lock if you migrated the oldpage successfully and returns 0. If driver cannot migrate the page at the moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time because VM interprets -EAGAIN as "temporal migration failure". On returning any error except -EAGAIN, VM will give up the page migration without retrying in this time. Driver shouldn't touch page.lru field VM using in the functions. 3. void (*putback_page)(struct page *); If migration fails on isolated page, VM should return the isolated page to the driver so VM calls driver's putback_page with migration failed page. In this function, driver should put the isolated page back to the own data structure. 4. non-lru movable page flags There are two page flags for supporting non-lru movable page. * PG_movable Driver should use the below function to make page movable under page_lock. void __SetPageMovable(struct page *page, struct address_space *mapping) It needs argument of address_space for registering migration family functions which will be called by VM. Exactly speaking, PG_movable is not a real flag of struct page. Rather than, VM reuses page->mapping's lower bits to represent it. #define PAGE_MAPPING_MOVABLE 0x2 page->mapping = page->mapping | PAGE_MAPPING_MOVABLE; so driver shouldn't access page->mapping directly. Instead, driver should use page_mapping which mask off the low two bits of page->mapping so it can get right struct address_space. For testing of non-lru movable page, VM supports __PageMovable function. However, it doesn't guarantee to identify non-lru movable page because page->mapping field is unified with other variables in struct page. As well, if driver releases the page after isolation by VM, page->mapping doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether page is LRU or non-lru movable once the page has been isolated. Because LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also good for just peeking to test non-lru movable pages before more expensive checking with lock_page in pfn scanning to select victim. For guaranteeing non-lru movable page, VM provides PageMovable function. Unlike __PageMovable, PageMovable functions validates page->mapping and mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden destroying of page->mapping. Driver using __SetPageMovable should clear the flag via __ClearMovablePage under page_lock before the releasing the page. * PG_isolated To prevent concurrent isolation among several CPUs, VM marks isolated page as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru movable page, it can skip it. Driver doesn't need to manipulate the flag because VM will set/clear it automatically. Keep in mind that if driver sees PG_isolated page, it means the page have been isolated by VM so it shouldn't touch page.lru field. PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag for own purpose. [opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru] Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: John Einar Reitan <john.reitan@foss.arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--Documentation/filesystems/Locking4
-rw-r--r--Documentation/filesystems/vfs.txt11
-rw-r--r--Documentation/vm/page_migration107
-rw-r--r--include/linux/compaction.h17
-rw-r--r--include/linux/fs.h2
-rw-r--r--include/linux/ksm.h3
-rw-r--r--include/linux/migrate.h2
-rw-r--r--include/linux/mm.h1
-rw-r--r--include/linux/page-flags.h33
-rw-r--r--mm/compaction.c85
-rw-r--r--mm/ksm.c4
-rw-r--r--mm/migrate.c192
-rw-r--r--mm/page_alloc.c2
-rw-r--r--mm/util.c6
14 files changed, 416 insertions, 53 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75eea7ce3d7c..dda6e3f8e203 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ prototypes:
195 int (*releasepage) (struct page *, int); 195 int (*releasepage) (struct page *, int);
196 void (*freepage)(struct page *); 196 void (*freepage)(struct page *);
197 int (*direct_IO)(struct kiocb *, struct iov_iter *iter); 197 int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
198 bool (*isolate_page) (struct page *, isolate_mode_t);
198 int (*migratepage)(struct address_space *, struct page *, struct page *); 199 int (*migratepage)(struct address_space *, struct page *, struct page *);
200 void (*putback_page) (struct page *);
199 int (*launder_page)(struct page *); 201 int (*launder_page)(struct page *);
200 int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); 202 int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
201 int (*error_remove_page)(struct address_space *, struct page *); 203 int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes
219releasepage: yes 221releasepage: yes
220freepage: yes 222freepage: yes
221direct_IO: 223direct_IO:
224isolate_page: yes
222migratepage: yes (both) 225migratepage: yes (both)
226putback_page: yes
223launder_page: yes 227launder_page: yes
224is_partially_uptodate: yes 228is_partially_uptodate: yes
225error_remove_page: yes 229error_remove_page: yes
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index c61a223ef3ff..900360cbcdae 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -592,9 +592,14 @@ struct address_space_operations {
592 int (*releasepage) (struct page *, int); 592 int (*releasepage) (struct page *, int);
593 void (*freepage)(struct page *); 593 void (*freepage)(struct page *);
594 ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); 594 ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
595 /* isolate a page for migration */
596 bool (*isolate_page) (struct page *, isolate_mode_t);
595 /* migrate the contents of a page to the specified target */ 597 /* migrate the contents of a page to the specified target */
596 int (*migratepage) (struct page *, struct page *); 598 int (*migratepage) (struct page *, struct page *);
599 /* put migration-failed page back to right list */
600 void (*putback_page) (struct page *);
597 int (*launder_page) (struct page *); 601 int (*launder_page) (struct page *);
602
598 int (*is_partially_uptodate) (struct page *, unsigned long, 603 int (*is_partially_uptodate) (struct page *, unsigned long,
599 unsigned long); 604 unsigned long);
600 void (*is_dirty_writeback) (struct page *, bool *, bool *); 605 void (*is_dirty_writeback) (struct page *, bool *, bool *);
@@ -747,6 +752,10 @@ struct address_space_operations {
747 and transfer data directly between the storage and the 752 and transfer data directly between the storage and the
748 application's address space. 753 application's address space.
749 754
755 isolate_page: Called by the VM when isolating a movable non-lru page.
756 If page is successfully isolated, VM marks the page as PG_isolated
757 via __SetPageIsolated.
758
750 migrate_page: This is used to compact the physical memory usage. 759 migrate_page: This is used to compact the physical memory usage.
751 If the VM wants to relocate a page (maybe off a memory card 760 If the VM wants to relocate a page (maybe off a memory card
752 that is signalling imminent failure) it will pass a new page 761 that is signalling imminent failure) it will pass a new page
@@ -754,6 +763,8 @@ struct address_space_operations {
754 transfer any private data across and update any references 763 transfer any private data across and update any references
755 that it has to the page. 764 that it has to the page.
756 765
766 putback_page: Called by the VM when isolated page's migration fails.
767
757 launder_page: Called before freeing a page - it writes back the dirty page. To 768 launder_page: Called before freeing a page - it writes back the dirty page. To
758 prevent redirtying the page, it is kept locked during the whole 769 prevent redirtying the page, it is kept locked during the whole
759 operation. 770 operation.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index fea5c0864170..18d37c7ac50b 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -142,5 +142,110 @@ Steps:
14220. The new page is moved to the LRU and can be scanned by the swapper 14220. The new page is moved to the LRU and can be scanned by the swapper
143 etc again. 143 etc again.
144 144
145Christoph Lameter, May 8, 2006. 145C. Non-LRU page migration
146-------------------------
147
148Although original migration aimed for reducing the latency of memory access
149for NUMA, compaction who want to create high-order page is also main customer.
150
151Current problem of the implementation is that it is designed to migrate only
152*LRU* pages. However, there are potential non-lru pages which can be migrated
153in drivers, for example, zsmalloc, virtio-balloon pages.
154
155For virtio-balloon pages, some parts of migration code path have been hooked
156up and added virtio-balloon specific functions to intercept migration logics.
157It's too specific to a driver so other drivers who want to make their pages
158movable would have to add own specific hooks in migration path.
159
160To overclome the problem, VM supports non-LRU page migration which provides
161generic functions for non-LRU movable pages without driver specific hooks
162migration path.
163
164If a driver want to make own pages movable, it should define three functions
165which are function pointers of struct address_space_operations.
166
1671. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
168
169What VM expects on isolate_page function of driver is to return *true*
170if driver isolates page successfully. On returing true, VM marks the page
171as PG_isolated so concurrent isolation in several CPUs skip the page
172for isolation. If a driver cannot isolate the page, it should return *false*.
173
174Once page is successfully isolated, VM uses page.lru fields so driver
175shouldn't expect to preserve values in that fields.
176
1772. int (*migratepage) (struct address_space *mapping,
178 struct page *newpage, struct page *oldpage, enum migrate_mode);
179
180After isolation, VM calls migratepage of driver with isolated page.
181The function of migratepage is to move content of the old page to new page
182and set up fields of struct page newpage. Keep in mind that you should
183indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
184under page_lock if you migrated the oldpage successfully and returns 0.
185If driver cannot migrate the page at the moment, driver can return -EAGAIN.
186On -EAGAIN, VM will retry page migration in a short time because VM interprets
187-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
188VM will give up the page migration without retrying in this time.
189
190Driver shouldn't touch page.lru field VM using in the functions.
191
1923. void (*putback_page)(struct page *);
193
194If migration fails on isolated page, VM should return the isolated page
195to the driver so VM calls driver's putback_page with migration failed page.
196In this function, driver should put the isolated page back to the own data
197structure.
146 198
1994. non-lru movable page flags
200
201There are two page flags for supporting non-lru movable page.
202
203* PG_movable
204
205Driver should use the below function to make page movable under page_lock.
206
207 void __SetPageMovable(struct page *page, struct address_space *mapping)
208
209It needs argument of address_space for registering migration family functions
210which will be called by VM. Exactly speaking, PG_movable is not a real flag of
211struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
212
213 #define PAGE_MAPPING_MOVABLE 0x2
214 page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
215
216so driver shouldn't access page->mapping directly. Instead, driver should
217use page_mapping which mask off the low two bits of page->mapping under
218page lock so it can get right struct address_space.
219
220For testing of non-lru movable page, VM supports __PageMovable function.
221However, it doesn't guarantee to identify non-lru movable page because
222page->mapping field is unified with other variables in struct page.
223As well, if driver releases the page after isolation by VM, page->mapping
224doesn't have stable value although it has PAGE_MAPPING_MOVABLE
225(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
226page is LRU or non-lru movable once the page has been isolated. Because
227LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
228good for just peeking to test non-lru movable pages before more expensive
229checking with lock_page in pfn scanning to select victim.
230
231For guaranteeing non-lru movable page, VM provides PageMovable function.
232Unlike __PageMovable, PageMovable functions validates page->mapping and
233mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
234destroying of page->mapping.
235
236Driver using __SetPageMovable should clear the flag via __ClearMovablePage
237under page_lock before the releasing the page.
238
239* PG_isolated
240
241To prevent concurrent isolation among several CPUs, VM marks isolated page
242as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
243movable page, it can skip it. Driver doesn't need to manipulate the flag
244because VM will set/clear it automatically. Keep in mind that if driver
245sees PG_isolated page, it means the page have been isolated by VM so it
246shouldn't touch page.lru field.
247PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
248for own purpose.
249
250Christoph Lameter, May 8, 2006.
251Minchan Kim, Mar 28, 2016.
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a58c852a268f..c6b47c861cea 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -54,6 +54,9 @@ enum compact_result {
54struct alloc_context; /* in mm/internal.h */ 54struct alloc_context; /* in mm/internal.h */
55 55
56#ifdef CONFIG_COMPACTION 56#ifdef CONFIG_COMPACTION
57extern int PageMovable(struct page *page);
58extern void __SetPageMovable(struct page *page, struct address_space *mapping);
59extern void __ClearPageMovable(struct page *page);
57extern int sysctl_compact_memory; 60extern int sysctl_compact_memory;
58extern int sysctl_compaction_handler(struct ctl_table *table, int write, 61extern int sysctl_compaction_handler(struct ctl_table *table, int write,
59 void __user *buffer, size_t *length, loff_t *ppos); 62 void __user *buffer, size_t *length, loff_t *ppos);
@@ -151,6 +154,19 @@ extern void kcompactd_stop(int nid);
151extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx); 154extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
152 155
153#else 156#else
157static inline int PageMovable(struct page *page)
158{
159 return 0;
160}
161static inline void __SetPageMovable(struct page *page,
162 struct address_space *mapping)
163{
164}
165
166static inline void __ClearPageMovable(struct page *page)
167{
168}
169
154static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask, 170static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
155 unsigned int order, int alloc_flags, 171 unsigned int order, int alloc_flags,
156 const struct alloc_context *ac, 172 const struct alloc_context *ac,
@@ -212,6 +228,7 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_i
212#endif /* CONFIG_COMPACTION */ 228#endif /* CONFIG_COMPACTION */
213 229
214#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA) 230#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
231struct node;
215extern int compaction_register_node(struct node *node); 232extern int compaction_register_node(struct node *node);
216extern void compaction_unregister_node(struct node *node); 233extern void compaction_unregister_node(struct node *node);
217 234
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0c9ebf530d9e..97fe08d17d89 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,8 @@ struct address_space_operations {
402 */ 402 */
403 int (*migratepage) (struct address_space *, 403 int (*migratepage) (struct address_space *,
404 struct page *, struct page *, enum migrate_mode); 404 struct page *, struct page *, enum migrate_mode);
405 bool (*isolate_page)(struct page *, isolate_mode_t);
406 void (*putback_page)(struct page *);
405 int (*launder_page) (struct page *); 407 int (*launder_page) (struct page *);
406 int (*is_partially_uptodate) (struct page *, unsigned long, 408 int (*is_partially_uptodate) (struct page *, unsigned long,
407 unsigned long); 409 unsigned long);
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 7ae216a39c9e..481c8c4627ca 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -43,8 +43,7 @@ static inline struct stable_node *page_stable_node(struct page *page)
43static inline void set_page_stable_node(struct page *page, 43static inline void set_page_stable_node(struct page *page,
44 struct stable_node *stable_node) 44 struct stable_node *stable_node)
45{ 45{
46 page->mapping = (void *)stable_node + 46 page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
47 (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
48} 47}
49 48
50/* 49/*
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9b50325e4ddf..404fbfefeb33 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
37 struct page *, struct page *, enum migrate_mode); 37 struct page *, struct page *, enum migrate_mode);
38extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, 38extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
39 unsigned long private, enum migrate_mode mode, int reason); 39 unsigned long private, enum migrate_mode mode, int reason);
40extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
41extern void putback_movable_page(struct page *page);
40 42
41extern int migrate_prep(void); 43extern int migrate_prep(void);
42extern int migrate_prep_local(void); 44extern int migrate_prep_local(void);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ece042dfe23c..3e22335a435c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1035,6 +1035,7 @@ static inline pgoff_t page_file_index(struct page *page)
1035} 1035}
1036 1036
1037bool page_mapped(struct page *page); 1037bool page_mapped(struct page *page);
1038struct address_space *page_mapping(struct page *page);
1038 1039
1039/* 1040/*
1040 * Return true only if the page has been allocated with 1041 * Return true only if the page has been allocated with
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a32445f930..f36dbb3a3060 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -129,6 +129,9 @@ enum pageflags {
129 129
130 /* Compound pages. Stored in first tail page's flags */ 130 /* Compound pages. Stored in first tail page's flags */
131 PG_double_map = PG_private_2, 131 PG_double_map = PG_private_2,
132
133 /* non-lru isolated movable page */
134 PG_isolated = PG_reclaim,
132}; 135};
133 136
134#ifndef __GENERATING_BOUNDS_H 137#ifndef __GENERATING_BOUNDS_H
@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
357 * with the PAGE_MAPPING_ANON bit set to distinguish it. See rmap.h. 360 * with the PAGE_MAPPING_ANON bit set to distinguish it. See rmap.h.
358 * 361 *
359 * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled, 362 * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
360 * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit; 363 * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
361 * and then page->mapping points, not to an anon_vma, but to a private 364 * bit; and then page->mapping points, not to an anon_vma, but to a private
362 * structure which KSM associates with that merged page. See ksm.h. 365 * structure which KSM associates with that merged page. See ksm.h.
363 * 366 *
364 * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used. 367 * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
368 * page and then page->mapping points a struct address_space.
365 * 369 *
366 * Please note that, confusingly, "page_mapping" refers to the inode 370 * Please note that, confusingly, "page_mapping" refers to the inode
367 * address_space which maps the page from disk; whereas "page_mapped" 371 * address_space which maps the page from disk; whereas "page_mapped"
368 * refers to user virtual address space into which the page is mapped. 372 * refers to user virtual address space into which the page is mapped.
369 */ 373 */
370#define PAGE_MAPPING_ANON 1 374#define PAGE_MAPPING_ANON 0x1
371#define PAGE_MAPPING_KSM 2 375#define PAGE_MAPPING_MOVABLE 0x2
372#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM) 376#define PAGE_MAPPING_KSM (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
377#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
373 378
374static __always_inline int PageAnonHead(struct page *page) 379static __always_inline int PageMappingFlags(struct page *page)
375{ 380{
376 return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0; 381 return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
377} 382}
378 383
379static __always_inline int PageAnon(struct page *page) 384static __always_inline int PageAnon(struct page *page)
380{ 385{
381 page = compound_head(page); 386 page = compound_head(page);
382 return PageAnonHead(page); 387 return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
388}
389
390static __always_inline int __PageMovable(struct page *page)
391{
392 return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
393 PAGE_MAPPING_MOVABLE;
383} 394}
384 395
385#ifdef CONFIG_KSM 396#ifdef CONFIG_KSM
@@ -393,7 +404,7 @@ static __always_inline int PageKsm(struct page *page)
393{ 404{
394 page = compound_head(page); 405 page = compound_head(page);
395 return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) == 406 return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
396 (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM); 407 PAGE_MAPPING_KSM;
397} 408}
398#else 409#else
399TESTPAGEFLAG_FALSE(Ksm) 410TESTPAGEFLAG_FALSE(Ksm)
@@ -641,6 +652,8 @@ static inline void __ClearPageBalloon(struct page *page)
641 atomic_set(&page->_mapcount, -1); 652 atomic_set(&page->_mapcount, -1);
642} 653}
643 654
655__PAGEFLAG(Isolated, isolated, PF_ANY);
656
644/* 657/*
645 * If network-based swap is enabled, sl*b must keep track of whether pages 658 * If network-based swap is enabled, sl*b must keep track of whether pages
646 * were allocated from pfmemalloc reserves. 659 * were allocated from pfmemalloc reserves.
diff --git a/mm/compaction.c b/mm/compaction.c
index 7bc04778f84d..fe95d8d021c3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -81,6 +81,44 @@ static inline bool migrate_async_suitable(int migratetype)
81 81
82#ifdef CONFIG_COMPACTION 82#ifdef CONFIG_COMPACTION
83 83
84int PageMovable(struct page *page)
85{
86 struct address_space *mapping;
87
88 VM_BUG_ON_PAGE(!PageLocked(page), page);
89 if (!__PageMovable(page))
90 return 0;
91
92 mapping = page_mapping(page);
93 if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
94 return 1;
95
96 return 0;
97}
98EXPORT_SYMBOL(PageMovable);
99
100void __SetPageMovable(struct page *page, struct address_space *mapping)
101{
102 VM_BUG_ON_PAGE(!PageLocked(page), page);
103 VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
104 page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
105}
106EXPORT_SYMBOL(__SetPageMovable);
107
108void __ClearPageMovable(struct page *page)
109{
110 VM_BUG_ON_PAGE(!PageLocked(page), page);
111 VM_BUG_ON_PAGE(!PageMovable(page), page);
112 /*
113 * Clear registered address_space val with keeping PAGE_MAPPING_MOVABLE
114 * flag so that VM can catch up released page by driver after isolation.
115 * With it, VM migration doesn't try to put it back.
116 */
117 page->mapping = (void *)((unsigned long)page->mapping &
118 PAGE_MAPPING_MOVABLE);
119}
120EXPORT_SYMBOL(__ClearPageMovable);
121
84/* Do not skip compaction more than 64 times */ 122/* Do not skip compaction more than 64 times */
85#define COMPACT_MAX_DEFER_SHIFT 6 123#define COMPACT_MAX_DEFER_SHIFT 6
86 124
@@ -670,7 +708,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
670 708
671 /* Time to isolate some pages for migration */ 709 /* Time to isolate some pages for migration */
672 for (; low_pfn < end_pfn; low_pfn++) { 710 for (; low_pfn < end_pfn; low_pfn++) {
673 bool is_lru;
674 711
675 if (skip_on_failure && low_pfn >= next_skip_pfn) { 712 if (skip_on_failure && low_pfn >= next_skip_pfn) {
676 /* 713 /*
@@ -733,21 +770,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
733 } 770 }
734 771
735 /* 772 /*
736 * Check may be lockless but that's ok as we recheck later.
737 * It's possible to migrate LRU pages and balloon pages
738 * Skip any other type of page
739 */
740 is_lru = PageLRU(page);
741 if (!is_lru) {
742 if (unlikely(balloon_page_movable(page))) {
743 if (balloon_page_isolate(page)) {
744 /* Successfully isolated */
745 goto isolate_success;
746 }
747 }
748 }
749
750 /*
751 * Regardless of being on LRU, compound pages such as THP and 773 * Regardless of being on LRU, compound pages such as THP and
752 * hugetlbfs are not to be compacted. We can potentially save 774 * hugetlbfs are not to be compacted. We can potentially save
753 * a lot of iterations if we skip them at once. The check is 775 * a lot of iterations if we skip them at once. The check is
@@ -763,8 +785,37 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
763 goto isolate_fail; 785 goto isolate_fail;
764 } 786 }
765 787
766 if (!is_lru) 788 /*
789 * Check may be lockless but that's ok as we recheck later.
790 * It's possible to migrate LRU and non-lru movable pages.
791 * Skip any other type of page
792 */
793 if (!PageLRU(page)) {
794 if (unlikely(balloon_page_movable(page))) {
795 if (balloon_page_isolate(page)) {
796 /* Successfully isolated */
797 goto isolate_success;
798 }
799 }
800
801 /*
802 * __PageMovable can return false positive so we need
803 * to verify it under page_lock.
804 */
805 if (unlikely(__PageMovable(page)) &&
806 !PageIsolated(page)) {
807 if (locked) {
808 spin_unlock_irqrestore(&zone->lru_lock,
809 flags);
810 locked = false;
811 }
812
813 if (isolate_movable_page(page, isolate_mode))
814 goto isolate_success;
815 }
816
767 goto isolate_fail; 817 goto isolate_fail;
818 }
768 819
769 /* 820 /*
770 * Migration will fail if an anonymous page is pinned in memory, 821 * Migration will fail if an anonymous page is pinned in memory,
diff --git a/mm/ksm.c b/mm/ksm.c
index 4786b4150f62..35b8aef867a9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -532,8 +532,8 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
532 void *expected_mapping; 532 void *expected_mapping;
533 unsigned long kpfn; 533 unsigned long kpfn;
534 534
535 expected_mapping = (void *)stable_node + 535 expected_mapping = (void *)((unsigned long)stable_node |
536 (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM); 536 PAGE_MAPPING_KSM);
537again: 537again:
538 kpfn = READ_ONCE(stable_node->kpfn); 538 kpfn = READ_ONCE(stable_node->kpfn);
539 page = pfn_to_page(kpfn); 539 page = pfn_to_page(kpfn);
diff --git a/mm/migrate.c b/mm/migrate.c
index c74412b381ff..8119fdc563f8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -31,6 +31,7 @@
31#include <linux/vmalloc.h> 31#include <linux/vmalloc.h>
32#include <linux/security.h> 32#include <linux/security.h>
33#include <linux/backing-dev.h> 33#include <linux/backing-dev.h>
34#include <linux/compaction.h>
34#include <linux/syscalls.h> 35#include <linux/syscalls.h>
35#include <linux/hugetlb.h> 36#include <linux/hugetlb.h>
36#include <linux/hugetlb_cgroup.h> 37#include <linux/hugetlb_cgroup.h>
@@ -73,6 +74,81 @@ int migrate_prep_local(void)
73 return 0; 74 return 0;
74} 75}
75 76
77bool isolate_movable_page(struct page *page, isolate_mode_t mode)
78{
79 struct address_space *mapping;
80
81 /*
82 * Avoid burning cycles with pages that are yet under __free_pages(),
83 * or just got freed under us.
84 *
85 * In case we 'win' a race for a movable page being freed under us and
86 * raise its refcount preventing __free_pages() from doing its job
87 * the put_page() at the end of this block will take care of
88 * release this page, thus avoiding a nasty leakage.
89 */
90 if (unlikely(!get_page_unless_zero(page)))
91 goto out;
92
93 /*
94 * Check PageMovable before holding a PG_lock because page's owner
95 * assumes anybody doesn't touch PG_lock of newly allocated page
96 * so unconditionally grapping the lock ruins page's owner side.
97 */
98 if (unlikely(!__PageMovable(page)))
99 goto out_putpage;
100 /*
101 * As movable pages are not isolated from LRU lists, concurrent
102 * compaction threads can race against page migration functions
103 * as well as race against the releasing a page.
104 *
105 * In order to avoid having an already isolated movable page
106 * being (wrongly) re-isolated while it is under migration,
107 * or to avoid attempting to isolate pages being released,
108 * lets be sure we have the page lock
109 * before proceeding with the movable page isolation steps.
110 */
111 if (unlikely(!trylock_page(page)))
112 goto out_putpage;
113
114 if (!PageMovable(page) || PageIsolated(page))
115 goto out_no_isolated;
116
117 mapping = page_mapping(page);
118 VM_BUG_ON_PAGE(!mapping, page);
119
120 if (!mapping->a_ops->isolate_page(page, mode))
121 goto out_no_isolated;
122
123 /* Driver shouldn't use PG_isolated bit of page->flags */
124 WARN_ON_ONCE(PageIsolated(page));
125 __SetPageIsolated(page);
126 unlock_page(page);
127
128 return true;
129
130out_no_isolated:
131 unlock_page(page);
132out_putpage:
133 put_page(page);
134out:
135 return false;
136}
137
138/* It should be called on page which is PG_movable */
139void putback_movable_page(struct page *page)
140{
141 struct address_space *mapping;
142
143 VM_BUG_ON_PAGE(!PageLocked(page), page);
144 VM_BUG_ON_PAGE(!PageMovable(page), page);
145 VM_BUG_ON_PAGE(!PageIsolated(page), page);
146
147 mapping = page_mapping(page);
148 mapping->a_ops->putback_page(page);
149 __ClearPageIsolated(page);
150}
151
76/* 152/*
77 * Put previously isolated pages back onto the appropriate lists 153 * Put previously isolated pages back onto the appropriate lists
78 * from where they were once taken off for compaction/migration. 154 * from where they were once taken off for compaction/migration.
@@ -94,10 +170,25 @@ void putback_movable_pages(struct list_head *l)
94 list_del(&page->lru); 170 list_del(&page->lru);
95 dec_zone_page_state(page, NR_ISOLATED_ANON + 171 dec_zone_page_state(page, NR_ISOLATED_ANON +
96 page_is_file_cache(page)); 172 page_is_file_cache(page));
97 if (unlikely(isolated_balloon_page(page))) 173 if (unlikely(isolated_balloon_page(page))) {
98 balloon_page_putback(page); 174 balloon_page_putback(page);
99 else 175 /*
176 * We isolated non-lru movable page so here we can use
177 * __PageMovable because LRU page's mapping cannot have
178 * PAGE_MAPPING_MOVABLE.
179 */
180 } else if (unlikely(__PageMovable(page))) {
181 VM_BUG_ON_PAGE(!PageIsolated(page), page);
182 lock_page(page);
183 if (PageMovable(page))
184 putback_movable_page(page);
185 else
186 __ClearPageIsolated(page);
187 unlock_page(page);
188 put_page(page);
189 } else {
100 putback_lru_page(page); 190 putback_lru_page(page);
191 }
101 } 192 }
102} 193}
103 194
@@ -594,7 +685,7 @@ EXPORT_SYMBOL(migrate_page_copy);
594 ***********************************************************/ 685 ***********************************************************/
595 686
596/* 687/*
597 * Common logic to directly migrate a single page suitable for 688 * Common logic to directly migrate a single LRU page suitable for
598 * pages that do not use PagePrivate/PagePrivate2. 689 * pages that do not use PagePrivate/PagePrivate2.
599 * 690 *
600 * Pages are locked upon entry and exit. 691 * Pages are locked upon entry and exit.
@@ -757,33 +848,72 @@ static int move_to_new_page(struct page *newpage, struct page *page,
757 enum migrate_mode mode) 848 enum migrate_mode mode)
758{ 849{
759 struct address_space *mapping; 850 struct address_space *mapping;
760 int rc; 851 int rc = -EAGAIN;
852 bool is_lru = !__PageMovable(page);
761 853
762 VM_BUG_ON_PAGE(!PageLocked(page), page); 854 VM_BUG_ON_PAGE(!PageLocked(page), page);
763 VM_BUG_ON_PAGE(!PageLocked(newpage), newpage); 855 VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
764 856
765 mapping = page_mapping(page); 857 mapping = page_mapping(page);
766 if (!mapping) 858
767 rc = migrate_page(mapping, newpage, page, mode); 859 if (likely(is_lru)) {
768 else if (mapping->a_ops->migratepage) 860 if (!mapping)
861 rc = migrate_page(mapping, newpage, page, mode);
862 else if (mapping->a_ops->migratepage)
863 /*
864 * Most pages have a mapping and most filesystems
865 * provide a migratepage callback. Anonymous pages
866 * are part of swap space which also has its own
867 * migratepage callback. This is the most common path
868 * for page migration.
869 */
870 rc = mapping->a_ops->migratepage(mapping, newpage,
871 page, mode);
872 else
873 rc = fallback_migrate_page(mapping, newpage,
874 page, mode);
875 } else {
769 /* 876 /*
770 * Most pages have a mapping and most filesystems provide a 877 * In case of non-lru page, it could be released after
771 * migratepage callback. Anonymous pages are part of swap 878 * isolation step. In that case, we shouldn't try migration.
772 * space which also has its own migratepage callback. This
773 * is the most common path for page migration.
774 */ 879 */
775 rc = mapping->a_ops->migratepage(mapping, newpage, page, mode); 880 VM_BUG_ON_PAGE(!PageIsolated(page), page);
776 else 881 if (!PageMovable(page)) {
777 rc = fallback_migrate_page(mapping, newpage, page, mode); 882 rc = MIGRATEPAGE_SUCCESS;
883 __ClearPageIsolated(page);
884 goto out;
885 }
886
887 rc = mapping->a_ops->migratepage(mapping, newpage,
888 page, mode);
889 WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
890 !PageIsolated(page));
891 }
778 892
779 /* 893 /*
780 * When successful, old pagecache page->mapping must be cleared before 894 * When successful, old pagecache page->mapping must be cleared before
781 * page is freed; but stats require that PageAnon be left as PageAnon. 895 * page is freed; but stats require that PageAnon be left as PageAnon.
782 */ 896 */
783 if (rc == MIGRATEPAGE_SUCCESS) { 897 if (rc == MIGRATEPAGE_SUCCESS) {
784 if (!PageAnon(page)) 898 if (__PageMovable(page)) {
899 VM_BUG_ON_PAGE(!PageIsolated(page), page);
900
901 /*
902 * We clear PG_movable under page_lock so any compactor
903 * cannot try to migrate this page.
904 */
905 __ClearPageIsolated(page);
906 }
907
908 /*
909 * Anonymous and movable page->mapping will be cleard by
910 * free_pages_prepare so don't reset it here for keeping
911 * the type to work PageAnon, for example.
912 */
913 if (!PageMappingFlags(page))
785 page->mapping = NULL; 914 page->mapping = NULL;
786 } 915 }
916out:
787 return rc; 917 return rc;
788} 918}
789 919
@@ -793,6 +923,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
793 int rc = -EAGAIN; 923 int rc = -EAGAIN;
794 int page_was_mapped = 0; 924 int page_was_mapped = 0;
795 struct anon_vma *anon_vma = NULL; 925 struct anon_vma *anon_vma = NULL;
926 bool is_lru = !__PageMovable(page);
796 927
797 if (!trylock_page(page)) { 928 if (!trylock_page(page)) {
798 if (!force || mode == MIGRATE_ASYNC) 929 if (!force || mode == MIGRATE_ASYNC)
@@ -873,6 +1004,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
873 goto out_unlock_both; 1004 goto out_unlock_both;
874 } 1005 }
875 1006
1007 if (unlikely(!is_lru)) {
1008 rc = move_to_new_page(newpage, page, mode);
1009 goto out_unlock_both;
1010 }
1011
876 /* 1012 /*
877 * Corner case handling: 1013 * Corner case handling:
878 * 1. When a new swap-cache page is read into, it is added to the LRU 1014 * 1. When a new swap-cache page is read into, it is added to the LRU
@@ -922,7 +1058,8 @@ out:
922 * list in here. 1058 * list in here.
923 */ 1059 */
924 if (rc == MIGRATEPAGE_SUCCESS) { 1060 if (rc == MIGRATEPAGE_SUCCESS) {
925 if (unlikely(__is_movable_balloon_page(newpage))) 1061 if (unlikely(__is_movable_balloon_page(newpage) ||
1062 __PageMovable(newpage)))
926 put_page(newpage); 1063 put_page(newpage);
927 else 1064 else
928 putback_lru_page(newpage); 1065 putback_lru_page(newpage);
@@ -963,6 +1100,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
963 /* page was freed from under us. So we are done. */ 1100 /* page was freed from under us. So we are done. */
964 ClearPageActive(page); 1101 ClearPageActive(page);
965 ClearPageUnevictable(page); 1102 ClearPageUnevictable(page);
1103 if (unlikely(__PageMovable(page))) {
1104 lock_page(page);
1105 if (!PageMovable(page))
1106 __ClearPageIsolated(page);
1107 unlock_page(page);
1108 }
966 if (put_new_page) 1109 if (put_new_page)
967 put_new_page(newpage, private); 1110 put_new_page(newpage, private);
968 else 1111 else
@@ -1012,8 +1155,21 @@ out:
1012 num_poisoned_pages_inc(); 1155 num_poisoned_pages_inc();
1013 } 1156 }
1014 } else { 1157 } else {
1015 if (rc != -EAGAIN) 1158 if (rc != -EAGAIN) {
1016 putback_lru_page(page); 1159 if (likely(!__PageMovable(page))) {
1160 putback_lru_page(page);
1161 goto put_new;
1162 }
1163
1164 lock_page(page);
1165 if (PageMovable(page))
1166 putback_movable_page(page);
1167 else
1168 __ClearPageIsolated(page);
1169 unlock_page(page);
1170 put_page(page);
1171 }
1172put_new:
1017 if (put_new_page) 1173 if (put_new_page)
1018 put_new_page(newpage, private); 1174 put_new_page(newpage, private);
1019 else 1175 else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f7bb1aef54f2..8b2623683431 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1016,7 +1016,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
1016 (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; 1016 (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
1017 } 1017 }
1018 } 1018 }
1019 if (PageAnonHead(page)) 1019 if (PageMappingFlags(page))
1020 page->mapping = NULL; 1020 page->mapping = NULL;
1021 if (check_free) 1021 if (check_free)
1022 bad += free_pages_check(page); 1022 bad += free_pages_check(page);
diff --git a/mm/util.c b/mm/util.c
index 917e0e3d0f8e..b756ee36f7f0 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
399 } 399 }
400 400
401 mapping = page->mapping; 401 mapping = page->mapping;
402 if ((unsigned long)mapping & PAGE_MAPPING_FLAGS) 402 if ((unsigned long)mapping & PAGE_MAPPING_ANON)
403 return NULL; 403 return NULL;
404 return mapping; 404
405 return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
405} 406}
407EXPORT_SYMBOL(page_mapping);
406 408
407/* Slow path of page_mapcount() for compound pages */ 409/* Slow path of page_mapcount() for compound pages */
408int __page_mapcount(struct page *page) 410int __page_mapcount(struct page *page)