aboutsummaryrefslogtreecommitdiffstats
path: root/fs
diff options
context:
space:
mode:
authorMel Gorman <mgorman@suse.de>2014-06-04 19:10:31 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2014-06-04 19:54:10 -0400
commit2457aec63745e235bcafb7ef312b182d8682f0fc (patch)
treec658266ed5a8c1acd4f2028c8bf69ab2a7c8ba42 /fs
parente7470ee89f003634a88e7b5e5a7b65b3025987de (diff)
mm: non-atomically mark page accessed during page cache allocation where possible
aops->write_begin may allocate a new page and make it visible only to have mark_page_accessed called almost immediately after. Once the page is visible the atomic operations are necessary which is noticable overhead when writing to an in-memory filesystem like tmpfs but should also be noticable with fast storage. The objective of the patch is to initialse the accessed information with non-atomic operations before the page is visible. The bulk of filesystems directly or indirectly use grab_cache_page_write_begin or find_or_create_page for the initial allocation of a page cache page. This patch adds an init_page_accessed() helper which behaves like the first call to mark_page_accessed() but may called before the page is visible and can be done non-atomically. The primary APIs of concern in this care are the following and are used by most filesystems. find_get_page find_lock_page find_or_create_page grab_cache_page_nowait grab_cache_page_write_begin All of them are very similar in detail to the patch creates a core helper pagecache_get_page() which takes a flags parameter that affects its behavior such as whether the page should be marked accessed or not. Then old API is preserved but is basically a thin wrapper around this core function. Each of the filesystems are then updated to avoid calling mark_page_accessed when it is known that the VM interfaces have already done the job. There is a slight snag in that the timing of the mark_page_accessed() has now changed so in rare cases it's possible a page gets to the end of the LRU as PageReferenced where as previously it might have been repromoted. This is expected to be rare but it's worth the filesystem people thinking about it in case they see a problem with the timing change. It is also the case that some filesystems may be marking pages accessed that previously did not but it makes sense that filesystems have consistent behaviour in this regard. The test case used to evaulate this is a simple dd of a large file done multiple times with the file deleted on each iterations. The size of the file is 1/10th physical memory to avoid dirty page balancing. In the async case it will be possible that the workload completes without even hitting the disk and will have variable results but highlight the impact of mark_page_accessed for async IO. The sync results are expected to be more stable. The exception is tmpfs where the normal case is for the "IO" to not hit the disk. The test machine was single socket and UMA to avoid any scheduling or NUMA artifacts. Throughput and wall times are presented for sync IO, only wall times are shown for async as the granularity reported by dd and the variability is unsuitable for comparison. As async results were variable do to writback timings, I'm only reporting the maximum figures. The sync results were stable enough to make the mean and stddev uninteresting. The performance results are reported based on a run with no profiling. Profile data is based on a separate run with oprofile running. async dd 3.15.0-rc3 3.15.0-rc3 vanilla accessed-v2 ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%) tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%) btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%) ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%) xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%) The XFS figure is a bit strange as it managed to avoid a worst case by sheer luck but the average figures looked reasonable. samples percentage ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'fs')
-rw-r--r--fs/btrfs/extent_io.c11
-rw-r--r--fs/btrfs/file.c5
-rw-r--r--fs/buffer.c7
-rw-r--r--fs/ext4/mballoc.c14
-rw-r--r--fs/f2fs/checkpoint.c3
-rw-r--r--fs/f2fs/node.c2
-rw-r--r--fs/fuse/file.c2
-rw-r--r--fs/gfs2/aops.c1
-rw-r--r--fs/gfs2/meta_io.c4
-rw-r--r--fs/ntfs/attrib.c1
-rw-r--r--fs/ntfs/file.c1
11 files changed, 23 insertions, 28 deletions
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f29a54e454d4..4cd0ac983f91 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4510,7 +4510,8 @@ static void check_buffer_tree_ref(struct extent_buffer *eb)
4510 spin_unlock(&eb->refs_lock); 4510 spin_unlock(&eb->refs_lock);
4511} 4511}
4512 4512
4513static void mark_extent_buffer_accessed(struct extent_buffer *eb) 4513static void mark_extent_buffer_accessed(struct extent_buffer *eb,
4514 struct page *accessed)
4514{ 4515{
4515 unsigned long num_pages, i; 4516 unsigned long num_pages, i;
4516 4517
@@ -4519,7 +4520,8 @@ static void mark_extent_buffer_accessed(struct extent_buffer *eb)
4519 num_pages = num_extent_pages(eb->start, eb->len); 4520 num_pages = num_extent_pages(eb->start, eb->len);
4520 for (i = 0; i < num_pages; i++) { 4521 for (i = 0; i < num_pages; i++) {
4521 struct page *p = extent_buffer_page(eb, i); 4522 struct page *p = extent_buffer_page(eb, i);
4522 mark_page_accessed(p); 4523 if (p != accessed)
4524 mark_page_accessed(p);
4523 } 4525 }
4524} 4526}
4525 4527
@@ -4533,7 +4535,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
4533 start >> PAGE_CACHE_SHIFT); 4535 start >> PAGE_CACHE_SHIFT);
4534 if (eb && atomic_inc_not_zero(&eb->refs)) { 4536 if (eb && atomic_inc_not_zero(&eb->refs)) {
4535 rcu_read_unlock(); 4537 rcu_read_unlock();
4536 mark_extent_buffer_accessed(eb); 4538 mark_extent_buffer_accessed(eb, NULL);
4537 return eb; 4539 return eb;
4538 } 4540 }
4539 rcu_read_unlock(); 4541 rcu_read_unlock();
@@ -4581,7 +4583,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
4581 spin_unlock(&mapping->private_lock); 4583 spin_unlock(&mapping->private_lock);
4582 unlock_page(p); 4584 unlock_page(p);
4583 page_cache_release(p); 4585 page_cache_release(p);
4584 mark_extent_buffer_accessed(exists); 4586 mark_extent_buffer_accessed(exists, p);
4585 goto free_eb; 4587 goto free_eb;
4586 } 4588 }
4587 4589
@@ -4596,7 +4598,6 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
4596 attach_extent_buffer_page(eb, p); 4598 attach_extent_buffer_page(eb, p);
4597 spin_unlock(&mapping->private_lock); 4599 spin_unlock(&mapping->private_lock);
4598 WARN_ON(PageDirty(p)); 4600 WARN_ON(PageDirty(p));
4599 mark_page_accessed(p);
4600 eb->pages[i] = p; 4601 eb->pages[i] = p;
4601 if (!PageUptodate(p)) 4602 if (!PageUptodate(p))
4602 uptodate = 0; 4603 uptodate = 0;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ae6af072b635..74272a3f9d9b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -470,11 +470,12 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
470 for (i = 0; i < num_pages; i++) { 470 for (i = 0; i < num_pages; i++) {
471 /* page checked is some magic around finding pages that 471 /* page checked is some magic around finding pages that
472 * have been modified without going through btrfs_set_page_dirty 472 * have been modified without going through btrfs_set_page_dirty
473 * clear it here 473 * clear it here. There should be no need to mark the pages
474 * accessed as prepare_pages should have marked them accessed
475 * in prepare_pages via find_or_create_page()
474 */ 476 */
475 ClearPageChecked(pages[i]); 477 ClearPageChecked(pages[i]);
476 unlock_page(pages[i]); 478 unlock_page(pages[i]);
477 mark_page_accessed(pages[i]);
478 page_cache_release(pages[i]); 479 page_cache_release(pages[i]);
479 } 480 }
480} 481}
diff --git a/fs/buffer.c b/fs/buffer.c
index 0d3e8d5a2299..eba6e4f621ce 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -227,7 +227,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
227 int all_mapped = 1; 227 int all_mapped = 1;
228 228
229 index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits); 229 index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits);
230 page = find_get_page(bd_mapping, index); 230 page = find_get_page_flags(bd_mapping, index, FGP_ACCESSED);
231 if (!page) 231 if (!page)
232 goto out; 232 goto out;
233 233
@@ -1366,12 +1366,13 @@ __find_get_block(struct block_device *bdev, sector_t block, unsigned size)
1366 struct buffer_head *bh = lookup_bh_lru(bdev, block, size); 1366 struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
1367 1367
1368 if (bh == NULL) { 1368 if (bh == NULL) {
1369 /* __find_get_block_slow will mark the page accessed */
1369 bh = __find_get_block_slow(bdev, block); 1370 bh = __find_get_block_slow(bdev, block);
1370 if (bh) 1371 if (bh)
1371 bh_lru_install(bh); 1372 bh_lru_install(bh);
1372 } 1373 } else
1373 if (bh)
1374 touch_buffer(bh); 1374 touch_buffer(bh);
1375
1375 return bh; 1376 return bh;
1376} 1377}
1377EXPORT_SYMBOL(__find_get_block); 1378EXPORT_SYMBOL(__find_get_block);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c8238a26818c..afe8a133e3d1 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1044,6 +1044,8 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
1044 * allocating. If we are looking at the buddy cache we would 1044 * allocating. If we are looking at the buddy cache we would
1045 * have taken a reference using ext4_mb_load_buddy and that 1045 * have taken a reference using ext4_mb_load_buddy and that
1046 * would have pinned buddy page to page cache. 1046 * would have pinned buddy page to page cache.
1047 * The call to ext4_mb_get_buddy_page_lock will mark the
1048 * page accessed.
1047 */ 1049 */
1048 ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b); 1050 ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b);
1049 if (ret || !EXT4_MB_GRP_NEED_INIT(this_grp)) { 1051 if (ret || !EXT4_MB_GRP_NEED_INIT(this_grp)) {
@@ -1062,7 +1064,6 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
1062 ret = -EIO; 1064 ret = -EIO;
1063 goto err; 1065 goto err;
1064 } 1066 }
1065 mark_page_accessed(page);
1066 1067
1067 if (e4b.bd_buddy_page == NULL) { 1068 if (e4b.bd_buddy_page == NULL) {
1068 /* 1069 /*
@@ -1082,7 +1083,6 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
1082 ret = -EIO; 1083 ret = -EIO;
1083 goto err; 1084 goto err;
1084 } 1085 }
1085 mark_page_accessed(page);
1086err: 1086err:
1087 ext4_mb_put_buddy_page_lock(&e4b); 1087 ext4_mb_put_buddy_page_lock(&e4b);
1088 return ret; 1088 return ret;
@@ -1141,7 +1141,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
1141 1141
1142 /* we could use find_or_create_page(), but it locks page 1142 /* we could use find_or_create_page(), but it locks page
1143 * what we'd like to avoid in fast path ... */ 1143 * what we'd like to avoid in fast path ... */
1144 page = find_get_page(inode->i_mapping, pnum); 1144 page = find_get_page_flags(inode->i_mapping, pnum, FGP_ACCESSED);
1145 if (page == NULL || !PageUptodate(page)) { 1145 if (page == NULL || !PageUptodate(page)) {
1146 if (page) 1146 if (page)
1147 /* 1147 /*
@@ -1176,15 +1176,16 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
1176 ret = -EIO; 1176 ret = -EIO;
1177 goto err; 1177 goto err;
1178 } 1178 }
1179
1180 /* Pages marked accessed already */
1179 e4b->bd_bitmap_page = page; 1181 e4b->bd_bitmap_page = page;
1180 e4b->bd_bitmap = page_address(page) + (poff * sb->s_blocksize); 1182 e4b->bd_bitmap = page_address(page) + (poff * sb->s_blocksize);
1181 mark_page_accessed(page);
1182 1183
1183 block++; 1184 block++;
1184 pnum = block / blocks_per_page; 1185 pnum = block / blocks_per_page;
1185 poff = block % blocks_per_page; 1186 poff = block % blocks_per_page;
1186 1187
1187 page = find_get_page(inode->i_mapping, pnum); 1188 page = find_get_page_flags(inode->i_mapping, pnum, FGP_ACCESSED);
1188 if (page == NULL || !PageUptodate(page)) { 1189 if (page == NULL || !PageUptodate(page)) {
1189 if (page) 1190 if (page)
1190 page_cache_release(page); 1191 page_cache_release(page);
@@ -1209,9 +1210,10 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
1209 ret = -EIO; 1210 ret = -EIO;
1210 goto err; 1211 goto err;
1211 } 1212 }
1213
1214 /* Pages marked accessed already */
1212 e4b->bd_buddy_page = page; 1215 e4b->bd_buddy_page = page;
1213 e4b->bd_buddy = page_address(page) + (poff * sb->s_blocksize); 1216 e4b->bd_buddy = page_address(page) + (poff * sb->s_blocksize);
1214 mark_page_accessed(page);
1215 1217
1216 BUG_ON(e4b->bd_bitmap_page == NULL); 1218 BUG_ON(e4b->bd_bitmap_page == NULL);
1217 BUG_ON(e4b->bd_buddy_page == NULL); 1219 BUG_ON(e4b->bd_buddy_page == NULL);
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 4aa521aa9bc3..c405b8f17054 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -69,7 +69,6 @@ repeat:
69 goto repeat; 69 goto repeat;
70 } 70 }
71out: 71out:
72 mark_page_accessed(page);
73 return page; 72 return page;
74} 73}
75 74
@@ -137,13 +136,11 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, int start, int nrpages, int type)
137 if (!page) 136 if (!page)
138 continue; 137 continue;
139 if (PageUptodate(page)) { 138 if (PageUptodate(page)) {
140 mark_page_accessed(page);
141 f2fs_put_page(page, 1); 139 f2fs_put_page(page, 1);
142 continue; 140 continue;
143 } 141 }
144 142
145 f2fs_submit_page_mbio(sbi, page, blk_addr, &fio); 143 f2fs_submit_page_mbio(sbi, page, blk_addr, &fio);
146 mark_page_accessed(page);
147 f2fs_put_page(page, 0); 144 f2fs_put_page(page, 0);
148 } 145 }
149out: 146out:
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index a161e955c4c8..57caa6eaf47b 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -967,7 +967,6 @@ repeat:
967 goto repeat; 967 goto repeat;
968 } 968 }
969got_it: 969got_it:
970 mark_page_accessed(page);
971 return page; 970 return page;
972} 971}
973 972
@@ -1022,7 +1021,6 @@ page_hit:
1022 f2fs_put_page(page, 1); 1021 f2fs_put_page(page, 1);
1023 return ERR_PTR(-EIO); 1022 return ERR_PTR(-EIO);
1024 } 1023 }
1025 mark_page_accessed(page);
1026 return page; 1024 return page;
1027} 1025}
1028 1026
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f680d2c44e97..903cbc9cd6bd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1089,8 +1089,6 @@ static ssize_t fuse_fill_write_pages(struct fuse_req *req,
1089 tmp = iov_iter_copy_from_user_atomic(page, ii, offset, bytes); 1089 tmp = iov_iter_copy_from_user_atomic(page, ii, offset, bytes);
1090 flush_dcache_page(page); 1090 flush_dcache_page(page);
1091 1091
1092 mark_page_accessed(page);
1093
1094 if (!tmp) { 1092 if (!tmp) {
1095 unlock_page(page); 1093 unlock_page(page);
1096 page_cache_release(page); 1094 page_cache_release(page);
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 5a49b037da81..492123cda64a 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -577,7 +577,6 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, loff_t *pos,
577 p = kmap_atomic(page); 577 p = kmap_atomic(page);
578 memcpy(buf + copied, p + offset, amt); 578 memcpy(buf + copied, p + offset, amt);
579 kunmap_atomic(p); 579 kunmap_atomic(p);
580 mark_page_accessed(page);
581 page_cache_release(page); 580 page_cache_release(page);
582 copied += amt; 581 copied += amt;
583 index++; 582 index++;
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 2cf09b63a6b4..b984a6e190bc 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -136,7 +136,8 @@ struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
136 yield(); 136 yield();
137 } 137 }
138 } else { 138 } else {
139 page = find_lock_page(mapping, index); 139 page = find_get_page_flags(mapping, index,
140 FGP_LOCK|FGP_ACCESSED);
140 if (!page) 141 if (!page)
141 return NULL; 142 return NULL;
142 } 143 }
@@ -153,7 +154,6 @@ struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
153 map_bh(bh, sdp->sd_vfs, blkno); 154 map_bh(bh, sdp->sd_vfs, blkno);
154 155
155 unlock_page(page); 156 unlock_page(page);
156 mark_page_accessed(page);
157 page_cache_release(page); 157 page_cache_release(page);
158 158
159 return bh; 159 return bh;
diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
index a27e3fecefaf..250ed5b20c8f 100644
--- a/fs/ntfs/attrib.c
+++ b/fs/ntfs/attrib.c
@@ -1748,7 +1748,6 @@ int ntfs_attr_make_non_resident(ntfs_inode *ni, const u32 data_size)
1748 if (page) { 1748 if (page) {
1749 set_page_dirty(page); 1749 set_page_dirty(page);
1750 unlock_page(page); 1750 unlock_page(page);
1751 mark_page_accessed(page);
1752 page_cache_release(page); 1751 page_cache_release(page);
1753 } 1752 }
1754 ntfs_debug("Done."); 1753 ntfs_debug("Done.");
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index db9bd8a31725..86ddab916b66 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2060,7 +2060,6 @@ static ssize_t ntfs_file_buffered_write(struct kiocb *iocb,
2060 } 2060 }
2061 do { 2061 do {
2062 unlock_page(pages[--do_pages]); 2062 unlock_page(pages[--do_pages]);
2063 mark_page_accessed(pages[do_pages]);
2064 page_cache_release(pages[do_pages]); 2063 page_cache_release(pages[do_pages]);
2065 } while (do_pages); 2064 } while (do_pages);
2066 if (unlikely(status)) 2065 if (unlikely(status))