aboutsummaryrefslogtreecommitdiffstats
path: root/fs/xfs
diff options
context:
space:
mode:
authorDave Chinner <dchinner@redhat.com>2014-09-23 08:55:00 -0400
committerDave Chinner <david@fromorbit.com>2014-09-23 08:55:00 -0400
commit2ebff7bbd785c86e12956388b9e6f6bb8ea5d21e (patch)
treec0a3d0cec84187bdf66c674294d67e1416484996 /fs/xfs
parent7abbb8f928e5b7cea1edd077131b2ace665c6712 (diff)
xfs: flush entire last page of old EOF on truncate up
On a sub-page sized filesystem, truncating a mapped region down leaves us in a world of hurt. We truncate the pagecache, zeroing the newly unused tail, then punch blocks out from under the page. If we then truncate the file back up immediately, we expose that unmapped hole to a dirty page mapped into the user application, and that's where it all goes wrong. In truncating the page cache, we avoid unmapping the tail page of the cache because it still contains valid data. The problem is that it also contains a hole after the truncate, but nobody told the mm subsystem that. Therefore, if the page is dirty before the truncate, we'll never get a .page_mkwrite callout after we extend the file and the application writes data into the hole on the page. Hence when we come to writing that region of the page, it has no blocks and no delayed allocation reservation and hence we toss the data away. This patch adds code to the truncate up case to solve it, by ensuring the partial page at the old EOF is always cleaned after we do any zeroing and move the EOF upwards. We can't actually serialise the page writeback and truncate against page faults (yes, that problem AGAIN) so this is really just a best effort and assumes it is extremely unlikely that someone is concurrently writing to the page at the EOF while extending the file. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Diffstat (limited to 'fs/xfs')
-rw-r--r--fs/xfs/xfs_iops.c30
1 files changed, 30 insertions, 0 deletions
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 72129493e9d3..ec6dcdc181ee 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -849,6 +849,36 @@ xfs_setattr_size(
849 return error; 849 return error;
850 truncate_setsize(inode, newsize); 850 truncate_setsize(inode, newsize);
851 851
852 /*
853 * The "we can't serialise against page faults" pain gets worse.
854 *
855 * If the file is mapped then we have to clean the page at the old EOF
856 * when extending the file. Extending the file can expose changes the
857 * underlying page mapping (e.g. from beyond EOF to a hole or
858 * unwritten), and so on the next attempt to write to that page we need
859 * to remap it for write. i.e. we need .page_mkwrite() to be called.
860 * Hence we need to clean the page to clean the pte and so a new write
861 * fault will be triggered appropriately.
862 *
863 * If we do it before we change the inode size, then we can race with a
864 * page fault that maps the page with exactly the same problem. If we do
865 * it after we change the file size, then a new page fault can come in
866 * and allocate space before we've run the rest of the truncate
867 * transaction. That's kinda grotesque, but it's better than have data
868 * over a hole, and so that's the lesser evil that has been chosen here.
869 *
870 * The real solution, however, is to have some mechanism for locking out
871 * page faults while a truncate is in progress.
872 */
873 if (newsize > oldsize && mapping_mapped(VFS_I(ip)->i_mapping)) {
874 error = filemap_write_and_wait_range(
875 VFS_I(ip)->i_mapping,
876 round_down(oldsize, PAGE_CACHE_SIZE),
877 round_up(oldsize, PAGE_CACHE_SIZE) - 1);
878 if (error)
879 return error;
880 }
881
852 tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE); 882 tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
853 error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0); 883 error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
854 if (error) 884 if (error)