Btrfs: call filemap_fdatawrite twice for compression

I removed this in an earlier commit and I was wrong. Because compression can return from filemap_fdatawrite() without having actually set any of it's pages as writeback() it can make filemap_fdatawait() do essentially nothing, and then we won't find any ordered extents because they may not have been created yet. So not only does this make fsync() completely useless, but it will also screw up if you truncate on a non-page aligned offset since we zero out the end and then wait on ordered extents and then call drop caches. We can drop the cache before the io completes and then we try to unpin the extent we just wrote we won't find it and everything goes sideways. So fix this by putting it back and put a giant comment there to keep me from trying to remove it in the future. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
author: Josef Bacik <josef@redhat.com> 2012-06-08 15:26:47 -0400
committer: Chris Mason <chris.mason@oracle.com> 2012-06-14 21:30:54 -0400
commit: 7ddf5a42d311d74fd9f7373cb56def0843c219f8 (patch)
tree: 3b6a46eec858b867db9184d0e8beefe4ed01e9ec /fs/btrfs/ordered-data.c
parent: 8180ef8894fa402443205cff1e23417e8d3434df (diff)
1 files changed, 21 insertions, 1 deletions
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 9e138cdc36c5..643335a4fe3c 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -627,7 +627,27 @@ void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len)
        /* start IO across the range first to instantiate any delalloc
         * extents
         */
-        filemap_write_and_wait_range(inode->i_mapping, start, orig_end);
+        filemap_fdatawrite_range(inode->i_mapping, start, orig_end);
+        /*
+         * So with compression we will find and lock a dirty page and clear the
+         * first one as dirty, setup an async extent, and immediately return
+         * with the entire range locked but with nobody actually marked with
+         * writeback.  So we can't just filemap_write_and_wait_range() and
+         * expect it to work since it will just kick off a thread to do the
+         * actual work.  So we need to call filemap_fdatawrite_range _again_
+         * since it will wait on the page lock, which won't be unlocked until
+         * after the pages have been marked as writeback and so we're good to go
+         * from there.  We have to do this otherwise we'll miss the ordered
+         * extents and that results in badness.  Please Josef, do not think you
+         * know better and pull this out at some point in the future, it is
+         * right and you are wrong.
+         */
+        if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+                     &BTRFS_I(inode)->runtime_flags))
+                filemap_fdatawrite_range(inode->i_mapping, start, orig_end);
+        filemap_fdatawait_range(inode->i_mapping, start, orig_end);
        end = orig_end;
        found = 0;
author	Josef Bacik <josef@redhat.com>	2012-06-08 15:26:47 -0400
committer	Chris Mason <chris.mason@oracle.com>	2012-06-14 21:30:54 -0400
commit	7ddf5a42d311d74fd9f7373cb56def0843c219f8 (patch)
tree	3b6a46eec858b867db9184d0e8beefe4ed01e9ec /fs/btrfs/ordered-data.c
parent	8180ef8894fa402443205cff1e23417e8d3434df (diff)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 9e138cdc36c5..643335a4fe3c 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c
@@ -627,7 +627,27 @@ void btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len)
627	/* start IO across the range first to instantiate any delalloc	627	/* start IO across the range first to instantiate any delalloc
628	* extents	628	* extents
629	*/	629	*/
630	filemap_write_and_wait_range(inode->i_mapping, start, orig_end);	630	filemap_fdatawrite_range(inode->i_mapping, start, orig_end);
		631
		632	/*
		633	* So with compression we will find and lock a dirty page and clear the
		634	* first one as dirty, setup an async extent, and immediately return
		635	* with the entire range locked but with nobody actually marked with
		636	* writeback. So we can't just filemap_write_and_wait_range() and
		637	* expect it to work since it will just kick off a thread to do the
		638	* actual work. So we need to call filemap_fdatawrite_range _again_
		639	* since it will wait on the page lock, which won't be unlocked until
		640	* after the pages have been marked as writeback and so we're good to go
		641	* from there. We have to do this otherwise we'll miss the ordered
		642	* extents and that results in badness. Please Josef, do not think you
		643	* know better and pull this out at some point in the future, it is
		644	* right and you are wrong.
		645	*/
		646	if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
		647	&BTRFS_I(inode)->runtime_flags))
		648	filemap_fdatawrite_range(inode->i_mapping, start, orig_end);
		649
		650	filemap_fdatawait_range(inode->i_mapping, start, orig_end);
631		651
632	end = orig_end;	652	end = orig_end;
633	found = 0;	653	found = 0;