aboutsummaryrefslogtreecommitdiffstats
path: root/fs/btrfs/transaction.c
diff options
context:
space:
mode:
authorChris Mason <chris.mason@oracle.com>2009-03-31 13:27:11 -0400
committerChris Mason <chris.mason@oracle.com>2009-03-31 14:27:58 -0400
commit5a3f23d515a2ebf0c750db80579ca57b28cbce6d (patch)
treee0ffb43dd35f1c3def9a74ec7a6f4470902c9761 /fs/btrfs/transaction.c
parent1a81af4d1d9c60d4313309f937a1fc5567205a87 (diff)
Btrfs: add extra flushing for renames and truncates
Renames and truncates are both common ways to replace old data with new data. The filesystem can make an effort to make sure the new data is on disk before actually replacing the old data. This is especially important for rename, which many application use as though it were atomic for both the data and the metadata involved. The current btrfs code will happily replace a file that is fully on disk with one that was just created and still has pending IO. If we crash after transaction commit but before the IO is done, we'll end up replacing a good file with a zero length file. The solution used here is to create a list of inodes that need special ordering and force them to disk before the commit is done. This is similar to the ext3 style data=ordering, except it is only done on selected files. Btrfs is able to get away with this because it does not wait on commits very often, even for fsync (which use a sub-commit). For renames, we order the file when it wasn't already on disk and when it is replacing an existing file. Larger files are sent to filemap_flush right away (before the transaction handle is opened). For truncates, we order if the file goes from non-zero size down to zero size. This is a little different, because at the time of the truncate the file has no dirty bytes to order. But, we flag the inode so that it is added to the ordered list on close (via release method). We also immediately add it to the ordered list of the current transaction so that we can try to flush down any writes the application sneaks in before commit. Signed-off-by: Chris Mason <chris.mason@oracle.com>
Diffstat (limited to 'fs/btrfs/transaction.c')
-rw-r--r--fs/btrfs/transaction.c11
1 files changed, 11 insertions, 0 deletions
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 9c8f158dd2db..664782c6a2df 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -975,6 +975,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
975 int should_grow = 0; 975 int should_grow = 0;
976 unsigned long now = get_seconds(); 976 unsigned long now = get_seconds();
977 977
978 btrfs_run_ordered_operations(root, 0);
979
978 /* make a pass through all the delayed refs we have so far 980 /* make a pass through all the delayed refs we have so far
979 * any runnings procs may add more while we are here 981 * any runnings procs may add more while we are here
980 */ 982 */
@@ -1056,6 +1058,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
1056 BUG_ON(ret); 1058 BUG_ON(ret);
1057 } 1059 }
1058 1060
1061 /*
1062 * rename don't use btrfs_join_transaction, so, once we
1063 * set the transaction to blocked above, we aren't going
1064 * to get any new ordered operations. We can safely run
1065 * it here and no for sure that nothing new will be added
1066 * to the list
1067 */
1068 btrfs_run_ordered_operations(root, 1);
1069
1059 smp_mb(); 1070 smp_mb();
1060 if (cur_trans->num_writers > 1 || should_grow) 1071 if (cur_trans->num_writers > 1 || should_grow)
1061 schedule_timeout(timeout); 1072 schedule_timeout(timeout);