diff options
author | Chris Mason <chris.mason@oracle.com> | 2009-03-31 13:27:11 -0400 |
---|---|---|
committer | Chris Mason <chris.mason@oracle.com> | 2009-03-31 14:27:58 -0400 |
commit | 5a3f23d515a2ebf0c750db80579ca57b28cbce6d (patch) | |
tree | e0ffb43dd35f1c3def9a74ec7a6f4470902c9761 /fs/btrfs/transaction.c | |
parent | 1a81af4d1d9c60d4313309f937a1fc5567205a87 (diff) |
Btrfs: add extra flushing for renames and truncates
Renames and truncates are both common ways to replace old data with new
data. The filesystem can make an effort to make sure the new data is
on disk before actually replacing the old data.
This is especially important for rename, which many application use as
though it were atomic for both the data and the metadata involved. The
current btrfs code will happily replace a file that is fully on disk
with one that was just created and still has pending IO.
If we crash after transaction commit but before the IO is done, we'll end
up replacing a good file with a zero length file. The solution used
here is to create a list of inodes that need special ordering and force
them to disk before the commit is done. This is similar to the
ext3 style data=ordering, except it is only done on selected files.
Btrfs is able to get away with this because it does not wait on commits
very often, even for fsync (which use a sub-commit).
For renames, we order the file when it wasn't already
on disk and when it is replacing an existing file. Larger files
are sent to filemap_flush right away (before the transaction handle is
opened).
For truncates, we order if the file goes from non-zero size down to
zero size. This is a little different, because at the time of the
truncate the file has no dirty bytes to order. But, we flag the inode
so that it is added to the ordered list on close (via release method). We
also immediately add it to the ordered list of the current transaction
so that we can try to flush down any writes the application sneaks in
before commit.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Diffstat (limited to 'fs/btrfs/transaction.c')
-rw-r--r-- | fs/btrfs/transaction.c | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 9c8f158dd2db..664782c6a2df 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c | |||
@@ -975,6 +975,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, | |||
975 | int should_grow = 0; | 975 | int should_grow = 0; |
976 | unsigned long now = get_seconds(); | 976 | unsigned long now = get_seconds(); |
977 | 977 | ||
978 | btrfs_run_ordered_operations(root, 0); | ||
979 | |||
978 | /* make a pass through all the delayed refs we have so far | 980 | /* make a pass through all the delayed refs we have so far |
979 | * any runnings procs may add more while we are here | 981 | * any runnings procs may add more while we are here |
980 | */ | 982 | */ |
@@ -1056,6 +1058,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, | |||
1056 | BUG_ON(ret); | 1058 | BUG_ON(ret); |
1057 | } | 1059 | } |
1058 | 1060 | ||
1061 | /* | ||
1062 | * rename don't use btrfs_join_transaction, so, once we | ||
1063 | * set the transaction to blocked above, we aren't going | ||
1064 | * to get any new ordered operations. We can safely run | ||
1065 | * it here and no for sure that nothing new will be added | ||
1066 | * to the list | ||
1067 | */ | ||
1068 | btrfs_run_ordered_operations(root, 1); | ||
1069 | |||
1059 | smp_mb(); | 1070 | smp_mb(); |
1060 | if (cur_trans->num_writers > 1 || should_grow) | 1071 | if (cur_trans->num_writers > 1 || should_grow) |
1061 | schedule_timeout(timeout); | 1072 | schedule_timeout(timeout); |