aboutsummaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_bmap_btree.c
diff options
context:
space:
mode:
authorDave Chinner <dchinner@redhat.com>2013-08-29 20:23:45 -0400
committerBen Myers <bpm@sgi.com>2013-09-10 13:49:57 -0400
commit638f44163d57f87d0905fbed7d54202beff916fc (patch)
treebecdb2c6ee54e318bd1cb27bd72f3438194674dc /fs/xfs/xfs_bmap_btree.c
parent21b5c9784bceb8b8e0095f87355f3b138ebac2d0 (diff)
xfs: recovery of swap extents operations for CRC filesystems
This is the recovery side of the btree block owner change operation performed by swapext on CRC enabled filesystems. We detect that an owner change is needed by the flag that has been placed on the inode log format flag field. Because the inode recovery is being replayed after the buffers that make up the BMBT in the given checkpoint, we can walk all the buffers and directly modify them when we see the flag set on an inode. Because the inode can be relogged and hence present in multiple chekpoints with the "change owner" flag set, we could do multiple passes across the inode to do this change. While this isn't optimal, we can't directly ignore the flag as there may be multiple independent swap extent operations being replayed on the same inode in different checkpoints so we can't ignore them. Further, because the owner change operation uses ordered buffers, we might have buffers that are newer on disk than the current checkpoint and so already have the owner changed in them. Hence we cannot just peek at a buffer in the tree and check that it has the correct owner and assume that the change was completed. So, for the moment just brute force the owner change every time we see an inode with the flag set. Note that we have to be careful here because the owner of the buffers may point to either the old owner or the new owner. Currently the verifier can't verify the owner directly, so there is no failure case here right now. If we verify the owner exactly in future, then we'll have to take this into account. This was tested in terms of normal operation via xfstests - all of the fsr tests now pass without failure. however, we really need to modify xfs/227 to stress v3 inodes correctly to ensure we fully cover this case for v5 filesystems. In terms of recovery testing, I used a hacked version of xfs_fsr that held the temp inode open for a few seconds before exiting so that the filesystem could be shut down with an open owner change recovery flags set on at least the temp inode. fsr leaves the temp inode unlinked and in btree format, so this was necessary for the owner change to be reliably replayed. logprint confirmed the tmp inode in the log had the correct flag set: INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88 INODE: #regs:3 ino:0x44 flags:0x209 dsize:88 ^^^^^ 0x200 is set, indicating a data fork owner change needed to be replayed on inode 0x44. A printk in the revoery code confirmed that the inode change was recovered: XFS (vdc): Mounting Filesystem XFS (vdc): Starting recovery (logdev: internal) recovering owner change ino 0x44 XFS (vdc): Version 5 superblock detected. This kernel L support enabled! Use of these features in this kernel is at your own risk! XFS (vdc): Ending recovery (logdev: internal) The script used to test this was: $ cat ./recovery-fsr.sh #!/bin/bash dev=/dev/vdc mntpt=/mnt/scratch testfile=$mntpt/testfile umount $mntpt mkfs.xfs -f -m crc=1 $dev mount $dev $mntpt chmod 777 $mntpt for i in `seq 10000 -1 0`; do xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1 done xfs_bmap -vp $testfile |head -20 xfs_fsr -d -v $testfile & sleep 10 /home/dave/src/xfstests-dev/src/godown -f $mntpt wait umount $mntpt xfs_logprint -t $dev |tail -20 time mount $dev $mntpt xfs_bmap -vp $testfile umount $mntpt $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Diffstat (limited to 'fs/xfs/xfs_bmap_btree.c')
-rw-r--r--fs/xfs/xfs_bmap_btree.c26
1 files changed, 18 insertions, 8 deletions
diff --git a/fs/xfs/xfs_bmap_btree.c b/fs/xfs/xfs_bmap_btree.c
index aa2eadd41bab..531b0206cce6 100644
--- a/fs/xfs/xfs_bmap_btree.c
+++ b/fs/xfs/xfs_bmap_btree.c
@@ -932,30 +932,40 @@ xfs_bmdr_maxrecs(
932 * we switch forks between inodes. The operation that the caller is doing will 932 * we switch forks between inodes. The operation that the caller is doing will
933 * determine whether is needs to change owner before or after the switch. 933 * determine whether is needs to change owner before or after the switch.
934 * 934 *
935 * For demand paged modification, the fork switch should be done after reading 935 * For demand paged transactional modification, the fork switch should be done
936 * in all the blocks, modifying them and pinning them in the transaction. For 936 * after reading in all the blocks, modifying them and pinning them in the
937 * modification when the buffers are already pinned in memory, the fork switch 937 * transaction. For modification when the buffers are already pinned in memory,
938 * can be done before changing the owner as we won't need to validate the owner 938 * the fork switch can be done before changing the owner as we won't need to
939 * until the btree buffers are unpinned and writes can occur again. 939 * validate the owner until the btree buffers are unpinned and writes can occur
940 * again.
941 *
942 * For recovery based ownership change, there is no transactional context and
943 * so a buffer list must be supplied so that we can record the buffers that we
944 * modified for the caller to issue IO on.
940 */ 945 */
941int 946int
942xfs_bmbt_change_owner( 947xfs_bmbt_change_owner(
943 struct xfs_trans *tp, 948 struct xfs_trans *tp,
944 struct xfs_inode *ip, 949 struct xfs_inode *ip,
945 int whichfork, 950 int whichfork,
946 xfs_ino_t new_owner) 951 xfs_ino_t new_owner,
952 struct list_head *buffer_list)
947{ 953{
948 struct xfs_btree_cur *cur; 954 struct xfs_btree_cur *cur;
949 int error; 955 int error;
950 956
957 ASSERT(tp || buffer_list);
958 ASSERT(!(tp && buffer_list));
951 if (whichfork == XFS_DATA_FORK) 959 if (whichfork == XFS_DATA_FORK)
952 ASSERT(ip->i_d.di_format = XFS_DINODE_FMT_BTREE); 960 ASSERT(ip->i_d.di_format = XFS_DINODE_FMT_BTREE);
953 else 961 else
954 ASSERT(ip->i_d.di_aformat = XFS_DINODE_FMT_BTREE); 962 ASSERT(ip->i_d.di_aformat = XFS_DINODE_FMT_BTREE);
955 963
956 cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork); 964 cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork);
957 error = xfs_btree_change_owner(cur, new_owner); 965 if (!cur)
966 return ENOMEM;
967
968 error = xfs_btree_change_owner(cur, new_owner, buffer_list);
958 xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR); 969 xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
959 return error; 970 return error;
960} 971}
961