xfs: fix data corruption w/ unaligned reflink ranges

When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. XFS only supports whole block sharing, but we still need to support whole file reflink correctly. Hence if the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. This avoids the data corruption vector, but also avoids disruption of returning EINVAL to userspace for the common case of whole file cloning. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
author: Dave Chinner <david@fromorbit.com> 2018-10-05 21:44:39 -0400
committer: Dave Chinner <david@fromorbit.com> 2018-10-05 21:44:39 -0400
commit: b39989009bdb84992915c9869f58094ed5becf10 (patch)
tree: 574f76831c2cd648c830512c18dabba768ac96eb
parent: dceeb47b0ed65e14de53507a8a9c32a90831cfa1 (diff)
1 files changed, 34 insertions, 13 deletions
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index f889398e25d6..42ea7bab9144 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1262,22 +1262,32 @@ xfs_reflink_zero_posteof(
 /*
 * Prepare two files for range cloning.  Upon a successful return both inodes
- * will have the iolock and mmaplock held, the page cache of the out file
+ * will have the iolock and mmaplock held, the page cache of the out file will
- * will be truncated, and any leases on the out file will have been broken.
+ * be truncated, and any leases on the out file will have been broken.  This
- * This function borrows heavily from xfs_file_aio_write_checks.
+ * function borrows heavily from xfs_file_aio_write_checks.
 *
 * The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't
 * checked that the bytes beyond EOF physically match. Hence we cannot use the
 * EOF block in the source dedupe range because it's not a complete block match,
- * hence can introduce a corruption into the file that has it's
+ * hence can introduce a corruption into the file that has it's block replaced.
- * block replaced.
 *
- * Despite this issue, we still need to report that range as successfully
+ * In similar fashion, the VFS file cloning also allows partial EOF blocks to be
- * deduped to avoid confusing userspace with EINVAL errors on completely
+ * "block aligned" for the purposes of cloning entire files.  However, if the
- * matching file data. The only time that an unaligned length will be passed to
+ * source file range includes the EOF block and it lands within the existing EOF
- * us is when it spans the EOF block of the source file, so if we simply mask it
+ * of the destination file, then we can expose stale data from beyond the source
- * down to be block aligned here the we will dedupe everything but that partial
+ * file EOF in the destination file.
- * EOF block.
+ *
+ * XFS doesn't support partial block sharing, so in both cases we have check
+ * these cases ourselves. For dedupe, we can simply round the length to dedupe
+ * down to the previous whole block and ignore the partial EOF block. While this
+ * means we can't dedupe the last block of a file, this is an acceptible
+ * tradeoff for simplicity on implementation.
+ *
+ * For cloning, we want to share the partial EOF block if it is also the new EOF
+ * block of the destination file. If the partial EOF block lies inside the
+ * existing destination EOF, then we have to abort the clone to avoid exposing
+ * stale data in the destination file. Hence we reject these clone attempts with
+ * -EINVAL in this case.
 */
 STATIC int
 xfs_reflink_remap_prep(
@@ -1293,6 +1303,7 @@ xfs_reflink_remap_prep(
        struct inode            *inode_out = file_inode(file_out);
        struct xfs_inode        *dest = XFS_I(inode_out);
        bool                    same_inode = (inode_in == inode_out);
+        u64                     blkmask = i_blocksize(inode_in) - 1;
        ssize_t                 ret;
        /* Lock both files against IO */
@@ -1325,8 +1336,18 @@ xfs_reflink_remap_prep(
         * from the source file so we don't try to dedupe the partial
         * EOF block.
         */
-        if (is_dedupe)
+        if (is_dedupe) {
-                *len &= ~((u64)i_blocksize(inode_in) - 1);
+                *len &= ~blkmask;
+        } else if (*len & blkmask) {
+                /*
+                 * The user is attempting to share a partial EOF block,
+                 * if it's inside the destination EOF then reject it.
+                 */
+                if (pos_out + *len < i_size_read(inode_out)) {
+                        ret = -EINVAL;
+                        goto out_unlock;
+                }
+        }
        /* Attach dquots to dest inode before changing block map */
        ret = xfs_qm_dqattach(dest);
author	Dave Chinner <david@fromorbit.com>	2018-10-05 21:44:39 -0400
committer	Dave Chinner <david@fromorbit.com>	2018-10-05 21:44:39 -0400
commit	b39989009bdb84992915c9869f58094ed5becf10 (patch)
tree	574f76831c2cd648c830512c18dabba768ac96eb
parent	dceeb47b0ed65e14de53507a8a9c32a90831cfa1 (diff)