Btrfs: fix truncation of compressed and inlined extents

When truncating a file to a smaller size which consists of an inline extent that is compressed, we did not discard (or made unusable) the data between the new file size and the old file size, wasting metadata space and allowing for the truncated data to be leaked and the data corruption/loss mentioned below. We were also not correctly decrementing the number of bytes used by the inode, we were setting it to zero, giving a wrong report for callers of the stat(2) syscall. The fsck tool also reported an error about a mismatch between the nbytes of the file versus the real space used by the file. Now because we weren't discarding the truncated region of the file, it was possible for a caller of the clone ioctl to actually read the data that was truncated, allowing for a security breach without requiring root access to the system, using only standard filesystem operations. The scenario is the following: 1) User A creates a file which consists of an inline and compressed extent with a size of 2000 bytes - the file is not accessible to any other users (no read, write or execution permission for anyone else); 2) The user truncates the file to a size of 1000 bytes; 3) User A makes the file world readable; 4) User B creates a file consisting of an inline extent of 2000 bytes; 5) User B issues a clone operation from user A's file into its own file (using a length argument of 0, clone the whole range); 6) User B now gets to see the 1000 bytes that user A truncated from its file before it made its file world readbale. User B also lost the bytes in the range [1000, 2000[ bytes from its own file, but that might be ok if his/her intention was reading stale data from user A that was never supposed to be public. Note that this contrasts with the case where we truncate a file from 2000 bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In this case reading any byte from the range [1000, 2000[ will return a value of 0x00, instead of the original data. This problem exists since the clone ioctl was added and happens both with and without my recent data loss and file corruption fixes for the clone ioctl (patch "Btrfs: fix file corruption and data loss after cloning inline extents"). So fix this by truncating the compressed inline extents as we do for the non-compressed case, which involves decompressing, if the data isn't already in the page cache, compressing the truncated version of the extent, writing the compressed content into the inline extent and then truncate it. The following test case for fstests reproduces the problem. In order for the test to pass both this fix and my previous fix for the clone ioctl that forbids cloning a smaller inline extent into a larger one, which is titled "Btrfs: fix file corruption and data loss after cloning inline extents", are needed. Without that other fix the test fails in a different way that does not leak the truncated data, instead part of destination file gets replaced with zeroes (because the destination file has a larger inline extent than the source). seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount "-o compress" # Create our test files. File foo is going to be the source of a clone operation # and consists of a single inline extent with an uncompressed size of 512 bytes, # while file bar consists of a single inline extent with an uncompressed size of # 256 bytes. For our test's purpose, it's important that file bar has an inline # extent with a size smaller than foo's inline extent. $XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128" \ -c "pwrite -S 0x2a 128 384" \ $SCRATCH_MNT/foo | _filter_xfs_io $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io # Now durably persist all metadata and data. We do this to make sure that we get # on disk an inline extent with a size of 512 bytes for file foo. sync # Now truncate our file foo to a smaller size. Because it consists of a # compressed and inline extent, btrfs did not shrink the inline extent to the # new size (if the extent was not compressed, btrfs would shrink it to 128 # bytes), it only updates the inode's i_size to 128 bytes. $XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo # Now clone foo's inline extent into bar. # This clone operation should fail with errno EOPNOTSUPP because the source # file consists only of an inline extent and the file's size is smaller than # the inline extent of the destination (128 bytes < 256 bytes). However the # clone ioctl was not prepared to deal with a file that has a size smaller # than the size of its inline extent (something that happens only for compressed # inline extents), resulting in copying the full inline extent from the source # file into the destination file. # # Note that btrfs' clone operation for inline extents consists of removing the # inline extent from the destination inode and copy the inline extent from the # source inode into the destination inode, meaning that if the destination # inode's inline extent is larger (N bytes) than the source inode's inline # extent (M bytes), some bytes (N - M bytes) will be lost from the destination # file. Btrfs could copy the source inline extent's data into the destination's # inline extent so that we would not lose any data, but that's currently not # done due to the complexity that would be needed to deal with such cases # (specially when one or both extents are compressed), returning EOPNOTSUPP, as # it's normally not a very common case to clone very small files (only case # where we get inline extents) and copying inline extents does not save any # space (unlike for normal, non-inlined extents). $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar # Now because the above clone operation used to succeed, and due to foo's inline # extent not being shinked by the truncate operation, our file bar got the whole # inline extent copied from foo, making us lose the last 128 bytes from bar # which got replaced by the bytes in range [128, 256[ from foo before foo was # truncated - in other words, data loss from bar and being able to read old and # stale data from foo that should not be possible to read anymore through normal # filesystem operations. Contrast with the case where we truncate a file from a # size N to a smaller size M, truncate it back to size N and then read the range # [M, N[, we should always get the value 0x00 for all the bytes in that range. # We expected the clone operation to fail with errno EOPNOTSUPP and therefore # not modify our file's bar data/metadata. So its content should be 256 bytes # long with all bytes having the value 0xbb. # # Without the btrfs bug fix, the clone operation succeeded and resulted in # leaking truncated data from foo, the bytes that belonged to its range # [128, 256[, and losing data from bar in that same range. So reading the # file gave us the following content: # # 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 # * # 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a # * # 0000400 echo "File bar's content after the clone operation:" od -t x1 $SCRATCH_MNT/bar # Also because the foo's inline extent was not shrunk by the truncate # operation, btrfs' fsck, which is run by the fstests framework everytime a # test completes, failed reporting the following error: # # root 5 inode 257 errors 400, nbytes wrong status=0 exit Cc: stable@vger.kernel.org Signed-off-by: Filipe Manana <fdmanana@suse.com>
author: Filipe Manana <fdmanana@suse.com> 2015-10-16 07:34:25 -0400
committer: Filipe Manana <fdmanana@suse.com> 2015-10-16 16:02:53 -0400
commit: 0305cd5f7fca85dae392b9ba85b116896eb7c1c7 (patch)
tree: a00e547bbfa1a14068a50322de28486fed6f4cc6
parent: 5e6ecb362bd5950a3d8ce19c32829e4f8c7917d9 (diff)
1 files changed, 68 insertions, 14 deletions
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 208db4e835f0..cbb4286490a1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4217,6 +4217,47 @@ static int truncate_space_check(struct btrfs_trans_handle *trans,
 }
+static int truncate_inline_extent(struct inode *inode,
+                                  struct btrfs_path *path,
+                                  struct btrfs_key *found_key,
+                                  const u64 item_end,
+                                  const u64 new_size)
+{
+        struct extent_buffer *leaf = path->nodes[0];
+        int slot = path->slots[0];
+        struct btrfs_file_extent_item *fi;
+        u32 size = (u32)(new_size - found_key->offset);
+        struct btrfs_root *root = BTRFS_I(inode)->root;
+        fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+        if (btrfs_file_extent_compression(leaf, fi) != BTRFS_COMPRESS_NONE) {
+                loff_t offset = new_size;
+                loff_t page_end = ALIGN(offset, PAGE_CACHE_SIZE);
+                /*
+                 * Zero out the remaining of the last page of our inline extent,
+                 * instead of directly truncating our inline extent here - that
+                 * would be much more complex (decompressing all the data, then
+                 * compressing the truncated data, which might be bigger than
+                 * the size of the inline extent, resize the extent, etc).
+                 * We release the path because to get the page we might need to
+                 * read the extent item from disk (data not in the page cache).
+                 */
+                btrfs_release_path(path);
+                return btrfs_truncate_page(inode, offset, page_end - offset, 0);
+        }
+        btrfs_set_file_extent_ram_bytes(leaf, fi, size);
+        size = btrfs_file_extent_calc_inline_size(size);
+        btrfs_truncate_item(root, path, size, 1);
+        if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
+                inode_sub_bytes(inode, item_end + 1 - new_size);
+        return 0;
+}
 /*
 * this can truncate away extent items, csum items and directory items.
 * It starts at a high offset and removes keys until it can't find
@@ -4411,27 +4452,40 @@ search_again:
                         * special encodings
                         */
                        if (!del_item &&
-                            btrfs_file_extent_compression(leaf, fi) == 0 &&
                            btrfs_file_extent_encryption(leaf, fi) == 0 &&
                            btrfs_file_extent_other_encoding(leaf, fi) == 0) {
-                                u32 size = new_size - found_key.offset;
-                                if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
-                                        inode_sub_bytes(inode, item_end + 1 -
-                                                        new_size);
                                /*
-                                 * update the ram bytes to properly reflect
+                                 * Need to release path in order to truncate a
-                                 * the new size of our item
+                                 * compressed extent. So delete any accumulated
+                                 * extent items so far.
                                 */
-                                btrfs_set_file_extent_ram_bytes(leaf, fi, size);
+                                if (btrfs_file_extent_compression(leaf, fi) !=
-                                size =
+                                    BTRFS_COMPRESS_NONE && pending_del_nr) {
-                                    btrfs_file_extent_calc_inline_size(size);
+                                        err = btrfs_del_items(trans, root, path,
-                                btrfs_truncate_item(root, path, size, 1);
+                                                              pending_del_slot,
+                                                              pending_del_nr);
+                                        if (err) {
+                                                btrfs_abort_transaction(trans,
+                                                                        root,
+                                                                        err);
+                                                goto error;
+                                        }
+                                        pending_del_nr = 0;
+                                }
+                                err = truncate_inline_extent(inode, path,
+                                                             &found_key,
+                                                             item_end,
+                                                             new_size);
+                                if (err) {
+                                        btrfs_abort_transaction(trans,
+                                                                root, err);
+                                        goto error;
+                                }
                        } else if (test_bit(BTRFS_ROOT_REF_COWS,
                                            &root->state)) {
-                                inode_sub_bytes(inode, item_end + 1 -
+                                inode_sub_bytes(inode, item_end + 1 - new_size);
-                                                found_key.offset);
                        }
                }
 delete:
author	Filipe Manana <fdmanana@suse.com>	2015-10-16 07:34:25 -0400
committer	Filipe Manana <fdmanana@suse.com>	2015-10-16 16:02:53 -0400
commit	0305cd5f7fca85dae392b9ba85b116896eb7c1c7 (patch)
tree	a00e547bbfa1a14068a50322de28486fed6f4cc6
parent	5e6ecb362bd5950a3d8ce19c32829e4f8c7917d9 (diff)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 208db4e835f0..cbb4286490a1 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c
@@ -4217,6 +4217,47 @@ static int truncate_space_check(struct btrfs_trans_handle *trans,
4217		4217
4218	}	4218	}
4219		4219
		4220	static int truncate_inline_extent(struct inode *inode,
		4221	struct btrfs_path *path,
		4222	struct btrfs_key *found_key,
		4223	const u64 item_end,
		4224	const u64 new_size)
		4225	{
		4226	struct extent_buffer *leaf = path->nodes[0];
		4227	int slot = path->slots[0];
		4228	struct btrfs_file_extent_item *fi;
		4229	u32 size = (u32)(new_size - found_key->offset);
		4230	struct btrfs_root *root = BTRFS_I(inode)->root;
		4231
		4232	fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
		4233
		4234	if (btrfs_file_extent_compression(leaf, fi) != BTRFS_COMPRESS_NONE) {
		4235	loff_t offset = new_size;
		4236	loff_t page_end = ALIGN(offset, PAGE_CACHE_SIZE);
		4237
		4238	/*
		4239	* Zero out the remaining of the last page of our inline extent,
		4240	* instead of directly truncating our inline extent here - that
		4241	* would be much more complex (decompressing all the data, then
		4242	* compressing the truncated data, which might be bigger than
		4243	* the size of the inline extent, resize the extent, etc).
		4244	* We release the path because to get the page we might need to
		4245	* read the extent item from disk (data not in the page cache).
		4246	*/
		4247	btrfs_release_path(path);
		4248	return btrfs_truncate_page(inode, offset, page_end - offset, 0);
		4249	}
		4250
		4251	btrfs_set_file_extent_ram_bytes(leaf, fi, size);
		4252	size = btrfs_file_extent_calc_inline_size(size);
		4253	btrfs_truncate_item(root, path, size, 1);
		4254
		4255	if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
		4256	inode_sub_bytes(inode, item_end + 1 - new_size);
		4257
		4258	return 0;
		4259	}
		4260
4220	/*	4261	/*
4221	* this can truncate away extent items, csum items and directory items.	4262	* this can truncate away extent items, csum items and directory items.
4222	* It starts at a high offset and removes keys until it can't find	4263	* It starts at a high offset and removes keys until it can't find
@@ -4411,27 +4452,40 @@ search_again:
4411	* special encodings	4452	* special encodings
4412	*/	4453	*/
4413	if (!del_item &&	4454	if (!del_item &&
4414	btrfs_file_extent_compression(leaf, fi) == 0 &&
4415	btrfs_file_extent_encryption(leaf, fi) == 0 &&	4455	btrfs_file_extent_encryption(leaf, fi) == 0 &&
4416	btrfs_file_extent_other_encoding(leaf, fi) == 0) {	4456	btrfs_file_extent_other_encoding(leaf, fi) == 0) {
4417	u32 size = new_size - found_key.offset;
4418
4419	if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
4420	inode_sub_bytes(inode, item_end + 1 -
4421	new_size);
4422		4457
4423	/*	4458	/*
4424	* update the ram bytes to properly reflect	4459	* Need to release path in order to truncate a
4425	* the new size of our item	4460	* compressed extent. So delete any accumulated
		4461	* extent items so far.
4426	*/	4462	*/
4427	btrfs_set_file_extent_ram_bytes(leaf, fi, size);	4463	if (btrfs_file_extent_compression(leaf, fi) !=
4428	size =	4464	BTRFS_COMPRESS_NONE && pending_del_nr) {
4429	btrfs_file_extent_calc_inline_size(size);	4465	err = btrfs_del_items(trans, root, path,
4430	btrfs_truncate_item(root, path, size, 1);	4466	pending_del_slot,
		4467	pending_del_nr);
		4468	if (err) {
		4469	btrfs_abort_transaction(trans,
		4470	root,
		4471	err);
		4472	goto error;
		4473	}
		4474	pending_del_nr = 0;
		4475	}
		4476
		4477	err = truncate_inline_extent(inode, path,
		4478	&found_key,
		4479	item_end,
		4480	new_size);
		4481	if (err) {
		4482	btrfs_abort_transaction(trans,
		4483	root, err);
		4484	goto error;
		4485	}
4431	} else if (test_bit(BTRFS_ROOT_REF_COWS,	4486	} else if (test_bit(BTRFS_ROOT_REF_COWS,
4432	&root->state)) {	4487	&root->state)) {
4433	inode_sub_bytes(inode, item_end + 1 -	4488	inode_sub_bytes(inode, item_end + 1 - new_size);
4434	found_key.offset);
4435	}	4489	}
4436	}	4490	}
4437	delete:	4491	delete: