litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2011-11-22
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove free-space-cache.c WARN during log replay Btrfs: sectorsize align offsets in fiemap Btrfs: clear pages dirty for io and set them extent mapped Btrfs: wait on caching if we're loading the free space cache Btrfs: prefix resize related printks with btrfs: btrfs: fix stat blocks accounting Btrfs: avoid unnecessary bitmap search for cluster setup Btrfs: fix to search one more bitmap for cluster setup btrfs: mirror_num should be int, not u64 btrfs: Fix up 32/64-bit compatibility for new ioctls Btrfs: fix barrier flushes Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush
\| *	Btrfs: remove free-space-cache.c WARN during log replay	Chris Mason	2011-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log replay code only partially loads block groups, since the block group caching code is able to detect and deal with extents the logging code has pinned down. While the logging code is pinning down block groups, there is a bogus WARN_ON we're hitting if the code wasn't able to find an extent in the cache. This commit removes the warning because it can happen any time there isn't a valid free space cache for that block group. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: sectorsize align offsets in fiemap	Josef Bacik	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere else that calls btrfs_get_extent while running xfstests 13 in a loop. This is because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets, which will end up adding mappings that are not sectorsize aligned, which will cause problems in some cases for subsequent calls to btrfs_get_extent for similar areas that are sectorsize aligned. With this patch I ran xfstests 13 in a loop for a couple of hours and didn't hit the problem that I could previously hit in at most 20 minutes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
\| *	Btrfs: clear pages dirty for io and set them extent mapped	Josef Bacik	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When doing the io_ctl helpers to clean up the free space cache stuff I stopped using our normal prepare_pages stuff, which means I of course forgot to do things like set the pages extent mapped, which will cause us all sorts of wonderful propblems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
\| *	Btrfs: wait on caching if we're loading the free space cache	Josef Bacik	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've been hitting panics when running xfstest 13 in a loop for long periods of time. And actually this problem has always existed so we've been hitting these things randomly for a while. Basically what happens is we get a thread coming into the allocator and reading the space cache off of disk and adding the entries to the free space cache as we go. Then we get another thread that comes in and tries to allocate from that block group. Since block_group->cached != BTRFS_CACHE_NO it goes ahead and tries to do the allocation. We do this because if we're doing the old slow way of caching we don't want to hold people up and wait for everything to finish. The problem with this is we could end up discarding the space cache at some arbitrary point in the future, which means we could very well end up allocating space that is either bad, or when the real caching happens it could end up thinking the space isn't in use when it really is and cause all sorts of other problems. The solution is to add a new flag to indicate we are loading the free space cache from disk, and always try to cache the block group if cache->cached != BTRFS_CACHE_FINISHED. That way if we are loading the space cache anybody else who tries to allocate from the block group will have to wait until it's finished to make sure it completes successfully. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
\| *	Btrfs: prefix resize related printks with btrfs:	Arnd Hannemann	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the user it is confusing to find something like: [10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472 in kernel log, because it doesn't point directly to btrfs. This patch prefixes those messages with "btrfs:" like other btrfs related printks. Signed-off-by: Arnd Hannemann <arnd@arndnet.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	btrfs: fix stat blocks accounting	David Sterba	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Round inode bytes and delalloc bytes up to real blocksize before converting to sector size. Otherwise eg. files smaller than 512 are reported with zero blocks due to incorrect rounding. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: avoid unnecessary bitmap search for cluster setup	Li Zefan	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	setup_cluster_no_bitmap() searches all the extents and bitmaps starting from offset. Therefore if it returns -ENOSPC, all the bitmaps starting from offset are in the bitmaps list, so it's sufficient to search from this list in setup_cluser_bitmap(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix to search one more bitmap for cluster setup	Li Zefan	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Suppose there are two bitmaps [0, 256], [256, 512] and one extent [100, 120] in the free space cache, and we want to setup a cluster with offset=100, bytes=50. In this case, there will be only one bitmap [256, 512] in the temporary bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256]. The cause is, the list is constructed in setup_cluster_no_bitmap(), and only bitmaps with bitmap_entry->offset >= offset will be added into the list, and the very bitmap that convers offset has bitmap_entry->offset <= offset. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	btrfs: mirror_num should be int, not u64	Jan Schmidt	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My previous patch introduced some u64 for failed_mirror variables, this one makes it consistent again. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	btrfs: Fix up 32/64-bit compatibility for new ioctls	Jeff Mahoney	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch casts to unsigned long before casting to a pointer and fixes the following warnings: fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix barrier flushes	Chris Mason	2011-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When btrfs is writing the super blocks, it send barrier flushes to make sure writeback caching drives get all the metadata on disk in the right order. But, we have two bugs in the way these are sent down. When doing full commits (not via the tree log), we are sending the barrier down before the last super when it should be going down before the first. In multi-device setups, we should be waiting for the barriers to complete on all devices before writing any of the supers. Both of these bugs can cause corruptions on power failures. We fix it with some new code to send down empty barriers to all devices before writing the first super. Alexandre Oliva found the multi-device bug. Arne Jansen did the async barrier loop. Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
\| *	Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush	Liu Bo	2011-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The btrfs snapshotting code requires that once a root has been snapshotted, we don't change it during a commit. But there are two cases to lead to tree corruptions: 1) multi-thread snapshots can commit serveral snapshots in a transaction, and this may change the src root when processing the following pending snapshots, which lead to the former snapshots corruptions; 2) the free inode cache was changing the roots when it root the cache, which lead to corruptions. This fixes things by making sure we force COW the block after we create a snapshot during commiting a transaction, then any changes to the roots will result in COW, and we get all the fs roots and snapshot roots to be consistent. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* \|	new helper: mount_subtree()	Al Viro	2011-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	takes vfsmount and relative path, does lookup within that vfsmount (possibly triggering automounts) and returns the result as root of subtree suitable for return by ->mount() (i.e. a reference to dentry and an active reference to its superblock grabbed, superblock locked exclusive). btrfs and nfs switched to it instead of open-coding the sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	switch create_mnt_ns() to saner calling conventions, fix double mntput() in nfs	Al Viro	2011-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Life is much saner if create_mnt_ns(mnt) drops mnt in case of error... Switch it to such calling conventions, switch callers, fix double mntput() in fs/nfs/super.c one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	btrfs: fix double mntput() in mount_subvol()	Al Viro	2011-11-16
\| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2011-11-11
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: rename the option to nospace_cache Btrfs: handle bio_add_page failure gracefully in scrub Btrfs: fix deadlock caused by the race between relocation Btrfs: only map pages if we know we need them when reading the space cache Btrfs: fix orphan backref nodes Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush} Btrfs: fix unreleased path in btrfs_orphan_cleanup() Btrfs: fix no reserved space for writing out inode cache Btrfs: fix nocow when deleting the item Btrfs: tweak the delayed inode reservations again Btrfs: rework error handling in btrfs_mount() Btrfs: close devices on all error paths in open_ctree() Btrfs: avoid null dereference and leaks when bailing from open_ctree() Btrfs: fix subvol_name leak on error in btrfs_mount() Btrfs: fix memory leak in btrfs_parse_early_options() Btrfs: fix our reservations for updating an inode when completing io Btrfs: fix oops on NULL trans handle in btrfs_truncate btrfs: fix double-free 'tree_root' in 'btrfs_mount()'
\| *	btrfs: rename the option to nospace_cache	David Sterba	2011-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename no_space_cache option to nospace_cache to be more consistent with the rest, where the simple prefix 'no' is used to negate an option. The option has been introduced during the -rc1 cycle and there are has not been widely used, so it's safe. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: handle bio_add_page failure gracefully in scrub	Arne Jansen	2011-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately dm based targets accept only one page per bio, thus making scrub always fails. This patch just submits the current bio when an error is encountered and starts a new one. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix deadlock caused by the race between relocation	Miao Xie	2011-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can not do flushable reservation for the relocation when we create snapshot, because it may make the transaction commit task and the flush task wait for each other and the deadlock happens. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: only map pages if we know we need them when reading the space cache	Josef Bacik	2011-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	People have been running into a warning when loading space cache because the page is already mapped when trying to read in a bitmap. The way we read in entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry maps the entries if it needs to, and if it hits the end of the page it simply unmaps the page. That way we can unconditionally unmap the io_ctl before reading in the bitmap and we should stop hitting these warnings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix orphan backref nodes	Miao Xie	2011-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the root node of a fs/file tree is in the block group that is being relocated, but the others are not in the other block groups. when we create a snapshot for this tree between the relocation tree creation ends and ->create_reloc_tree is set to 0, Btrfs will create some backref nodes that are the lowest nodes of the backrefs cache. But we forget to add them into ->leaves list of the backref cache and deal with them, and at last, they will triggered BUG_ON(). kernel BUG at fs/btrfs/relocation.c:239! This patch fixes it by adding them into ->leaves list of backref cache. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}	Miao Xie	2011-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs_block_rsv_add{, _noflush}() have similar code, so abstract that code. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix unreleased path in btrfs_orphan_cleanup()	Miao Xie	2011-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we did stress test for the space relocation, the deadlock happened. By debugging, We found it was caused by the carelessness that we forgot to unlock the read lock of the extent buffers in btrfs_orphan_cleanup() before we end the transaction handle, so the transaction commit task waited the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer, but that task waited the commit task to end the transaction commit, and the deadlock happened. Fix it. Signed-ff-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix no reserved space for writing out inode cache	Miao Xie	2011-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I-node cache forgets to reserve the space when writing out it. And when we do some stress test, such as synctest, it will trigger WARN_ON() in use_block_rsv(). WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]() ... Call Trace: [<ffffffff8104df86>] warn_slowpath_common+0x80/0x98 [<ffffffff8104dfb3>] warn_slowpath_null+0x15/0x17 [<ffffffffa0369c60>] btrfs_alloc_free_block+0xbf/0x281 [btrfs] [<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108 [<ffffffffa035c040>] __btrfs_cow_block+0x118/0x3b5 [btrfs] [<ffffffffa035c7ba>] btrfs_cow_block+0x103/0x14e [btrfs] [<ffffffffa035e4c4>] btrfs_search_slot+0x249/0x6a4 [btrfs] [<ffffffffa036d086>] btrfs_lookup_inode+0x2a/0x8a [btrfs] [<ffffffffa03788b7>] btrfs_update_inode+0xaa/0x141 [btrfs] [<ffffffffa036d7ec>] btrfs_save_ino_cache+0xea/0x202 [btrfs] [<ffffffffa03a761e>] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs] [<ffffffffa0373867>] commit_fs_roots+0xaa/0x158 [btrfs] [<ffffffffa03746a6>] btrfs_commit_transaction+0x405/0x731 [btrfs] [<ffffffff810690df>] ? wake_up_bit+0x25/0x25 [<ffffffffa039d652>] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs] [<ffffffffa0381c5f>] btrfs_sync_file+0x16a/0x198 [btrfs] [<ffffffff81122806>] ? mntput+0x21/0x23 [<ffffffff8112d150>] vfs_fsync_range+0x18/0x21 [<ffffffff8112d170>] vfs_fsync+0x17/0x19 [<ffffffff8112d316>] do_fsync+0x29/0x3e [<ffffffff8112d348>] sys_fsync+0xb/0xf [<ffffffff81468352>] system_call_fastpath+0x16/0x1b Sometimes it causes BUG_ON() in the reservation code of the delayed inode is triggered. So we must reserve enough space for inode cache. Note: If we can not reserve the enough space for inode cache, we will give up writing out it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix nocow when deleting the item	Miao Xie	2011-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves, if we modify the result of it, the meta-data will be broken. fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Merge branch 'mount-fixes' of git://github.com/idryomov/btrfs-unstable into ↵	Chris Mason	2011-11-10
\| \|\ \| \| \| \| \| \| \| \| \|	integration
\| \| *	Btrfs: rework error handling in btrfs_mount()	Ilya Dryomov	2011-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer dereference on error paths, also we would leave all devices busy and leak fs_info with all sub-structures on error when trying to mount an already mounted fs to a different directory. Fix this by doing all allocations before trying to open any of the devices, adjust error path for mount-already-mounted-fs case. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
\| \| *	Btrfs: close devices on all error paths in open_ctree()	Ilya Dryomov	2011-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bug introduced by 7e662854 where we would leave devices busy on certain error paths in open_ctree(). fs_info is guaranteed to be non-NULL now so it's safe to dereference it on all error paths. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
\| \| *	Btrfs: avoid null dereference and leaks when bailing from open_ctree()	Ilya Dryomov	2011-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix bugs introduced by 6c41761f. Firstly, after failing to allocate any of the tree roots (first 'goto fail' in open_ctree()) we would dereference a NULL fs_info pointer in free_fs_info(). Secondly, after failures from init_srcu_struct(), setup_bdi() and new_inode() we would leak all earlier allocated roots: fs_info fields haven't been initialized yet so free_fs_info() is rendered useless. Fix this by initializing fs_info pointer and fs_info fields before any allocations happen. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
\| \| *	Btrfs: fix subvol_name leak on error in btrfs_mount()	Ilya Dryomov	2011-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs_parse_early_options() can fail due to error while scanning devices (-o device= option), but still strdup() subvol_name string: mount -o subvol=SUBV,device=BAD_DEVICE <dev> <mnt> So free subvol_name string on error. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
\| \| *	Btrfs: fix memory leak in btrfs_parse_early_options()	Ilya Dryomov	2011-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't leak subvol_name string in case multiple subvol= options are given. "The lastest option is effective" behavior (consistent with subvolid= and subvolrootid= options) is preserved. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
\| * \|	Btrfs: tweak the delayed inode reservations again	Chris Mason	2011-11-10
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Josef sent along an incremental to the inode reservation code to make sure we try and fall back to directly updating the inode item if things go horribly wrong. This reworks that patch slightly, adding a fallback function that will always try to update the inode item directly without going through the delayed_inode code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix our reservations for updating an inode when completing io	Josef Bacik	2011-11-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	People have been reporting ENOSPC crashes in finish_ordered_io. This is because we try to steal from the delalloc block rsv to satisfy a reservation to update the inode. The problem with this is we don't explicitly save space for updating the inode when doing delalloc. This is kind of a problem and we've gotten away with this because way back when we just stole from the delalloc reserve without any questions, and this worked out fine because generally speaking the leaf had been modified either by the mtime update when we did the original write or because we just updated the leaf when we inserted the file extent item, only on rare occasions had the leaf not actually been modified, and that was still ok because we'd just use a block or two out of the over-reservation that is delalloc. Then came the delayed inode stuff. This is amazing, except it wants a full reservation for updating the inode since it may do it at some point down the road after we've written the blocks and we have to recow everything again. This worked out because the delayed inode stuff just stole from the global reserve, that is until recently when I changed that because it caused other problems. So here we are, we're doing everything right and being screwed for it. So take an extra reservation for the inode at delalloc reservation time and carry it through the life of the delalloc reservation. If we need it we can steal it in the delayed inode stuff. If we have already stolen it try and do a normal metadata reservation. If that fails try to steal from the delalloc reservation. If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to solve this and in the meantime we'll steal from the global reserve. With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see any problems. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix oops on NULL trans handle in btrfs_truncate	Chris Mason	2011-11-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we fail to reserve space in the transaction during truncate, we can error out with a NULL trans handle. The cleanup code needs an extra check to make sure we aren't trying to use the bad handle. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	btrfs: fix double-free 'tree_root' in 'btrfs_mount()'	slyich@gmail.com	2011-11-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On error path 'tree_root' is treed in 'free_fs_info()'. No need to free it explicitely. Noticed by SLUB in debug mode: Complete reproducer under usermode linux (discovered on real machine): bdev=/dev/ubda btr_root=/btr /mkfs.btrfs $bdev mount $bdev $btr_root mkdir $btr_root/subvols/ cd $btr_root/subvols/ /btrfs su cr foo /btrfs su cr bar mount $bdev -osubvol=subvols/foo $btr_root/subvols/bar umount $btr_root/subvols/bar which gives device fsid 4d55aa28-45b1-474b-b4ec-da912322195e devid 1 transid 7 /dev/ubda ============================================================================= BUG kmalloc-2048: Object already free ----------------------------------------------------------------------------- INFO: Allocated in btrfs_mount+0x389/0x7f0 age=0 cpu=0 pid=277 INFO: Freed in btrfs_mount+0x51c/0x7f0 age=0 cpu=0 pid=277 INFO: Slab 0x0000000062886200 objects=15 used=9 fp=0x0000000070b4d2d0 flags=0x4081 INFO: Object 0x0000000070b4d2d0 @offset=21200 fp=0x0000000070b4a968 ... Call Trace: 70b31948: [<6008c522>] print_trailer+0xe2/0x130 70b31978: [<6008c5aa>] object_err+0x3a/0x50 70b319a8: [<6008e242>] free_debug_processing+0x142/0x2a0 70b319e0: [<600ebf6f>] btrfs_mount+0x55f/0x7f0 70b319f8: [<6008e5c1>] __slab_free+0x221/0x2d0 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Cc: Arne Jansen <sensille@gmx.net> Cc: Chris Mason <chris.mason@oracle.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2011-11-06
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits) Btrfs: check for a null fs root when writing to the backup root log Btrfs: fix race during transaction joins Btrfs: fix a potential btrfs_bio leak on scrub fixups Btrfs: rename btrfs_bio multi -> bbio for consistency Btrfs: stop leaking btrfs_bios on readahead Btrfs: stop the readahead threads on failed mount Btrfs: fix extent_buffer leak in the metadata IO error handling Btrfs: fix the new inspection ioctls for 32 bit compat Btrfs: fix delayed insertion reservation Btrfs: ClearPageError during writepage and clean_tree_block Btrfs: be smarter about committing the transaction in reserve_metadata_bytes Btrfs: make a delayed_block_rsv for the delayed item insertion Btrfs: add a log of past tree roots btrfs: separate superblock items out of fs_info Btrfs: use the global reserve when truncating the free space cache inode Btrfs: release metadata from global reserve if we have to fallback for unlink Btrfs: make sure to flush queued bios if write_cache_pages waits Btrfs: fix extent pinning bugs in the tree log Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN Btrfs: don't wait as long for more batches during SSD log commit ...
\| *	Btrfs: check for a null fs root when writing to the backup root log	Chris Mason	2011-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During log replay, can commit the transaction before the fs_root pointers are setup, so we have to make sure they are not null before trying to use them. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix race during transaction joins	Chris Mason	2011-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While we're allocating ram for a new transaction, we drop our spinlock. When we get the lock back, we do check to see if a transaction started while we slept, but we don't check to make sure it isn't blocked because a commit has already started. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix a potential btrfs_bio leak on scrub fixups	Ilya Dryomov	2011-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case we were able to map less than we wanted (length < PAGE_SIZE clause is true) btrfs_bio is still allocated and we have to free it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: rename btrfs_bio multi -> bbio for consistency	Ilya Dryomov	2011-11-06
\| \| \| \| \| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: stop leaking btrfs_bios on readahead	Ilya Dryomov	2011-11-06
\| \| \| \| \| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: stop the readahead threads on failed mount	Chris Mason	2011-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we don't stop them, they linger around corrupting memory by using pointers to freed things. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix extent_buffer leak in the metadata IO error handling	Chris Mason	2011-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scrub readahead branch brought in a new error handling hook, but it was leaking extent_buffer references. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Btrfs: fix the new inspection ioctls for 32 bit compat	Chris Mason	2011-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new ioctls to follow backrefs are not clean for 32/64 bit compat. This reworks them for u64s everywhere. They are brand new, so there are no problems with changing the interface now. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| *	Merge git://git.jan-o-sch.net/btrfs-unstable into integration	Chris Mason	2011-11-06
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: fs/btrfs/Makefile fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| \| *	btrfs: integrating raid-repair and scrub-fixup-nodatasum	Jan Schmidt	2011-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This ties nodatasum fixup in scrub together with raid repair patches. While both series are working fine alone, scrub will report uncorrectable errors if they occur in a nodatasum extent and the page is in the page cache. Previously, we would have triggered readpage to find good data and do the repair. However, readpage wouldn't read anything in the case where the page is up to date in the cache. So, we simply take that good data we have and call repair_io_failure directly (unless the page in the cache is dirty). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
\| \| *	btrfs: Moved repair code from inode.c to extent_io.c	Jan Schmidt	2011-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The raid-retry code in inode.c can be generalized so that it works for metadata as well. Thus, this patch moves it to extent_io.c and makes the raid-retry code a raid-repair code. Repair works that way: Whenever a read error occurs and we have more mirrors to try, note the failed mirror, and retry another. If we find a good one, check if we did note a failure earlier and if so, do not allow the read to complete until after the bad sector was written with the good data we just fetched. As we have the extent locked while reading, no one can change the data in between. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
\| \| *	btrfs: Put mirror_num in bi_bdev	Jan Schmidt	2011-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error correction code wants to make sure that only the bad mirror is rewritten. Thus, we need to know which mirror is the bad one. I did not find a more apropriate field than bi_bdev. But I think using this is fine, because it is modified by the block layer, anyway, and should not be read after the bio returned. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
\| \| *	btrfs: Do not use bio->bi_bdev after submission	Jan Schmidt	2011-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The block layer modifies bio->bi_bdev and bio->bi_sector while working on the bio, they do _not_ come back unmodified in the completion callback. To call add_page, we need at least some bi_bdev set, which is why the code was working, previously. With this patch, we use the latest_bdev from fsinfo instead of the leftover in the bio. This gives us the possibility to use the bi_bdev field for another purpose. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>