aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Btrfs: fix deadlock in wait_for_more_refsArne Jansen2012-08-28
| | | | | | | | | | | | | | | | Commit a168650c introduced a waiting mechanism to prevent busy waiting in btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations, where a tree_mod_seq is held while waiting for the io to complete, while the end_io calls btrfs_run_delayed_refs. This whole mechanism is unnecessary. If not enough runnable refs are available to satisfy count, just return as count is more like a guideline than a strict requirement. In case we have to run all refs, commit transaction makes sure that no other threads are working in the transaction anymore, so we just assert here that no refs are blocked. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* btrfs: fix second lock in btrfs_delete_delayed_items()Fengguang Wu2012-08-28
| | | | | | | | Fix a real bug caught by coccinelle. fs/btrfs/delayed-inode.c:1013:1-11: second lock on line 1013 Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
* Btrfs: don't allocate a seperate csums array for direct readsJosef Bacik2012-08-28
| | | | | | | | | | | | We've been allocating a big array for csums instead of storing them in the io_tree like we do for buffered reads because previously we were locking the entire range, so we didn't have an extent state for each sector of the range. But now that we do the range locking as we map the buffers we can limit the mapping lenght to sectorsize and use the private part of the io_tree for our csums. This allows us to avoid an extra memory allocation for direct reads which could incur latency. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: do not strdup non existent stringsJosef Bacik2012-08-28
| | | | | | | | | When we close devices we add back empty devices for some reason that escapes me. In the case of a missing dev we don't allocate an rcu_string for it's name, so check to see if the device has a name and if it doesn't don't bother strdup()'ing it. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: do not use missing devices when showing devnameJosef Bacik2012-08-28
| | | | | | | | | | | | | | | If you do the following mkfs.btrfs /dev/sdb /dev/sdc rmmod btrfs dd if=/dev/zero of=/dev/sdb bs=1M count=1 mount -o degraded /dev/sdc /mnt/btrfs-test the box will panic trying to deref the name for the missing dev since it is the lower numbered devid. So fix show_devname to not use missing devices. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix that error value is changed by mistakeStefan Behrens2012-08-28
| | | | | | | | | In iterate_inodes_from_logical() the error result from extent_from_logical() is patched by mistake. Typically ENOENT is patched to EINVAL because (-ENOENT & BTRFS_EXTENT_FLAG_TREE_BLOCK) evaluates to true. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
* Btrfs: lock extents as we map them in DIOJosef Bacik2012-08-28
| | | | | | | | | | | | | | | | | | | | | | A deadlock in xfstests 113 was uncovered by commit d187663ef24cd3d033f0cbf2867e70b36a3a90b8 This is because we would not return EIOCBQUEUED for short AIO reads, instead we'd wait for the DIO to complete and then return the amount of data we transferred, which would allow our stuff to unlock the remaning amount. But with this change this no longer happens, so if we have a short AIO read (for example if we try to read past EOF), we could leave the section from EOF to the end of where we tried to read locked. Fixing this is tricky since there is no clear way to know exactly how much data DIO truly submitted for IO, so to make this less hard on ourselves and less combersome we need to lock the extents as we try to map them, and then we unlock any areas we didn't actually map. This makes us completely safe from deadlocks and reliance on a particular behavior of the DIO code. This also lays the groundwork for allowing us to use the normal csum storage method for reads which means we can remove an allocation. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix some endian bugs handling the root timesDan Carpenter2012-08-28
| | | | | | | "trans->transid" is cpu endian but we want to store the data as little endian. "item->ctime.nsec" is only 32 bits, not 64. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
* Btrfs: unlock on error in btrfs_delalloc_reserve_metadata()Dan Carpenter2012-08-28
| | | | | | We should release this mutex before returning the error code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
* Btrfs: checking for NULL instead of IS_ERRDan Carpenter2012-08-28
| | | | | | add_qgroup_rb() never returns NULL, only error pointers. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
* Btrfs: fix some error codes in btrfs_qgroup_inherit()Dan Carpenter2012-08-28
| | | | | | | These are returning zero when it should be returning a negative error code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
* Btrfs: fix a misplaced address operator in a conditionStefan Behrens2012-08-28
| | | | | | This should obviously not be "if (&flag)" but "if (flag)". Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
* Btrfs: uninit variable fixes in send/receiveChris Mason2012-07-25
| | | | Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linusChris Mason2012-07-25
|\ | | | | | | | | | | | | | | | | | | | | | | | | This is the kernel portion of btrfs send/receive Conflicts: fs/btrfs/Makefile fs/btrfs/backref.h fs/btrfs/ctree.c fs/btrfs/ioctl.c fs/btrfs/ioctl.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receiveAlexander Block2012-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the BTRFS_IOC_SEND ioctl that is required for send. It allows btrfs-progs to implement full and incremental sends. Patches for btrfs-progs will follow. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
| * Btrfs: add btrfs_compare_trees functionAlexander Block2012-07-25
| | | | | | | | | | | | | | | | | | | | | | | | This function is used to find the differences between two trees. The tree compare skips whole subtrees if it detects shared tree blocks and thus is pretty fast. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
| * Btrfs: introduce subvol uuids and timesAlexander Block2012-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
| * Btrfs: make iref_to_path non staticAlexander Block2012-07-25
| | | | | | | | | | | | | | Make iref_to_path non static (needed in send) and rename it to btrfs_iref_to_path Signed-off-by: Alexander Block <ablock84@googlemail.com>
| * Btrfs: add helper for tree enumerationArne Jansen2012-07-25
| | | | | | | | | | | | | | | | | | Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * btrfs: allow cross-subvolume file cloneDavid Sterba2012-07-25
| | | | | | | | | | | | | | | | | | Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from one filesystem are mounted separately. Signed-off-by: David Sterba <dsterba@suse.cz>
* | Btrfs: add a barrier before a waitqueue_active checkChris Mason2012-07-25
| | | | | | | | | | | | | | We were missing wakeups on the delayed ref waitqueue due to races on waitqueue_active. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* | Btrfs: call the ordered free operation without any locks heldChris Mason2012-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason <chris.mason@fusionio.com> cc: stable@kernel.org
* | Btrfs: Check INCOMPAT flags on remount and add helper functionMitch Harder2012-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* | Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linusChris Mason2012-07-25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: fs/btrfs/ioctl.c fs/btrfs/ioctl.h fs/btrfs/transaction.c fs/btrfs/transaction.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * | Btrfs: add qgroup inheritanceArne Jansen2012-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a subvolume or snapshot, it is necessary to initialize the qgroup account with a copy of some other (tracking) qgroup. This patch adds parameters to the ioctls to pass the information from which qgroup to inherit. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: add qgroup ioctlsArne Jansen2012-07-12
| | | | | | | | | | | | | | | | | | | | | Ioctls to control the qgroup feature like adding and removing qgroups and assigning qgroups. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: hooks to reserve qgroup spaceArne Jansen2012-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like block reserves, reserve a small piece of space on each transaction start and for delalloc. These are the hooks that can actually return EDQUOT to the user. The amount of space reserved is tracked in the transaction handle. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: hooks for qgroup to record delayed refsJan Schmidt2012-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Hooks into qgroup code to record refs and into transaction commit. This is the main entry point for qgroup. Basically every change in extent backrefs got accounted to the appropriate qgroups. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | Btrfs: quota tree support and startupArne Jansen2012-07-12
| | | | | | | | | | | | | | | | | | | | | | | | Init the quota tree along with the others on open_ctree and close_ctree. Add the quota tree to the list of well known trees in btrfs_read_fs_root_no_name. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: call the qgroup accounting functionsJan Schmidt2012-07-12
| | | | | | | | | | | | Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | Btrfs: qgroup implementation and prototypesArne Jansen2012-07-12
| | | | | | | | | | | | | | | Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | Btrfs: Test code to change the order of delayed-ref processingArne Jansen2012-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally delayed refs get processed in ascending bytenr order. This correlates in most cases to the order added. To expose dependencies on this order, we start to process the tree in the middle instead of the beginning. This code is only effective when SCRAMBLE_DELAYED_REFS is defined. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: qgroup state and initializationArne Jansen2012-07-10
| | | | | | | | | | | | | | | | | | Add state to fs_info. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: added helper to create new treesArne Jansen2012-07-10
| | | | | | | | | | | | | | | | | | | | | This creates a brand new tree. Will be used to create the quota tree. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: check the root passed to btrfs_end_transactionArne Jansen2012-07-10
| | | | | | | | | | | | | | | | | | | | | | | | This patch only add a consistancy check to validate that the same root is passed to start_transaction and end_transaction. Subvolume quota depends on this. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: add helper for tree enumerationArne Jansen2012-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: qgroup on-disk formatArne Jansen2012-07-10
| | | | | | | | | | | | | | | | | | | | | Not all features are in use by the current version and thus may change in the future. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: join tree mod log code with the code holding back delayed refsJan Schmidt2012-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've got two mechanisms both required for reliable backref resolving (tree mod log and holding back delayed refs). You cannot make use of one without the other. So instead of requiring the user of this mechanism to setup both correctly, we join them into a single interface. Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list as we did before, which was of no value. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | Btrfs: fix buffer leak in btrfs_next_old_leafJan Schmidt2012-07-10
| | | | | | | | | | | | | | | | | | | | | When calling btrfs_next_old_leaf, we were leaking an extent buffer in the rare case of using the deadlock avoidance code needed for the tree mod log. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* | | Btrfs: improve multi-thread buffer readLiu Bo2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing with my buffer read fio jobs[1], I find that btrfs does not perform well enough. Here is a scenario in fio jobs: We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file, and all of them will race on add_to_page_cache_lru(), and if one thread successfully puts its page into the page cache, it takes the responsibility to read the page's data. And what's more, reading a page needs a period of time to finish, in which other threads can slide in and process rest pages: t1 t2 t3 t4 add Page1 read Page1 add Page2 | read Page2 add Page3 | | read Page3 add Page4 | | | read Page4 -----|------------|-----------|-----------|-------- v v v v bio bio bio bio Now we have four bios, each of which holds only one page since we need to maintain consecutive pages in bio. Thus, we can end up with far more bios than we need. Here we're going to a) delay the real read-page section and b) try to put more pages into page cache. With that said, we can make each bio hold more pages and reduce the number of bios we need. Here is some numbers taken from fio results: w/o patch w patch ------------- -------- --------------- READ: 745MB/s +25% 934MB/s [1]: [global] group_reporting thread numjobs=4 bs=32k rw=read ioengine=sync directory=/mnt/btrfs/ [READ] filename=foobar size=2000M invalidate=1 Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* | | Btrfs: make btrfs's allocation smoothly with preallocationLiu Bo2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For backref walking, we've introduce delayed ref's sequence. However, it changes our preallocation behavior. The story is that when we preallocate an extent and then mark it written piece by piece, the ideal case should be that we don't need to COW the extent, which is why we use 'preallocate'. But we may not make use of preallocation, since when we check for cross refs on the extent, we may have two ref entries which have the same content except the sequence value, and we recognize them as cross refs and do COW to allocate another extent. So we end up with several pieces of space instead of an whole extent. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* | | Btrfs: lock the transition from dirty to writeback for an ebJosef Bacik2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a small window where an eb can have no IO bits set on it, which could potentially result in extent_buffer_under_io() returning false when we want it to return true, which could result in not fun things happening. So in order to protect this case we need to hold the refs_lock when we make this transition to make sure we get reliable results out of extent_buffer_udner_io(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* | | Btrfs: fix potential race in extent buffer freeingJosef Bacik2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This sounds sort of impossible but it is the only thing I can think of and at the very least it is theoretically possible so here it goes. If we are in try_release_extent_buffer we will check that the ref count on the extent buffer is 1 and not under IO, and then go down and clear the tree ref. If between this check and clearing the tree ref somebody else comes in and grabs a ref on the eb and the marks it dirty before try_release_extent_buffer() does it's tree ref clear we can end up with a dirty eb that will be freed while it is still dirty which will result in a panic. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* | | Btrfs: don't return true in releasepage unless we actually freed the ebJosef Bacik2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed while looking at an extent_buffer race that we will unconditionally return 1 if we get down to release_extent_buffer after clearing the tree ref. However we can easily race in here and get a ref on the eb and not actually free the eb. So make release_extent_buffer return 1 if it free'd the eb and 0 if not so we can be a little kinder to the vm. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* | | Btrfs: suppress printk() if all device I/O stats are zeroStefan Behrens2012-07-23
| | | | | | | | | | | | | | | | | | | | | Code is added to suppress the I/O stats printing at mount time if all statistic values are zero. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
* | | Btrfs: remove unwanted printk() for btrfs device I/O statsStefan Behrens2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | People complained about the annoying kernel log message "btrfs: no dev_stats entry found ... (OK on first mount after mkfs)" everytime a filesystem is mounted for the first time after running mkfs. Since the distribution of the btrfs-progs is not synchronized to the kernel version, mkfs like it is now will be used also in the future. Then this message is not useful to find errors, it is just annoying. This commit removes the printk(). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
* | | Btrfs: rewrite BTRFS_SETGET_FUNCSLi Zefan2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and btrfs_foo() functions, which read and write specific fields in the extent buffer. The total number of set/get functions is ~200, but in fact we only need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64. It results in redunction of ~37K bytes. text data bss dec hex filename 629661 12489 216 642366 9cd3e fs/btrfs/btrfs.o.orig 592637 12489 216 605342 93c9e fs/btrfs/btrfs.o Signed-off-by: Li Zefan <lizefan@huawei.com>
* | | Btrfs: zero unused bytes in inode itemLi Zefan2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The otime field is not zeroed, so users will see random otime in an old filesystem with a new kernel which has otime support in the future. The reserved bytes are also not zeroed, and we'll have compatibility issue if we make use of those bytes. Signed-off-by: Li Zefan <lizefan@huawei.com>
* | | Btrfs: kill free_space pointer from inode structureLi Zefan2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type, which means every inode has the same BTRFS_I(inode)->free_space pointer. This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits). Signed-off-by: Li Zefan <lizefan@huawei.com>
* | | btrfs read error corrected message floods the console during recoveryAnand Jain2012-07-23
| | | | | | | | | | | | | | | | | | Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice Signed-off-by: Josef Bacik <jbacik@fusionio.com>