| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
We cannot just return error before freeing ordered extent and releasing reserved
space when we fail to start a transacion.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9
(Btrfs: lock extents as we map them in DIO).
In dio write, we should unlock the section which we didn't do IO on in case that
we fall back to buffered write. But we need to not only unlock the section
but also cleanup reserved space for the section.
This bug was found while running xfstests 133, with this 133 no longer complains.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can deadlock with freeze right now because we unconditionally start a
transaction in our ->sync_fs() call. To fix this just check and see if we
have a running transaction to commit. This saves us from the deadlock
because at this point we'll have the umount sem for the sb so we're safe
from freezes coming in after we've done our check. With this patch the
freeze xfstests no longer deadlocks. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 442a4f6308e694e0fa6025708bd5e4e424bbf51c added btrfs device
statistic counters for detected IO and checksum errors to Linux 3.5.
The statistic part that counts checksum errors in
end_bio_extent_readpage() can cause a BUG() in a subfunction:
"kernel BUG at fs/btrfs/volumes.c:3762!"
That part is reverted with the current patch.
However, the counting of checksum errors in the scrub context remains
active, and the counting of detected IO errors (read, write or flush
errors) in all contexts remains active.
Cc: stable <stable@vger.kernel.org> # 3.5
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With commit acce952b0, btrfs was changed to flag the filesystem with
BTRFS_SUPER_FLAG_ERROR and switch to read-only mode after a fatal
error happened like a write I/O errors of all mirrors.
In such situations, on unmount, the superblock is written in
btrfs_error_commit_super(). This is done with the intention to be able
to evaluate the error flag on the next mount. A warning is printed
in this case during the next mount and the log tree is ignored.
The issue is that it is possible that the superblock points to a root
that was not written (due to write I/O errors).
The result is that the filesystem cannot be mounted. btrfsck also does
not start and all the other btrfs-progs tools fail to start as well.
However, mount -o recovery is working well and does the right things
to recover the filesystem (i.e., don't use the log root, clear the
free space cache and use the next mountable root that is stored in the
root backup array).
This patch removes the writing of the superblock when
BTRFS_SUPER_FLAG_ERROR is set, and removes the handling of the error
flag in the mount function.
These lines can be used to reproduce the issue (using /dev/sdm):
SCRATCH_DEV=/dev/sdm
SCRATCH_MNT=/mnt
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup create foo
ls -alLF /dev/mapper/foo
mkfs.btrfs /dev/mapper/foo
mount /dev/mapper/foo $SCRATCH_MNT
echo bar > $SCRATCH_MNT/foo
sync
echo 0 25165824 error | dmsetup reload foo
dmsetup resume foo
ls -alF $SCRATCH_MNT
touch $SCRATCH_MNT/1
ls -alF $SCRATCH_MNT
sleep 35
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup reload foo
dmsetup resume foo
sleep 1
umount $SCRATCH_MNT
btrfsck /dev/mapper/foo
dmsetup remove foo
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Daniel Blueman reported a bug with fio+balance on a ramdisk setup.
Basically what happens is the balance relocates a tree block which will drop
the implicit refs for all of its children and adds a full backref. Once the
block is relocated we have to add the implicit refs back, so when we cow the
block again we add the implicit refs for its children back. The problem
comes when the original drop ref doesn't get run before we add the implicit
refs back. The delayed ref stuff will specifically prefer ADD operations
over DROP to keep us from freeing up an extent that will have references to
it, so we try to add the implicit ref before it is actually removed and we
panic. This worked fine before because the add would have just canceled the
drop out and we would have been fine. But the backref walking work needs to
be able to freeze the delayed ref stuff in time so we have this ever
increasing sequence number that gets attached to all new delayed ref updates
which makes us not merge refs and we run into this issue.
So to fix this we need to merge delayed refs. So everytime we run a
clustered ref we need to try and merge all of its delayed refs. The backref
walking stuff locks the delayed ref head before processing, so if we have it
locked we are safe to merge any refs inside of the sequence number. If
there is no sequence number we can merge all refs. Doing this not only
fixes our bug but keeps the delayed ref code from adding and removing
useless refs and batching together multiple refs into one search instead of
one search per delayed ref, which will really help our commit times. I ran
this with Daniels test and 276 and I haven't seen any problems. Thanks,
Reported-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Subvol delete is a special kind of awful where we use the global reserve to
cover the ENOSPC requirements. The problem is once we're done removing
everything we do a btrfs_update_inode(), which by default will try to do the
delayed update stuff which will use it's own reserve. There will be no
space in this reserve and we'll return ENOSPC. So instead use
btrfs_update_inode_fallback() which will just fallback to updating the inode
item in the case of enospc. This is fine because the global reserve covers
the space requirements for this. With this patch I can now delete a subvol
on a problem image Dave Sterba sent me. Thanks,
Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
| |
When we created a new snapshot, the mtime and ctime of its parent directory
were not updated. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With commit
commit d1270cd91f308c9d22b2804720c36ccd32dbc35e
Author: Arne Jansen <sensille@gmx.net>
Date: Tue Sep 13 15:16:43 2011 +0200
Btrfs: put back delayed refs that are too new
I added a window where the delayed_ref's head->ref_mod code can diverge
from the sum of the remaining refs, because we release the head->mutex
in the middle. This leads to btrfs_lookup_extent_info returning wrong
numbers. This patch fixes this by adjusting the head's ref_mod with each
delayed ref we run.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
| |
When we split a leaf, we may end up inserting a new root on top of that
leaf. The reflog code was incorrectly assuming the old root was always
a node. This makes sure we skip over leaves.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arne was complaining about the space cache having mismatching generation
numbers when debugging a deadlock. This is because we can run out of space
in our preallocated range for our space cache if you have a pretty
fragmented amount of space in your pinned space. So just increase the
amount of space we preallocate for space cache so we can be sure to have
enough space. This will only really affect data ranges since their the only
chunks that end up larger than 256MB. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
|
|
| |
We need a barrir before calling waitqueue_active otherwise we will miss
wakeups. So in places that do atomic_dec(); then atomic_read() use
atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
and then add an explicit memory barrier everywhere else that need them.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit a168650c introduced a waiting mechanism to prevent busy waiting in
btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
where a tree_mod_seq is held while waiting for the io to complete, while
the end_io calls btrfs_run_delayed_refs.
This whole mechanism is unnecessary. If not enough runnable refs are
available to satisfy count, just return as count is more like a guideline
than a strict requirement.
In case we have to run all refs, commit transaction makes sure that no
other threads are working in the transaction anymore, so we just assert
here that no refs are blocked.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
| |
Fix a real bug caught by coccinelle.
fs/btrfs/delayed-inode.c:1013:1-11: second lock on line 1013
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been allocating a big array for csums instead of storing them in the
io_tree like we do for buffered reads because previously we were locking the
entire range, so we didn't have an extent state for each sector of the
range. But now that we do the range locking as we map the buffers we can
limit the mapping lenght to sectorsize and use the private part of the
io_tree for our csums. This allows us to avoid an extra memory allocation
for direct reads which could incur latency. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
| |
When we close devices we add back empty devices for some reason that escapes
me. In the case of a missing dev we don't allocate an rcu_string for it's
name, so check to see if the device has a name and if it doesn't don't
bother strdup()'ing it. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you do the following
mkfs.btrfs /dev/sdb /dev/sdc
rmmod btrfs
dd if=/dev/zero of=/dev/sdb bs=1M count=1
mount -o degraded /dev/sdc /mnt/btrfs-test
the box will panic trying to deref the name for the missing dev since it is
the lower numbered devid. So fix show_devname to not use missing devices.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
| |
In iterate_inodes_from_logical() the error result from
extent_from_logical() is patched by mistake. Typically ENOENT is
patched to EINVAL because (-ENOENT & BTRFS_EXTENT_FLAG_TREE_BLOCK)
evaluates to true.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A deadlock in xfstests 113 was uncovered by commit
d187663ef24cd3d033f0cbf2867e70b36a3a90b8
This is because we would not return EIOCBQUEUED for short AIO reads, instead
we'd wait for the DIO to complete and then return the amount of data we
transferred, which would allow our stuff to unlock the remaning amount. But
with this change this no longer happens, so if we have a short AIO read (for
example if we try to read past EOF), we could leave the section from EOF to
the end of where we tried to read locked. Fixing this is tricky since there
is no clear way to know exactly how much data DIO truly submitted for IO, so
to make this less hard on ourselves and less combersome we need to lock the
extents as we try to map them, and then we unlock any areas we didn't
actually map. This makes us completely safe from deadlocks and reliance on
a particular behavior of the DIO code. This also lays the groundwork for
allowing us to use the normal csum storage method for reads which means we
can remove an allocation. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
| |
"trans->transid" is cpu endian but we want to store the data as little
endian. "item->ctime.nsec" is only 32 bits, not 64.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
|
|
|
|
|
|
| |
We should release this mutex before returning the error code.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
|
|
|
|
|
|
| |
add_qgroup_rb() never returns NULL, only error pointers.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
|
|
|
|
|
|
|
| |
These are returning zero when it should be returning a negative error
code.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
|
|
|
|
|
|
| |
This should obviously not be "if (&flag)" but "if (flag)".
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
|
|
|
|
| |
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is the kernel portion of btrfs send/receive
Conflicts:
fs/btrfs/Makefile
fs/btrfs/backref.h
fs/btrfs/ctree.c
fs/btrfs/ioctl.c
fs/btrfs/ioctl.h
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This function is used to find the differences between
two trees. The tree compare skips whole subtrees if it
detects shared tree blocks and thus is pretty fast.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.
It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.
Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.
btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.
If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
Make iref_to_path non static (needed in send) and rename
it to btrfs_iref_to_path
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.
Signed-off-by: David Sterba <dsterba@suse.cz>
|
| |
| |
| |
| |
| |
| |
| | |
We were missing wakeups on the delayed ref waitqueue due
to races on waitqueue_active.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Each ordered operation has a free callback, and this was called with the
worker spinlock held. Josef made the free callback also call iput,
which we can't do with the spinlock.
This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list. We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.
Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
fs/btrfs/ioctl.c
fs/btrfs/ioctl.h
fs/btrfs/transaction.c
fs/btrfs/transaction.h
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When creating a subvolume or snapshot, it is necessary
to initialize the qgroup account with a copy of some
other (tracking) qgroup. This patch adds parameters
to the ioctls to pass the information from which qgroup
to inherit.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Ioctls to control the qgroup feature like adding and
removing qgroups and assigning qgroups.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Like block reserves, reserve a small piece of space on each
transaction start and for delalloc. These are the hooks that
can actually return EDQUOT to the user.
The amount of space reserved is tracked in the transaction
handle.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Hooks into qgroup code to record refs and into transaction commit.
This is the main entry point for qgroup. Basically every change in
extent backrefs got accounted to the appropriate qgroups.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Init the quota tree along with the others on open_ctree
and close_ctree. Add the quota tree to the list of well
known trees in btrfs_read_fs_root_no_name.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Normally delayed refs get processed in ascending bytenr order. This
correlates in most cases to the order added. To expose dependencies
on this order, we start to process the tree in the middle instead of
the beginning.
This code is only effective when SCRAMBLE_DELAYED_REFS is defined.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Add state to fs_info.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This creates a brand new tree. Will be used to create
the quota tree.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch only add a consistancy check to validate that the
same root is passed to start_transaction and end_transaction.
Subvolume quota depends on this.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Not all features are in use by the current version
and thus may change in the future.
Signed-off-by: Arne Jansen <sensille@gmx.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We've got two mechanisms both required for reliable backref resolving (tree
mod log and holding back delayed refs). You cannot make use of one without
the other. So instead of requiring the user of this mechanism to setup both
correctly, we join them into a single interface.
Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list
as we did before, which was of no value.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|