| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
I hit this a couple times while working on my fsync patch (all my bugs, not
normal operation), but with my new stuff we could have new errors from cases
I have not encountered, so instead of BUG()'ing we should be WARN()'ing so
that we are notified there is a problem but the user doesn't lose their
data. We can easily commit the transaction in the case that the tree
logging fails and still be fine, so let's try and be as nice to the user as
possible. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At least for the vm workload. Currently on fsync we will
1) Truncate all items in the log tree for the given inode if they exist
and
2) Copy all items for a given inode into the log
The problem with this is that for things like VMs you can have lots of
extents from the fragmented writing behavior, and worst yet you may have
only modified a few extents, not the entire thing. This patch fixes this
problem by tracking which transid modified our extent, and then when we do
the tree logging we find all of the extents we've modified in our current
transaction, sort them and commit them. We also only truncate up to the
xattrs of the inode and copy that stuff in normally, and then just drop any
extents in the range we have that exist in the log already. Here are some
numbers of a 50 meg fio job that does random writes and fsync()s after every
write
Original Patched
SATA drive 82KB/s 140KB/s
Fusion drive 431KB/s 2532KB/s
So around 2-6 times faster depending on your hardware. There are a few
corner cases, for example if you truncate at all we have to do it the old
way since there is no way to be sure what is in the log is ok. This
probably could be done smarter, but if you write-fsync-truncate-write-fsync
you deserve what you get. All this work is in RAM of course so if your
inode gets evicted from cache and you read it in and fsync it we'll do it
the slow way if we are still in the same transaction that we last modified
the inode in.
The biggest cool part of this is that it requires no changes to the recovery
code, so if you fsync with this patch and crash and load an old kernel, it
will run the recovery and be a-ok. I have tested this pretty thoroughly
with an fsync tester and everything comes back fine, as well as xfstests.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While working on my fsync patch my fsync tester kept hitting mismatching
md5sums when I would randomly write to a prealloc'ed region, syncfs() and
then write to the prealloced region some more and then fsync() and then
immediately reboot. This is because the tree logging code will skip writing
csums for file extents who's generation is less than the current running
transaction. When we mark extents as written we haven't been updating their
generation so they were always being skipped. This wouldn't happen if you
were to preallocate and then write in the same transaction, but if you for
example prealloced a VM you could definitely run into this problem. This
patch makes my fsync tester happy again. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
|
| |
Swinging this pendulum back the other way. We've been allocating chunks up
to 2% of the disk no matter how much we actually have allocated. So instead
fix this calculation to only allocate chunks if we have more than 80% of the
space available allocated. Please test this as it will likely cause all
sorts of ENOSPC problems to pop up suddenly. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a completely impossible situation to hit where you can preallocate
a file, fsync it, write into the preallocated region, have the transaction
commit twice and then fsync and then immediately lose power and lose all of
the contents of the write. This patch fixes this just so I feel better
about the situation and because it is lightweight, we just update the
last_trans when we finish an ordered IO and we don't update the inode
itself. This way we are completely safe and I feel better. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
|
|
|
| |
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The btrfs send code was assuming the offset of the file item into the
extent translated to bytes on disk. If we're compressed, this isn't
true, and so it was off into extents owned by other files.
It was also improperly handling inline extents. This solves a crash
where we may have gone past the end of the file extent item by not
testing early enough for an inline extent. It also solves problems
where we have a whole between the end of the inline item and the start
of the full extent.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
|
|
|
|
|
|
| |
We can't do the deleted/reused logic for top/root inodes as it would
create a stream that tries to delete and recreate the root dir.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
We have to ignore inode/space cache objects in send/receive.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
We need to pass the root that we determined earlier to iterate_inode_ref.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
Used the wrong compare operator here.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
| |
The previous check was working fine, but this check should be
easier to read. Also, we could theoritically have some exotic
bugs with the previous checks.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
Both were leaked in case of error.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
A leftover from older code and unused now.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
Doing some code cleanups as suggested by Arne.
Changes do not change any logic.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
| |
As the subject already said, add/fix comments.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
|
| |
Updating send_progress in process_recorded_refs was not correct.
It got updated too early in the cur_inode_new_gen case.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
|
|
| |
Btrfs send/receive uses the aux field to store inode numbers. On
32 bit machines this may become a problem.
Also fix all users of ulist_add and ulist_add_merged.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
|
|
| |
We can't easily use the index of the radix tree for inums as the
radix tree uses 32bit indexes on 32bit kernels. For 32bit kernels,
we now use the lower 32bit of the inum as index and an additional
list to store multiple entries per radix tree entry.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
When everything is done, name_cache_free is called which however
forgot to call kfree on the cache entries.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
|
|
| |
If we break, we may miss the clone from send_root which we prefer
over all other clones.
Commit is a result of Arne's review.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
|
| |
Don't have a seperate return path for the mentioned case. Now
we do the same "take lowest inode/offset" logic for all found clones.
Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
|
| |
Make sure to never get in trouble due to the backref_ctx
which was on the stack before.
Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
| |
The new name should be easier to understand/read.
Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
| |
use_list is a leftover and unused.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
| |
We only added the parent for the new position of a moved dir.
We also need to add the old parent of the moved dir.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
| |
fs_path_remove is not used at the moment due to a previous patch.
Remove it for now (with #if 0) to avoid compile warnings.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
| |
We missed that check which resultet in all refs with the same name
being reported as first_ref.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
|
|
|
|
| |
When the current inodes inum is smaller then the inum of the
parent directory strange things were happending due to wrong
path resolution and other bugs. Fix this with a new approach
for the problem.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
|
|
|
|
| |
We need rdev in the next commit.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IBM reported a deadlock in select_parent(). This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.
There are two cases when the traversal needs to be restarted:
1) concurrent d_move(); this can only happen when not already locked,
since taking rename_lock protects against concurrent d_move().
2) racing with final d_put() on child just at the moment of ascending
to parent; rename_lock doesn't protect against this rare race, so it
can happen when already locked.
Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held. This patch fixes all three callers of
try_to_ascend().
IBM reported that the deadlock is gone with this patch.
[ I rewrote the patch to be smaller and just do the "goto again" if the
lock was already held, but credit goes to Miklos for the real work.
- Linus ]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Two small patches:
* One patch to fix the function declarations for
!CONFIG_IOMMU_API. This is causing build errors
in linux-next and should be fixed for v3.6.
* Another patch to fix an IOMMU group related NULL pointer
dereference."
* tag 'iommu-fixes-v3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix wrong assumption in iommu-group specific code
iommu: static inline iommu group stub functions
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The new IOMMU groups code in the AMD IOMMU driver makes the
assumption that there is a pci_dev struct available for all
device-ids listed in the IVRS ACPI table. Unfortunatly this
assumption is not true and so this code causes a NULL
pointer dereference at boot on some systems.
Fix it by making sure the given pointer is never NULL when
passed to the group specific code. The real fix is larger
and will be queued for v3.7.
Reported-by: Florian Dazinger <florian@dazinger.net>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull NVMe driver fixes from Matthew Wilcox:
"Now that actual hardware has been released (don't have any yet
myself), people are starting to want some of these fixes merged."
Willy doesn't have hardware? Guys...
* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Cancel outstanding IOs on queue deletion
NVMe: Free admin queue memory on initialisation failure
NVMe: Use ida for nvme device instance
NVMe: Fix whitespace damage in nvme_init
NVMe: handle allocation failure in nvme_map_user_pages()
NVMe: Fix uninitialized iod compiler warning
NVMe: Do not set IO queue depth beyond device max
NVMe: Set block queue max sectors
NVMe: use namespace id for nvme_get_features
NVMe: replace nvme_ns with nvme_dev for user admin
NVMe: Fix nvme module init when nvme_major is set
NVMe: Set request queue logical block size
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the device is hot-unplugged while there are active commands, we should
time out the I/Os so that upper layers don't just see the I/Os disappear.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the adapter fails initialisation, the memory allocated for the
admin queue may not be freed. Split the memory freeing part of
nvme_free_queue() into nvme_free_queue_mem() and call it in the case of
initialisation failure.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Quoc-Son Anh <quoc-sonx.anh@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 5c42ea1643 used spaces instead of tabs.
Also remove the unnecessary initialisation of the 'result' variable.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We should return here and avoid a NULL dereference.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Set the depth for IO queues to the device's maximum supported queue
entries if the requested depth exceeds the device's capabilities.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Set the max hw sectors in a namespace's request queue if the nvme device
has a max data transfer size.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The specification does not provide a use for command dword11 in the NVMe
Get Features command, but does use the NSID for some features.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The function nvme_user_admin_command does not require a namespace to
proceed. Replace with the nvme_dev structure so that it can be called
from contexts that do not have a namespace.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
register_blkdev returns 0 when given a valid major number.
Reported-by:Ross Zwisler <ross.zwisler@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Sets the request queue logical block size with the block size of the
namespace.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Sasha Levin has been running trinity in a KVM tools guest, and was able
to trigger the BUG_ON() at arch/x86/mm/pat.c:279 (verifying the range of
the memory type). The call trace showed that it was mtdchar_mmap() that
created an invalid remap_pfn_range().
The problem is that mtdchar_mmap() does various really odd and subtle
things with the vma page offset etc, and uses the wrong types (and the
wrong overflow) detection for it.
For example, the page offset may well be 32-bit on a 32-bit
architecture, but after shifting it up by PAGE_SHIFT, we need to use a
potentially 64-bit resource_size_t to correctly hold the full value.
Also, we need to check that the vma length plus offset doesn't overflow
before we check that it is smaller than the length of the mtdmap region.
This fixes things up and tries to make the code a bit easier to read.
Reported-and-tested-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull networking fixes from David S Miller:
1) Netfilter xt_limit module can use uninitialized rules, from Jan
Engelhardt.
2) Wei Yongjun has found several more spots where error pointers were
treated as NULL/non-NULL and vice versa.
3) bnx2x was converted to pci_io{,un}map() but one remaining plain
iounmap() got missed. From Neil Horman.
4) Due to a fence-post type error in initialization of inetpeer entries
(which is where we store the ICMP rate limiting information), we can
erroneously drop ICMPs if the inetpeer was created right around when
jiffies wraps.
Fix from Nicolas Dichtel.
5) smsc75xx resume fix from Steve Glendinnig.
6) LAN87xx smsc chips need an explicit hardware init, from Marek Vasut.
7) qlcnic uses msleep() with locks held, fix from Narendra K.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netdev: octeon: fix return value check in octeon_mgmt_init_phy()
inetpeer: fix token initialization
qlcnic: Fix scheduling while atomic bug
bnx2: Clean up remaining iounmap
net: phy: smsc: Implement PHY config_init for LAN87xx
smsc75xx: fix resume after device reset
netdev: pasemi: fix return value check in pasemi_mac_phy_init()
team: fix return value check
l2tp: fix return value check
netfilter: xt_limit: have r->cost != 0 case work
|