| Commit message (Collapse) | Author | Age |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes the following problem:
- If we failed to deal with the delayed dir items, we should abort transaction,
just as its comment said. Fix it.
- If root reference or root back reference insertion failed, we should
abort transaction. Fix it.
- Fix the double free problem of pending->inherit.
- Do not restore the trans->rsv if we doesn't change it.
- make the error path more clearly.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bbio has been malloced in btrfs_map_block() and should be
freed before leaving from the error handling cases.
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I noticed this when I was doing the fsync stuff, we allocate split extents if we
drop an extent range that is in the middle of an existing extent. This BUG()'s
if we fail to allocate memory, but the fact is this is just a cache, we will
just regenerate the cache if we need it, the important part is that we free the
range we are given. This can be done without allocations, so if we fail to
allocate splits just skip the splitting stage and free our em and look for more
extents to drop. This also makes btrfs_drop_extent_cache a void since nobody
was checking the return value anyway. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Josef has suggested that this is not necessary. Removing it also avoids
this lockdep splat (after the new sb_internal locking stuff was added):
[ 604.090449] ======================================================
[ 604.114819] [ INFO: possible circular locking dependency detected ]
[ 604.139262] 3.6.0-rc2-ceph-00144-g463b030 #1 Not tainted
[ 604.162193] -------------------------------------------------------
[ 604.186139] btrfs-cleaner/6669 is trying to acquire lock:
[ 604.209555] (sb_internal#2){.+.+..}, at: [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
[ 604.257100]
[ 604.257100] but task is already holding lock:
[ 604.300366] (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
[ 604.352989]
[ 604.352989] which lock already depends on the new lock.
[ 604.352989]
[ 604.427104]
[ 604.427104] the existing dependency chain (in reverse order) is:
[ 604.478493]
[ 604.478493] -> #1 (&fs_info->cleanup_work_sem){.+.+..}:
[ 604.529313] [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
[ 604.559621] [<ffffffff81632b69>] down_read+0x39/0x4e
[ 604.589382] [<ffffffffa004db98>] btrfs_lookup_dentry+0x218/0x550 [btrfs]
[ 604.596161] btrfs: unlinked 1 orphans
[ 604.675002] [<ffffffffa006aadd>] create_subvol+0x62d/0x690 [btrfs]
[ 604.708859] [<ffffffffa006d666>] btrfs_mksubvol.isra.52+0x346/0x3a0 [btrfs]
[ 604.772466] [<ffffffffa006d7f2>] btrfs_ioctl_snap_create_transid+0x132/0x190 [btrfs]
[ 604.842245] [<ffffffffa006d8ae>] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs]
[ 604.912852] [<ffffffffa00708ae>] btrfs_ioctl+0x138e/0x1990 [btrfs]
[ 604.951888] [<ffffffff8118e9b8>] do_vfs_ioctl+0x98/0x560
[ 604.989961] [<ffffffff8118ef11>] sys_ioctl+0x91/0xa0
[ 605.026628] [<ffffffff8163d569>] system_call_fastpath+0x16/0x1b
[ 605.064404]
[ 605.064404] -> #0 (sb_internal#2){.+.+..}:
[ 605.126832] [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
[ 605.163671] [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
[ 605.200228] [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
[ 605.236818] [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
[ 605.274029] [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
[ 605.340520] [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
[ 605.378720] [<ffffffff811972c8>] evict+0xb8/0x1c0
[ 605.416057] [<ffffffff811974d5>] iput+0x105/0x210
[ 605.452373] [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
[ 605.521627] [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
[ 605.560520] [<ffffffff810791ee>] kthread+0xae/0xc0
[ 605.598094] [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
[ 605.636499]
[ 605.636499] other info that might help us debug this:
[ 605.636499]
[ 605.736504] Possible unsafe locking scenario:
[ 605.736504]
[ 605.801931] CPU0 CPU1
[ 605.835126] ---- ----
[ 605.867093] lock(&fs_info->cleanup_work_sem);
[ 605.898594] lock(sb_internal#2);
[ 605.931954] lock(&fs_info->cleanup_work_sem);
[ 605.965359] lock(sb_internal#2);
[ 605.994758]
[ 605.994758] *** DEADLOCK ***
[ 605.994758]
[ 606.075281] 2 locks held by btrfs-cleaner/6669:
[ 606.104528] #0: (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b5d5>] cleaner_kthread+0x95/0x120 [btrfs]
[ 606.165626] #1: (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
[ 606.231297]
[ 606.231297] stack backtrace:
[ 606.287723] Pid: 6669, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00144-g463b030 #1
[ 606.347823] Call Trace:
[ 606.376184] [<ffffffff8162a77c>] print_circular_bug+0x1fb/0x20c
[ 606.409243] [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
[ 606.441343] [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[ 606.474583] [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
[ 606.505934] [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[ 606.539429] [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0
[ 606.571719] [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
[ 606.603498] [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[ 606.637405] [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[ 606.670165] [<ffffffff81172e75>] ? kmem_cache_alloc+0xb5/0x160
[ 606.702144] [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
[ 606.735562] [<ffffffffa00256a6>] ? block_rsv_add_bytes+0x56/0x80 [btrfs]
[ 606.769861] [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
[ 606.804575] [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
[ 606.838756] [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40
[ 606.872010] [<ffffffff811972c8>] evict+0xb8/0x1c0
[ 606.903800] [<ffffffff811974d5>] iput+0x105/0x210
[ 606.935416] [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
[ 606.970510] [<ffffffffa003b5d5>] ? cleaner_kthread+0x95/0x120 [btrfs]
[ 607.005648] [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
[ 607.040724] [<ffffffffa003b540>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs]
[ 607.104740] [<ffffffff810791ee>] kthread+0xae/0xc0
[ 607.137119] [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10
[ 607.169797] [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
[ 607.202472] [<ffffffff81635430>] ? retint_restore_args+0x13/0x13
[ 607.235884] [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0
[ 607.268731] [<ffffffff8163e740>] ? gs_change+0x13/0x13
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| | |
We expect current->journal_info to point to the trans handle we are
committing.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The freeze rwsem is taken by sb_start_intwrite() and dropped during the
commit_ or end_transaction(). In the async case, that happens in a worker
thread. Tell lockdep the calling thread is releasing ownership of the
rwsem and the async thread is picking it up.
XFS plays the same trick in fs/xfs/xfs_aops.c.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| | |
This patch adds hole punching via fallocate. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I audited all users of btrfs_drop_extents and found that nobody actually uses
the hint_byte argument. I'm sure it was used for something at some point but
it's not used now, and the way the pinning works the disk bytenr would never be
immediately useful anyway so lets just remove it. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is based on Josef's "Btrfs: turbo charge fsync".
If an inode is a BTRFS_INODE_NODATASUM one, we don't need to look for csum
items any more.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is based on Josef's "Btrfs: turbo charge fsync".
The current btrfs checks if an inode is in log by comparing
root's last_log_commit to inode's last_sub_trans[2].
But the problem is that this root->last_log_commit is shared among
inodes.
Say we have N inodes to be logged, after the first inode,
root's last_log_commit is updated and the N-1 remained files will
be skipped.
This fixes the bug by keeping a local copy of root's last_log_commit
inside each inode and this local copy will be maintained itself.
[1]: we regard each log transaction as a subset of btrfs's transaction,
i.e. sub_trans
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
| |
| |
| |
| |
| |
| |
| | |
If we add a new orphan item, we should increase the atomic counter,
not decrease it. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is based on Josef's "Btrfs: turbo charge fsync".
The above Josef's patch performs very good in random sync write test,
because we won't have too much extents to merge.
However, it does not performs good on the test:
dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync
The reason is when we do sequencial sync write, we need to merge the
current extent just with the previous one, so that we can get accumulated
extents to log:
A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ...
So we'll have to flush more and more checksum into log tree, which is the
bottleneck according to my tests.
But we can avoid this by telling fsync the real extents that are needed
to be logged.
With this, I did the above dd sync write test (size=50m),
w/o (orig) w/ (josef's) w/ (this)
SATA 104KB/s 109KB/s 121KB/s
ramdisk 1.5MB/s 1.5MB/s 10.7MB/s (613%)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We will stop and restart a transaction every time we move to a different leaf
when truncating a file. This is for enospc reasons, but really we could
probably get away with doing this a little better by actually working until we
hit an ENOSPC. So add a ->failfast flag to the block_rsv and set it when we do
truncates which will fail as soon as the block rsv runs out of space, and then
at that point we can stop and restart the transaction and refill the block rsv
and carry on. This will make rm'ing of a file with lots of extents a bit
faster. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is based on Josef's "Btrfs: turbo charge fsync".
We should cleanup those extents after we've finished logging inode,
otherwise we may do redundant work on them.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I hit this a couple times while working on my fsync patch (all my bugs, not
normal operation), but with my new stuff we could have new errors from cases
I have not encountered, so instead of BUG()'ing we should be WARN()'ing so
that we are notified there is a problem but the user doesn't lose their
data. We can easily commit the transaction in the case that the tree
logging fails and still be fine, so let's try and be as nice to the user as
possible. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
At least for the vm workload. Currently on fsync we will
1) Truncate all items in the log tree for the given inode if they exist
and
2) Copy all items for a given inode into the log
The problem with this is that for things like VMs you can have lots of
extents from the fragmented writing behavior, and worst yet you may have
only modified a few extents, not the entire thing. This patch fixes this
problem by tracking which transid modified our extent, and then when we do
the tree logging we find all of the extents we've modified in our current
transaction, sort them and commit them. We also only truncate up to the
xattrs of the inode and copy that stuff in normally, and then just drop any
extents in the range we have that exist in the log already. Here are some
numbers of a 50 meg fio job that does random writes and fsync()s after every
write
Original Patched
SATA drive 82KB/s 140KB/s
Fusion drive 431KB/s 2532KB/s
So around 2-6 times faster depending on your hardware. There are a few
corner cases, for example if you truncate at all we have to do it the old
way since there is no way to be sure what is in the log is ok. This
probably could be done smarter, but if you write-fsync-truncate-write-fsync
you deserve what you get. All this work is in RAM of course so if your
inode gets evicted from cache and you read it in and fsync it we'll do it
the slow way if we are still in the same transaction that we last modified
the inode in.
The biggest cool part of this is that it requires no changes to the recovery
code, so if you fsync with this patch and crash and load an old kernel, it
will run the recovery and be a-ok. I have tested this pretty thoroughly
with an fsync tester and everything comes back fine, as well as xfstests.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While working on my fsync patch my fsync tester kept hitting mismatching
md5sums when I would randomly write to a prealloc'ed region, syncfs() and
then write to the prealloced region some more and then fsync() and then
immediately reboot. This is because the tree logging code will skip writing
csums for file extents who's generation is less than the current running
transaction. When we mark extents as written we haven't been updating their
generation so they were always being skipped. This wouldn't happen if you
were to preallocate and then write in the same transaction, but if you for
example prealloced a VM you could definitely run into this problem. This
patch makes my fsync tester happy again. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Swinging this pendulum back the other way. We've been allocating chunks up
to 2% of the disk no matter how much we actually have allocated. So instead
fix this calculation to only allocate chunks if we have more than 80% of the
space available allocated. Please test this as it will likely cause all
sorts of ENOSPC problems to pop up suddenly. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is a completely impossible situation to hit where you can preallocate
a file, fsync it, write into the preallocated region, have the transaction
commit twice and then fsync and then immediately lose power and lose all of
the contents of the write. This patch fixes this just so I feel better
about the situation and because it is lightweight, we just update the
last_trans when we finish an ordered IO and we don't update the inode
itself. This way we are completely safe and I feel better. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The btrfs send code was assuming the offset of the file item into the
extent translated to bytes on disk. If we're compressed, this isn't
true, and so it was off into extents owned by other files.
It was also improperly handling inline extents. This solves a crash
where we may have gone past the end of the file extent item by not
testing early enough for an inline extent. It also solves problems
where we have a whole between the end of the inline item and the start
of the full extent.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We can't do the deleted/reused logic for top/root inodes as it would
create a stream that tries to delete and recreate the root dir.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
We have to ignore inode/space cache objects in send/receive.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
We need to pass the root that we determined earlier to iterate_inode_ref.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
Used the wrong compare operator here.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The previous check was working fine, but this check should be
easier to read. Also, we could theoritically have some exotic
bugs with the previous checks.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
Both were leaked in case of error.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
A leftover from older code and unused now.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
Doing some code cleanups as suggested by Arne.
Changes do not change any logic.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| | |
As the subject already said, add/fix comments.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Updating send_progress in process_recorded_refs was not correct.
It got updated too early in the cur_inode_new_gen case.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Btrfs send/receive uses the aux field to store inode numbers. On
32 bit machines this may become a problem.
Also fix all users of ulist_add and ulist_add_merged.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can't easily use the index of the radix tree for inums as the
radix tree uses 32bit indexes on 32bit kernels. For 32bit kernels,
we now use the lower 32bit of the inum as index and an additional
list to store multiple entries per radix tree entry.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
When everything is done, name_cache_free is called which however
forgot to call kfree on the cache entries.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we break, we may miss the clone from send_root which we prefer
over all other clones.
Commit is a result of Arne's review.
Reported-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Don't have a seperate return path for the mentioned case. Now
we do the same "take lowest inode/offset" logic for all found clones.
Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Make sure to never get in trouble due to the backref_ctx
which was on the stack before.
Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The new name should be easier to understand/read.
Commit is a result of Arne's review.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| | |
use_list is a leftover and unused.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We only added the parent for the new position of a moved dir.
We also need to add the old parent of the moved dir.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| | |
fs_path_remove is not used at the moment due to a previous patch.
Remove it for now (with #if 0) to avoid compile warnings.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We missed that check which resultet in all refs with the same name
being reported as first_ref.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the current inodes inum is smaller then the inum of the
parent directory strange things were happending due to wrong
path resolution and other bugs. Fix this with a new approach
for the problem.
Reported-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| | |
We need rdev in the next commit.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().
Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.
Now device drivers can implement this method and obtain nonlinear vma support.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
All increments and decrements are under the same spinlock - have to be,
since they need to protect the radix_tree it's found in. Just use
int, no need to wank with kref...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|