| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add description of d_lock handling to comments over prune_one_dentry().
- It has three callsites - uninline it, saving 200 bytes of text.
Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The race is that the shrink_dcache_memory shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.
If it does, then it will call in to iput on the inode while the dentry is
no longer able to be found by the umounting process. If iput takes a
while, generic_shutdown_super could get all the way though
shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
ever waiting on this particular inode.
Eventually the superblock gets freed anyway and if the iput tried to touch
it (which some filesystems certainly do), it will lose. The promised
"Self-destruct in 5 seconds" doesn't lead to a nice day.
The race is closed by holding s_umount while calling prune_one_dentry on
someone else's dentry. As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any
such active prune_one_dentry completes.
This requires that prune_dcache *knows* which filesystem (if any) it is
doing the prune on behalf of so that it can be careful of other
filesystems. shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.
shrink_dcache_anon is now passed a super_block rather than the s_anon list
out of the superblock, so it can get the s_anon list itself, and can pass
the superblock down to prune_dcache.
If prune_dcache finds a dentry that it cannot free, it leaves it where it
is (at the tail of the list) and exits, on the assumption that some other
thread will be removing that dentry soon. To try to make sure that some
work gets done, a limited number of dnetries which are untouchable are
skipped over while choosing the dentry to work on.
I believe this race was first found by Kirill Korotaev.
Cc: Jan Blunck <jblunck@suse.de>
Acked-by: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the steal_locks() function.
steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.
In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().
The steal_locks() function has an effect on applications doing:
clone(CLONE_FILES)
/* in child */
lock
execve
lock
POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".
According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.
Apps using NPTL are not affected, since all other threads are killed before
execve.
Apps using LinuxThreads are only affected if they
- have multiple threads during exec (LinuxThreads doesn't kill other
threads, the app may do it with pthread_kill_other_threads_np())
- rely on POSIX locks being inherited across exec
Both conditions are documented, but not their interaction.
Apps using clone() natively are affected if they
- use clone(CLONE_FILES)
- rely on POSIX locks being inherited across exec
The above scenarios are unlikely, but possible.
If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.
That would add more complexity though, so this solution seems to be
preferred by most people.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This race became a cause of oops, and can reproduce by the following.
while true; do
dd if=/dev/zero of=/dev/.static/dev/hdg1 bs=512 count=1000 & sync
done
This race condition was between __sync_single_inode() and iput().
cpu0 (fs's inode) cpu1 (bdev's inode)
----------------- -------------------
close("/dev/hda2")
[...]
__sync_single_inode()
/* copy the bdev's ->i_mapping */
mapping = inode->i_mapping;
generic_forget_inode()
bdev_clear_inode()
/* restre the fs's ->i_mapping */
inode->i_mapping = &inode->i_data;
/* bdev's inode was freed */
destroy_inode(inode);
if (wait) {
/* dereference a freed bdev's mapping->host */
filemap_fdatawait(mapping); /* Oops */
Since __sync_single_inode() is only taking a ref-count of fs's inode, the
another process can be close() and freeing the bdev's inode while writing
fs's inode. So, __sync_signle_inode() accesses the freed ->i_mapping,
oops.
This patch takes a ref-count on the bdev's inode for the fs's inode before
setting a ->i_mapping, and the clear_inode() of the fs's inode does iput() on
the bdev's inode. So if the fs's inode is still living, bdev's inode
shouldn't be freed.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
|
|
|
|
|
| |
Many thanks to Pauline Ng for the detailed bug report and analysis!
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* git://oss.sgi.com:8090/xfs-2.6: (43 commits)
[XFS] Remove files from the build that are now unused.
[XFS] Fix a Makefile issue related to exports.o handling.
[XFS] Remove version 1 directory code. Never functioned on Linux, just
[XFS] Map EFSCORRUPTED to an actual error code, not just a made up one
[XFS] Kill direct access to ->count in valusema(); all we ever use it for
[XFS] Remove unneeded conditional code on NFS export interface related
[XFS] Remove an incorrect use of unlikely() on a relatively likely code
[XFS] Push some common code out of write path into core XFS code for
[XFS] Remove unnecessary local from open_exec dmapi path.
[XFS] Minor XFS documentation updates.
[XFS] Fix broken const use inside local suffix_strtoul routine.
[XFS] Fix nused counter. It's currently getting set to -1 rather than
[XFS] Fix mismerge of the fs_writable cleanup patch causing a freeze/thaw
[XFS] Fix up debug code so that bulkstat wont generate thousands of
[XFS] Remove unused parameter from di2xflags routine.
[XFS] Cleanup a missed porting conversion, and freezing.
[XFS] Resolve a namespace collision on remaining vtypes for FreeBSD
[XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters.
[XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters.
[XFS] statvfs component of directory/project quota support, code
...
|
| |\ |
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pure bloat.
SGI-PV: 952969
SGI-Modid: xfs-linux-melb:xfs-kern:26251a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
(990). Turns out some ye-olde unices used EUCLEAN as
Filesystem-needs-cleaning, so now we use that too.
SGI-PV: 953954
SGI-Modid: xfs-linux-melb:xfs-kern:26286a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
is check if semaphore is actually locked, which can be trivially done in
portable way. Code gets more reabable, while we are at it...
SGI-PV: 953915
SGI-Modid: xfs-linux-melb:xfs-kern:26274a
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
code paths.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26250a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
path.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26249a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sharing.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26248a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26247a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26201a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
getting decremented by 1. Since nused never reaches 0, the "if
(!free->hdr.nused)" check in xfs_dir2_leafn_remove() fails every time and
xfs_dir2_shrink_inode() doesn't get called when it should. This causes
extra blocks to be left on an empty directory and the directory in unable
to be converted back to inline extent mode.
SGI-PV: 951958
SGI-Modid: xfs-linux-melb:xfs-kern:211382a
Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
test hang.
SGI-PV: 953563
SGI-Modid: xfs-linux-melb:xfs-kern:26182a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
fsstress warnings.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26111a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 904192
SGI-Modid: xfs-linux-melb:xfs-kern:26110a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26109a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
porters.
SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26108a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26107a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 9533338
SGI-Modid: xfs-linux-melb:xfs-kern:26106a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
originally by Glen.
SGI-PV: 932952
SGI-Modid: xfs-linux-melb:xfs-kern:26105a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
interface.
SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26103a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26102a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26101a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
truncate down followed by delayed allocation (buffered writes) - worst
case scenario for the notorious NULL files problem. This reduces the
window where we are exposed to that problem significantly.
SGI-PV: 917976
SGI-Modid: xfs-linux-melb:xfs-kern:26100a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26099a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26097a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
shutdown vop flags consistent with sync vop flags declarations too.
SGI-PV: 939911
SGI-Modid: xfs-linux-melb:xfs-kern:26096a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
layers.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26095a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
anymore here.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26094a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
init_rwsem() has no return value. This is not a problem if init_rwsem()
is a function, but it's a problem if it's a do { ... } while (0) macro.
(which lockdep introduces)
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26082a
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
logged version of di_next_unlinked which is actually always stored in the
correct ondisk format. This was pointed out to us by Shailendra Tripathi.
And is evident in the xfs qa test of 121.
SGI-PV: 953263
SGI-Modid: xfs-linux-melb:xfs-kern:26044a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
transaction completion from marking the inode dirty while it is being
cleaned up on it's way out of the system.
SGI-PV: 952967
SGI-Modid: xfs-linux-melb:xfs-kern:26040a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
64bit kernels allow recovery to handle both versions and do the necessary
decoding
SGI-PV: 952214
SGI-Modid: xfs-linux-melb:xfs-kern:26011a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
transaction within each such operation may involve multiple locking of AGF
buffer. While the freeing extent function has sorted the extents based on
AGF number before entering into transaction, however, when the file system
space is very limited, the allocation of space would try every AGF to get
space allocated, this could potentially cause out-of-order locking, thus
deadlock could happen. This fix mitigates the scarce space for allocation
by setting aside a few blocks without reservation, and avoid deadlock by
maintaining ascending order of AGF locking.
SGI-PV: 947395
SGI-Modid: xfs-linux-melb:xfs-kern:210801a
Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 953061
SGI-Modid: xfs-linux-melb:xfs-kern:25986a
Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
mmap only.
SGI-PV: 952736
SGI-Modid: xfs-linux-melb:xfs-kern:25922a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 907752
SGI-Modid: xfs-linux-melb:xfs-kern:25921a
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 952291
SGI-Modid: xfs-linux-melb:xfs-kern:209807a
Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ATTR_NOLOCK flag, but this was split off some time ago, as ATTR_DMI needed
to be used separately. Two asserts were added to guard correctness of the
code during the transition. These are no longer required.
SGI-PV: 952145
SGI-Modid: xfs-linux-melb:xfs-kern:209633a
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 943272
SGI-Modid: xfs-linux-melb:xfs-kern:25808a
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 943272
SGI-Modid: xfs-linux-melb:xfs-kern:25807a
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
SGI-PV: 943272
SGI-Modid: xfs-linux-melb:xfs-kern:25806a
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
the range spanned by modifications to the in-core extent map. Add
XFS_BUNMAPI() and XFS_SWAP_EXTENTS() macros that call xfs_bunmapi() and
xfs_swap_extents() via the ioops vector. Change all calls that may modify
the in-core extent map for the data fork to go through the ioops vector.
This allows a cache of extent map data to be kept in sync.
SGI-PV: 947615
SGI-Modid: xfs-linux-melb:xfs-kern:209226a
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
|