aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* push BKL down into ->put_superChristoph Hellwig2009-06-11
| | | | | | | | | | | | | | | | | Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* No need to do lock_super() for exclusion in generic_shutdown_super()Al Viro2009-06-11
| | | | | | | | | | | | | | We can't run into contention on it. All other callers of lock_super() either hold s_umount (and we have it exclusive) or hold an active reference to superblock in question, which prevents the call of generic_shutdown_super() while the reference is held. So we can replace lock_super(s) with get_fs_excl() in generic_shutdown_super() (and corresponding change for unlock_super(), of course). Since ext4 expects s_lock held for its put_super, take lock_super() into it. The rest of filesystems do not care at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Trim a bit of crap from fs.hAl Viro2009-06-11
| | | | | | do_remount_sb() is fs/internal.h fodder, fsync_no_super() is long gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Make sure that all callers of remount hold s_umount exclusiveAl Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* enforce ->sync_fs is only called for rw superblockChristoph Hellwig2009-06-11
| | | | | | | | | | | | | | Make sure a superblock really is writeable by checking MS_RDONLY under s_umount. sync_filesystems needed some re-arragement for that, but all but one sync_filesystem caller had the correct locking already so that we could add that check there. cachefiles grew s_umount locking. I've also added a WARN_ON to sync_filesystem to assert this for future callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* cleanup sync_supersChristoph Hellwig2009-06-11
| | | | | | | | | | | Merge the write_super helper into sync_super and move the check for ->write_super earlier so that we can avoid grabbing a reference to a superblock that doesn't have it. While we're at it also add a little comment documenting sync_supers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* dcache: extrace and use d_unlinked()Alexey Dobriyan2009-06-11
| | | | | | | | | d_unlinked() will be used in middle-term to ban checkpointing when opened but unlinked file is detected, and in long term, to detect such situation and special case on it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* remove ->write_super call in generic_shutdown_superChristoph Hellwig2009-06-11
| | | | | | | | | | | | | | | | | | | | | We just did a full fs writeout using sync_filesystem before, and if that's not enough for the filesystem it can perform it's own writeout in ->put_super, which many filesystems already do. Move a call to foofs_write_super into every foofs_put_super for now to guarantee identical behaviour until it's cleaned up by the individual filesystem maintainers. Exceptions: - affs already has identical copy & pasted code at the beginning of affs_put_super so no need to do it twice. - xfs does the right thing without it and I have changes pending for the xfs tree touching this are so I don't really need conflicts here.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* qnx4: remove ->write_superChristoph Hellwig2009-06-11
| | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ocfs2: remove ->write_super and stop maintaining ->s_dirtChristoph Hellwig2009-06-11
| | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* gfs2: remove ->write_super and stop maintaining ->s_dirtChristoph Hellwig2009-06-11
| | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ext3: remove ->write_super and stop maintaining ->s_dirtChristoph Hellwig2009-06-11
| | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* btrfs: remove ->write_super and stop maintaining ->s_dirtChristoph Hellwig2009-06-11
| | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* quota: Introduce writeout_quota_sb() (version 4)Jan Kara2009-06-11
| | | | | | | | | Introduce this function which just writes all the quota structures but avoids all the syncing and cache pruning work to expose quota structures to userspace. Use this function from __sync_filesystem when wait == 0. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* quota: cleanup dquota sync functions (version 4)Christoph Hellwig2009-06-11
| | | | | | | | | | | | | | | | | | | | Currently the VFS calls vfs_dq_sync to sync out disk quotas for a given superblock. This is a small wrapper around sync_dquots which for the case of a non-NULL superblock is a small wrapper around quota_sync_sb. Just make quota_sync_sb global (rename it to sync_quota_sb) and call it directly. Also call it directly for those cases in quota.c that have a superblock and leave sync_dquots purely an iterator over sync_quota_sb and remove it's superblock argument. To make this nicer move the check for the lack of a quota_sync method from the callers into sync_quota_sb. [folded build fix from Alexander Beregalov <a.beregalov@gmail.com>] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Rename fsync_super() to sync_filesystem() (version 4)Jan Kara2009-06-11
| | | | | | | | Rename the function so that it better describe what it really does. Also remove the unnecessary include of buffer_head.h. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Move syncing code from super.c to sync.c (version 4)Jan Kara2009-06-11
| | | | | | | | | | Move sync_filesystems(), __fsync_super(), fsync_super() from super.c to sync.c where it fits better. [build fixes folded] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Make sys_sync() use fsync_super() (version 4)Jan Kara2009-06-11
| | | | | | | | | | | | | | | | | | It is unnecessarily fragile to have two places (fsync_super() and do_sync()) doing data integrity sync of the filesystem. Alter __fsync_super() to accommodate needs of both callers and use it. So after this patch __fsync_super() is the only place where we gather all the calls needed to properly send all data on a filesystem to disk. Nice bonus is that we get a complete livelock avoidance and write_supers() is now only used for periodic writeback of superblocks. sync_blockdevs() introduced a couple of patches ago is gone now. [build fixes folded] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Make __fsync_super() a static function (version 4)Jan Kara2009-06-11
| | | | | | | | | | __fsync_super() does the same thing as fsync_super(). So change the only caller to use fsync_super() and make __fsync_super() static. This removes unnecessarily duplicated call to sync_blockdev() and prepares ground for the changes to __fsync_super() in the following patches. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Call ->sync_fs() even if s_dirt is 0 (version 4)Jan Kara2009-06-11
| | | | | | | | | | | sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then ->sync_fs() isn't called. This does not really make much sence since s_dirt is generally used by a filesystem to mean that ->write_super() needs to be called. But ->sync_fs() does different things. I even suspect that some filesystems (btrfs?) sets s_dirt just to fool this logic. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Fix sys_sync() and fsync_super() reliability (version 4)Jan Kara2009-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, do_sync() called: sync_inodes(0); sync_supers(); sync_filesystems(0); sync_filesystems(1); sync_inodes(1); This ordering makes it kind of hard for filesystems as sync_inodes(0) need not submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2 and others are hit by this) when racing e.g. with background writeback. A similar problem hits also other filesystems (e.g. ext2) because of write_supers() being called before the sync_inodes(1). Change the ordering of calls in do_sync() - this requires a new function sync_blockdevs() to preserve the property that block devices are always synced after write_super() / sync_fs() call. The same issue is fixed in __fsync_super() function used on umount / remount read-only. [AV: build fixes] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* remove s_async_listChristoph Hellwig2009-06-11
| | | | | | | | | | Remove the unused s_async_list in the superblock, a leftover of the broken async inode deletion code that leaked into mainline. Having this in the middle of the sync/unmount path is not helpful for the following cleanups. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: move mark_files_ro into file_table.cnpiggin@suse.de2009-06-11
| | | | | | | | | | | This function walks the s_files lock, and operates primarily on the files in a superblock, so it better belongs here (eg. see also fs_may_remount_ro). [AV: ... and it shouldn't be static after that move] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: introduce mnt_clone_writenpiggin@suse.de2009-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up lmbench lat_mmap test by about another 2% after the first patch. Before: avg = 462.286 std = 5.46106 After: avg = 453.12 std = 9.58257 (50 runs of each, stddev gives a reasonable confidence) It does this by introducing mnt_clone_write, which avoids some heavyweight operations of mnt_want_write if called on a vfsmount which we know already has a write count; and mnt_want_write_file, which can call mnt_clone_write if the file is open for write. After these two patches, mnt_want_write and mnt_drop_write go from 7% on the profile down to 1.3% (including mnt_clone_write). [AV: mnt_want_write_file() should take file alone and derive mnt from it; not only all callers have that form, but that's the only mnt about which we know that it's already held for write if file is opened for write] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: mnt_want_write speedupnpiggin@suse.de2009-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it. A microbenchmark yes, but it exercises some important paths in the mm. Before: avg = 501.9 std = 14.7773 After: avg = 462.286 std = 5.46106 (50 runs of each, stddev gives a reasonable confidence, but there is quite a bit of variation there still) It does this by removing the complex per-cpu locking and counter-cache and replaces it with a percpu counter in struct vfsmount. This makes the code much simpler, and avoids spinlocks (although the msync is still pretty costly, unfortunately). It results in about 900 bytes smaller code too. It does increase the size of a vfsmount, however. It should also give a speedup on large systems if CPUs are frequently operating on different mounts (because the existing scheme has to operate on an atomic in the struct vfsmount when switching between mounts). But I'm most interested in the single threaded path performance for the moment. [AV: minor cleanup] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Move junk from proc_fs.h to fs/proc/internal.hAl Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch lookup_mnt()Al Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch follow_mount()Al Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch follow_down()Al Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Switch collect_mounts() to struct pathAl Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch follow_up() to struct pathAl Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch rqst_exp_parent()Al Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch rqst_exp_get_by_name()Al Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch exp_parent() to struct pathAl Viro2009-06-11
| | | | | | | ... and lose the always-NULL last argument (non-NULL case had been split off a while ago). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nfsd struct path use: exp_get_by_name()Al Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Don't bother with check_mnt() in do_add_mount() on shrinkable onesAl Viro2009-06-11
| | | | | | | | These guys are what we add as submounts; checks for "is that attached in our namespace" are simply irrelevant for those and counterproductive for use of private vfsmount trees a-la what NFS folks want. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Make vfs_path_lookup() use starting point as rootAl Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Cache root in nameidataAl Viro2009-06-11
| | | | | | | | | | | | New field: nd->root. When pathname resolution wants to know the root, check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise copy current->fs->root there. After path_walk() is finished, we check if we'd got a cached value in nd->root and drop it. Before calling path_walk() we should either set nd->root.mnt to NULL *or* copy (and pin down) some path to nd->root. In the latter case we won't be looking at current->fs->root at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Preparations to caching root in path_walk()Al Viro2009-06-11
| | | | | | | | | | | Split do_path_lookup(), opencode the call from do_filp_open() do_filp_open() is the only caller of do_path_lookup() that cares about root afterwards (it keeps resolving symlinks on O_CREAT path after it'd done LOOKUP_PARENT walk). So when we start caching fs->root in path_walk(), it'll need a different treatment. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Get rid of path_lookup in autofs4Al Viro2009-06-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs: allow exposing privroot w/ xattrs enabledJeff Mahoney2009-06-11
| | | | | | | This patch adds an -oexpose_privroot option to allow access to the privroot. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2009-06-11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (23 commits) Btrfs: fix extent_buffer leak during tree log replay Btrfs: fix oops when btrfs_inherit_iflags called with a NULL dir Btrfs: fix -o nodatasum printk spelling Btrfs: check duplicate backrefs for both data and metadata Btrfs: init worker struct fields before kthread-run Btrfs: pin buffers during write_dev_supers Btrfs: avoid races between super writeout and device list updates Fix btrfs when ACLs are configured out Btrfs: fdatasync should skip metadata writeout Btrfs: remove crc32c.h and use libcrc32c directly. Btrfs: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION Btrfs: autodetect SSD devices Btrfs: add mount -o ssd_spread to spread allocations out Btrfs: avoid allocation clusters that are too spread out Btrfs: Add mount -o nossd Btrfs: avoid IO stalls behind congested devices in a multi-device FS Btrfs: don't allow WRITE_SYNC bios to starve out regular writes Btrfs: fix metadata dirty throttling limits Btrfs: reduce mount -o ssd CPU usage Btrfs: balance btree more often ...
| * Btrfs: fix extent_buffer leak during tree log replayChris Mason2009-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | During tree log replay, we read in the tree log roots, process them and then free them. A recent change takes an extra reference on the root node of the tree when the root is read in, and stores that reference in root->commit_root. This reference was not being freed, leaving us with one buffer pinned in ram for each subvol with a tree log root after a crash. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix oops when btrfs_inherit_iflags called with a NULL dirChris Mason2009-06-11
| | | | | | | | | | | | This happens during subvol creation. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix -o nodatasum printk spellingChris Mason2009-06-11
| | | | | | | | | | | | It was printing nodatacsum, which was not the correct option name. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: check duplicate backrefs for both data and metadataYan Zheng2009-06-11
| | | | | | | | | | | | | | | | | | | | | | | | lookup_inline_extent_backref only checks for duplicate backref for data extents. It assumes backrefs for tree block never conflict. This patch makes lookup_inline_extent_backref check for duplicate backrefs for both data and tree block, so that we can detect potential bug earlier. This is a safety check, strictly speaking it is not required. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: init worker struct fields before kthread-runShin Hong2009-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug which may result race condition between btrfs_start_workers() and worker_loop(). btrfs_start_workers() executed in a parent thread writes on workers->worker and worker_loop() in a child thread reads workers->worker. However, there is no synchronization enforcing the order of two operations. This patch makes btrfs_start_workers() fill workers->worker before it starts a child thread with worker_loop() Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: pin buffers during write_dev_supersHisashi Hifumi2009-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write_dev_supers is called in sequence. First is it called with wait == 0, which starts IO on all of the super blocks for a given device. Then it is called with wait == 1 to make sure they all reach the disk. It doesn't currently pin the buffers between the two calls, and it also assumes the buffers won't go away between the two calls, leading to an oops if the VM manages to free the buffers in the middle of the sync. This fixes that assumption and updates the code to return an error if things are not up to date when the wait == 1 run is done. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: avoid races between super writeout and device list updatesChris Mason2009-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On multi-device filesystems, btrfs writes supers to all of the devices before considering a sync complete. There wasn't any additional locking between super writeout and the device list management code because device management was done inside a transaction and super writeout only happened with no transation writers running. With the btrfs fsync log and other async transaction updates, this has been racey for some time. This adds a mutex to protect the device list. The existing volume mutex could not be reused due to transaction lock ordering requirements. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Fix btrfs when ACLs are configured outAl Viro2009-06-10
| | | | | | | | | | | | | | | | ... otherwise generic_permission() will allow *anything* for all files you don't own and that have some group permissions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>