aboutsummaryrefslogtreecommitdiffstats
path: root/fs/namei.c
Commit message (Collapse)AuthorAge
* fs: brlock vfsmount_lockNick Piggin2010-08-18
| | | | | | | | | | | | | | | | | | | | | | | | | | fs: brlock vfsmount_lock Use a brlock for the vfsmount lock. It must be taken for write whenever modifying the mount hash or associated fields, and may be taken for read when performing mount hash lookups. A new lock is added for the mnt-id allocator, so it doesn't need to take the heavy vfsmount write-lock. The number of atomics should remain the same for fastpath rlock cases, though code would be slightly slower due to per-cpu access. Scalability is not not be much improved in common cases yet, due to other locks (ie. dcache_lock) getting in the way. However path lookups crossing mountpoints should be one case where scalability is improved (currently requiring the global lock). The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node Altix system (high latency to remote nodes), a simple umount microbenchmark (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it took 6.8s, afterwards took 7.1s, about 5% slower. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: remove extra lookup in __lookup_hashNick Piggin2010-08-18
| | | | | | | | | | | | | | | | | | | | | | | | | fs: remove extra lookup in __lookup_hash Optimize lookup for create operations, where no dentry should often be common-case. In cases where it is not, such as unlink, the added overhead is much smaller than the removed. Also, move comments about __d_lookup racyness to the __d_lookup call site. d_lookup is intuitive; __d_lookup is what needs commenting. So in that same vein, add kerneldoc comments to __d_lookup and clean up some of the comments: - We are interested in how the RCU lookup works here, particularly with renames. Make that explicit, and point to the document where it is explained in more detail. - RCU is pretty standard now, and macros make implementations pretty mindless. If we want to know about RCU barrier details, we look in RCU code. - Delete some boring legacy comments because we don't care much about how the code used to work, more about the interesting parts of how it works now. So comments about lazy LRU may be interesting, but would better be done in the LRU or refcount management code. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: dentry allocation consolidationNick Piggin2010-08-18
| | | | | | | | | | fs: dentry allocation consolidation There are 2 duplicate copies of code in dentry allocation in path lookup. Consolidate them into a single function. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: fix do_lookup false negativeNick Piggin2010-08-18
| | | | | | | | | | | | | | | | | | | | fs: fix do_lookup false negative In do_lookup, if we initially find no dentry, we take the directory i_mutex and re-check the lookup. If we find a dentry there, then we revalidate it if needed. However if that revalidate asks for the dentry to be invalidated, we return -ENOENT from do_lookup. What should happen instead is an attempt to allocate and lookup a new dentry. This is probably not noticed because it is rare. It is only reached if a concurrent create races in first (in which case, the dentry probably won't be invalidated anyway), or if the racy __d_lookup has failed due to a false-negative (which is very rare). Fix this by removing code and have it use the normal reval path. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: add helpers to get root and pwdMiklos Szeredi2010-08-11
| | | | | | | | | | | | Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2010-08-10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits) fanotify: use both marks when possible fsnotify: pass both the vfsmount mark and inode mark fsnotify: walk the inode and vfsmount lists simultaneously fsnotify: rework ignored mark flushing fsnotify: remove global fsnotify groups lists fsnotify: remove group->mask fsnotify: remove the global masks fsnotify: cleanup should_send_event fanotify: use the mark in handler functions audit: use the mark in handler functions dnotify: use the mark in handler functions inotify: use the mark in handler functions fsnotify: send fsnotify_mark to groups in event handling functions fsnotify: Exchange list heads instead of moving elements fsnotify: srcu to protect read side of inode and vfsmount locks fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called fsnotify: use _rcu functions for mark list traversal fsnotify: place marks on object in order of group memory address vfs/fsnotify: fsnotify_close can delay the final work in fput fsnotify: store struct file not struct path ... Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
| * fsnotify: use unsigned char * for dentry->d_name.nameEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | fsnotify was using char * when it passed around the d_name.name string internally but it is actually an unsigned char *. This patch switches fsnotify to use unsigned and should silence some pointer signess warnings which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify code as there are still issues with kstrdup and strlen which would pop out needless warnings. Signed-off-by: Eric Paris <eparis@redhat.com>
* | security: make LSMs explicitly mask off permissionsEric Paris2010-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | SELinux needs to pass the MAY_ACCESS flag so it can handle auditting correctly. Presently the masking of MAY_* flags is done in the VFS. In order to allow LSMs to decide what flags they care about and what flags they don't just pass them all and the each LSM mask off what they don't need. This patch should contain no functional changes to either the VFS or any LSM. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
* | LSM: Remove unused arguments from security_path_truncate().Tetsuo Handa2010-08-02
|/ | | | | | | | | | | | When commit be6d3e56a6b9b3a4ee44a0685e39e595073c6f0d "introduce new LSM hooks where vfsmount is available." was proposed, regarding security_path_truncate(), only "struct file *" argument (which AppArmor wanted to use) was removed. But length and time_attrs arguments are not used by TOMOYO nor AppArmor. Thus, let's remove these arguments. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: James Morris <jmorris@namei.org>
* VFS: fix recent breakage of FS_REVAL_DOTNeil Brown2010-05-27
| | | | | | | | | | | | | | | | | | | | | | Commit 1f36f774b22a0ceb7dd33eca626746c81a97b6a5 broke FS_REVAL_DOT semantics. In particular, before this patch, the command ls -l in an NFS mounted directory would always check if the directory on the server had changed and if so would flush and refill the pagecache for the dir. After this patch, the same "ls -l" will repeatedly return stale date until the cached attributes for the directory time out. The following patch fixes this by ensuring the d_revalidate is called by do_last when "." is being looked-up. link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN is not set so nfs_lookup_verify_inode chooses not to do any validation. The following patch restores the original behaviour. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* namei.c : update mnt when it neededHuang Shijie2010-05-21
| | | | | | | update the mnt of the path when it is not equal to the new one. Signed-off-by: Huang Shijie <shijie8@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Fix the regression created by "set S_DEAD on unlink()..." commitAl Viro2010-05-15
| | | | | | | | | | | | | | | | | | | | | | | 1) i_flags simply doesn't work for mount/unlink race prevention; we may have many links to file and rm on one of those obviously shouldn't prevent bind on top of another later on. To fix it right way we need to mark _dentry_ as unsuitable for mounting upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and i_mutex on the inode in question. Set it (with dont_mount(dentry)) in unlink/rmdir/etc., check (with cant_mount(dentry)) in places in namespace.c that used to check for S_DEAD. Setting S_DEAD is still needed in places where we used to set it (for directories getting killed), since we rely on it for readdir/rmdir race prevention. 2) rename()/mount() protection has another bogosity - we unhash the target before we'd checked that it's not a mountpoint. Fixed. 3) ancient bogosity in pivot_root() - we locked i_mutex on the right directory, but checked S_DEAD on the different (and wrong) one. Noticed and fixed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Fix O_NOFOLLOW behavior for paths with trailing slashesJan Kara2010-05-13
| | | | | | | | | | | | | | | | | | | According to specification mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY) should return success but currently it returns ELOOP. This is a regression caused by path lookup cleanup patch series. Fix the code to ignore O_NOFOLLOW in case the provided path has trailing slashes. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Restore LOOKUP_DIRECTORY hint handling in final lookup on open()Al Viro2010-03-26
| | | | | | | Lose want_dir argument, while we are at it - since now nd->flags & LOOKUP_DIRECTORY is equivalent to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-next' into for-linusJiri Kosina2010-03-08
|\ | | | | | | | | | | | | | | | | Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c
| * Fix misspellings of "truly" in comments.Adam Buchbinder2010-02-04
| | | | | | | | | | | | | | Some comments misspell "truly"; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | Fix a dumb typo - use of & instead of &&Al Viro2010-03-06
| | | | | | | | | | | | | | | | | | We managed to lose O_DIRECTORY testing due to a stupid typo in commit 1f36f774b2 ("Switch !O_CREAT case to use of do_last()") Reported-by: Walter Sheets <w41ter@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for_linus' of ↵Linus Torvalds2010-03-05
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
| * | dquot: move dquot initialization responsibility into the filesystemChristoph Hellwig2010-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* | | Switch !O_CREAT case to use of do_last()Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | ... and now we have all intents crap well localized Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Get rid of symlink body copyingAl Viro2010-03-05
| | | | | | | | | | | | | | | | | | | | | | | | Now that nd->last stays around until ->put_link() is called, we can just postpone that ->put_link() in do_filp_open() a bit and don't bother with copying. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Finish pulling of -ESTALE handling to upper level in do_filp_open()Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | | | | Don't bother with path_walk() (and its retry loop); link_path_walk() will do it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Turn do_link spaghetty into a normal loopAl Viro2010-03-05
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Unify exits in O_CREAT handlingAl Viro2010-03-05
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Kill is_link argument of do_last()Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | We set it to 1 iff we return NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Pull handling of LAST_BIND into do_last(), clean up ok: part in do_filp_open()Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | Note that in case of !O_CREAT we know that nd.root has already been given up Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Leave mangled flag only for setting nd.intent.open.flagAl Viro2010-03-05
| | | | | | | | | | | | | | | | | | Nothing else uses it anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Get rid of passing mangled flag to do_last()Al Viro2010-03-05
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Don't pass mangled open_flag to finish_open()Al Viro2010-03-05
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | pull more into do_last()Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | | | | Handling of LAST_DOT/LAST_ROOT/LAST_DOTDOT/terminating slash can be pulled in as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | bail out with ELOOP earlier in do_link loopAl Viro2010-03-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we'd passed through 32 trailing symlinks already, there's no sense following the 33rd - we'll bail out anyway. Better bugger off earlier. It *does* change behaviour, after a fashion - if the 33rd happens to be a procfs-style symlink, original code *would* allow it. This one will not. Cry me a river if that hurts you. Please, do. And post a video of that, while you are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | pull the common predecessors into do_last()Al Viro2010-03-05
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | postpone __putname() until after do_last()Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | | | | | | | Since do_last() doesn't mangle nd->last_name, we can safely postpone __putname() done in handling of trailing symlinks until after the call of do_last() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | unroll do_last: loop in do_filp_open()Al Viro2010-03-05
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Shift releasing nd->root from do_last() to its callerAl Viro2010-03-05
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | gut do_filp_open() a bit more (do_last separation)Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | Brute-force separation of stuff reachable from do_last: with the exception of do_link:; just take all that crap to a helper function as-is and have it tell the caller if it has to go to do_link. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | beginning to untangle do_filp_open()Al Viro2010-03-05
| | | | | | | | | | | | | | | | | | | | | | | | That's going to be a long and painful series. The first step: take the stuff reachable from 'ok' label in do_filp_open() into a new helper (finish_open()). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'for-fsnotify' into for-linusAl Viro2010-03-03
|\ \ \
| * | | Lose the first argument of audit_inode_child()Al Viro2010-02-08
| | | | | | | | | | | | | | | | | | | | | | | | it's always equal to ->d_name.name of the second argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | Lose the new_name argument of fsnotify_move()Al Viro2010-02-08
| | | | | | | | | | | | | | | | | | | | | | | | it's always new_dentry->d_name.name Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | set S_DEAD on unlink() and non-directory rename() victimsAl Viro2010-03-03
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Clean follow_dotdot() up a bitAl Viro2010-03-03
| | | | | | | | | | | | | | | | | | | | | | | | No need to open-code follow_up() in it and locking can be lighter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Switch may_open() and break_lease() to passing O_...Al Viro2010-03-03
| |/ / |/| | | | | | | | | | | | | | ... instead of mixing FMODE_ and O_ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | fix LOOKUP_FOLLOW on automount "symlinks"Al Viro2010-02-19
|/ / | | | | | | | | | | | | | | Make sure that automount "symlinks" are followed regardless of LOOKUP_FOLLOW; it should have no effect on them. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ima: rename ima_path_check to ima_file_checkMimi Zohar2010-02-07
| | | | | | | | | | | | | | | | ima_path_check actually deals with files! call it ima_file_check instead. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fix ima breakageMimi Zohar2010-02-07
|/ | | | | | | | | | | | | | | | The "Untangling ima mess, part 2 with counters" patch messed up the counters. Based on conversations with Al Viro, this patch streamlines ima_path_check() by removing the counter maintaince. The counters are now updated independently, from measuring the file, in __dentry_open() and alloc_file() by calling ima_counts_get(). ima_path_check() is called from nfsd and do_filp_open(). It also did not measure all files that should have been measured. Reason: ima_path_check() got bogus value passed as mask. [AV: mea culpa] [AV: add missing nfsd bits] Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Fix the -ESTALE handling in do_filp_open()Al Viro2010-01-14
| | | | | | | | | Instead of playing sick games with path saving, cleanups, just retry the entire thing once with LOOKUP_REVAL added. Post-.34 we'll convert all -ESTALE handling in there to that style, rather than playing with many retry loops deep in the call chain. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Fix ACC_MODE() for realAl Viro2010-01-14
| | | | | | | | | | commit 5300990c0370e804e49d9a59d928c5d53fb73487 had stepped on a rather nasty mess: definitions of ACC_MODE used to be different. Fixed the resulting breakage, converting them to variant that takes O_... value; all callers have that and it actually simplifies life (see tomoyo part of changes). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fix autofs/afs/etc. magic mountpoint breakageAl Viro2010-01-14
| | | | | | | | | | | | | We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT) if /mnt/tmp is an autofs direct mount. The reason is that nd.last_type is bogus here; we want LAST_BIND for everything of that kind and we get LAST_NORM left over from finding parent directory. So make sure that it *is* set properly; set to LAST_BIND before doing ->follow_link() - for normal symlinks it will be changed by __vfs_follow_link() and everything else needs it set that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* generic_permission: MAY_OPEN is not write accessSerge E. Hallyn2009-12-30
| | | | | | | | | | | | | | | generic_permission was refusing CAP_DAC_READ_SEARCH-enabled processes from opening DAC-protected files read-only, because do_filp_open adds MAY_OPEN to the open mask. Ignore MAY_OPEN. After this patch, CAP_DAC_READ_SEARCH is again sufficient to open(fname, O_RDONLY) on a file to which DAC otherwise refuses us read permission. Reported-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>