aboutsummaryrefslogtreecommitdiffstats
path: root/fs/namespace.c
Commit message (Collapse)AuthorAge
* vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flagsMiao Xie2012-12-20
| | | | | | | | The compiler may optimize the while loop and make the check just be done once, so we should use ACCESS_ONCE() to guard access to ->mnt_flags Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* userns: Require CAP_SYS_ADMIN for most uses of setns.Eric W. Biederman2012-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andy Lutomirski <luto@amacapital.net> found a nasty little bug in the permissions of setns. With unprivileged user namespaces it became possible to create new namespaces without privilege. However the setns calls were relaxed to only require CAP_SYS_ADMIN in the user nameapce of the targed namespace. Which made the following nasty sequence possible. pid = clone(CLONE_NEWUSER | CLONE_NEWNS); if (pid == 0) { /* child */ system("mount --bind /home/me/passwd /etc/passwd"); } else if (pid != 0) { /* parent */ char path[PATH_MAX]; snprintf(path, sizeof(path), "/proc/%u/ns/mnt"); fd = open(path, O_RDONLY); setns(fd, 0); system("su -"); } Prevent this possibility by requiring CAP_SYS_ADMIN in the current user namespace when joing all but the user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* proc: Usable inode numbers for the namespace file descriptors.Eric W. Biederman2012-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* userns: fix return value on mntns_install() failureZhao Hongjiang2012-11-19
| | | | | | | Change return value from -EINVAL to -EPERM when the permission check fails. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* vfs: Allow unprivileged manipulation of the mount namespace.Eric W. Biederman2012-11-19
| | | | | | | | | | | | | | | | - Add a filesystem flag to mark filesystems that are safe to mount as an unprivileged user. - Add a filesystem flag to mark filesystems that don't need MNT_NODEV when mounted by an unprivileged user. - Relax the permission checks to allow unprivileged users that have CAP_SYS_ADMIN permissions in the user namespace referred to by the current mount namespace to be allowed to mount, unmount, and move filesystems. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* vfs: Only support slave subtrees across different user namespacesEric W. Biederman2012-11-19
| | | | | | | | | | | | Sharing mount subtress with mount namespaces created by unprivileged users allows unprivileged mounts created by unprivileged users to propagate to mount namespaces controlled by privileged users. Prevent nasty consequences by changing shared subtrees to slave subtress when an unprivileged users creates a new mount namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* vfs: Add a user namespace reference from struct mnt_namespaceEric W. Biederman2012-11-19
| | | | | | | This will allow for support for unprivileged mounts in a new user namespace. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* vfs: Add setns support for the mount namespaceEric W. Biederman2012-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | setns support for the mount namespace is a little tricky as an arbitrary decision must be made about what to set fs->root and fs->pwd to, as there is no expectation of a relationship between the two mount namespaces. Therefore I arbitrarily find the root mount point, and follow every mount on top of it to find the top of the mount stack. Then I set fs->root and fs->pwd to that location. The topmost root of the mount stack seems like a reasonable place to be. Bind mount support for the mount namespace inodes has the possibility of creating circular dependencies between mount namespaces. Circular dependencies can result in loops that prevent mount namespaces from every being freed. I avoid creating those circular dependencies by adding a sequence number to the mount namespace and require all bind mounts be of a younger mount namespace into an older mount namespace. Add a helper function proc_ns_inode so it is possible to detect when we are attempting to bind mound a namespace inode. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* vfs: define struct filename and have getname() return itJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* consitify do_mount() argumentsAl Viro2012-10-11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do_add_mount()/umount -l racesAl Viro2012-09-22
| | | | | | | | | | | | | | normally we deal with lock_mount()/umount races by checking that mountpoint to be is still in our namespace after lock_mount() has been done. However, do_add_mount() skips that check when called with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The reason is that ->mnt_ns may be a temporary namespace created exactly to contain automounts a-la NFS4 referral handling. It's not the namespace of the caller, though, so check_mnt() would fail here. We still need to check that ->mnt_ns is non-NULL in that case, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: Add freezing handling to mnt_want_write() / mnt_drop_write()Jan Kara2012-07-31
| | | | | | | | | | | | | | | | Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Comment mount following codeDavid Howells2012-07-14
| | | | | | | | | Add comments describing what the directions "up" and "down" mean and ref count handling to the VFS mount following family of functions. Signed-off-by: Valerie Aurora <vaurora@redhat.com> (Original author) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errorsDavid Howells2012-07-14
| | | | | | | | | | | | | | | | | | copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Valerie Aurora <valerie.aurora@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of magic in proc_namespace.cAl Viro2012-07-14
| | | | | | | | don't rely on proc_mounts->m being the first field; container_of() is there for purpose. No need to bother with ->private, while we are at it - the same container_of will do nicely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of ->mnt_longtermAl Viro2012-07-14
| | | | | | | | | it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: umount_tree() might be called on subtree that had never made itAl Viro2012-05-30
| | | | | | | | | | __mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm() we'd done back when we set ->mnt_ns non-NULL; it should not be done to vfsmounts that had never gone through commit_tree() and friends. Kudos to lczerner for catching that one... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* brlocks/lglocks: API cleanupsAndi Kleen2012-05-29
| | | | | | | | | | | | | | | | | | | | | lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-01-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
| * Merge branch 'master' into for-nextJiri Kosina2011-11-13
| |\ | | | | | | | | | | | | Sync with Linus tree to have 157550ff ("mtd: add GPMI-NAND driver in the config and Makefile") as I have patch depending on that one.
| * | namespace: mnt_want_write: Remove unused label 'out'Kautuk Consul2011-10-29
| | | | | | | | | | | | | | | | | | | | | | | | I was studying the code and I saw that the out label is not being used at all so I removed it and its usage from the function. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | vfs: prevent remount read-only if pending removesMiklos Szeredi2012-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are any inodes on the super block that have been unlinked (i_nlink == 0) but have not yet been deleted then prevent the remounting the super block read-only. Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: protect remounting superblock read-onlyMiklos Szeredi2012-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently remouting superblock read-only is racy in a major way. With the per mount read-only infrastructure it is now possible to prevent most races, which this patch attempts. Before starting the remount read-only, iterate through all mounts belonging to the superblock and if none of them have any pending writes, set sb->s_readonly_remount. This indicates that remount is in progress and no further write requests are allowed. If the remount succeeds set MS_RDONLY and reset s_readonly_remount. If the remounting is unsuccessful just reset s_readonly_remount. This can result in transient EROFS errors, despite the fact the remount failed. Unfortunately hodling off writes is difficult as remount itself may touch the filesystem (e.g. through load_nls()) which would deadlock. A later patch deals with delayed writes due to nlink going to zero. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: keep list of mounts for each superblockMiklos Szeredi2012-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of vfsmounts belonging to a superblock. List is protected by vfsmount_lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: switch ->show_options() to struct dentry *Al Viro2012-01-06
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: trim includes a bitAl Viro2012-01-03
| | | | | | | | | | | | | | | | | | [folded fix for missing magic.h from Tetsuo Handa] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | switch mnt_namespace ->root to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: take /proc/*/mounts and friends to fs/proc_namespace.cAl Viro2012-01-03
| | | | | | | | | | | | | | | | | | | | | rationale: that stuff is far tighter bound to fs/namespace.c than to the guts of procfs proper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: opencode mntget() mnt_set_mountpoint()Al Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: spread struct mount - remaining argument of next_mnt()Al Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: move fsnotify junk to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: move mnt_devnameAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: move mnt_list to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: switch pnode.h macros to struct mount *Al Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: move the rest of int fields to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: mnt_id/mnt_group_id movedAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: mnt_ns moved to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: spread struct mount - mntput_no_expireAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: spread struct mount - do_add_mount and graft_treeAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: and now we can make ->mnt_master point to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: take mnt_master to struct mountAl Viro2012-01-03
| | | | | | | | | | | | | | | | | | make IS_MNT_SLAVE take struct mount * at the same time Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: spread struct mount - remaining argument of mnt_set_mountpoint()Al Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: spread struct mount - propagate_mnt()Al Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: spread struct mount - get_dominating_id / do_make_slaveAl Viro2012-01-03
| | | | | | | | | | | | | | | | | | | | | next pile of horrors, similar to mnt_parent one; this time it's mnt_master. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: take mnt_child/mnt_mounts to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: all counters taken to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: spread struct mount - work with countersAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: move mnt_mountpoint to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: now it can be done - make mnt_parent point to struct mountAl Viro2012-01-03
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>