aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* ceph: clean up header guardsSage Weil2010-08-01
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: strip misleading/obsolete version, feature infoSage Weil2010-08-01
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: specify supported features in super.hSage Weil2010-08-01
| | | | | | | | Specify the supported/required feature bits in super.h client code instead of using the definitions from the shared kernel/userspace headers (which will go away shortly). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clean up fsid mount optionSage Weil2010-08-01
| | | | | | | Specify the fsid mount option in hex, not via the major/minor u64 hackery we had before. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove unused 'monport' mount optionSage Weil2010-08-01
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: handle ESTALE properly; on receipt send to authority if it wasn'tGreg Farnum2010-08-01
| | | | | Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add ceph_get_cap_for_mds function.Greg Farnum2010-08-01
| | | | | Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: connect to export targets on cap exportSage Weil2010-08-01
| | | | | | | When we get a cap EXPORT message, make sure we are connected to all export targets to ensure we can handle the matching IMPORT. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: connect to export targets if mds is laggySage Weil2010-08-01
| | | | | | | | If an MDS we are talking to may have failed, we need to open sessions to its potential export targets to ensure that any in-progress migration that may have involved some of our caps is properly handled. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: introduce helper to connect to mds export targetsSage Weil2010-08-01
| | | | | | | There are a few cases where we need to open sessions with a given mds's potential export targets. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: only set num_pages in calc_layoutSage Weil2010-08-01
| | | | | | Setting it elsewhere is unnecessary and more fragile. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do caps accounting per mds_clientYehuda Sadeh2010-08-01
| | | | | | | | | Caps related accounting is now being done per mds client instead of just being global. This prepares ground work for a later revision of the caps preallocated reservation list. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: track laggy state of mds from mdsmapSage Weil2010-08-01
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: code cleanupYehuda Sadeh2010-08-01
| | | | | | | Mainly fixing minor issues reported by sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: skip if no auth cap in flush_snapsSage Weil2010-08-01
| | | | | | | | If we have a capsnap but no auth cap (e.g. because it is migrating to another mds), bail out and do nothing for now. Do NOT remove the capsnap from the flush list. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: simplify caps revocation, fix for multimdsSage Weil2010-08-01
| | | | | | | | | | | | | The caps revocation should either initiate writeback, invalidateion, or call check_caps to ack or do the dirty work. The primary question is whether we can get away with only checking the auth cap or whether all caps need to be checked. The old code was doing...something else. At the very least, revocations from non-auth MDSs could break by triggering the "check auth cap only" case. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: simplify add_cap_releasesSage Weil2010-08-01
| | | | | | No functional change, aside from more useful debug output. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: drop unused argumentSage Weil2010-08-01
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: perform lazy reads when file mode and caps permitSage Weil2010-08-01
| | | | | | | | | If the file mode is marked as "lazy," perform cached/buffered reads when the caps permit it. Adjust the rdcache_gen and invalidation logic accordingly so that we manage our cache based on the FILE_CACHE -or- FILE_LAZYIO cap bits. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: perform lazy writes when file mode and caps permitSage Weil2010-08-01
| | | | | | | If we have marked a file as "lazy" (using the ceph ioctl), perform buffered writes when the MDS caps allow it. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add LAZYIO ioctl to mark a file description for lazy consistencySage Weil2010-08-01
| | | | | | | | Allow an application to mark a file descriptor for lazy file consistency semantics, allowing buffered reads and writes when multiple clients are accessing the same file. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: request FILE_LAZYIO cap when LAZY file mode is setSage Weil2010-08-01
| | | | | | | Also clean up the file flags -> file mode -> wanted caps functions while we're at it. This resyncs this file with userspace. Signed-off-by: Sage Weil <sage@newdream.net>
* NFS: Fix a typo in include/linux/nfs_fs.hTrond Myklebust2010-08-01
| | | | | | | | | | | | nfs_commit_inode() needs to be defined irrespectively of whether or not we are supporting NFSv3 and NFSv4. Allow the compiler to optimise away code in the NFSv2-only case by converting it into an inlined stub function. Reported-and-tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2010-07-30
|\ | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Ensure that writepage respects the nonblock flag NFS: kswapd must not block in nfs_release_page nfs: include space for the NUL in root path
| * NFS: Ensure that writepage respects the nonblock flagTrond Myklebust2010-07-30
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: kswapd must not block in nfs_release_pageTrond Myklebust2010-07-30
| | | | | | | | | | | | | | | | | | | | | | See https://bugzilla.kernel.org/show_bug.cgi?id=16056 If other processes are blocked waiting for kswapd to free up some memory so that they can make progress, then we cannot allow kswapd to block on those processes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
| * nfs: include space for the NUL in root pathDan Carpenter2010-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In root_nfs_name() it does the following: if (strlen(buf) + strlen(cp) > NFS_MAXPATHLEN) { printk(KERN_ERR "Root-NFS: Pathname for remote directory too long.\n"); return -1; } sprintf(nfs_export_path, buf, cp); In the original code if (strlen(buf) + strlen(cp) == NFS_MAXPATHLEN) then the sprintf() would lead to an overflow. Generally the rest of the code assumes that the path can have NFS_MAXPATHLEN (1024) characters and a NUL terminator so the fix is to add space to the nfs_export_path[] buffer. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | CIFS: Remove __exit mark from cifs_exit_dns_resolver()David Howells2010-07-30
|/ | | | | | | | | | Remove the __exit mark from cifs_exit_dns_resolver() as it's called by the module init routine in case of error, and so may have been discarded during linkage. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* CRED: Fix get_task_cred() and task_state() to not resurrect dead credentialsDavid Howells2010-07-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible for get_task_cred() as it currently stands to 'corrupt' a set of credentials by incrementing their usage count after their replacement by the task being accessed. What happens is that get_task_cred() can race with commit_creds(): TASK_1 TASK_2 RCU_CLEANER -->get_task_cred(TASK_2) rcu_read_lock() __cred = __task_cred(TASK_2) -->commit_creds() old_cred = TASK_2->real_cred TASK_2->real_cred = ... put_cred(old_cred) call_rcu(old_cred) [__cred->usage == 0] get_cred(__cred) [__cred->usage == 1] rcu_read_unlock() -->put_cred_rcu() [__cred->usage == 1] panic() However, since a tasks credentials are generally not changed very often, we can reasonably make use of a loop involving reading the creds pointer and using atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero. If successful, we can safely return the credentials in the knowledge that, even if the task we're accessing has released them, they haven't gone to the RCU cleanup code. We then change task_state() in procfs to use get_task_cred() rather than calling get_cred() on the result of __task_cred(), as that suffers from the same problem. Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be tripped when it is noticed that the usage count is not zero as it ought to be, for example: kernel BUG at kernel/cred.c:168! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex 745 RIP: 0010:[<ffffffff81069881>] [<ffffffff81069881>] __put_cred+0xc/0x45 RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0 RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0 R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001 FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0) Stack: ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45 <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000 <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246 Call Trace: [<ffffffff810698cd>] put_cred+0x13/0x15 [<ffffffff81069b45>] commit_creds+0x16b/0x175 [<ffffffff8106aace>] set_current_groups+0x47/0x4e [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 RIP [<ffffffff81069881>] __put_cred+0xc/0x45 RSP <ffff88019e7e9eb8> ---[ end trace df391256a100ebdd ]--- Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ecryptfs: Bugfix for error related to ecryptfs_hash_bucketsAndre Osterhues2010-07-28
| | | | | | | | | | | | | | | The function ecryptfs_uid_hash wrongly assumes that the second parameter to hash_long() is the number of hash buckets instead of the number of hash bits. This patch fixes that and renames the variable ecryptfs_hash_buckets to ecryptfs_hash_bits to make it clearer. Fixes: CVE-2010-2492 Signed-off-by: Andre Osterhues <aosterhues@escrypt.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2010-07-28
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: use complete_all and wake_up_all ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES" ceph: fix dentry lease release ceph: fix leak of dentry in ceph_init_dentry() error path ceph: fix pg_mapping leak on pg_temp updates ceph: fix d_release dop for snapdir, snapped dentries ceph: avoid dcache readdir for snapdir
| * ceph: use complete_all and wake_up_allYehuda Sadeh2010-07-27
| | | | | | | | | | | | | | | | | | | | This fixes an issue triggered by running concurrent syncs. One of the syncs would go through while the other would just hang indefinitely. In any case, we never actually want to wake a single waiter, so the *_all functions should be used. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES"Robert P. J. Day2010-07-25
| | | | | | | | | | Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix dentry lease releaseSage Weil2010-07-23
| | | | | | | | | | | | | | | | | | When we embed a dentry lease release notification in a request, invalidate our lease so we don't think we still have it. Otherwise we can get all sorts of incorrect client behavior when multiple clients are interacting with the same part of the namespace. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix leak of dentry in ceph_init_dentry() error pathSage Weil2010-07-23
| | | | | | | | | | | | If we fail to allocate a ceph_dentry_info, don't leak the dn reference. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix pg_mapping leak on pg_temp updatesSage Weil2010-07-23
| | | | | | | | | | | | | | Free the ceph_pg_mapping structs when they are removed from the pg_temp rbtree. Also fix a leak in the __insert_pg_mapping() error path. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix d_release dop for snapdir, snapped dentriesSage Weil2010-07-23
| | | | | | | | | | | | | | | | | | | | We need to set the d_release dop for snapdir and snapped dentries so that the ceph_dentry_info struct gets released. We also use the dcache to cache readdir results when possible, which only works if we know when dentries are dropped from the cache. Since we don't use the dcache for readdir in the hidden snapdir, avoid that case in ceph_dentry_release. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: avoid dcache readdir for snapdirSage Weil2010-07-22
| | | | | | | | | | | | | | | | We should always go to the MDS for readdir on the hidden snapdir. The set of snapshots can change at any time; the client can't trust its cache for that. Signed-off-by: Sage Weil <sage@newdream.net>
* | GFS2: Use kmalloc when possible for ->readdir()Steven Whitehouse2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we don't need a huge amount of memory in ->readdir() then we can use kmalloc rather than vmalloc to allocate it. This should cut down on the greater overheads associated with vmalloc for smaller directories. We may be able to eliminate vmalloc entirely at some stage, but this is easy to do right away. Also using GFP_NOFS to avoid any issues wrt to deleting inodes while under a glock, and suggestion from Linus to factor out the alloc/dealloc. I've given this a test with a variety of different sized directories and it seems to work ok. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | 9p: Pass the correct end of buffer to p9stat_readLatchesar Ionkov2010-07-27
| | | | | | | | | | | | | | Pass the correct end of the buffer to p9stat_read. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | sysfs: allow creating symlinks from untagged to tagged directoriesEric W. Biederman2010-07-26
| | | | | | | | | | | | | | | | | | | | Supporting symlinks from untagged to tagged directories is reasonable, and needed to support CONFIG_SYSFS_DEPRECATED. So don't fail a prior allowing that case to work. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories.Eric W. Biederman2010-07-26
| | | | | | | | | | | | | | | | This happens for network devices when SYSFS_DEPRECATED is enabled. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | sysfs: Don't allow the creation of symlinks we can't removeEric W. Biederman2010-07-26
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | Recently my tagged sysfs support revealed a flaw in the device core that a few rare drivers are running into such that we don't always put network devices in a class subdirectory named net/. Since we are not creating the class directory the network devices wind up in a non-tagged directory, but the symlinks to the network devices from /sys/class/net are in a tagged directory. All of which works until we go to remove or rename the symlink. When we remove or rename a symlink we look in the namespace of the target of the symlink. Since the target of the symlink is in a non-tagged sysfs directory we don't have a namespace to look in, and we fail to remove the symlink. Detect this problem up front and simply don't create symlinks we won't be able to remove later. This prevents symlink leakage and fails in a much clearer and more understandable way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* CIFS: Fix a malicious redirect problem in the DNS lookup codeDavid Howells2010-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the security problem in the CIFS filesystem DNS lookup code in which a malicious redirect could be installed by a random user by simply adding a result record into one of their keyrings with add_key() and then invoking a CIFS CFS lookup [CVE-2010-2524]. This is done by creating an internal keyring specifically for the caching of DNS lookups. To enforce the use of this keyring, the module init routine creates a set of override credentials with the keyring installed as the thread keyring and instructs request_key() to only install lookup result keys in that keyring. The override is then applied around the call to request_key(). This has some additional benefits when a kernel service uses this module to request a key: (1) The result keys are owned by root, not the user that caused the lookup. (2) The result keys don't pop up in the user's keyrings. (3) The result keys don't come out of the quota of the user that caused the lookup. The keyring can be viewed as root by doing cat /proc/keys: 2a0ca6c3 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: 1/4 It can then be listed with 'keyctl list' by root. # keyctl list 0x2a0ca6c3 1 key in keyring: 726766307: --alswrv 0 0 dns_resolver: foo.bar.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <smfrench@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix up trivial spelling errors ('taht' -> 'that')Linus Torvalds2010-07-21
| | | | | | | | | Pointed out by Lucas who found the new one in a comment in setup_percpu.c. And then I fixed the others that I grepped for. Reported-by: Lucas <canolucas@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2010-07-20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: do not include cap/dentry releases in replayed messages ceph: reuse request message when replaying against recovering mds ceph: fix creation of ipv6 sockets ceph: fix parsing of ipv6 addresses ceph: fix printing of ipv6 addrs ceph: add kfree() to error path ceph: fix leak of mon authorizer ceph: fix message revocation
| * ceph: do not include cap/dentry releases in replayed messagesSage Weil2010-07-16
| | | | | | | | | | | | | | | | | | Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: reuse request message when replaying against recovering mdsSage Weil2010-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix creation of ipv6 socketsSage Weil2010-07-09
| | | | | | | | | | | | Use the address family from the peer address instead of assuming IPv4. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix parsing of ipv6 addressesSage Weil2010-07-09
| | | | | | | | | | | | | | Check for brackets around the ipv6 address to avoid ambiguity with the port number. Signed-off-by: Sage Weil <sage@newdream.net>