aboutsummaryrefslogtreecommitdiffstats
path: root/fs/locks.c
Commit message (Collapse)AuthorAge
* Leases can be hidden by flocksPavel Emelyanov2007-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode->i_flock list contains the leases, flocks and posix locks in the specified order. However, the flocks are added in the head of this list thus hiding the leases from F_GETLEASE command, from time_out_leases() and other code that expects the leases to come first. The following example will demonstrate this: #define _GNU_SOURCE #include <unistd.h> #include <fcntl.h> #include <stdio.h> #include <sys/file.h> static void show_lease(int fd) { int res; res = fcntl(fd, F_GETLEASE); switch (res) { case F_RDLCK: printf("Read lease\n"); break; case F_WRLCK: printf("Write lease\n"); break; case F_UNLCK: printf("No leases\n"); break; default: printf("Some shit\n"); break; } } int main(int argc, char **argv) { int fd, res; fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("Can't open file"); return 1; } res = fcntl(fd, F_SETLEASE, F_WRLCK); if (res == -1) { perror("Can't set lease"); return 1; } show_lease(fd); if (flock(fd, LOCK_SH) == -1) { perror("Can't flock shared"); return 1; } show_lease(fd); return 0; } The first call to show_lease() will show the write lease set, but the second will show no leases. Fix the flock adding so that the leases always stay in the head of this list. Found during making the flocks pid-namespaces aware. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rename setlease to generic_setleaseChristoph Hellwig2007-07-31
| | | | | | | | | | | Make it a little more clear that this is the default implementation for the setleast operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: Remove slab destructors from kmem_cache_create().Paul Mundt2007-07-19
| | | | | | | | | | | | | | Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* locks: fix vfs_test_lock() commentJ. Bruce Fields2007-07-18
| | | | | | | Thanks to Doug Chapman for pointing out that the comment here is inconsistent with the function prototype. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
* locks: make posix_test_lock() interface more consistentJ. Bruce Fields2007-07-18
| | | | | | | | Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or presence of a conflict lock by setting fl_type to, respectively, F_UNLCK or something other than F_UNLCK, the return value is no longer needed. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
* locks: export setlease to filesystemsJ. Bruce Fields2007-07-18
| | | | | | | Export setlease so it can used by filesystems to implement their lease methods. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
* locks: provide a file lease method enabling cluster-coherent leasesJ. Bruce Fields2007-07-18
| | | | | | | | | | | | Currently leases are only kept locally, so there's no way for a distributed filesystem to enforce them against multiple clients. We're particularly interested in the case of nfsd exporting a cluster filesystem, in which case nfsd needs cluster-coherent leases in order to implement delegations correctly. Also add some documentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* locks: rename lease functions to reflect locks.c conventionsJ. Bruce Fields2007-07-18
| | | | | | | | | | | | | | | | | | We've been using the convention that vfs_foo is the function that calls a filesystem-specific foo method if it exists, or falls back on a generic method if it doesn't; thus vfs_foo is what is called when some other part of the kernel (normally lockd or nfsd) wants to get a lock, whereas foo is what filesystems call to use the underlying local functionality as part of their lock implementation. So rename setlease to vfs_setlease (which will call a filesystem-specific setlease after a later patch) and __setlease to setlease. Also, vfs_setlease need only be GPL-exported as long as it's only needed by lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
* locks: share more common lease codeJ. Bruce Fields2007-07-18
| | | | | | | | | Share more code between setlease (used by nfsd) and fcntl. Also some minor cleanup. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Christoph Hellwig <hch@infradead.org>
* locks: clean up lease_alloc()J. Bruce Fields2007-07-18
| | | | | | | Return the newly allocated structure as the return value instead of using a struct ** parameter. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* locks: convert an -EINVAL return to a BUGJ. Bruce Fields2007-07-18
| | | | | | | There's no point trying to return an error in these cases, which all represent bugs in the callers. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* leases: minor break_lease() comment clarificationdavid m. richter2007-07-18
| | | | | | | clarify that break_lease() checks for presence of any lock, not just leases. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
* Remove SLAB_CTOR_CONSTRUCTORChristoph Lameter2007-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* locks: fix F_GETLK regression (failure to find conflicts)J. Bruce Fields2007-05-10
| | | | | | | | | | | | | | | | | | | | | | In 9d6a8c5c213e34c475e72b245a8eb709258e968c we changed posix_test_lock to modify its single file_lock argument instead of taking separate input and output arguments. This makes it no longer safe to set the output lock's fl_type to F_UNLCK before looking for a conflict, since that means searching for a conflict against a lock with type F_UNLCK. This fixes a regression which causes F_GETLK to incorrectly report no conflict on most filesystems (including any filesystem that doesn't do its own locking). Also fix posix_lock_to_flock() to copy the lock type. This isn't strictly necessary, since the caller already does this; but it seems less likely to cause confusion in the future. Thanks to Doug Chapman for the bug report. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2007-05-07
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux: gfs2: nfs lock support for gfs2 lockd: add code to handle deferred lock requests lockd: always preallocate block in nlmsvc_lock() lockd: handle test_lock deferrals lockd: pass cookie in nlmsvc_testlock lockd: handle fl_grant callbacks lockd: save lock state on deferral locks: add fl_grant callback for asynchronous lock return nfsd4: Convert NFSv4 to new lock interface locks: add lock cancel command locks: allow {vfs,posix}_lock_file to return conflicting lock locks: factor out generic/filesystem switch from setlock code locks: factor out generic/filesystem switch from test_lock locks: give posix_test_lock same interface as ->lock locks: make ->lock release private data before returning in GETLK case locks: create posix-to-flock helper functions locks: trivial removal of unnecessary parentheses
| * locks: add fl_grant callback for asynchronous lock returnMarc Eshel2007-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Acquiring a lock on a cluster filesystem may require communication with remote hosts, and to avoid blocking lockd or nfsd threads during such communication, we allow the results to be returned asynchronously. When a ->lock() call needs to block, the file system will return -EINPROGRESS, and then later return the results with a call to the routine in the fl_grant field of the lock_manager_operations struct. This differs from the case when ->lock returns -EAGAIN to a blocking lock request; in that case, the filesystem calls fl_notify when the lock is granted, and the caller retries the original lock. So while fl_notify is merely a hint to the caller that it should retry, fl_grant actually communicates the final result of the lock operation (with the lock already acquired in the succesful case). Therefore fl_grant takes a lock, a status and, for the test lock case, a conflicting lock. We also allow fl_grant to return an error to the filesystem, to handle the case where the fl_grant requests arrives after the lock manager has already given up waiting for it. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * locks: add lock cancel commandMarc Eshel2007-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Lock managers need to be able to cancel pending lock requests. In the case where the exported filesystem manages its own locks, it's not sufficient just to call posix_unblock_lock(); we need to let the filesystem know what's happening too. We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this might also be made available to userspace applications that could benefit from an asynchronous locking api. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| * locks: allow {vfs,posix}_lock_file to return conflicting lockMarc Eshel2007-05-06
| | | | | | | | | | | | | | | | | | | | | | | | The nfsv4 protocol's lock operation, in the case of a conflict, returns information about the conflicting lock. It's unclear how clients can use this, so for now we're not going so far as to add a filesystem method that can return a conflicting lock, but we may as well return something in the local case when it's easy to. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| * locks: factor out generic/filesystem switch from setlock codeMarc Eshel2007-05-06
| | | | | | | | | | | | | | | | | | | | | | Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for all the setlk code. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| * locks: factor out generic/filesystem switch from test_lockJ. Bruce Fields2007-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for test_lock. Note that this hasn't been necessary until recently, because the few filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS. However GFS (and, in the future, other cluster filesystems) need to implement their own locking to get cluster-coherent locking, and also want to be able to export locking to NFS (lockd and NFSv4). So we accomplish this by factoring out code such as this and exporting it for the use of lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| * locks: give posix_test_lock same interface as ->lockMarc Eshel2007-05-06
| | | | | | | | | | | | | | | | | | posix_test_lock() and ->lock() do the same job but have gratuitously different interfaces. Modify posix_test_lock() so the two agree, simplifying some code in the process. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| * locks: make ->lock release private data before returning in GETLK caseJ. Bruce Fields2007-05-06
| | | | | | | | | | | | | | | | | | | | | | | | The file_lock argument to ->lock is used to return the conflicting lock when found. There's no reason for the filesystem to return any private information with this conflicting lock, but nfsv4 is. Fix nfsv4 client, and modify locks.c to stop calling fl_release_private for it in this case. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>"
| * locks: create posix-to-flock helper functionsJ. Bruce Fields2007-04-16
| | | | | | | | | | | | | | | | Factor out a bit of messy code by creating posix-to-flock counterparts to the existing flock-to-posix helper functions. Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| * locks: trivial removal of unnecessary parenthesesJ. Bruce Fields2007-04-16
| | | | | | | | | | | | Remove some unnecessary parentheses. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
* | slab allocators: Remove SLAB_DEBUG_INITIAL flagChristoph Lameter2007-05-07
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] VFS: change struct file to use struct pathJosef "Jeff" Sipek2006-12-08
| | | | | | | | | | | | | This patch changes struct file to use struct path instead of having independent pointers to struct dentry and struct vfsmount, and converts all users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}. Additionally, it adds two #define's to make the transition easier for users of the f_dentry and f_vfsmnt. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove kmem_cache_tChristoph Lameter2006-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove SLAB_KERNELChristoph Lameter2006-12-07
| | | | | | | | SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] file: modify struct fown_struct to use a struct pidEric W. Biederman2006-10-02
| | | | | | | | | | File handles can be requested to send sigio and sigurg to processes. By tracking the destination processes using struct pid instead of pid_t we make the interface safe from all potential pid wrap around problems. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] VFS: Use SEEK_{SET, CUR, END} instead of hardcoded valuesJosef 'Jeff' Sipek2006-10-01
| | | | | | | | | VFS: Use SEEK_{SET,CUR,END} instead of hardcoded values Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fcntl(F_SETSIG) fixTrond Myklebust2006-08-14
| | | | | | | | | | | | | | fcntl(F_SETSIG) no longer works on leases because lease_release_private_callback() gets called as the lease is copied in order to initialise it. The problem is that lease_alloc() performs an unnecessary initialisation, which sets the lease_manager_ops. Avoid the problem by allocating the target lease structure using locks_alloc_lock(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* VFS: Add support for the FL_ACCESS flag to flock_lock_file()Trond Myklebust2006-07-05
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* VFS: Allow caller to determine if BSD or posix locks were actually freedTrond Myklebust2006-07-05
| | | | | | | | Change posix_lock_file_conf(), and flock_lock_file() so that if called with an F_UNLCK argument, and the FL_EXISTS flag they will indicate whether or not any locks were actually freed by returning 0 or -ENOENT. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* [PATCH] fs/locks.c: make posix_locks_deadlock() staticAdrian Bunk2006-06-23
| | | | | | | | | We can now make posix_locks_deadlock() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] vfs: add lock owner argument to flush operationMiklos Szeredi2006-06-23
| | | | | | | | | | | | | | | | | | | Pass the POSIX lock owner ID to the flush operation. This is useful for filesystems which don't want to store any locking state in inode->i_flock but want to handle locking/unlocking POSIX locks internally. FUSE is one such filesystem but I think it possible that some network filesystems would need this also. Also add a flag to indicate that a POSIX locking request was generated by close(), so filesystems using the above feature won't send an extra locking request in this case. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] locks: clean up locks_remove_posix()Miklos Szeredi2006-06-23
| | | | | | | | | | | | | locks_remove_posix() can use posix_lock_file() instead of doing the lock removal by hand. posix_lock_file() now does exacly the same. The comment about pids no longer applies, posix_lock_file() takes only the owner into account. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] locks: don't do unnecessary allocationsMiklos Szeredi2006-06-23
| | | | | | | | | | | | | | | | posix_lock_file() always allocates new locks in advance, even if it's easy to determine that no allocations will be needed. Optimize these cases: - FL_ACCESS flag is set - Unlocking the whole range Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] locks: don't unnecessarily fail posix lock operationsMiklos Szeredi2006-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | posix_lock_file() was too cautious, failing operations on OOM, even if they didn't actually require an allocation. This has the disadvantage, that a failing unlock on process exit could lead to a memory leak. There are two possibilites for this: - filesystem implements .lock() and calls back to posix_lock_file(). On cleanup of files_struct locks_remove_posix() is called which should remove all locks belonging to files_struct. However if filesystem calls posix_lock_file() which fails, then those locks will never be freed. - if a file is closed while a lock is blocked, then after acquiring fcntl_setlk() will undo the lock. But this unlock itself might fail on OOM, again possibly leaking the lock. The solution is to move the checking of the allocations until after it is sure that they will be needed. This will solve the above problem since unlock will always succeed unless it splits an existing region. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] remove steal_locks()Miklos Szeredi2006-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the steal_locks() function. steal_locks() doesn't work correctly with any filesystem that does it's own lock management, including NFS, CIFS, etc. In addition it has weird semantics on local filesystems in case tasks sharing file-descriptor tables are doing POSIX locking operations in parallel to execve(). The steal_locks() function has an effect on applications doing: clone(CLONE_FILES) /* in child */ lock execve lock POSIX locks acquired before execve (by "child", "parent" or any further task sharing files_struct) will after the execve be owned exclusively by "child". According to Chris Wright some LSB/LTP kind of suite triggers without the stealing behavior, but there's no known real-world application that would also fail. Apps using NPTL are not affected, since all other threads are killed before execve. Apps using LinuxThreads are only affected if they - have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec Both conditions are documented, but not their interaction. Apps using clone() natively are affected if they - use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec The above scenarios are unlikely, but possible. If the patch is vetoed, there's a plan B, that involves mostly keeping the weird stealing semantics, but changing the way lock ownership is handled so that network and local filesystems work consistently. That would add more complexity though, so this solution seems to be preferred by most people. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <willy@debian.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Return error in case flock_lock_file failureKirill Korotaev2006-06-14
| | | | | | | | | If flock_lock_file() failed to allocate flock with locks_alloc_lock() then "error = 0" is returned. Need to return some non-zero. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/locks.c: Fix lease_initTrond Myklebust2006-05-08
| | | | | | | | | | | | | | It is insane to be giving lease_init() the task of freeing the lock it is supposed to initialise, given that the lock is not guaranteed to be allocated on the stack. This causes lockups in fcntl_setlease(). Problem diagnosed by Daniel Hokka Zakrisson <daniel@hozac.com> Also fix a slab leak in __setlease() due to an uninitialised return value. Problem diagnosed by Björn Steinbrink. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Daniel Hokka Zakrisson <daniel@hozac.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix file lookup without refDipankar Sarma2006-04-19
| | | | | | | | | | | | | | There are places in the kernel where we look up files in fd tables and access the file structure without holding refereces to the file. So, we need special care to avoid the race between looking up files in the fd table and tearing down of the file in another CPU. Otherwise, one might see a NULL f_dentry or such torn down version of the file. This patch fixes those special places where such a race may happen. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/locks.c: Fix sys_flock() raceTrond Myklebust2006-03-31
| | | | | | | | | | | | | | | | | | | | | | | | sys_flock() currently has a race which can result in a double free in the multi-thread case. Thread 1 Thread 2 sys_flock(file, LOCK_EX) sys_flock(file, LOCK_UN) If Thread 2 removes the lock from inode->i_lock before Thread 1 tests for list_empty(&lock->fl_link) at the end of sys_flock, then both threads will end up calling locks_free_lock for the same lock. Fix is to make flock_lock_file() do the same as posix_lock_file(), namely to make a copy of the request, so that the caller can always free the lock. This also has the side-effect of fixing up a reference problem in the lockd handling of flock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] locks: don't panicMiklos Szeredi2006-03-31
| | | | | | | | Don't panic! Just BUG_ON(). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] VFS,fs/locks.c,NFSD4: add race_free posix_lock_file_conf() interfaceAndy Adamson2006-03-26
| | | | | | | | | | | | | | | | | | | | Lockd and the NFSv4 server both exercise a race condition where posix_test_lock() is called either before or after posix_lock_file() to deal with a denied lock request due to a conflicting lock. Remove the race condition for the NFSv4 server by adding a new conflicting lock parameter to __posix_lock_file() , changing the name to __posix_lock_file_conf(). Keep posix_lock_file() interface, add posix_lock_conf() interface, both call __posix_lock_file_conf(). [akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs] Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] VFS,fs/locks.c: cleanup locks_insert_blockJ. Bruce Fields2006-03-26
| | | | | | | | | BUG instead of handling a case that should never happen. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Use __read_mostly on some hot fs variablesEric Dumazet2006-03-26
| | | | | | | | | | | | | | I discovered on oprofile hunting on a SMP platform that dentry lookups were slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in a cache line that contained inodes_stat. So each time inodes_stats is changed by a cpu, other cpus have to refill their cache line. This patch moves some variables to the __read_mostly section, in order to avoid false sharing. RCU dentry lookups can go full speed. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* lockd: stop abusing file_lock_listChristoph Hellwig2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently lockd directly access the file_lock_list from fs/locks.c. It does so to mark locks granted or reclaimable. This is very suboptimal, because a) lockd needs to poke into locks.c internals, and b) it needs to iterate over all locks in the system for marking locks granted or reclaimable. This patch adds lists for granted and reclaimable locks to the nlm_host structure instead, and adds locks to those. nlmclnt_lock: now adds the lock to h_granted instead of setting the NFS_LCK_GRANTED, still O(1) nlmclnt_mark_reclaim: goes away completely, replaced by a list_splice_init. Complexity reduced from O(locks in the system) to O(1) reclaimer: iterates over h_reclaim now, complexity reduced from O(locks in the system) to O(locks per nlm_host) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* lockd: posix_test_lock() should not call locks_copy_lock()Trond Myklebust2006-03-20
| | | | | | | | The caller of posix_test_lock() should never need to look at the lock private data, so do not copy that information. This also means that there is no need to call the fl_release_private methods. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* locks,lockd: fix race in nlmsvc_testlockAndy Adamson2006-03-20
| | | | | | | | | | | | | | posix_test_lock() returns a pointer to a struct file_lock which is unprotected and can be removed while in use by the caller. Move the conflicting lock from the return to a parameter, and copy the conflicting lock. In most cases the caller ends up putting the copy of the conflicting lock on the stack. On i386, sizeof(struct file_lock) appears to be about 100 bytes. We're assuming that's reasonable. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>