aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* VFS: security/: d_inode() annotationsDavid Howells2015-04-15
| | | | | | | | ... except where that code acts as a filesystem driver, rather than working with dentries given to it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: security/: d_backing_inode() annotationsDavid Howells2015-04-15
| | | | | | | | most of the ->d_inode uses there refer to the same inode IO would go to, i.e. d_backing_inode() Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: net/: d_inode() annotationsDavid Howells2015-04-15
| | | | | | | socket inodes and sunrpc filesystems - inodes owned by that code Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: net/unix: d_backing_inode() annotationsDavid Howells2015-04-15
| | | | | | | places where we are dealing with S_ISSOCK file creation/lookups. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: kernel/: d_inode() annotationsDavid Howells2015-04-15
| | | | | | | | relayfs and tracefs are dealing with inodes of their own; those two act as filesystem drivers Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: audit: d_backing_inode() annotationsDavid Howells2015-04-15
| | | | | Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Fix up some ->d_inode accesses in the chelsio driverDavid Howells2015-04-15
| | | | | | | | | | | | Fix up some ->d_inode accesses in the chelsio driver. (1) FILE_DATA() should just be replaced with file_inode(). (2) set_debugfs_file_size() should be removed and debugfs_create_file_size() should be used to create the file. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Cachefiles should perform fs modifications on the top layer onlyDavid Howells2015-04-15
| | | | | | | | Cachefiles should perform fs modifications (eg. vfs_unlink()) on the top layer only and should not attempt to alter the lower layer. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: AF_UNIX sockets should call mknod on the top layer onlyDavid Howells2015-04-15
| | | | | | | | AF_UNIX sockets should call mknod on the top layer only and should not attempt to modify the lower layer in a layered filesystem such as overlayfs. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* block: loop: switch to VFS ITER_BVECChristoph Hellwig2015-04-15
| | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inodeDavid Howells2015-04-15
| | | | | | | | Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Make pathwalk use d_is_reg() rather than S_ISREG()David Howells2015-04-15
| | | | | | | | | | Make pathwalk use d_is_reg() rather than S_ISREG() to determine whether to honour O_TRUNC. Since this occurs after complete_walk(), the dentry type field cannot change and the inode pointer cannot change as we hold a ref on the dentry, so this should be safe. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()David Howells2015-04-15
| | | | | | | | Fix up debugfs to use d_is_dir(dentry) in place of S_ISDIR(dentry->d_inode->i_mode). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalkDavid Howells2015-04-15
| | | | | | | | | | | | | | | Where we have: if (!dentry->d_inode || d_is_negative(dentry)) { type constructions in pathwalk we should be able to eliminate the check of d_inode and rely solely on the result of d_is_negative() or d_is_positive(). What we do have to take care to do is to read d_inode after calling a d_is_xxx() typecheck function to get the barriering right. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* NFS: Don't use d_inode as a variable nameDavid Howells2015-04-15
| | | | | | | Don't use d_inode as a variable name as it now masks a function name. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Impose ordering on accesses of d_inode and d_flagsDavid Howells2015-04-15
| | | | | | | | | | | | | | | | | | | | | | | | Impose ordering on accesses of d_inode and d_flags to avoid the need to do this: if (!dentry->d_inode || d_is_negative(dentry)) { when this: if (d_is_negative(dentry)) { should suffice. This check is especially problematic if a dentry can have its type field set to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in unionmount). What we really need to do is stick a write barrier between setting d_inode and setting d_flags and a read barrier between reading d_flags and reading d_inode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Add owner-filesystem positive/negative dentry checksDavid Howells2015-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Supply two functions to test whether a filesystem's own dentries are positive or negative (d_really_is_positive() and d_really_is_negative()). The problem is that the DCACHE_ENTRY_TYPE field of dentry->d_flags may be overridden by the union part of a layered filesystem and isn't thus necessarily indicative of the type of dentry. Normally, this would involve a negative dentry (ie. ->d_inode == NULL) having ->d_layer.lower pointed to a lower layer dentry, DCACHE_PINNING_LOWER set and the DCACHE_ENTRY_TYPE field set to something other than DCACHE_MISS_TYPE - but it could also involve, say, a DCACHE_SPECIAL_TYPE being overridden to DCACHE_WHITEOUT_TYPE if a 0,0 chardev is detected in the top layer. However, inside a filesystem, when that fs is looking at its own dentries, it probably wants to know if they are really negative or not - and doesn't care about the fallthrough bits used by the union. To this end, a filesystem should normally use d_really_is_positive/negative() when looking at its own dentries rather than d_is_positive/negative() and should use d_inode() to get at the inode. Anyone looking at someone else's dentries (this includes pathwalk) should use d_is_xxx() and d_backing_inode(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nfs: generic_write_checks() shouldn't be done on swapout...Al Viro2015-04-15
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ocfs2: use __generic_file_write_iter()Al Viro2015-04-11
| | | | | | | we can do that now - all we need is to clear IOCB_DIRECT from ->ki_flags in "can't do dio" case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* mirror O_APPEND and O_DIRECT into iocb->ki_flagsAl Viro2015-04-11
| | | | | | ... avoiding write_iter/fcntl races. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch generic_write_checks() to iocb and iterAl Viro2015-04-11
| | | | | | | | | | | | ... returning -E... upon error and amount of data left in iter after (possible) truncation upon success. Note, that normal case gives a non-zero (positive) return value, so any tests for != 0 _must_ be updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: fs/ext4/file.c
* ocfs2: move generic_write_checks() before the alignment checksAl Viro2015-04-11
| | | | | | | | | | | | | Alignment checks for dio depend upon the range truncation done by generic_write_checks(). They can be done as soon as we got ocfs2_rw_lock() and that actually makes ocfs2_prepare_inode_for_write() simpler. The only thing to watch out for is restoring the original count in "unlock and redo without dio" case. Position doesn't need to be restored, since we change it only in O_APPEND case and in that case it will be reassigned anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ocfs2_file_write_iter: stop messing with pposAl Viro2015-04-11
| | | | | | it's &iocb->ki_pos; no need to obfuscate. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' into for-nextAl Viro2015-04-11
|\
| * ocfs2: _really_ sync the right rangeAl Viro2015-04-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "ocfs2 syncs the wrong range" had been broken; prior to it the code was doing the wrong thing in case of O_APPEND, all right, but _after_ it we were syncing the wrong range in 100% cases. *ppos, aka iocb->ki_pos is incremented prior to that point, so we are always doing sync on the area _after_ the one we'd written to. Spotted by Joseph Qi <joseph.qi@huawei.com> back in January; unfortunately, I'd missed his mail back then ;-/ Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ocfs2_file_write_iter: keep return value and current position update in syncAl Viro2015-04-08
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [regression] ocfs2: do *not* increment ->ki_pos twiceAl Viro2015-04-08
| | | | | | | | | | | | | | generic_file_direct_write() already does that. Broken by "ocfs2: do not fallback to buffer I/O write if appending" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ioctx_alloc(): fix vma (and file) leak on failureAl Viro2015-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we fail past the aio_setup_ring(), we need to destroy the mapping. We don't need to care about anybody having found ctx, or added requests to it, since the last failure exit is exactly the failure to make ctx visible to lookups. Reproducer (based on one by Joe Mario <jmario@redhat.com>): void count(char *p) { char s[80]; printf("%s: ", p); fflush(stdout); sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid()); system(s); } int main() { io_context_t *ctx; int created, limit, i, destroyed; FILE *f; count("before"); if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL) perror("opening aio-max-nr"); else if (fscanf(f, "%d", &limit) != 1) fprintf(stderr, "can't parse aio-max-nr\n"); else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL) perror("allocating aio_context_t array"); else { for (i = 0, created = 0; i < limit; i++) { if (io_setup(1000, ctx + created) == 0) created++; } for (i = 0, destroyed = 0; i < created; i++) if (io_destroy(ctx[i]) == 0) destroyed++; printf("created %d, failed %d, destroyed %d\n", created, limit - created, destroyed); count("after"); } } Found-by: Joe Mario <jmario@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fix mremap() vs. ioctx_kill() raceAl Viro2015-04-06
| | | | | | | | | | | | | | | | | | | | | | | | teach ->mremap() method to return an error and have it fail for aio mappings in process of being killed Note that in case of ->mremap() failure we need to undo move_page_tables() we'd already done; we could call ->mremap() first, but then the failure of move_page_tables() would require undoing whatever _successful_ ->mremap() has done, which would be a lot more headache in general. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | udf_file_write_iter: reorder and simplifyAl Viro2015-04-11
| | | | | | | | | | | | it's easier to do generic_write_checks() first Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fuse: ->direct_IO() doesn't need generic_write_checks()Al Viro2015-04-11
| | | | | | | | | | | | | | | | | | already done by caller. We used to call __fuse_direct_write(), which called generic_write_checks(); now the former got expanded, bringing the latter to the surface. It used to be called all along and calling it from there had been wrong all along... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ext4_file_write_iter: move generic_write_checks() upAl Viro2015-04-11
| | | | | | | | | | | | simpler that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | xfs_file_aio_write_checks: switch to iocb/iov_iterAl Viro2015-04-11
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | generic_write_checks(): drop isblk argumentAl Viro2015-04-11
| | | | | | | | | | | | all remaining callers are passing 0; some just obscure that fact. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | blkdev_write_iter: expand generic_file_checks() call in thereAl Viro2015-04-11
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | lift generic_write_checks() into callers of __generic_file_write_iter()Al Viro2015-04-11
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | __generic_file_write_iter: keep ->ki_pos and return value consistentAl Viro2015-04-11
| | | | | | | | | | | | | | | | | | | | | | A side effect worth noting: in O_APPEND case we set ->ki_pos early, so if it turns out to be an error or a zero-length write, we'll end up with ->ki_pos modified. Safe, since all callers never look at the ->ki_pos after the call of __generic_file_write_iter() returning non-positive, all the way to caller of ->write_iter() and those discard ->ki_pos when getting that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | cifs: fold cifs_iovec_write() into the only callerAl Viro2015-04-11
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ntfs: move iov_iter_truncate() closer to generic_write_checks()Al Viro2015-04-11
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | new_sync_write(): discard ->ki_pos unless the return value is positiveAl Viro2015-04-11
| | | | | | | | | | | | | | | | | | That allows ->write_iter() instances much more convenient life wrt iocb->ki_pos (and fixes several filesystems with borderline POSIX violations when zero-length write succeeds and changes the current position). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | direct_IO: remove rw from a_ops->direct_IO()Omar Sandoval2015-04-11
| | | | | | | | | | | | | | Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | direct_IO: use iov_iter_rw() instead of rw everywhereOmar Sandoval2015-04-11
| | | | | | | | | | | | | | | | | | | | | | The rw parameter to direct_IO is redundant with iov_iter->type, and treated slightly differently just about everywhere it's used: some users do rw & WRITE, and others do rw == WRITE where they should be doing a bitwise check. Simplify this with the new iov_iter_rw() helper, which always returns either READ or WRITE. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Remove rw from dax_{do_,}io()Omar Sandoval2015-04-11
| | | | | | | | | | | | | | And use iov_iter_rw() instead. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Remove rw from {,__,do_}blockdev_direct_IO()Omar Sandoval2015-04-11
| | | | | | | | | | | | | | | | Most filesystems call through to these at some point, so we'll start here. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | new helper: iov_iter_rw()Omar Sandoval2015-04-11
| | | | | | | | | | | | | | Get either READ or WRITE out of iter->type. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ->aio_read and ->aio_write removedAl Viro2015-04-11
| | | | | | | | | | | | no remaining users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | pcm: another weird API abuseAl Viro2015-04-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | readv() and writev() should _not_ ignore all but the first ->iov_len, among other things. Really weird abuse of those syscalls - it expects a vector element per channel, with identical lengths (it actually assumes them to be identical - no checking is done). readv() and writev() are really bad match for that. Unfortunately, userland API is userland API and we can't do anything about them. Converted to ->read_iter/->write_iter. Please, _please_ don't do anything of that kind when designing new interfaces. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | infinibad: weird APIs switched to ->write_iter()Al Viro2015-04-11
| | | | | | | | | | | | | | | | | | | | | | | | Things Not To Do When Writing A Driver, part 1001st: have writev() and write() on the same file doing completely different things. As in, "interpret very different sets of commands". We _can_ handle that, but it's a bloody bad idea. Don't do that in new drivers. Ever. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | kill do_sync_read/do_sync_writeAl Viro2015-04-11
| | | | | | | | | | | | | | | | all remaining instances of aio_{read,write} (all 4 of them) have explicit ->read and ->write resp.; do_sync_read/do_sync_write is never called by __vfs_read/__vfs_write anymore and no other users had been left. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fuse: use iov_iter_get_pages() for non-splice pathAl Viro2015-04-11
| | | | | | | | | | | | store reference to iter instead of that to iovec Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>