aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* efs: update error msg to not refer to deleted read_inode()Robert P. J. Day2008-04-02
| | | | | | Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* afs: add missing up_write() on returnSven Schnelle2008-04-02
| | | | | | | | | If afs_cell_alloc() fails, afs_cells_sem doesn't get unlocked, which leads to a deadlock. Unlock it before returning. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cifs: fix misannotationsAl Viro2008-03-30
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NULL noise: fs/*, mm/*, kernel/*Al Viro2008-03-30
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* jbd/jbd2 NULL noiseAl Viro2008-03-30
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2008-03-28
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock [PATCH] do shrink_submounts() for all fs types [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts() [PATCH] count ghost references to vfsmounts [PATCH] reduce stack footprint in namespace.c
| * [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lockAl Viro2008-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] do shrink_submounts() for all fs typesAl Viro2008-03-27
| | | | | | | | | | | | | | | | | | ... and take it out of ->umount_begin() instances. Call with all locks already taken (by do_umount()) and leave calling release_mounts() to caller (it will do release_mounts() anyway, so we can just put into the same list). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts()Al Viro2008-03-27
| | | | | | | | | | | | | | ... and fix a race on access of ->mnt_share et.al. without namespace_sem in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] count ghost references to vfsmountsAl Viro2008-03-27
| | | | | | | | | | | | | | | | make propagate_mount_busy() exclude references from the vfsmounts that had been isolated by umount_tree() and are just waiting for release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] reduce stack footprint in namespace.cAl Viro2008-03-27
| | | | | | | | | | | | A lot of places misuse struct nameidata when they need struct path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | afs: prevent double cell registrationSven Schnelle2008-03-28
| | | | | | | | | | | | | | | | | | | | | | | | kafs doesn't check if the cell already exists - so if you do an echo "add newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell again. kobject will also complain about a double registration. To prevent such problems, return -EEXIST in that case. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | vfs: fix data leak in nobh_write_end()Dmitri Monakhov2008-03-28
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current nobh_write_end() implementation ignore partial writes(copied < len) case if page was fully mapped and simply mark page as Uptodate, which is totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in previous write_begin call. It simply contains garbage from pagecache and result in data leakage. #TEST_CASE_BEGIN: ~~~~~~~~~~~~~~~~ In fact issue triggered by classical testcase open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 ftruncate(3, 409600) = 0 writev(3, [{"a", 1}, {NULL, 4095}], 2) = 1 ##TESTCASE_SOURCE: ~~~~~~~~~~~~~~~~~ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/uio.h> #include <sys/mman.h> #include <errno.h> int main(int argc, char **argv) { int fd, ret; void* p; struct iovec iov[2]; fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666); ftruncate(fd, 409600); iov[0].iov_base="a"; iov[0].iov_len=1; iov[1].iov_base=NULL; iov[1].iov_len=4096; ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec)); printf("writev = %d, err = %d\n", ret, errno); return 0; } ##TESTCASE RESULT: ~~~~~~~~~~~~~~~~~~ [root@ts63 ~]# mount | grep mnt2 /dev/mapper/test on /mnt2 type ext2 (rw,nobh) [root@ts63 ~]# /tmp/writev /mnt2/test writev = 1, err = 0 [root@ts63 ~]# hexdump -C /mnt2/test 00000000 61 65 62 6f 6f 74 00 00 f0 b9 b4 59 3a 00 00 00 |aeboot.....Y:...| 00000010 20 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000020 df df df df df df df df df df df df df df df df |................| 00000030 3a 00 00 00 2a 00 00 00 21 00 00 00 00 00 00 00 |:...*...!.......| 00000040 60 c0 8c 00 00 00 00 00 40 4a 8d 00 00 00 00 00 |`.......@J......| 00000050 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000060 74 69 6d 65 20 64 64 20 69 66 3d 2f 64 65 76 2f |time dd if=/dev/| 00000070 6c 6f 6f 70 30 20 20 6f 66 3d 2f 64 65 76 2f 6e |loop0 of=/dev/n| skip.. 00000f50 00 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |........1.......| 00000f60 6d 6b 66 73 2e 65 78 74 33 20 2f 64 65 76 2f 76 |mkfs.ext3 /dev/v| 00000f70 7a 76 67 2f 74 65 73 74 20 2d 62 34 30 39 36 00 |zvg/test -b4096.| 00000f80 a0 fe 8c 00 00 00 00 00 21 00 00 00 00 00 00 00 |........!.......| 00000f90 23 31 32 30 35 39 35 30 34 30 34 00 3a 00 00 00 |#1205950404.:...| 00000fa0 20 00 8d 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000fb0 d0 cf 8c 00 00 00 00 00 10 d0 8c 00 00 00 00 00 |................| 00000fc0 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000fd0 6d 6f 75 6e 74 20 2f 64 65 76 2f 76 7a 76 67 2f |mount /dev/vzvg/| 00000fe0 74 65 73 74 20 20 2f 76 7a 20 2d 6f 20 64 61 74 |test /vz -o dat| 00000ff0 61 3d 77 72 69 74 65 62 61 63 6b 00 00 00 00 00 |a=writeback.....| 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| As you can see file's page contains garbage from pagecache instead of zeros. #TEST_CASE_END Attached patch: - Add sanity check BUG_ON in order to prevent incorrect usage by caller, This is function invariant because page can has buffers and in no zero *fadata pointer at the same time. - Always attach buffers to page is it is partial write case. - Always switch back to generic_write_end if page has buffers. This is reasonable because if page already has buffer then generic_write_begin was called previously. Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Reviewed-by: Nick Piggin <npiggin@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2008-03-25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] get stack footprint of pathname resolution back to relative sanity [PATCH] double iput() on failure exit in hugetlb [PATCH] double dput() on failure exit in tiny-shmem [PATCH] fix up new filp allocators [PATCH] check for null vfsmount in dentry_open() [PATCH] reiserfs: eliminate private use of struct file in xattr [PATCH] sanitize hppfs hppfs pass vfsmount to dentry_open() [PATCH] restore export of do_kern_mount()
| * [PATCH] get stack footprint of pathname resolution back to relative sanityAl Viro2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Somebody had put struct nameidata in stack frame of link_path_walk(). Unfortunately, there are certain realities to deal with: * It's in the middle of recursion. Depth is equal to the nesting depth of symlinks, i.e. up to 8. * struct namiedata is, even if one discards the intent junk, at least 12 pointers + 5 ints. * moreover, adding a stack frame is not free in that situation. * there are fs methods called on top of that, and they also have stack footprint. * kernel stack is not infinite. The thing is, even if one chooses to deal with -ESTALE that way (and it's one hell of an overkill), the only thing that needs to be preserved is vfsmount + dentry, not the entire struct nameidata. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] double iput() on failure exit in hugetlbAl Viro2008-03-19
| | | | | | | | | | | | once we'd done d_instantiate(), we should only do dput(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] fix up new filp allocatorsDave Hansen2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some new uses of get_empty_filp() have crept in; switched to alloc_file() to make sure that pieces of initialization won't be missing. We really need to kill get_empty_filp(). [AV] fixed dentry leak on failure exit in anon_inode_getfd() Cc: Erez Zadok <ezk@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J Bruce Fields" <bfields@fieldses.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] check for null vfsmount in dentry_open()Christoph Hellwig2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure no-one calls dentry_open with a NULL vfsmount argument and crap out with a stacktrace otherwise. A NULL file->f_vfsmnt has always been problematic, but with the per-mount r/o tracking we can't accept anymore at all. [AV] the last place that passed NULL had been eliminated by the previous patch (reiserfs xattr stuff) Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] reiserfs: eliminate private use of struct file in xattrJeff Mahoney2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After several posts and bug reports regarding interaction with the NULL nameidata, here's a patch to clean up the mess with struct file in the reiserfs xattr code. As observed in several of the posts, there's really no need for struct file to exist in the xattr code. It was really only passed around due to the f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it. reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the struct file passed to it, and the xattr code uses a private version of reiserfs_readdir() to enumerate the xattr directories. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] sanitize hppfsAl Viro2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * hppfs_iget() and its users are racy; there's no need to pollute icache anyway, new_inode() works fine and is safe, unlike the current kludges (these relied on overwriting ->i_ino before another iget_locked() gets to that one - and did it after unlocking). * merge hppfs_iget()/init_inode()/hppfs_read_inode(), while we are at it. * to pass proper vfsmount to dentry_open() store the reference in hppfs superblock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --
| * hppfs pass vfsmount to dentry_open()Dave Hansen2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here's patch for hppfs that uses vfs_kern_mount to make sure it always has a procfs instance and passed the vfsmount on through the inode private data. Also fixes a procfs file_system_type leak for every attempted hppfs mount. [ jdike - gave this file a style workover, plus deleted hppfs_dentry_ops ] Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] restore export of do_kern_mount()Al Viro2008-03-17
| | | | | | | | | | | | | | | | | | vfs_kern_mount() requires having a reference to fs type, which makes it impossible for module to create procfs, etc. private mount. Open-coding is not an option, since e.g. put_filesystem() is _not_ exported, and for a good reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | driver core: debug for bad dev_attr_show() return value.Andrew Morton2008-03-25
| | | | | | | | | | | | | | | | | | | | Try to find the culprit who caused http://bugzilla.kernel.org/show_bug.cgi?id=10150 Cc: <balajirrao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2008-03-22
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix mem leak on dfs referral [CIFS] file create with acl support enabled is slow [CIFS] Fix mtime on cp -p when file data cached but written out too late [CIFS] Fix build problem [CIFS] cifs: replace remaining __FUNCTION__ occurrences [CIFS] DFS patch that connects inode with dfs handling ops
| * | [CIFS] Fix mem leak on dfs referralSteve French2008-03-22
| | | | | | | | | | | | | | | Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | [CIFS] file create with acl support enabled is slowSteve French2008-03-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shirish Pargaonkar noted: With cifsacl mount option, when a file is created on the Windows server, exclusive oplock is broken right away because the get cifs acl code again opens the file to obtain security descriptor. The client does not have the newly created file handle or inode in any of its lists yet so it does not respond to oplock break and server waits for its duration and then responds to the second open. This slows down file creation signficantly. The fix is to pass the file descriptor to the get cifsacl code wherever available so that get cifs acl code does not send second open (NT Create ANDX) and oplock is not broken. CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6Steve French2008-03-14
| |\ \
| * | | [CIFS] Fix mtime on cp -p when file data cached but written out too lateSteve French2008-03-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kukks noticed that cp -p can write out file data too late, after the timestamp is already set. This was introduced as an unintentional sideeffect of the change in an earlier patch (see below) which fixed some delayed return code propagation. cea218054ad277d6c126890213afde07b4eb1602 Author: Jeff Layton <jlayton@redhat.com> Date: Tue Nov 20 23:19:03 2007 +0000 Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | [CIFS] Fix build problemSteve French2008-03-11
| | | | | | | | | | | | | | | | Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6Steve French2008-03-11
| |\ \ \
| * | | | [CIFS] cifs: replace remaining __FUNCTION__ occurrencesHarvey Harrison2008-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | [CIFS] DFS patch that connects inode with dfs handling opsIgor Mammedov2008-03-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if DFS junction point Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | | | Change pagemap output format to allow for future reporting of huge pagesHans Rosenfeld2008-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pagemap output format to allow for future reporting of huge pages. (Format comment and minor cleanups: mpm@selenic.com) Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-03-21
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits) [NET] ifb: set separate lockdep classes for queue locks [IPV6] KCONFIG: Fix description about IPV6_TUNNEL. [TCP]: Fix shrinking windows with window scaling netpoll: zap_completion_queue: adjust skb->users counter bridge: use time_before() in br_fdb_cleanup() [TG3]: Fix build warning on sparc32. MAINTAINERS: bluez-devel is subscribers-only audit: netlink socket can be auto-bound to pid other than current->pid (v2) [NET]: Fix permissions of /proc/net [SCTP]: Fix a race between module load and protosw access [NETFILTER]: ipt_recent: sanity check hit count [NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup() [RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning [IPV4]: esp_output() misannotations [8021Q]: vlan_dev misannotations xfrm: ->eth_proto is __be16 [IPV4]: ipv4_is_lbcast() misannotations [SUNRPC]: net/* NULL noise [SCTP]: fix misannotated __sctp_rcv_asconf_lookup() [PKT_SCHED]: annotate cls_u32 ...
| * | | | | [NET]: Fix permissions of /proc/netAndre Noll2008-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e9720ac ([NET]: Make /proc/net a symlink on /proc/self/net (v3)) broke ganglia and probably other applications that read /proc/net/dev. This is due to the change of permissions of /proc/net that was introduced in that commit. Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net This patch restores the permissions to the old value which makes ganglia happy again. Pavel Emelyanov says: This also broke the postfix, as it was reported in bug #10286 and described in detail by Benjamin. Signed-off-by: Andre Noll <maan@systemlinux.org> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge branch 'master' of ↵David S. Miller2008-03-18
| |\ \ \ \ \ | | | |_|_|/ | | |/| | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
| * | | | | [NET] endianness noise: INADDR_ANYAl Viro2008-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2008-03-20
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: nfs: don't ignore return value from nfs_pageio_add_request
| * | | | | | nfs: don't ignore return value from nfs_pageio_add_requestFred Isaman2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ignoring the return value from nfs_pageio_add_request can cause deadlocks. In read path: call nfs_pageio_add_request from readpage_async_filler assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original request. BUT, since return code is ignored, readpage_async_filler assumes it has been added, and does nothing further, leaving page locked. do_generic_mapping_read will eventually call lock_page, resulting in deadlock In write path: page is marked dirty by generic_perform_write nfs_writepages is called call nfs_pageio_add_request from nfs_page_async_flush assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original request, yet marking the request as locked (PG_BUSY) and in writeback, clearing dirty marks. The next time a write is done to the page, deadlock will result as nfs_write_end calls nfs_update_request Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | | | | fs/ufs/balloc.c: fix sparc64 printk warningAndrew Morton2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs/ufs/balloc.c: In function `ufs_change_blocknr': fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 2) fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 3) sector_t is u64 and we don't know what type the architecture uses to implement u64. Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | zisofs: fix readpage() outside i_sizeDave Young2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A read request outside i_size will be handled in do_generic_file_read(). So we just return 0 to avoid getting -EIO as normal reading, let do_generic_file_read do the rest. At the same time we need unlock the page to avoid system stuck. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227 Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Report-by: Christian Perle <chris@linuxinfotag.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | fs: fix kernel-doc notation warningsRandy Dunlap2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc notation warnings in fs/. Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line: * mark_files_ro Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line: * lookup_one_len: filesystem helper to lookup single pathname component Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line: * bh_uptodate_or_lock: Test whether the buffer is uptodate Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line: * bh_submit_read: Submit a locked buffer for reading Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line: * writeback_acquire: attempt to get exclusive writeback access to a device Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line: * writeback_in_progress: determine whether there is writeback in progress Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line: * writeback_release: relinquish exclusive writeback access against a device. Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line: * void journal_invalidatepage() Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | eCryptfs: Swap dput() and mntput()Michael Halcrow2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ecryptfs_d_release() is doing a mntput before doing the dput. This patch moves the dput before the mntput. Thanks to Rajouri Jammu for reporting this. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Rajouri Jammu <rajouri.jammu@gmail.com> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | jbd2: correctly unescape journal data blocksDuane Griffin2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a long-standing typo (predating git) that will cause data corruption if a journal data block needs unescaping. At the moment the wrong buffer head's data is being unescaped. To test this case mount a filesystem with data=journal, start creating and deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then pull the plug on the device. Without this patch the files will contain zeros instead of the correct data after recovery. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | jbd: correctly unescape journal data blocksDuane Griffin2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a long-standing typo (predating git) that will cause data corruption if a journal data block needs unescaping. At the moment the wrong buffer head's data is being unescaped. To test this case mount a filesystem with data=journal, start creating and deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then pull the plug on the device. Without this patch the files will contain zeros instead of the correct data after recovery. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | ROMFS: Fix up an error in iget removalDavid Howells2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up an error in iget removal in which romfs_lookup() making a successful call to romfs_iget() continues through the negative/error handling (previously the successful case jumped around the negative/error handling case): (1) inode is initialised to NULL at the top of the function, eliminating the need for specific negative-inode handling. This means the positive success handling now flows straight through. (2) Rename the labels to be clearer about what they mean. Also make romfs_lookup()'s result variable of type long so as to avoid 32-bit/64-bit conversions with PTR_ERR() and friends. Based upon a report and patch from Adam Richter. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Adam J. Richter" <adam@yggdrasil.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | ext3: fix wrong gfp type under transactionJosef Bacik2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several places where we make allocations with GFP_KERNEL while under a transaction, which could lead to an assertion panic or lockup if under memory pressure. This patch switches these problem areas to use GFP_NOFS to keep these problems from happening. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | quota: add possibly missing iput() when quotaon and quotaoff racesJan Kara2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should always put inode we have reference to, even if quota was reenabled in the mean time. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | jbd: fix jbd kernel-doc notationRandy Dunlap2008-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc notation in jbd. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | aio: bad AIO race in aio_complete() leads to process hangQuentin Barnes2008-03-19
|/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My group ran into a AIO process hang on a 2.6.24 kernel with the process sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come and it never would. We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box ("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't shared between all eight processors, but is L2 is shared between between all two processors on the Core2Duo we use. My analysis of the hang is if you go down to the second while-loop in read_events(), what happens on processor #1: 1) add_wait_queue_exclusive() adds thread to ctx->wait 2) aio_read_evt() to check tail 3) if aio_read_evt() returned 0, call [io_]schedule() and sleep In aio_complete() with processor #2: A) info->tail = tail; B) waitqueue_active(&ctx->wait) C) if waitqueue_active() returned non-0, call wake_up() The way the code is written, step 1 must be seen by all other processors before processor 1 checks for pending events in step 2 (that were recorded by step A) and step A by processor 2 must be seen by all other processors (checked in step 2) before step B is done. The race I believed I was seeing is that steps 1 and 2 were effectively swapped due to the __list_add() being delayed by the L2 cache not shared by some of the other processors. Imagine: proc 2: just before step A proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet proc 1, step 2: checks tail and sees no pending events proc 2, step A: updates tail proc 1, step 3: calls [io_]schedule() and sleeps proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup so proc 1 sleeps indefinitely My patch adds a memory barrier between steps A and B. It ensures that the update in step 1 gets seen on processor 2 before continuing. If processor 1 was just before step 1, the memory barrier makes sure that step A (update tail) gets seen by the time processor 1 makes it to step 2 (check tail). Before the patch our AIO process would hang virtually 100% of the time. After the patch, we have yet to see the process ever hang. Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com> Reviewed-by: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ We should probably disallow that "if (waitqueue_active()) wake_up()" coding pattern, because it's so often buggy wrt memory ordering ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>