aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
...
| * epoll: fix spurious lockdep warningsNelson Elhage2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/890952 commit d8805e633e054c816c47cb6e727c81f156d9253d upstream. epoll can acquire recursively acquire ep->mtx on multiple "struct eventpoll"s at once in the case where one epoll fd is monitoring another epoll fd. This is perfectly OK, since we're careful about the lock ordering, but it causes spurious lockdep warnings. Annotate the recursion using mutex_lock_nested, and add a comment explaining the nesting rules for good measure. Recent versions of systemd are triggering this, and it can also be demonstrated with the following trivial test program: --------------------8<-------------------- int main(void) { int e1, e2; struct epoll_event evt = { .events = EPOLLIN }; e1 = epoll_create1(0); e2 = epoll_create1(0); epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt); return 0; } --------------------8<-------------------- Reported-by: Paul Bolle <pebolle@tiscali.nl> Tested-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Nelson Elhage <nelhage@nelhage.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * CIFS: Fix DFS handling in cifs_get_file_infoPavel Shilovsky2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/890952 commit 42274bb22afc3e877ae5abed787b0b09d7dede52 upstream. We should call cifs_all_info_to_fattr in rc == 0 case only. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * CIFS: Fix incorrect max RFC1002 write size valuePavel Shilovsky2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/890952 commit 94443f43404239c2a6dc4252a7cb9e77f5b1eb6e upstream. ..the length field has only 17 bits. Acked-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * cifs, freezer: add wait_event_freezekillable and have cifs use itJeff Layton2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/24330 CIFS currently uses wait_event_killable to put tasks to sleep while they await replies from the server. That function though does not allow the freezer to run. In many cases, the network interface may be going down anyway, in which case the reply will never come. The client then ends up blocking the computer from suspending. Fix this by adding a new wait_event_freezable variant -- wait_event_freezekillable. The idea is to combine the behavior of wait_event_killable and wait_event_freezable -- put the task to sleep and only allow it to be awoken by fatal signals, but also allow the freezer to do its job. Signed-off-by: Jeff Layton <jlayton@redhat.com> (cherry picked from commit f06ac72e929115f2772c29727152ba0832d641e4) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * xfs: revert to using a kthread for AIL pushingChristoph Hellwig2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit 0030807c66f058230bcb20d2573bcaf28852e804 upstream Currently we have a few issues with the way the workqueue code is used to implement AIL pushing: - it accidentally uses the same workqueue as the syncer action, and thus can be prevented from running if there are enough sync actions active in the system. - it doesn't use the HIGHPRI flag to queue at the head of the queue of work items At this point I'm not confident enough in getting all the workqueue flags and tweaks right to provide a perfectly reliable execution context for AIL pushing, which is the most important piece in XFS to make forward progress when the log fills. Revert back to use a kthread per filesystem which fixes all the above issues at the cost of having a task struct and stack around for each mounted filesystem. In addition this also gives us much better ways to diagnose any issues involving hung AIL pushing and removes a small amount of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Tested-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * xfs: force the log if we encounter pinned buffers in .iop_pushbufChristoph Hellwig2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit 17b38471c3c07a49f0bbc2ecc2e92050c164e226 upstream We need to check for pinned buffers even in .iop_pushbuf given that inode items flush into the same buffers that may be pinned directly due operations on the unlinked inode list operating directly on buffers. To do this add a return value to .iop_pushbuf that tells the AIL push about this and use the existing log force mechanisms to unpin it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Tested-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * xfs: do not update xa_last_pushed_lsn for locked itemsChristoph Hellwig2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit bc6e588a8971aa74c02e42db4d6e0248679f3738 upstream If an item was locked we should not update xa_last_pushed_lsn and thus skip it when restarting the AIL scan as we need to be able to lock and write it out as soon as possible. Otherwise heavy lock contention might starve AIL pushing too easily, especially given the larger backoff once we moved xa_last_pushed_lsn all the way to the target lsn. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Tested-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * xfs: use a cursor for bulk AIL insertionDave Chinner2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit 1d8c95a363bf8cd4d4182dd19c01693b635311c2 upstream xfs: use a cursor for bulk AIL insertion Delayed logging can insert tens of thousands of log items into the AIL at the same LSN. When the committing of log commit records occur, we can get insertions occurring at an LSN that is not at the end of the AIL. If there are thousands of items in the AIL on the tail LSN, each insertion has to walk the AIL to find the correct place to insert the new item into the AIL. This can consume large amounts of CPU time and block other operations from occurring while the traversals are in progress. To avoid this repeated walk, use a AIL cursor to record where we should be inserting the new items into the AIL without having to repeat the walk. The cursor infrastructure already provides this functionality for push walks, so is a simple extension of existing code. While this will not avoid the initial walk, it will avoid repeating it tens of thousands of times during a single checkpoint commit. This version includes logic improvements from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * xfs: start periodic workers laterChristoph Hellwig2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit 2bcf6e970f5a88fa05dced5eeb0326e13d93c4a1 upstream Start the periodic sync workers only after we have finished xfs_mountfs and thus fully set up the filesystem structures. Without this we can call into xfs_qm_sync before the quotainfo strucute is set up if the mount takes unusually long, and probably hit other incomplete states as well. Also clean up the xfs_fs_fill_super error path by using consistent label names, and removing an impossible to reach case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * CIFS: Fix ERR_PTR dereference in cifs_get_rootPavel Shilovsky2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit 5b980b01212199833ee8023770fa4cbf1b85e9f4 upstream. move it to the beginning of the loop. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * fuse: fix memory leakMiklos Szeredi2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit 5dfcc87fd79dfb96ed155b524337dbd0da4f5993 upstream. kmemleak is reporting that 32 bytes are being leaked by FUSE: unreferenced object 0xe373b270 (size 32): comm "fusermount", pid 1207, jiffies 4294707026 (age 2675.187s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<b05517d7>] kmemleak_alloc+0x27/0x50 [<b0196435>] kmem_cache_alloc+0xc5/0x180 [<b02455be>] fuse_alloc_forget+0x1e/0x20 [<b0245670>] fuse_alloc_inode+0xb0/0xd0 [<b01b1a8c>] alloc_inode+0x1c/0x80 [<b01b290f>] iget5_locked+0x8f/0x1a0 [<b0246022>] fuse_iget+0x72/0x1a0 [<b02461da>] fuse_get_root_inode+0x8a/0x90 [<b02465cf>] fuse_fill_super+0x3ef/0x590 [<b019e56f>] mount_nodev+0x3f/0x90 [<b0244e95>] fuse_mount+0x15/0x20 [<b019d1bc>] mount_fs+0x1c/0xc0 [<b01b5811>] vfs_kern_mount+0x41/0x90 [<b01b5af9>] do_kern_mount+0x39/0xd0 [<b01b7585>] do_mount+0x2e5/0x660 [<b01b7966>] sys_mount+0x66/0xa0 This leak report is consistent and happens once per boot on 3.1.0-rc5-dirty. This happens if a FORGET request is queued after the fuse device was released. Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * exec: do not call request_module() twice from search_binary_handler()Tetsuo Handa2011-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/881420 commit 912193521b719fbfc2f16776febf5232fe8ba261 upstream. Currently, search_binary_handler() tries to load binary loader module using request_module() if a loader for the requested program is not yet loaded. But second attempt of request_module() does not affect the result of search_binary_handler(). If request_module() triggered recursion, calling request_module() twice causes 2 to the power of MAX_KMOD_CONCURRENT (= 50) repetitions. It is not an infinite loop but is sufficient for users to consider as a hang up. Therefore, this patch changes not to call request_module() twice, making 1 to the power of MAX_KMOD_CONCURRENT repetitions in case of recursion. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Maxim Uvarov <muvarov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * btrfs: fix d_off in the first direntHidetoshi Seto2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 3765fefaee2da83f10829fa64a74e6b7360350cb upstream. Since the d_off in the first dirent for "." (that originates from the 4th argument "offset" of filldir() for the 2nd dirent for "..") is wrongly assigned in btrfs_real_readdir(), telldir returns same offset for different locations. | # mkfs.btrfs /dev/sdb1 | # mount /dev/sdb1 fs0 | # cd fs0 | # touch file0 file1 | # ../test | telldir: 0 | readdir: d_off = 2, d_name = "." | telldir: 2 | readdir: d_off = 2, d_name = ".." | telldir: 2 | readdir: d_off = 3, d_name = "file0" | telldir: 3 | readdir: d_off = 2147483647, d_name = "file1" | telldir: 2147483647 To fix this problem, pass filp->f_pos (which is loff_t) instead. | # ../test | telldir: 0 | readdir: d_off = 1, d_name = "." | telldir: 1 | readdir: d_off = 2, d_name = ".." | telldir: 2 | readdir: d_off = 3, d_name = "file0" : At the moment the "offset" for "." is unused because there is no preceding dirent, however it is better to pass filp->f_pos to follow grammatical usage. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Cc: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * writeback: update dirtied_when for synced inode to prevent livelockWu Fengguang2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 94c3dcbb0b0cdfd82cedd21705424d8044edc42c upstream. Explicitly update .dirtied_when on synced inodes, so that they are no longer considered for writeback in the next round. It can prevent both of the following livelock schemes: - while true; do echo data >> f; done - while true; do touch f; done (in theory) The exact livelock condition is, during sync(1): (1) no new inodes are dirtied (2) an inode being actively dirtied On (2), the inode will be tagged and synced with .nr_to_write=LONG_MAX. When finished, it will be redirty_tail()ed because it's still dirty and (.nr_to_write > 0). redirty_tail() won't update its ->dirtied_when on condition (1). The sync work will then revisit it on the next queue_io() and find it eligible again because its old ->dirtied_when predates the sync work start time. We'll do more aggressive "keep writeback as long as we wrote something" logic in wb_writeback(). The "use LONG_MAX .nr_to_write" trick in commit b9543dac5bbc ("writeback: avoid livelocking WB_SYNC_ALL writeback") will no longer be enough to stop sync livelock. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stageWu Fengguang2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 6e6938b6d3130305a5960c86b1a9b21e58cf6144 upstream. sync(2) is performed in two stages: the WB_SYNC_NONE sync and the WB_SYNC_ALL sync. Identify the first stage with .tagged_writepages and do livelock prevention for it, too. Jan's commit f446daaea9 ("mm: implement writeback livelock avoidance using page tagging") is a partial fix in that it only fixed the WB_SYNC_ALL phase livelock. Although ext4 is tested to no longer livelock with commit f446daaea9, it may due to some "redirty_tail() after pages_skipped" effect which is by no means a guarantee for _all_ the file systems. Note that writeback_inodes_sb() is called by not only sync(), they are treated the same because the other callers also need livelock prevention. Impact: It changes the order in which pages/inodes are synced to disk. Now in the WB_SYNC_NONE stage, it won't proceed to write the next inode until finished with the current inode. Acked-by: Jan Kara <jack@suse.cz> CC: Dave Chinner <david@fromorbit.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * teach /proc/$pid/numa_maps about transparent hugepagesDave Hansen2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 32ef43848f283e0ef945d3c67e851c143fea3970 upstream. This is modeled after the smaps code. It detects transparent hugepages and then does a single gather_stats() for the page as a whole. This has two benifits: 1. It is more efficient since it does many pages in a single shot. 2. It does not have to break down the huge page. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * break out numa_maps gather_pte_stats() checksDave Hansen2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 3200a8aaab0c9ccdc0f59b0dac2d4a47029137fa upstream. gather_pte_stats() does a number of checks on a target page to see whether it should even be considered for statistics. This breaks that code out in to a separate function so that we can use it in the transparent hugepage case in the next patch. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Christoph Lameter <cl@gentwo.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * make /proc/$pid/numa_maps gather_stats() take variable page sizeDave Hansen2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit eb4866d0066ffd5446751c102d64feb3318d8bd1 upstream. We need to teach the numa_maps code about transparent huge pages. The first step is to teach gather_stats() that the pte it is dealing with might represent more than one page. Note that will we use this in a moment for transparent huge pages since they have use a single pmd_t which _acts_ as a "surrogate" for a bunch of smaller pte_t's. I'm a _bit_ unhappy that this interface counts in hugetlbfs page sizes for hugetlbfs pages and PAGE_SIZE for normal pages. That means that to figure out how many _bytes_ "dirty=1" means, you must first know the hugetlbfs page size. That's easier said than done especially if you don't have visibility in to the mount. But, that's probably a discussion for another day especially since it would change behavior to fix it. But, just in case anyone wonders why this patch only passes a '1' in the hugetlb case... Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * Fix the conflict between rwpidforward and rw mount optionsSteve French2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit c9c7fa0064f4afe1d040e72f24c2256dd8ac402d upstream. Both these options are started with "rw" - that's why the first one isn't switched on even if it is specified. Fix this by adding a length check for "rw" option check. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * restore pinning the victim dentry in vfs_rmdir()/vfs_rename_dir()Al Viro2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 1d2ef5901483004d74947bbf78d5146c24038fe7 upstream. We used to get the victim pinned by dentry_unhash() prior to commit 64252c75a219 ("vfs: remove dget() from dentry_unhash()") and ->rmdir() and ->rename() instances relied on that; most of them don't care, but ones that used d_delete() themselves do. As the result, we are getting rmdir() oopses on NFS now. Just grab the reference before locking the victim and drop it explicitly after unlocking, same as vfs_rename_other() does. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: Use protocol-defined value for lock/getlock 'type' field.Jim Garlick2011-10-17
| | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 51b8b4fb32271d39fbdd760397406177b2b0fd36 upstream. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: Always ask new inode in lookup for cache mode disabledAneesh Kumar K.V2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 73f507171cfa407b19f254aef95cbb058c8180cf upstream. This make sure we don't end up reusing the unlinked inode object. The ideal way is to use inode i_generation. But i_generation is not available in userspace always. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: Add OS dependent open flags in 9p protocolAneesh Kumar K.V2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit f88657ce3f9713a0c62101dffb0e972a979e77b9 upstream. Some of the flags are OS/arch dependent we add a 9p protocol value which maps to asm-generic/fcntl.h values in Linux Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> [extra comments from author as to why this needs to go to stable: Earlier for different operation such as open we used the values of open flag as defined by the OS. But some of these flags such as O_DIRECT are arch dependent. So if we have the 9p client and server running on different architectures, we end up with client sending client architecture value of these open flag and server will try to map these values to what its architecture states. For ex: O_DIRECT on a x86 client maps to #define O_DIRECT 00040000 Where as on sparc server it will maps to #define O_DIRECT 0x100000 Hence we need to map these open flags to OS/arch independent flag values. Getting these changes to an early version of kernel ensures us that we work with different combination of client and server. We should ideally backport this patch to all possible kernel version.] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: Don't update file type when updating file attributesAneesh Kumar K.V2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 45089142b1497dab2327d60f6c71c40766fc3ea4 upstream. We should only update attributes that we can change on stat2inode. Also do file type initialization in v9fs_init_inode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: Add fid before dentry instantiationAneesh Kumar K.V2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 5441ae5eb3614d3c28f77073370738a2820c88e4 upstream. d_instantiate marks the dentry positive. So a parallel lookup and mkdir of the directory can find dentry that doesn't have fid attached. This can result in both the code path doing v9fs_fid_add which results in v9fs_dentry leak. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * 9p: close ACL leaksAl Viro2011-10-17
| | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 1ec95bf34d976b38897d1977b155a544d77b05e7 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: Always ask new inode in createAneesh Kumar K.V2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit ed80fcfac2565fa866d93ba14f0e75de17a8223e upstream. This make sure we don't end up reusing the unlinked inode object. The ideal way is to use inode i_generation. But i_generation is not available in userspace always. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: Fix invalid mount options/argsPrem Karat2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit a2dd43bb0d7b9ce28f8a39254c25840c0730498e upstream. Without this fix, if any invalid mount options/args are passed while mouting the 9p fs, no error (-EINVAL) is returned and default arg value is assigned. This fix returns -EINVAL when an invalid arguement is found while parsing mount options. Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * fs/9p: When doing inode lookup compare qid details and inode mode bits.Aneesh Kumar K.V2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit fd2421f54423f307ecd31bdebdca6bc317e0c492 upstream. This make sure we don't use wrong inode from the inode hash. The inode number of the file deleted is reused by the next file system object created and if we only use inode number for inode hash lookup we could end up with wrong struct inode. Also compare inode generation number. Not all Linux file system provide st_gen in userspace. So it could be 0; Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * Avoid dereferencing a 'request_queue' after last close.NeilBrown2011-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/868628 commit 94007751bb02797ba87bac7aacee2731ac2039a3 upstream. On the last close of an 'md' device which as been stopped, the device is destroyed and in particular the request_queue is freed. The free is done in a separate thread so it might happen a short time later. __blkdev_put calls bdev_inode_switch_bdi *after* ->release has been called. Since commit f758eeabeb96f878c860e8f110f94ec8820822a9 bdev_inode_switch_bdi will dereference the 'old' bdi, which lives inside a request_queue, to get a spin lock. This causes the last close on an md device to sometime take a spin_lock which lives in freed memory - which results in an oops. So move the called to bdev_inode_switch_bdi before the call to ->release. Cc: Christoph Hellwig <hch@lst.de> Cc: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * hfsplus: Fix kfree of wrong pointers in hfsplus_fill_super() error pathSeth Forshee2011-09-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/854987 Commit 6596528e391a ("hfsplus: ensure bio requests are not smaller than the hardware sectors") changed the pointers used for volume header allocations but failed to free the correct pointers in the error path path of hfsplus_fill_super() and hfsplus_read_wrapper. The second hunk came from a separate patch by Pavel Ivanov. Reported-by: Pavel Ivanov <paivanof@gmail.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit f588c960fcaa6fa8bf82930bb819c9aca4eb9347) Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * cifs: fix possible memory corruption in CIFSFindNext, CVE-2011-3191Jeff Layton2011-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The name_len variable in CIFSFindNext is a signed int that gets set to the resume_name_len in the cifs_search_info. The resume_name_len however is unsigned and for some infolevels is populated directly from a 32 bit value sent by the server. If the server sends a very large value for this, then that value could look negative when converted to a signed int. That would make that value pass the PATH_MAX check later in CIFSFindNext. The name_len would then be used as a length value for a memcpy. It would then be treated as unsigned again, and the memcpy scribbles over a ton of memory. Fix this by making the name_len an unsigned value in CIFSFindNext. Cc: <stable@kernel.org> Reported-by: Darren Lavender <dcl@hppine99.gbr.hp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> (cherry-picked from commit c32dfffaf59f73bbcf4472141b851a4dc5db2bf0 cifs-2.6.git) CVE-2011-3191 BugLink: http://bugs.launchpad.net/bugs/834135 Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * UBUNTU: SAUCE: seccomp_filter: add process state reportingWill Drewry2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds seccomp and seccomp_filter status reporting to proc. /proc/<pid>/seccomp_filter provides the current seccomp mode and the list of allowed or dynamically filtered system calls. v9: rebase on to bccaeafd7c117acee36e90d37c7e05c19be9e7bf v8: - v7: emit seccomp mode directly v6: - v5: fix typos when mailing the wrong patch series v4: move from rcu guard to mutex guard v3: changed to using filters directly. v2: removed status entry, added seccomp file. (requested by kosaki.motohiro@jp.fujitsu.com) allowed S_IRUGO reading of entries (requested by viro@zeniv.linux.org.uk) added flags got rid of the seccomp_t type dropped seccomp file Signed-off-by: Will Drewry <wad@chromium.org> BUG=chromium-os:14496 TEST=see others in series Change-Id: I86515e56d2da3b3c22ac90856a6ffa71a045c253 Reviewed-on: http://gerrit.chromium.org/gerrit/3242 Reviewed-by: Sonny Rao <sonnyrao@chromium.org> Tested-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * eCryptfs: fix compile errorRoberto Sassu2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the compile error reported at the address: https://bugzilla.kernel.org/show_bug.cgi?id=40292 The problem arises when compiling eCryptfs as built-in and the 'encrypted' key type as a module. The patch prevents this combination from being set in the kernel configuration, by fixing the eCryptfs dependencies. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Reported-by: David Hill <hilld@binarystorm.net> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> (cherry picked from commit 4b6fee17b1758391281ddf5b00328035573f8be1) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * eCryptfs: Fix payload_len unitialized variable warningTyler Hicks2011-08-30
| | | | | | | | | | | | | | | | | | | | | | fs/ecryptfs/keystore.c: In function ‘ecryptfs_generate_key_packet_set’: fs/ecryptfs/keystore.c:1991:28: warning: ‘payload_len’ may be used uninitialized in this function [-Wuninitialized] fs/ecryptfs/keystore.c:1976:9: note: ‘payload_len’ was declared here Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> (cherry picked from commit 99b373ff2d1246f64b97a3d449a2fd6018d504e6) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * hfsplus: ensure bio requests are not smaller than the hardware sectorsSeth Forshee2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently all bio requests are 512 bytes, which may fail for media whose physical sector size is larger than this. Ensure these requests are not smaller than the block device logical block size. BugLink: http://bugs.launchpad.net/bugs/734883 Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de> (cherry picked from commit 6596528e391ad978a6a120142cba97a1d7324cb6) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * UBUNTU: SAUCE: (no-up) vfs: Add a trace point in the mark_inode_dirty functionArjan van de Ven2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [apw@canonical.com: This has no upstream traction but is used by powertop, so its worth carrying.] PowerTOP would like to be able to show who is keeping the disk busy by dirtying data. The most logical spot for this is in the vfs in the mark_inode_dirty() function. Doing this on the block level is not possible because by the time the IO hits the block layer the guilty party can no longer be found ("kjournald" and "pdflush" are not useful answers to "who caused this file to be dirty). The trace point follows the same logic/style as the block_dump code and pretty much dumps the same data, just not to dmesg (and thus to /var/log/messages) but via the trace events streams. Note: This patch was posted to lkml and might potentially go into 2.6.33 but I have not seen which maintainer will take it. Signed-of-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Amit Kucheria <amit.kucheria@canonical.com> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- ovl: make lower mount read-onlyMiklos Szeredi2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a file only existing on the lower fs is operned for O_RDONLY and fchmod/fchown/etc is performed on the open file then this will modify the lower fs, which is not what we want. Copying up at this point is not possible. The best solution is to return an error for this corner case and hope applications are not relying on it. Reported-by: "J. R. Okajima" <hooanon05@yahoo.co.jp> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- fs: limit filesystem stacking depthMiklos Szeredi2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a simple read-only counter to super_block that indicates deep this is in the stack of filesystems. Previously ecryptfs was the only stackable filesystem and it explicitly disallowed multiple layers of itself. Overlayfs, however, can be stacked recursively and also may be stacked on top of ecryptfs or vice versa. To limit the kernel stack usage we must limit the depth of the filesystem stack. Initially the limit is set to 2. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- ovl: improve stack use of lookup and readdirMiklos Szeredi2011-08-30
| | | | | | | | | | Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- ovl: fix overlayfs over overlayfsMiklos Szeredi2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlayfs expects ->permission() to not check for readonliness (which is normally checked by the VFS) and so not return with -EROFS. This is not true of some filesystems, notably overlayfs itself. The following patch should fix this by making sure that if the upper layer is read-only (such as squashfs) then it will mark overlayfs read-only too and by making ovl_permission() only return EROFS in the excpetional case where the upper filesystem became r/o after the overlay was constructed. Reported-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- overlayfs: implement show_optionsErez Zadok2011-08-30
| | | | | | | | | | | | | | | | | | | | This is useful because of the stacking nature of overlayfs. Users like to find out (via /proc/mounts) which lower/upper directory were used at mount time. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- overlayfs: add statfs supportAndy Whitcroft2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for statfs to the overlayfs filesystem. As the upper layer is the target of all write operations assume that the space in that filesystem is the space in the overlayfs. There will be some inaccuracy as overwriting a file will copy it up and consume space we were not expecting, but it is better than nothing. Use the upper layer dentry and mount from the overlayfs root inode, passing the statfs call to that filesystem. Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * UBUNTU: ubuntu: overlayfs -- overlay filesystemMiklos Szeredi2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlayfs allows one, usually read-write, directory tree to be overlaid onto another, read-only directory tree. All modifications go to the upper, writable layer. This type of mechanism is most often used for live CDs but there's a wide variety of other uses. The implementation differs from other "union filesystem" implementations in that after a file is opened all operations go directly to the underlying, lower or upper, filesystems. This simplifies the implementation and allows native performance in these cases. The dentry tree is duplicated from the underlying filesystems, this enables fast cached lookups without adding special support into the VFS. This uses slightly more memory than union mounts, but dentries are relatively small. Currently inodes are duplicated as well, but it is a possible optimization to share inodes for non-directories. Opening non directories results in the open forwarded to the underlying filesystem. This makes the behavior very similar to union mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file descriptors). Usage: mount -t overlay -olowerdir=/lower,upperdir=/upper overlay /mnt Supported: - all operations Missing: - ensure that filesystems part of the overlay are not modified outside the overlay The following cotributions have been folded into this patch: Neil Brown <neilb@suse.de>: - minimal remount support - use correct seek function for directories - initialise is_real before use - rename ovl_fill_cache to ovl_dir_read Felix Fietkau <nbd@openwrt.org>: - fix a deadlock in ovl_dir_read_merged - fix a deadlock in ovl_remove_whiteouts Erez Zadok <ezk@fsl.cs.sunysb.edu> - fix cleanup after WARN_ON Also thanks to the following people for testing and reporting bugs: Jordi Pujol <jordipujolp@gmail.com> Andy Whitcroft <apw@canonical.com> Michal Suchanek <hramrach@centrum.cz> Felix Fietkau <nbd@openwrt.org> Erez Zadok <ezk@fsl.cs.sunysb.edu> Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- vfs: introduce clone_private_mount()Miklos Szeredi2011-08-30
| | | | | | | | | | | | | | | | Overlayfs needs a private clone of the mount, so create a function for this and export to modules. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- vfs: export do_splice_direct() to modulesMiklos Szeredi2011-08-30
| | | | | | | | | | | | | | Export do_splice_direct() to modules. Needed by overlay filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: overlayfs -- vfs: add i_op->open()Miklos Szeredi2011-08-30
| | | | | | | | | | | | | | | | | | Add a new inode operation i_op->open(). This is for stacked filesystems that want to return a struct file from a different filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: nx-emu - i386: NX emulationIngo Molnar2011-08-30
| | | | | | | | | | | | | | | | | | | | | | This is old code with some cruft, all originally by Ingo Molnar with much later rebasing by Fedora folks and at least one arcane fix by Roland McGrath a few years ago. No longer uses exec-shield sysctl, merged with disable_nx. Kees Cook fixed boottime NX reporting for various corner cases. Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
| * UBUNTU: ubuntu: AUFS -- aufs2-standalone.patch aufs2.1-39Andy Whitcroft2011-08-30
| | | | | | | | Signed-off-by: Andy Whitcroft <apw@canonical.com>
| * UBUNTU: ubuntu: AUFS -- aufs2-base.patch aufs2.1-39Andy Whitcroft2011-08-30
| | | | | | | | Signed-off-by: Andy Whitcroft <apw@canonical.com>