aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAge
* Apply k4412 kernel from HardKernel for ODROID-X.Christopher Kenna2012-09-28
|
* nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton2012-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: compaction: introduce sync-light migration for use by compactionMel Gorman2012-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream. Stable note: Not tracked in Buzilla. This was part of a series that reduced interactivity stalls experienced when THP was enabled. These stalls were particularly noticable when copying data to a USB stick but the experiences for users varied a lot. This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT mode that avoids writing back pages to backing storage. Async compaction maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is used. This avoids sync compaction stalling for an excessive length of time, particularly when copying files to a USB stick where there might be a large number of dirty pages backed by a filesystem that does not support ->writepages. [aarcange@redhat.com: This patch is heavily based on Andrea's work] [akpm@linux-foundation.org: fix fs/nfs/write.c build] [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: compaction: determine if dirty pages can be migrated without blocking ↵Mel Gorman2012-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | within ->migratepage commit b969c4ab9f182a6e1b2a0848be349f99714947b0 upstream. Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging information by reducing LRU list churning had the side-effect of reducing THP allocation success rates. This was part of a series to restore the success rates while preserving the reclaim fix. Asynchronous compaction is used when allocating transparent hugepages to avoid blocking for long periods of time. Due to reports of stalling, there was a debate on disabling synchronous compaction but this severely impacted allocation success rates. Part of the reason was that many dirty pages are skipped in asynchronous compaction by the following check; if (PageDirty(page) && !sync && mapping->a_ops->migratepage != migrate_page) rc = -EBUSY; This skips over all mapping aops using buffer_migrate_page() even though it is possible to migrate some of these pages without blocking. This patch updates the ->migratepage callback with a "sync" parameter. It is the responsibility of the callback to fail gracefully if migration would block. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIOTrond Myklebust2012-06-09
| | | | | | | | | | | | | | | commit fb13bfa7e1bcfdcfdece47c24b62f1a1cad957e9 upstream. If a file OPEN is denied due to a share lock, the resulting NFS4ERR_SHARE_DENIED is currently mapped to the default EIO. This patch adds a more appropriate mapping, and brings Linux into line with what Solaris 10 does. See https://bugzilla.kernel.org/show_bug.cgi?id=43286 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Revalidate uid/gid after openJonathan Nieder2012-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a shorter (and more appropriate for stable kernels) analog to the following upstream commit: commit 6926afd1925a54a13684ebe05987868890665e2b Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Sat Jan 7 13:22:46 2012 -0500 NFSv4: Save the owner/group name string when doing open ...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Without this patch, logging into two different machines with home directories mounted over NFS4 and then running "vim" and typing ":q" in each reliably produces the following error on the second machine: E137: Viminfo file is not writable: /users/system/rtheys/.viminfo This regression was introduced by 80e52aced138 ("NFSv4: Don't do idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32 cycle) --- after the OPEN call, .viminfo has the default values for st_uid and st_gid (0xfffffffe) cached because we do not want to let rpciod wait for an idmapper upcall to fill them in. The fix used in mainline is to save the owner and group as strings and perform the upcall in _nfs4_proc_open outside the rpciod context, which takes about 600 lines. For stable, we can do something similar with a one-liner: make open check for the stale fields and make a (synchronous) GETATTR call to fill them when needed. Trond dictated the patch, I typed it in, and Rik tested it. Addresses http://bugs.debian.org/659111 and https://bugzilla.redhat.com/789298 Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Explained-by: David Flyn <davidf@rd.bbc.co.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Ensure that we check lock exclusive/shared type against open modesTrond Myklebust2012-05-07
| | | | | | | | | | | commit 55725513b5ef9d462aa3e18527658a0362aaae83 upstream. Since we may be simulating flock() locks using NFS byte range locks, we can't rely on the VFS having checked the file open mode for us. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Ensure that the LOCK code sets exception->inodeTrond Myklebust2012-05-07
| | | | | | | | | | | commit 05ffe24f5290dc095f98fbaf84afe51ef404ccc5 upstream. All callers of nfs4_handle_exception() that need to handle NFS4ERR_OPENMODE correctly should set exception->inode Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: Enclose hostname in brackets when needed in nfs_do_root_mountJan Kara2012-05-07
| | | | | | | | | | | | | | | | | | commit 98a2139f4f4d7b5fcc3a54c7fddbe88612abed20 upstream. When hostname contains colon (e.g. when it is an IPv6 address) it needs to be enclosed in brackets to make parsing of NFS device string possible. Fix nfs_do_root_mount() to enclose hostname properly when needed. NFS code actually does not need this as it does not parse the string passed by nfs_do_root_mount() but the device string is exposed to userspace in /proc/mounts. CC: Josh Boyer <jwboyer@redhat.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Return the delegation if the server returns NFS4ERR_OPENMODETrond Myklebust2012-04-02
| | | | | | | | | | | | | commit 3114ea7a24d3264c090556a2444fc6d2c06176d4 upstream. If a setattr() fails because of an NFS4ERR_OPENMODE error, it is probably due to us holding a read delegation. Ensure that the recovery routines return that delegation in this case. Reported-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Properly handle the case where the delegation is revokedTrond Myklebust2012-04-02
| | | | | | | | | | | | | | | | | | | commit a1d0b5eebc4fd6e0edb02688b35f17f67f42aea5 upstream. If we know that the delegation stateid is bad or revoked, we need to remove that delegation as soon as possible, and then mark all the stateids that relied on that delegation for recovery. We cannot use the delegation as part of the recovery process. Also note that NFSv4.1 uses a different error code (NFS4ERR_DELEG_REVOKED) to indicate that the delegation was revoked. Finally, ensure that setlk() and setattr() can both recover safely from a revoked delegation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEIDTrond Myklebust2012-02-29
| | | | | | | | | | | commit b9f9a03150969e4bd9967c20bce67c4de769058f upstream. To ensure that we don't just reuse the bad delegation when we attempt to recover the nfs4_state that received the bad stateid error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* pnfs-obj: Must return layout on IO errorBoaz Harrosh2012-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | commit fe0fe83585f88346557868a803a479dfaaa0688a upstream. As mandated by the standard. In case of an IO error, a pNFS objects layout driver must return it's layout. This is because all device errors are reported to the server as part of the layout return buffer. This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR is done, through a bit flag on the pnfs_layoutdriver_type->flags member. The flag is set by the layout driver that wants a layout_return preformed at pnfs_ld_{write,read}_done in case of an error. (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr because this code is never called outside of pnfs.c and pnfs IO paths) Without this patch 3.[0-2] Kernels leak memory and have an annoying WARN_ON after every IO error utilizing the pnfs-obj driver. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pnfs-obj: pNFS errors are communicated on iodata->pnfs_errorBoaz Harrosh2012-01-25
| | | | | | | | | | | | | | | | | | | commit 5c0b4129c07b902b27d3f3ebc087757f534a3abd upstream. Some time along the way pNFS IO errors were switched to communicate with a special iodata->pnfs_error member instead of the regular RPC members. But objlayout was not switched over. Fix that! Without this fix any IO error is hanged, because IO is not switched to MDS and pages are never cleared or read. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nfs: fix regression in handling of context= option in NFSv4Jeff Layton2012-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 8a0d551a59ac92d8ff048d6cb29d3a02073e81e8 upstream. Setting the security context of a NFSv4 mount via the context= mount option is currently broken. The NFSv4 codepath allocates a parsed options struct, and then parses the mount options to fill it. It eventually calls nfs4_remote_mount which calls security_init_mnt_opts. That clobbers the lsm_opts struct that was populated earlier. This bug also looks like it causes a small memory leak on each v4 mount where context= is used. Fix this by moving the initialization of the lsm_opts into nfs_alloc_parsed_mount_data. Also, add a destructor for nfs_parsed_mount_data to make it easier to free all of the allocations hanging off of it, and to ensure that the security_free_mnt_opts is called whenever security_init_mnt_opts is. I believe this regression was introduced quite some time ago, probably by commit c02d7adf. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSv4.1: fix backchannel slotid off-by-one bugAndy Adamson2012-01-25
| | | | | | | | | commit 61f2e5106582d02f30b6807e3f9c07463c572ccb upstream. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.Trond Myklebust2012-01-06
| | | | | | | | | | | commit 111d489f0fb431f4ae85d96851fbf8d3248c09d8 upstream. Currently, the code assumes that the SEQUENCE status bits are mutually exclusive. They are not... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFS: Prevent 3.0 from crashing if it receives a partial layoutTrond Myklebust2011-12-09
| | | | | | | | | | | | | | | This is a backport of critical parts of commit 7c24d9489f "NFSv4.1: File layout only supports whole file layouts" It prevents the file layout driver from (incorrectly) using partial layouts, but ignores the part of the referenced commmit that relies on additional machinery to change the LAYOUTGET request based on layout driver. Signed-off-by: Fred Isaman <iisaman@netapp.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nfs: when attempting to open a directory, fall back on normal lookup (try #5)Jeff Layton2011-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1788ea6e3b2a58cf4fb00206e362d9caff8d86a7 upstream. commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return from an OPEN call. Prior to that patch, that caused the client to fall back to doing a normal lookup. When that patch went in, the code began returning that error to userspace. The d_revalidate codepath however never had the corresponding change, so it was still possible to end up with a NULL ctx->state pointer after that. That patch caused a regression. When we attempt to open a directory that does not have a cached dentry, that open now errors out with EISDIR. If you attempt the same open with a cached dentry, it will succeed. Fix this by reverting the change in nfs_atomic_lookup and allowing attempts to open directories to fall back to a normal lookup Also, add a NFSv4-specific f_ops->open routine that just returns -ENOTDIR. This should never be called if things are working properly, but if it ever is, then the dprintk may help in debugging. To facilitate this, a new file_operations field is also added to the nfs_rpc_ops struct. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* VFS: Fix the remaining automounter semantics regressionsTrond Myklebust2011-11-11
| | | | | | | | | | | | | | | | | | | | commit 815d405ceff0d6964683f033e18b9b23a88fba87 upstream. The concensus seems to be that system calls such as stat() etc should not trigger an automount. Neither should the l* versions. This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups that _should_ trigger an automount on the last path element. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ Edited to leave out the cases that are already covered by LOOKUP_OPEN, LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally force automounting for their own reasons - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nfs: don't try to migrate pages with active requestsJeff Layton2011-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2da956523526e440ef4f4dd174e26f5ac06fe011 upstream. nfs_find_and_lock_request will take a reference to the nfs_page and will then put it if the req is already locked. It's possible though that the reference will be the last one. That put then can kick off a whole series of reference puts: nfs_page nfs_open_context dentry inode If the inode ends up being deleted, then the VFS will call truncate_inode_pages. That function will try to take the page lock, but it was already locked when migrate_page was called. The code deadlocks. Fix this by simply refusing the migration request if PagePrivate is already set, indicating that the page is already associated with an active read or write request. We've had a customer test a backported version of this patch and the preliminary results seem good. Cc: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pagesJeff Layton2011-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3236c3e1adc0c7ec83eaff1de2d06746b7c5bb28 upstream. commit 420e3646 allowed the kernel to reduce the number of unnecessary commit calls by skipping the commit when there are a large number of outstanding pages. However, the current test in nfs_commit_unstable_pages does not handle the edge condition properly. When ncommit == 0, then that means that the kernel doesn't need to do anything more for the inode. The current test though in the WB_SYNC_NONE case will return true, and the inode will end up being marked dirty. Once that happens the inode will never be clean until there's a WB_SYNC_ALL flush. Fix this by immediately returning from nfs_commit_unstable_pages when ncommit == 0. Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which has a backported version of 420e3646. The inode cache there was growing very large. The inode cache was unable to be shrunk since the inodes were all marked dirty. Calling sync() would essentially "fix" the problem -- the WB_SYNC_ALL flush would result in the inodes all being marked clean. What I'm not clear on is how big a problem this is in mainline kernels as the writeback code there is very different. Either way, it seems incorrect to re-mark the inode dirty in this case. Reported-by: Mike McLean <mikem@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when ↵Trond Myklebust2011-11-11
| | | | | | | | | | | | | | | | | | | | | | syncing" commit 59b7c05fffba030e5d9e72324691e2f99aa69b79 upstream. This reverts commit b80c3cb628f0ebc241b02e38dd028969fb8026a2. The reverted commit was rendered obsolete by a VFS fix: commit 5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags in two steps). We now no longer need to worry about writeback_single_inode() missing our marking the inode for COMMIT in 'do_writepages()' call. Reverting this patch, fixes a performance regression in which the inode would continuously get queued to the dirty list, causing the writeback code to unnecessarily try to send a COMMIT. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resetsTrond Myklebust2011-08-29
| | | | | | | | | | | | | | | | | | | | | commit 910ac68a2b80c7de95bc8488734067b1bb15d583 upstream. If the client is in the process of resetting the session when it receives a callback, then returning NFS4ERR_DELAY may cause a deadlock with the DESTROY_SESSION call. Basically, if the client returns NFS4ERR_DELAY in response to the CB_SEQUENCE call, then the server is entitled to believe that the client is busy because it is already processing that call. In that case, the server is perfectly entitled to respond with a NFS4ERR_BACK_CHAN_BUSY to any DESTROY_SESSION call. Fix this by having the client reply with a NFS4ERR_BADSESSION in response to the callback if it is resetting the session. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSv4.1: Fix the callback 'highest_used_slotid' behaviourTrond Myklebust2011-08-29
| | | | | | | | | | | | | | | | | | | | | commit 55a673990ec04cf63005318bcf08c2b0046e5778 upstream. Currently, there is no guarantee that we will call nfs4_cb_take_slot() even though nfs4_callback_compound() will consistently call nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field. The result is that we can trigger the BUG_ON() upon the next call to nfs4_cb_take_slot(). This patch fixes the above problem by using the slot id that was taken in the CB_SEQUENCE operation as a flag for whether or not we need to call nfs4_cb_free_slot(). It also fixes an atomicity problem: we need to set tbl->highest_used_slotid atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up racing with the various tests in nfs4_begin_drain_session(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pnfs-obj: Bug when we are running out of bioBoaz Harrosh2011-08-29
| | | | | | | | | | | | | | | | | | | | | | | | commit 20618b21da0796115e81906d24ff1601552701b7 upstream. When we have a situation that the number of pages we want to encode is bigger then the size of the bio. (Which can currently happen only when all IO is going to a single device .e.g group_width==1) then the IO is submitted short and we report back only the amount of bytes we actually wrote/read and all is fine. BUT ... There was a bug that the current length counter was advanced before the fail to add the extra page, and we come to a situation that the CDB length was one-page longer then the actual bio size, which is of course rejected by the osd-target. While here also fix the bio size calculation, in the case that we received more then one group of devices. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pnfs-obj: Fix the comp_index != 0 caseBoaz Harrosh2011-08-29
| | | | | | | | | | | | | | | | commit 9af7db3228acc286c50e3a0f054ec982efdbc6c6 upstream. There were bugs in the case of partial layout where olo_comp_index is not zero. This used to work and was tested but one of the later cleanup SQUASHMEs broke it and was not tested since. Also add a dprint that specify those received layout parameters. Everything else was already printed. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFS: Fix spurious readdir cookie loop messagesTrond Myklebust2011-08-05
| | | | | | | | | | | | | | | | | | | commit 0c0308066ca53fdf1423895f3a42838b67b3a5a8 upstream. If the directory contents change, then we have to accept that the file->f_pos value may shrink if we do a 'search-by-cookie'. In that case, we should turn off the loop detection and let the NFS client try to recover. The patch also fixes a second loop detection bug by ensuring that after turning on the ctx->duped flag, we read at least one new cookie into ctx->dir_cookie before attempting to match with ctx->dup_cookie. Reported-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()Trond Myklebust2011-08-05
| | | | | | | | | | | | | | commit ed1e6211a0a134ff23592c6f057af982ad5dab52 upstream. nfs_mark_return_delegation() is usually called without any locking, and so it is not safe to dereference delegation->inode. Since the inode is only used to discover the nfs_client anyway, it makes more sense to have the callers pass a valid pointer to the nfs_server as a parameter. Reported-by: Ian Kent <raven@themaw.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pnfs: use lwb as layoutcommit lengthPeng Tao2011-08-05
| | | | | | | | | | | | | commit 3557c6c3be5b2ca0b11365db7f8a813253eb520b upstream. Using NFS4_MAX_UINT64 will break current protocol. [Needed in v3.0] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pnfs: let layoutcommit handle a list of lsegPeng Tao2011-08-05
| | | | | | | | | | | | | | | commit a9bae5666d0510ad69bdb437371c9a3e6b770705 upstream. There can be multiple lseg per file, so layoutcommit should be able to handle it. [Needed in v3.0] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pnfs: save layoutcommit cred at layout header initPeng Tao2011-08-05
| | | | | | | | | | | | | | | commit 9fa4075878a5faac872a63f4a97ce79c776264e9 upstream. No need to save it for every lseg. No need to save it at every pnfs_set_layoutcommit. [Needed in v3.0] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pnfs: save layoutcommit lwb at layout headerPeng Tao2011-08-05
| | | | | | | | | | | | | commit acff5880539fe33897d016c0f3dcf062e67c61b6 upstream. No need to save it for every lseg. [Needed in v3.0] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2011-07-13
|\ | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix use of static variable in rpcb_getport_async NFSv4.1: update nfs4_fattr_bitmap_maxsz SUNRPC: Fix a race between work-queue and rpc_killall_tasks pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
| * NFSv4.1: update nfs4_fattr_bitmap_maxszAndy Adamson2011-07-11
| | | | | | | | | | | | | | | | | | Attribute IDs assigned in RFC 5661 now require three bitmaps. Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Signed-off-by: Andy Adamson <andros@netapp.com> Cc:stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfs: write: Set mds_offset in the generic layer - it is needed by all LDsBoaz Harrosh2011-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current pnfs tree, all the layouts set mds_offset in their .write_pagelist member. mds_offset is only used by generic layer and should be handled by it. This patch is for upstream. It is needed in this -rc series to fix a bug in objects layout_commit. I'll send patches for objects and blocks to be squashed into current pnfs tree. TODO: It looks like the read path needs the same patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | FS-Cache: Add a helper to bulk uncache pages on an inodeDavid Howells2011-07-07
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NFS: Fix decode_secinfo_maxszBryan Schumaker2011-06-21
| | | | | | | I initially did the calculation in bytes, and not words Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_testTrond Myklebust2011-06-21
| | | | | | And document what is going on there... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: Fix some issues with pnfs_generic_pg_testTrond Myklebust2011-06-21
| | | | | | | | | | | | | | | | 1. If the intention is to coalesce requests 'prev' and 'req' then we have to ensure at least that we have a layout starting at req_offset(prev). 2. If we're only requesting a minimal layout of length desc->pg_count, we need to test the length actually returned by the server before we allow the coalescing to occur. 3. We need to deal correctly with (pgio->lseg == NULL) 4. Fixup the test guarding the pnfs_update_layout. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: file layout must consider pg_bsize for coalescingBenny Halevy2011-06-20
| | | | | | | Otherwise we end up overflowing the rpc buffer size on the receive end. Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* pnfs-obj: No longer needed to take an extra ref at add_deviceBoaz Harrosh2011-06-19
| | | | | | | | | | | | | | | | Andy's last device_cache patches, already take an extra reference on the newly inserted device_id. So we can remove it from obj-io. Without this patch the device_ids are leaked. Andy's patches are not in Linus tree yet. So I'm not sure if they are scheduled for this Kernel or the next. This patch should be added as part of these. CC: Andy Adamson <andros@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Fix a readdir regressionTrond Myklebust2011-06-16
| | | | | | | | | | | | Commit 7ebb9315 (NFS: use secinfo when crossing mountpoints) introduces a regression when decoding an NFSv4 readdir entry that sets the rdattr_error field. By treating the resulting value as if it is a decoding error, the current code may cause us to skip valid readdir entries. Reported-by: Andy Adamson <andros@netapp.com> Cc: stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* nfs4.1: mark layout as bad on error path in _pnfs_return_layoutFred Isaman2011-06-15
| | | | | Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layoutFred Isaman2011-06-15
| | | | | | | | mark_matching_lsegs_invalid could put the last ref to the layout, so the get_layout_hdr needs to be called first. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error pathBenny Halevy2011-06-15
| | | | | | | | | | We always get a reference on the layout header and we rely on nfs4_layoutreturn_release to put it. If we hit an allocation error before starting the rpc proc we bail out early without dereferncing the layout header properly. Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: (d)printks should use %zd for ssize_t argumentsDavid Howells2011-06-15
| | | | | | | | | | | | | (d)printks should use %zd for ssize_t arguments not %ld, otherwise they might get a warning. I see the following with MN10300. fs/nfs/objlayout/objlayout.c: In function 'objlayout_read_done': fs/nfs/objlayout/objlayout.c:294: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t' Signed-off-by: David Howells <dhowells@redhat.com> cc: Trond Myklebust <Trond.Myklebust@netapp.com> cc: linux-nfs@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: fix break condition in pnfs_find_lsegBenny Halevy2011-06-15
| | | | | | | | | | | The break condition to skip out of the loop got broken when cmp_layout was change. Essentially, we want to stop looking once we know no layout on the remainder of the list can match the first byte of the looked-up range. Reported-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* nfs4.1: fix several problems with _pnfs_return_layoutFred Isaman2011-06-15
| | | | | | | | | | | _pnfs_return_layout had the following problems: - it did not call pnfs_free_lseg_list on all paths - it unintentionally did a forgetful return when there was no outstanding io - it raced with concurrent LAYOUTGETS Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: allow zero fh array in filelayout decode layoutAndy Adamson2011-06-15
| | | | | | Signed-off-by: Andy Adamson <andros@netapp.com> cc:stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>