aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nfs/read.c
Commit message (Collapse)AuthorAge
* [PATCH] Really ignore kmem_cache_destroy return valueAlexey Dobriyan2006-09-27
| | | | | | | | | | | | | | | | * Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* NFS: Make read() return an ESTALE if the file has been deletedTrond Myklebust2006-09-22
| | | | | | | | | | | Currently, a read() request will return EIO even if the file has been deleted on the server, simply because that is what the VM will return if the call to readpage() fails to update the page. Ensure that readpage() marks the inode as stale if it receives an ESTALE. Then return that error to userland. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Share NFS superblocks per-protocol per-server per-FSIDDavid Howells2006-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The attached patch makes NFS share superblocks between mounts from the same server and FSID over the same protocol. It does this by creating each superblock with a false root and returning the real root dentry in the vfsmount presented by get_sb(). The root dentry set starts off as an anonymous dentry if we don't already have the dentry for its inode, otherwise it simply returns the dentry we already have. We may thus end up with several trees of dentries in the superblock, and if at some later point one of anonymous tree roots is discovered by normal filesystem activity to be located in another tree within the superblock, the anonymous root is named and materialises attached to the second tree at the appropriate point. Why do it this way? Why not pass an extra argument to the mount() syscall to indicate the subpath and then pathwalk from the server root to the desired directory? You can't guarantee this will work for two reasons: (1) The root and intervening nodes may not be accessible to the client. With NFS2 and NFS3, for instance, mountd is called on the server to get the filehandle for the tip of a path. mountd won't give us handles for anything we don't have permission to access, and so we can't set up NFS inodes for such nodes, and so can't easily set up dentries (we'd have to have ghost inodes or something). With this patch we don't actually create dentries until we get handles from the server that we can use to set up their inodes, and we don't actually bind them into the tree until we know for sure where they go. (2) Inaccessible symbolic links. If we're asked to mount two exports from the server, eg: mount warthog:/warthog/aaa/xxx /mmm mount warthog:/warthog/bbb/yyy /nnn We may not be able to access anything nearer the root than xxx and yyy, but we may find out later that /mmm/www/yyy, say, is actually the same directory as the one mounted on /nnn. What we might then find out, for example, is that /warthog/bbb was actually a symbolic link to /warthog/aaa/xxx/www, but we can't actually determine that by talking to the server until /warthog is made available by NFS. This would lead to having constructed an errneous dentry tree which we can't easily fix. We can end up with a dentry marked as a directory when it should actually be a symlink, or we could end up with an apparently hardlinked directory. With this patch we need not make assumptions about the type of a dentry for which we can't retrieve information, nor need we assume we know its place in the grand scheme of things until we actually see that place. This patch reduces the possibility of aliasing in the inode and page caches for inodes that may be accessed by more than one NFS export. It also reduces the number of superblocks required for NFS where there are many NFS exports being used from a server (home directory server + autofs for example). This in turn makes it simpler to do local caching of network filesystems, as it can then be guaranteed that there won't be links from multiple inodes in separate superblocks to the same cache file. Obviously, cache aliasing between different levels of NFS protocol could still be a problem, but at least that gives us another key to use when indexing the cache. This patch makes the following changes: (1) The server record construction/destruction has been abstracted out into its own set of functions to make things easier to get right. These have been moved into fs/nfs/client.c. All the code in fs/nfs/client.c has to do with the management of connections to servers, and doesn't touch superblocks in any way; the remaining code in fs/nfs/super.c has to do with VFS superblock management. (2) The sequence of events undertaken by NFS mount is now reordered: (a) A volume representation (struct nfs_server) is allocated. (b) A server representation (struct nfs_client) is acquired. This may be allocated or shared, and is keyed on server address, port and NFS version. (c) If allocated, the client representation is initialised. The state member variable of nfs_client is used to prevent a race during initialisation from two mounts. (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we are given the root FH in advance. (e) The volume FSID is probed for on the root FH. (f) The volume representation is initialised from the FSINFO record retrieved on the root FH. (g) sget() is called to acquire a superblock. This may be allocated or shared, keyed on client pointer and FSID. (h) If allocated, the superblock is initialised. (i) If the superblock is shared, then the new nfs_server record is discarded. (j) The root dentry for this mount is looked up from the root FH. (k) The root dentry for this mount is assigned to the vfsmount. (3) nfs_readdir_lookup() creates dentries for each of the entries readdir() returns; this function now attaches disconnected trees from alternate roots that happen to be discovered attached to a directory being read (in the same way nfs_lookup() is made to do for lookup ops). The new d_materialise_unique() function is now used to do this, thus permitting the whole thing to be done under one set of locks, and thus avoiding any race between mount and lookup operations on the same directory. (4) The client management code uses a new debug facility: NFSDBG_CLIENT which is set by echoing 1024 to /proc/net/sunrpc/nfs_debug. (5) Clone mounts are now called xdev mounts. (6) Use the dentry passed to the statfs() op as the handle for retrieving fs statistics rather than the root dentry of the superblock (which is now a dummy). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix Oopsable condition in nfs_readpage_sync()Trond Myklebust2006-09-19
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* [PATCH] NFS: large non-page-aligned direct I/O clobbers memoryTrond Myklebust2006-09-08
| | | | | | | | | | | | | | | | | The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can allow data->npages to be one larger than rpages. This causes a page pointer to be written beyond the end of the pagevec in nfs_read_data (or nfs_write_data). Fix this by making nfs_(read|write)_alloc() calculate the size of the pagevec array, and initialise data->npages. Also get rid of the redundant argument to nfs_commit_alloc(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* NFS: Fix issue with EIO on NFS readTrond Myklebust2006-08-24
| | | | | | | | | | The problem is that we may be caching writes that would extend the file and create a hole in the region that we are reading. In this case, we need to detect the eof from the server, ensure that we zero out the pages that are part of the hole and mark them as up to date. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (cherry picked from 856b603b01b99146918c093969b6cb1b1b0f1c01 commit)
* NFS: make 2 functions staticAdrian Bunk2006-08-03
| | | | | | | | | | nfs_writedata_free() and nfs_readdata_free() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (cherry picked from 5e1ce40f0c3c8f67591aff17756930d7a18ceb1a commit)
* Remove obsolete #include <linux/config.h>Jörn Engel2006-06-30
| | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [PATCH] fix static linking of NFSDavid Brownell2006-06-27
| | | | | | | | | | | | | | Builds on ARM report link problems with common configurations like statically linked NFS (for nfsroot). The symptom is that __init section code references __exit section code; that won't work since the exit sections are discarded (since they can never be called). The best fix for these particular cases would be an "__init_or_exit" section annotation. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* NFS: Split fs/nfs/inode.cDavid Howells2006-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached patch splits it up into a number of files: (*) fs/nfs/inode.c Strictly inode specific functions. (*) fs/nfs/super.c Superblock management functions for NFS and NFS4, normal access, clones and referrals. The NFS4 superblock functions _could_ move out into a separate conditionally compiled file, but it's probably not worth it as there're so many common bits. (*) fs/nfs/namespace.c Some namespace-specific functions have been moved here. (*) fs/nfs/nfs4namespace.c NFS4-specific namespace functions (this could be merged into the previous file). This file is conditionally compiled. (*) fs/nfs/internal.h Inter-file declarations, plus a few simple utility functions moved from fs/nfs/inode.c. Additionally, all the in-.c-file externs have been moved here, and those files they were moved from now includes this file. For the most part, the functions have not been changed, only some multiplexor functions have changed significantly. I've also: (*) Added some extra banner comments above some functions. (*) Rearranged the function order within the files to be more logical and better grouped (IMO), though someone may prefer a different order. (*) Reduced the number of #ifdefs in .c files. (*) Added missing __init and __exit directives. Signed-Off-By: David Howells <dhowells@redhat.com>
* NFS: Optimize allocation of nfs_read/write_data structuresChuck Lever2006-06-09
| | | | | | | | | | | | Clean up use of page_array, and fix an off-by-one error noticed by Tom Talpey which causes kmalloc calls in cases where using the page_array is sufficient. Test plan: Normal client functional testing with r/wsize=32768. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Clean up and fix page zeroing when we have short readsTrond Myklebust2006-06-09
| | | | | | | | | | | | | The code that is supposed to zero the uninitialised partial pages when the server returns a short read is currently broken: it looks at the nfs_page wb_pgbase and wb_bytes fields instead of the equivalent nfs_read_data values when deciding where to start truncating the page. Also ensure that we are more careful about setting PG_uptodate before retrying a short read: the retry will change the nfs_read_data args.pgbase and args.count. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* [PATCH] mempool: use mempool_create_slab_pool()Matthew Dobson2006-03-26
| | | | | | | | | | Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* NFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free)Trond Myklebust2006-03-20
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Cleanup of NFS read codeTrond Myklebust2006-03-20
| | | | | | | | Same callback hierarchy inversion as for the NFS write calls. This patch is not strictly speaking needed by the O_DIRECT code, but avoids confusing differences between the asynchronous read and write code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: add I/O performance countersChuck Lever2006-03-20
| | | | | | | | | | | | | | | | Invoke the byte and event counter macros where we want to count bytes and events. Clean-up: fix a possible NULL dereference in nfs_lock, and simplify nfs_file_open. Test-plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: support large reads and writes on the wireChuck Lever2006-01-06
| | | | | | | | | | | | | | | | | Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* RPC: Clean up RPC task structureTrond Myklebust2006-01-06
| | | | | | | | | | Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Fix problem with OPEN_DOWNGRADETrond Myklebust2005-11-04
| | | | | | | | | | | | | | | | | RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny bits specified must be exactly equal to the union of the share_access and share_deny bits specified for some subset of the OPENs in effect for current openowner on the current file. Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file with O_WRONLY access mode. Fix the problem by replacing nfs4_find_state() with a modified version of nfs_find_open_context(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Cleanup initialisation of struct nfs_fattrTrond Myklebust2005-10-27
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* [PATCH] NFS: fix client oops when debugging is onNick Wilson2005-09-23
| | | | | | | | | | | | | | nfs_readpage_release() causes an oops while accessing a file with NFS debugging turned on (echo 32767 > /proc/sys/sunrpc/nfs_debug) and a kernel built with CONFIG_DEBUG_SLAB. This patch moves the debugging statement above nfs_release_request() to avoid accessing freed memory. Signed-off-by: Nick Wilson <njw@osdl.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NFS: Introduce the use of inode->i_lock to protect fields in nfsiChuck Lever2005-08-18
| | | | | | | | | | | | | | | | | | | | Down the road we want to eliminate the use of the global kernel lock entirely from the NFS client. To do this, we need to protect the fields in the nfs_inode structure adequately. Start by serializing updates to the "cache_validity" field. Note this change addresses an SMP hang found by njw@osdl.org, where processes deadlock because nfs_end_data_update and nfs_revalidate_mapping update the "cache_validity" field without proper serialization. Test plan: Millions of fsx ops on SMP clients. Run Nick Wilson's breaknfs program on large SMP clients. Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NFS: split nfsi->flags into two fieldsChuck Lever2005-08-18
| | | | | | | | | | | | | | | | Certain bits in nfsi->flags can be manipulated with atomic bitops, and some are better manipulated via logical bitmask operations. This patch splits the flags field into two. The next patch introduces atomic bitops for one of the fields. Test plan: Millions of fsx ops on SMP clients. Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NFS: Make searching and waiting on busy writeback requests more ↵Trond Myklebust2005-06-22
| | | | | | | | | efficient. Basically copies the VFS's method for tracking writebacks and applies it to the struct nfs_page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-16
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!