aboutsummaryrefslogtreecommitdiffstats
path: root/fs
diff options
context:
space:
mode:
authorNeil Brown <neilb@suse.de>2007-02-25 20:48:25 -0500
committerTrond Myklebust <Trond.Myklebust@netapp.com>2007-05-01 01:17:19 -0400
commit83672d392f7bcf556f7920d6715e4174d9373ee0 (patch)
tree43af2f3428bd5a8eec3b089faf5aaef6ff7a4189 /fs
parent1f4eab7e7c1d90dcd8ca4d7c064ee78dfbb345eb (diff)
NFS: Fix directory caching problem - with test case and patch.
Try running this script in an NFS mounted directory (Client relatively recent - 2.6.18 has the problem as does 2.6.20). ------------------------------------------------------ #!/bin/bash # # This script will produce the following errormessage from tar: # # tar: newdir/innerdir/innerfile: file changed as we read it # create dirs rm -rf nfstest mkdir -p nfstest/dir/innerdir # create files (should not be empty) echo "Hello World!" >nfstest/dir/file echo "Hello World!" >nfstest/dir/innerdir/innerfile # problem only happens if we sleep before chmod sleep 1 # change file modes chmod -R a+r nfstest # rename dir mv nfstest/dir nfstest/newdir # tar it tar -cf nfstest/nfstest.tar -C nfstest newdir # restore old dir name mv nfstest/newdir nfstest/dir -------------------------------------------------------- What happens: The 'chmod -R' does a readdir_plus in each directory and the results get cached in the page cache. It then updates the ctime on each file by one second. When this happens, the post-op attributes are used to update the ctime stored on the client to match the value in the kernel. The 'mv' calls shrink_dcache_parent on the directory tree which flushes all the dentries (so a new lookup will be required) but doesn't flush the inodes or pagecache. The 'tar' does a readdir on each directory, but (in the case of 'innerdir' at least) satisfies it from the pagecache and uses the READDIRPLUS data to update all the inodes. In the case of 'innerdir/innerfile', the ctime is out of date. 'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime. It then opens the file (triggering a GETATTR), reads the content, and then calls fstat to see if anything has changed. It finds that ctime has changed and so complains. The problem seems to be that the cache readdirplus info is kept around for too long. My patch below discards pagecache data for directories when dentry_iput is called on them. This effectively removes the symptom which convinces me that I correctly understand the problem. However I'm not convinced that is a proper solution, as there could easily be other races that trigger the same problem without being affected by this 'fix'. One possibility would be to require that readdirplus pagecache data be only used *once* to instantiate an inode. Somehow it should then be invalidated so that if the dentry subsequently disappears, it will cause a new request to the server to fill in the stat data. Another possibility is to compare the cache_change_attribute on the inode with something similar for the readdirplus info and reject the info from readdirplus if it is too old. I haven't tried to implement these and would value other opinions before I do. Thanks, NeilBrown Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Diffstat (limited to 'fs')
-rw-r--r--fs/nfs/dir.c4
1 files changed, 4 insertions, 0 deletions
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d971547ce609..e59fd31c9a22 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -865,6 +865,10 @@ static int nfs_dentry_delete(struct dentry *dentry)
865static void nfs_dentry_iput(struct dentry *dentry, struct inode *inode) 865static void nfs_dentry_iput(struct dentry *dentry, struct inode *inode)
866{ 866{
867 nfs_inode_return_delegation(inode); 867 nfs_inode_return_delegation(inode);
868 if (S_ISDIR(inode->i_mode))
869 /* drop any readdir cache as it could easily be old */
870 NFS_I(inode)->cache_validity |= NFS_INO_INVALID_DATA;
871
868 if (dentry->d_flags & DCACHE_NFSFS_RENAMED) { 872 if (dentry->d_flags & DCACHE_NFSFS_RENAMED) {
869 lock_kernel(); 873 lock_kernel();
870 drop_nlink(inode); 874 drop_nlink(inode);