alistair23-linux/fs/nfs
Neil Brown 83672d392f NFS: Fix directory caching problem - with test case and patch.
Try running this script in an NFS mounted directory (Client relatively
recent - 2.6.18 has the problem as does 2.6.20).

------------------------------------------------------
#!/bin/bash
#
# This script will produce the following errormessage from tar:
#
#   tar: newdir/innerdir/innerfile: file changed as we read it

# create dirs
rm -rf nfstest
mkdir -p nfstest/dir/innerdir

# create files (should not be empty)
echo "Hello World!" >nfstest/dir/file
echo "Hello World!" >nfstest/dir/innerdir/innerfile

# problem only happens if we sleep before chmod
sleep 1

# change file modes
chmod -R a+r nfstest

# rename dir
mv nfstest/dir nfstest/newdir

# tar it
tar -cf nfstest/nfstest.tar -C nfstest newdir

# restore old dir name
mv nfstest/newdir nfstest/dir
--------------------------------------------------------

What happens:

The 'chmod -R' does a readdir_plus in each directory and the results
get cached in the page cache.  It then updates the ctime on each file
by one second.  When this happens, the post-op attributes are used to
update the ctime stored on the client to match the value in the kernel.

The 'mv' calls shrink_dcache_parent on the directory tree which
flushes all the dentries (so a new lookup will be required) but
doesn't flush the inodes or pagecache.

The 'tar' does a readdir on each directory, but (in the case of
'innerdir' at least) satisfies it from the pagecache and uses the
READDIRPLUS data to update all the inodes.  In the case of
'innerdir/innerfile', the ctime is out of date.

'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime.
It then opens the file (triggering a GETATTR), reads the content, and
then calls fstat to see if anything has changed.  It finds that ctime
has changed and so complains.

The problem seems to be that the cache readdirplus info is kept around
for too long.

My patch below discards pagecache data for directories when
dentry_iput is called on them.  This effectively removes the symptom
which convinces me that I correctly understand the problem.  However
I'm not convinced that is a proper solution, as there could easily be
other races that trigger the same problem without being affected by
this 'fix'.

One possibility would be to require that readdirplus pagecache data be
only used *once* to instantiate an inode.  Somehow it should then be
invalidated so that if the dentry subsequently disappears, it will
cause a new request to the server to fill in the stat data.

Another possibility is to compare the cache_change_attribute on the
inode with something similar for the readdirplus info and reject the
info from readdirplus if it is too old.

I haven't tried to implement these and would value other opinions
before I do.

Thanks,
NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:19 -07:00
..
callback.c [PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses 2007-02-12 09:48:36 -08:00
callback.h [PATCH] xdr annotations: fs/nfs/callback* 2006-10-20 10:26:40 -07:00
callback_proc.c [PATCH] fs/nfs/callback* passes error values big-endian 2006-10-20 10:26:40 -07:00
callback_xdr.c [PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses 2007-02-12 09:48:36 -08:00
client.c NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. 2007-04-30 22:17:16 -07:00
delegation.c [PATCH] fs: Removing useless casts 2006-09-27 08:26:10 -07:00
delegation.h NFS: Rename struct nfs4_client to struct nfs_client 2006-09-22 23:24:31 -04:00
dir.c NFS: Fix directory caching problem - with test case and patch. 2007-04-30 22:17:19 -07:00
direct.c NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata 2007-04-30 22:17:07 -07:00
file.c [PATCH] mark struct inode_operations const 2 2007-02-12 09:48:46 -08:00
getroot.c NFSv4: Don't require that NFSv4 mount paths begin with '/' 2007-02-03 15:35:05 -08:00
idmap.c [PATCH] nfs: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:41 -08:00
inode.c NFS: Fix an Oops in nfs_setattr() 2007-04-14 21:46:47 -07:00
internal.h NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata 2007-04-30 22:17:07 -07:00
iostat.h NFSv4: Fix an oops in nfs4_fill_super 2006-03-20 13:44:48 -05:00
Makefile NFS: Share NFS superblocks per-protocol per-server per-FSID 2006-09-22 23:24:37 -04:00
mount_clnt.c SUNRPC: RPC buffer size estimates are too large 2007-04-30 22:17:10 -07:00
namespace.c [PATCH] mark struct inode_operations const 2 2007-02-12 09:48:46 -08:00
nfs2xdr.c SUNRPC: RPC buffer size estimates are too large 2007-04-30 22:17:10 -07:00
nfs3acl.c NFSv3: Client-side nfsacl caching fix 2006-06-09 09:34:11 -04:00
nfs3proc.c NFS: Remove nfs_readpage_sync() 2007-02-03 15:35:06 -08:00
nfs3xdr.c SUNRPC: RPC buffer size estimates are too large 2007-04-30 22:17:10 -07:00
nfs4_fs.h Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ 2007-02-12 22:43:25 -08:00
nfs4namespace.c NFSv4: /proc/mounts displays the wrong server name for referrals 2007-02-03 15:35:10 -08:00
nfs4proc.c Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ 2007-02-12 22:43:25 -08:00
nfs4renewd.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
nfs4state.c NFS: Share NFS superblocks per-protocol per-server per-FSID 2006-09-22 23:24:37 -04:00
nfs4xdr.c SUNRPC: RPC buffer size estimates are too large 2007-04-30 22:17:10 -07:00
nfsroot.c NFS: switch NFSROOT to use new rpcbind client 2007-04-30 22:17:14 -07:00
pagelist.c NFS: Use pgoff_t in structures and functions that pass page cache offsets 2007-04-30 22:17:09 -07:00
proc.c NFS: Remove nfs_readpage_sync() 2007-02-03 15:35:06 -08:00
read.c NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata 2007-04-30 22:17:07 -07:00
super.c NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. 2007-04-30 22:17:16 -07:00
symlink.c [PATCH] mark struct inode_operations const 2 2007-02-12 09:48:46 -08:00
sysctl.c [PATCH] nfs: fix congestion control 2007-03-16 19:25:05 -07:00
unlink.c NFS: kzalloc conversion in fs/nfs 2006-03-20 13:44:10 -05:00
write.c NFS: Use pgoff_t in structures and functions that pass page cache offsets 2007-04-30 22:17:09 -07:00