From a82416da83722944d78d933301e32e7c5ad70edd Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Fri, 14 Jan 2011 02:26:53 +0000 Subject: [PATCH 1/6] fs: small rcu-walk documentation fixes Signed-off-by: Nick Piggin --- Documentation/filesystems/porting | 8 ++++---- Documentation/filesystems/vfs.txt | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 07a32b42cf9c..7fcac6393cd0 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -365,8 +365,8 @@ must be done in the RCU callback. [recommended] vfs now tries to do path walking in "rcu-walk mode", which avoids atomic operations and scalability hazards on dentries and inodes (see -Documentation/filesystems/path-walk.txt). d_hash and d_compare changes (above) -are examples of the changes required to support this. For more complex +Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes +(above) are examples of the changes required to support this. For more complex filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so no changes are required to the filesystem. However, this is costly and loses the benefits of rcu-walk mode. We will begin to add filesystem callbacks that @@ -383,5 +383,5 @@ Documentation/filesystems/vfs.txt for more details. permission and check_acl are inode permission checks that are called on many or all directory inodes on the way down a path walk (to check for -exec permission). These must now be rcu-walk aware (flags & IPERM_RCU). See -Documentation/filesystems/vfs.txt for more details. +exec permission). These must now be rcu-walk aware (flags & IPERM_FLAG_RCU). +See Documentation/filesystems/vfs.txt for more details. diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index fbb324e2bd43..cae6d27c9f5b 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -415,8 +415,8 @@ otherwise noted. permission: called by the VFS to check for access rights on a POSIX-like filesystem. - May be called in rcu-walk mode (flags & IPERM_RCU). If in rcu-walk - mode, the filesystem must check the permission without blocking or + May be called in rcu-walk mode (flags & IPERM_FLAG_RCU). If in rcu-walk + mode, the filesystem must check the permission without blocking or storing to the inode. If a situation is encountered that rcu-walk cannot handle, return From bb20c18db6fbb5e6ba499c76473a487d35073467 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Fri, 14 Jan 2011 02:35:53 +0000 Subject: [PATCH 2/6] fs: force_reval_path drop rcu-walk before d_invalidate d_revalidate can return in rcu-walk mode even when it returns 0. We can't just call any old dcache function on rcu-walk dentry (the dentry is unstable, so even through d_lock can safely be taken, the result may no longer be what we expect -- careful re-checks would be required). So just drop rcu in this case. (I missed this conversion when switching to the rcu-walk convention that Linus suggested) Signed-off-by: Nick Piggin --- fs/namei.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/namei.c b/fs/namei.c index 19433cdba011..0f02359ce685 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -583,6 +583,13 @@ void release_open_intent(struct nameidata *nd) fput(nd->intent.open.file); } +/* + * Call d_revalidate and handle filesystems that request rcu-walk + * to be dropped. This may be called and return in rcu-walk mode, + * regardless of success or error. If -ECHILD is returned, the caller + * must return -ECHILD back up the path walk stack so path walk may + * be restarted in ref-walk mode. + */ static int d_revalidate(struct dentry *dentry, struct nameidata *nd) { int status; @@ -673,6 +680,9 @@ force_reval_path(struct path *path, struct nameidata *nd) return 0; if (!status) { + /* Don't d_invalidate in rcu-walk mode */ + if (nameidata_drop_rcu(nd)) + return -ECHILD; d_invalidate(dentry); status = -ESTALE; } From 90dbb77ba48dddb87445d238e84cd137cf97dd98 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Fri, 14 Jan 2011 02:36:19 +0000 Subject: [PATCH 3/6] fs: fix dropping of rcu-walk from force_reval_path As J. R. Okajima noted, force_reval_path passes in the same dentry to d_revalidate as the one in the nameidata structure (other callers pass in a child), so the locking breaks. This can oops with a chrooted nfs mount, for example. Similarly there can be other problems with revalidating a dentry which is already in nameidata of the path walk. Signed-off-by: Nick Piggin --- fs/namei.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/namei.c b/fs/namei.c index 0f02359ce685..14c73edca9ce 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -479,6 +479,14 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry struct fs_struct *fs = current->fs; struct dentry *parent = nd->path.dentry; + /* + * It can be possible to revalidate the dentry that we started + * the path walk with. force_reval_path may also revalidate the + * dentry already committed to the nameidata. + */ + if (unlikely(parent == dentry)) + return nameidata_drop_rcu(nd); + BUG_ON(!(nd->flags & LOOKUP_RCU)); if (nd->root.mnt) { spin_lock(&fs->lock); From 2c6755988afc003a0332406a134fb6a1626f9b28 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Fri, 14 Jan 2011 02:36:43 +0000 Subject: [PATCH 4/6] fs: hlist UP debug fixup Po-Yu Chuang noticed that hlist_bl_set_first could crash on a UP system when LIST_BL_LOCKMASK is 0, because LIST_BL_BUG_ON(!((unsigned long)h->first & LIST_BL_LOCKMASK)); always evaulates to true. Fix the expression, and also avoid a dependency between bit spinlock implementation and list bl code (list code shouldn't know anything except that bit 0 is set when adding and removing elements). Eventually if a good use case comes up, we might use this list to store 1 or more arbitrary bits of data, so it really shouldn't be tied to locking either, but for now they are helpful for debugging. Signed-off-by: Nick Piggin --- include/linux/list_bl.h | 5 +++-- include/linux/rculist_bl.h | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h index 9ee97e7f2be4..b2adbb4b2f73 100644 --- a/include/linux/list_bl.h +++ b/include/linux/list_bl.h @@ -16,7 +16,7 @@ * some fast and compact auxiliary data. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) +#if defined(CONFIG_SMP) #define LIST_BL_LOCKMASK 1UL #else #define LIST_BL_LOCKMASK 0UL @@ -62,7 +62,8 @@ static inline void hlist_bl_set_first(struct hlist_bl_head *h, struct hlist_bl_node *n) { LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK); - LIST_BL_BUG_ON(!((unsigned long)h->first & LIST_BL_LOCKMASK)); + LIST_BL_BUG_ON(((unsigned long)h->first & LIST_BL_LOCKMASK) != + LIST_BL_LOCKMASK); h->first = (struct hlist_bl_node *)((unsigned long)n | LIST_BL_LOCKMASK); } diff --git a/include/linux/rculist_bl.h b/include/linux/rculist_bl.h index b872b493724d..cf1244fbf3b6 100644 --- a/include/linux/rculist_bl.h +++ b/include/linux/rculist_bl.h @@ -11,7 +11,8 @@ static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h, struct hlist_bl_node *n) { LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK); - LIST_BL_BUG_ON(!((unsigned long)h->first & LIST_BL_LOCKMASK)); + LIST_BL_BUG_ON(((unsigned long)h->first & LIST_BL_LOCKMASK) != + LIST_BL_LOCKMASK); rcu_assign_pointer(h->first, (struct hlist_bl_node *)((unsigned long)n | LIST_BL_LOCKMASK)); } From 657e94b673a805b427903c5628e95348235fad06 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Fri, 14 Jan 2011 02:48:39 +0000 Subject: [PATCH 5/6] nfs: add missing rcu-walk check Signed-off-by: Nick Piggin --- fs/nfs/dir.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index d33da530097a..a0d8320bed9c 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1410,11 +1410,15 @@ no_open: static int nfs_open_revalidate(struct dentry *dentry, struct nameidata *nd) { struct dentry *parent = NULL; - struct inode *inode = dentry->d_inode; + struct inode *inode; struct inode *dir; struct nfs_open_context *ctx; int openflags, ret = 0; + if (nd->flags & LOOKUP_RCU) + return -ECHILD; + + inode = dentry->d_inode; if (!is_atomic_open(nd) || d_mountpoint(dentry)) goto no_open; From f20877d94a74557b7c28b4ed8920d834c31e0ea5 Mon Sep 17 00:00:00 2001 From: "J. R. Okajima" Date: Fri, 14 Jan 2011 03:56:04 +0000 Subject: [PATCH 6/6] fs: fix do_last error case when need_reval_dot When open(2) without O_DIRECTORY opens an existing dir, it should return EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it is changed by d_revalidate() which returns any positive to represent 'the target dir is valid.' Should we keep and return the initialized 'error' in this case. Signed-off-by: Nick Piggin --- fs/namei.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 14c73edca9ce..bc24894c5f14 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2122,11 +2122,13 @@ static struct file *do_last(struct nameidata *nd, struct path *path, dir = nd->path.dentry; case LAST_DOT: if (need_reval_dot(dir)) { - error = d_revalidate(nd->path.dentry, nd); - if (!error) - error = -ESTALE; - if (error < 0) + int status = d_revalidate(nd->path.dentry, nd); + if (!status) + status = -ESTALE; + if (status < 0) { + error = status; goto exit; + } } /* fallthrough */ case LAST_ROOT: