Commit graph

919 commits

Author SHA1 Message Date
Mimi Zohar 9bbb6cad01 ima: rename ima_path_check to ima_file_check
ima_path_check actually deals with files!  call it ima_file_check instead.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Mimi Zohar 8eb988c70e fix ima breakage
The "Untangling ima mess, part 2 with counters" patch messed
up the counters.  Based on conversations with Al Viro, this patch
streamlines ima_path_check() by removing the counter maintaince.
The counters are now updated independently, from measuring the file,
in __dentry_open() and alloc_file() by calling ima_counts_get().
ima_path_check() is called from nfsd and do_filp_open().
It also did not measure all files that should have been measured.
Reason: ima_path_check() got bogus value passed as mask.
[AV: mea culpa]
[AV: add missing nfsd bits]

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Al Viro 1e41568d73 Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Trond Myklebust aa696a6f34 nfsd: Use vfs_fsync_range() in nfsd_commit
The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-29 18:53:11 -05:00
Chuck Lever 37498292aa NFSD: Create PF_INET6 listener in write_ports
Try to create a PF_INET6 listener for NFSD, if IPv6 is enabled in the
kernel.

Make sure nfsd_serv's reference count is decreased if
__write_ports_addxprt() failed to create a listener.  See
__write_ports_addfd().

Our current plan is to rely on rpc.nfsd to create appropriate IPv6
listeners when server-side NFS/IPv6 support is desired.  Legacy
behavior, via the write_threads or write_svc kernel APIs, will remain
the same -- only IPv4 listeners are created.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@citi.umich.edu: Move error-handling code to end]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-27 17:01:08 -05:00
Chuck Lever 6871790815 SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
write_ports() converts svc_create_xprt()'s ENOENT error return to
EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
message that makes sense.

It turns out that several of the other kernel APIs rpc.nfsd use can
also return ENOENT from svc_create_xprt(), by way of lockd_up().

On the client side, an NFSv2 or NFSv3 mount request can also return
the result of lockd_up().  This error may also be returned during an
NFSv4 mount request, since the NFSv4 callback service uses
svc_create_xprt() to create the callback listener.  An ENOENT error
return results in a confusing error message from the mount command.

Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-26 17:59:21 -05:00
Ricardo Labiaga 8b8aae4009 nfsd41: Create the recovery entry for the NFSv4.1 client
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-14 12:24:46 -05:00
Christoph Hellwig 6a68f89ee1 nfsd: use vfs_fsync for non-directories
Instead of opencoding the fsync calling sequence use vfs_fsync.  This also
gets rid of the useless i_mutex over the data writeout.

Consolidate the remaining special code for syncing directories and document
it's quirks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-13 09:42:26 -05:00
Ricardo Labiaga de3cab793c nfsd4: Use FIRST_NFS4_OP in nfsd4_decode_compound()
Since we're checking for LAST_NFS4_OP, use FIRST_NFS4_OP to be consistent.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-13 09:42:26 -05:00
Ricardo Labiaga c551866e64 nfsd41: nfsd4_decode_compound() does not recognize all ops
The server incorrectly assumes that the operations in the
array start with value 0.  The first operation (OP_ACCESS)
has a value of 3, causing the check in nfsd4_decode_compound
to be off.

Instead of comparing that the operation number is less than
the number of elements in the array, the server should verify
that it is less than the maximum valid operation number
defined by LAST_NFS4_OP.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-13 09:42:26 -05:00
Linus Torvalds 93939f4e5d Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
  sunrpc: fix peername failed on closed listener
  nfsd: make sure data is on disk before calling ->fsync
  nfsd: fix "insecure" export option
2010-01-06 18:10:15 -08:00
Christoph Hellwig 7211a4e859 nfsd: make sure data is on disk before calling ->fsync
nfsd is not using vfs_fsync, so I missed it when changing the calling
convention during the 2.6.32 window.  This patch fixes it to not only
start the data writeout, but also wait for it to complete before calling
into ->fsync.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-06 17:37:26 -05:00
J. Bruce Fields 3d354cbc43 nfsd: fix "insecure" export option
A typo in 12045a6ee9 "nfsd: let "insecure" flag vary by
pseudoflavor" reversed the sense of the "insecure" flag.

Reported-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-20 20:19:51 -08:00
J. Bruce Fields f69ac2f5a3 nfsd: fix "insecure" export option
A typo in 12045a6ee9 "nfsd: let "insecure" flag vary by
pseudoflavor" reversed the sense of the "insecure" flag.

Reported-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-20 10:22:58 -05:00
Linus Torvalds bac5e54c29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
  direct I/O fallback sync simplification
  ocfs: stop using do_sync_mapping_range
  cleanup blockdev_direct_IO locking
  make generic_acl slightly more generic
  sanitize xattr handler prototypes
  libfs: move EXPORT_SYMBOL for d_alloc_name
  vfs: force reval of target when following LAST_BIND symlinks (try #7)
  ima: limit imbalance msg
  Untangling ima mess, part 3: kill dead code in ima
  Untangling ima mess, part 2: deal with counters
  Untangling ima mess, part 1: alloc_file()
  O_TRUNC open shouldn't fail after file truncation
  ima: call ima_inode_free ima_inode_free
  IMA: clean up the IMA counts updating code
  ima: only insert at inode creation time
  ima: valid return code from ima_inode_alloc
  fs: move get_empty_filp() deffinition to internal.h
  Sanitize exec_permission_lite()
  Kill cached_lookup() and real_lookup()
  Kill path_lookup_open()
  ...

Trivial conflicts in fs/direct-io.c
2009-12-16 12:04:02 -08:00
Al Viro 1429b3eca2 Untangling ima mess, part 3: kill dead code in ima
Kill the 'update' argument of ima_path_check(), kill
dead code in ima.

Current rules: ima counters are bumped at the same time
when the file switches from put_filp() fodder to fput()
one.  Which happens exactly in two places - alloc_file()
and __dentry_open().  Nothing else needs to do that at
all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:47 -05:00
Al Viro b65a9cfc2c Untangling ima mess, part 2: deal with counters
* do ima_get_count() in __dentry_open()
* stop doing that in followups
* move ima_path_check() to right after nameidata_to_filp()
* don't bump counters on it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:47 -05:00
J. Bruce Fields 7663dacd92 nfsd: remove pointless paths in file headers
The new .h files have paths at the top that are now out of date.  While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 15:01:47 -05:00
J. Bruce Fields 1557aca790 nfsd: move most of nfsfh.h to fs/nfsd
Most of this can be trivially moved to a private header as well.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 15:01:46 -05:00
J. Bruce Fields 774b147828 nfsd: make V4ROOT exports read-only
I can't see any use for writeable V4ROOT exports.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 15:01:44 -05:00
Steve Dickson 03a816b46d nfsd: restrict filehandles accepted in V4ROOT case
On V4ROOT exports, only accept filehandles that are the *root* of some
export.  This allows mountd to allow or deny access to individual
directories and symlinks on the pseudofilesystem.

Note that the checks in readdir and lookup are not enough, since a
malicious host with access to the network could guess filehandles that
they weren't able to obtain through lookup or readdir.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields f2ca7153ca nfsd: allow exports of symlinks
We want to allow exports of symlinks, to allow mountd to communicate to
the kernel which symlinks lead to exports, and hence which symlinks need
to be visible on the pseudofilesystem.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields 3227fa41ab nfsd: filter readdir results in V4ROOT case
As with lookup, we treat every boject as a mountpoint and pretend it
doesn't exist if it isn't exported.

The preexisting code here is confusing, but I haven't yet figured out
how to make it clearer.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:24 -05:00
J. Bruce Fields 82ead7fe41 nfsd: filter lookup results in V4ROOT case
We treat every object as a mountpoint and pretend it doesn't exist if
it isn't exported.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:23 -05:00
J. Bruce Fields 3b6cee7bc4 nfsd4: don't continue "under" mounts in V4ROOT case
If /A/mount/point/ has filesystem "B" mounted on top of it, and if "A"
is exported, but not "B", then the nfs server has always returned to the
client a filehandle for the mountpoint, instead of for the root of "B",
allowing the client to see the subtree of "A" that would otherwise be
hidden by B.

Disable this behavior in the case of V4ROOT exports; we implement the
path restrictions of V4ROOT exports by treating *every* directory as if
it were a mountpoint, and allowing traversal *only* if the new directory
is exported.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:07:23 -05:00
Steve Dickson eb4c86c6a5 nfsd: introduce export flag for v4 pseudoroot
NFSv4 differs from v2 and v3 in that it presents a single unified
filesystem tree, whereas v2 and v3 exported multiple filesystem (whose
roots could be found using a separate mount protocol).

Our original NFSv4 server implementation asked the administrator to
designate a single filesystem as the NFSv4 root, then to mount
filesystems they wished to export underneath.  (Often using bind mounts
of already-existing filesystems.)

This was conceptually simple, and allowed easy implementation, but
created a serious obstacle to upgrading between v2/v3: since the paths
to v4 filesystems were different, administrators would have to adjust
all the paths in client-side mount commands when switching to v4.

Various workarounds are possible.  For example, the administrator could
export "/" and designate it as the v4 root.  However, the security risks
of that approach are obvious, and in any case we shouldn't be requiring
the administrator to take extra steps to fix this problem; instead, the
server should present consistent paths across different versions by
default.

These patches take a modified version of that approach: we provide a new
export option which exports only a subset of a filesystem.  With this
flag, it becomes safe for mountd to export "/" by default, with no need
for additional configuration.

We begin just by defining the new flag.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-15 14:00:40 -05:00
J. Bruce Fields 12045a6ee9 nfsd: let "insecure" flag vary by pseudoflavor
This was an oversight; it should be among the export flags that can be
allowed to vary by pseudoflavor.  This allows an administrator to (for
example) allow auth_sys mounts only from low ports, but allow auth_krb5
mounts to use any port.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 19:08:58 -05:00
J. Bruce Fields e8e8753f7a nfsd: new interface to advertise export features
Soon we will add the new V4ROOT flag, and allow the INSECURE flag to
vary by pseudoflavor.  It would be useful for nfs-utils (for example,
for improved exportfs error reporting) to be able to know when this
happens.  Use this new interface for that purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:51:29 -05:00
Boaz Harrosh 9a74af2133 nfsd: Move private headers to source directory
Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:12 -05:00
Boaz Harrosh 341eb18446 nfsd: Source files #include cleanups
Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.

This patch should improve the compilation speed of the nfsd module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:12:09 -05:00
J. Bruce Fields 57ecb34feb nfsd4: fix share mode permissions
NFSv4 opens may function as locks denying other NFSv4 users the rights
to open a file.

We're requiring a user to have write permissions before they can deny
write.  We're *not* requiring a user to have write permissions to deny
read, which is if anything a more drastic denial.

What was intended was to require write permissions for DENY_READ.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-12-14 18:06:54 -05:00
J. Bruce Fields 864f0f61f8 nfsd: simplify fh_verify access checks
All nfsd security depends on the security checks in fh_verify, and
especially on nfsd_setuser().

It therefore bothers me that the nfsd_setuser call may be made from
three different places, depending on whether the filehandle has already
been mapped to a dentry, and on whether subtreechecking is in force.

Instead, make an unconditional call in fh_verify(), so it's trivial to
verify that the call always occurs.

That leaves us with a redundant nfsd_setuser() call in the subtreecheck
case--it needs the correct user set earlier in order to check execute
permissions on the path to this filehandle--but I'm willing to accept
that minor inefficiency in the subtreecheck case in return for more
straightforward permission checking.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-11-25 17:55:46 -05:00
J. Bruce Fields 9b8b317d58 Merge commit 'v2.6.32-rc8' into HEAD 2009-11-23 12:34:58 -05:00
Petr Vandrovec 479c2553af Fix memory corruption caused by nfsd readdir+
Commit 8177e6d6df ("nfsd: clean up
readdirplus encoding") introduced single character typo in nfs3 readdir+
implementation.  Unfortunately that typo has quite bad side effects:
random memory corruption, followed (on my box) with immediate
spontaneous box reboot.

Using 'p1' instead of 'p' fixes my Linux box rebooting whenever VMware
ESXi box tries to list contents of my home directory.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-14 12:55:55 -08:00
J. Bruce Fields 0a3adadee4 nfsd: make fs/nfsd/vfs.h for common includes
None of this stuff is used outside nfsd, so move it out of the common
linux include directory.

Actually, probably none of the stuff in include/linux/nfsd/nfsd.h really
belongs there, so later we may remove that file entirely.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-11-13 13:23:02 -05:00
Benny Halevy 8c10cbdb4a nfsd: use STATEID_FMT and STATEID_VAL for printing stateids
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-11-05 12:06:29 -05:00
Peter Staubach 1b7e0403c6 nfsd: register NFS_ACL with rpcbind
Modify the NFS server to register the NFS_ACL services with the rpcbind
daemon.  This allows the client to ping for the existence of the NFS_ACL
support via commands such as "rpcinfo -t <server> nfs_acl".

This patch also modifies the NFS_ACL support so that responses to
version 2 NULLPROC requests can be made.

The changelog for the patch which turned off this functionality
mentioned something about not registering the NFS_ACL as being part of
some tradition.  I can't find this tradition and the only other
implementation which supports NFS_ACL does register them with the
rpcbind daemon.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-11-04 13:46:37 -05:00
Frank Filz aba24d7158 nfsd: Fix sort_pacl in fs/nfsd/nf4acl.c to actually sort groups
We have been doing some extensive testing of Linux support for ACLs on
NFDS v4. We have noticed that the server rejects ACLs where the groups
are out of order, for example, the following ACL is rejected:

A::OWNER@:rwaxtTcCy
A::user101@domain:rwaxtcy
A::GROUP@:rwaxtcy
A:g:group102@domain:rwaxtcy
A:g:group101@domain:rwaxtcy
A::EVERYONE@:rwaxtcy

Examining the server code, I found that after converting an NFS v4 ACL
to POSIX, sort_pacl is called to sort the user ACEs and group ACEs.
Unfortunately, a minor bug causes the group sort to be skipped.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27 19:34:44 -04:00
J. Bruce Fields efe0cb6d5a nfsd4.1: common slot allocation size calculation
We do the same calculation in a couple places; use a helper function,
and add a little documentation, in the hopes of preventing bugs like
that fixed in the last patch.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27 19:34:43 -04:00
J. Bruce Fields dd829c4564 nfsd4.1: fix session memory use calculation
Unbalanced calculations on creation and destruction of sessions could
cause our estimate of cache memory used to become negative, sometimes
resulting in spurious SERVERFAULT returns to client CREATE_SESSION
requests.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27 19:34:43 -04:00
J. Bruce Fields e343eb0d60 Merge commit 'v2.6.32-rc5' into for-2.6.33 2009-10-27 18:45:17 -04:00
Alexey Dobriyan 828c09509b const: constify remaining file_operations
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Andy Adamson ddc04fd4d5 nfsd41: use sv_max_mesg for forechannel max sizes
ca_maxresponsesize and ca_maxrequest size include the RPC header.

sv_max_mesg is sv_max_payolad plus a page for overhead and is used in
svc_init_buffer to allocate server buffer space for both the request and reply.
Note that this means we can service an RPC compound that requires
ca_maxrequestsize (MAXWRITE) or ca_max_responsesize (MAXREAD) but that we do
not support an RPC compound that requires both ca_maxrequestsize and
ca_maxresponsesize.

Signed-off-by: Andy Adamson <andros@netapp.com>
[bfields@citi.umich.edu: more documentation updates]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:40:15 -04:00
J. Bruce Fields f39bde24b2 nfsd4: fix error return when pseudoroot missing
We really shouldn't hit this case at all, and forthcoming kernel and
nfs-utils changes should eliminate this case; if it does happen,
consider it a bug rather than reporting an error that doesn't really
make sense for the operation (since there's no reason for a server to be
accepting v4 traffic yet have no root filehandle).

Also move some exp_pseudoroot code into a helper function while we're
here.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:21:26 -04:00
J. Bruce Fields 289ede453e nfsd: minor nfsd_lookup cleanup
Break out some of nfsd_lookup_dentry into helper functions.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:07:53 -04:00
J. Bruce Fields fed8381126 nfsd4: cross mountpoints when looking up parents
3c394ddaa7 "nfsd4: nfsv4 clients should
cross mountpoints" forgot to handle lookups of parents directories.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:07:52 -04:00
Alexey Dobriyan 2bcd57ab61 headers: utsname.h redux
* remove asm/atomic.h inclusion from linux/utsname.h --
   not needed after kref conversion
 * remove linux/utsname.h inclusion from files which do not need it

NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 18:13:10 -07:00
James Morris 88e9d34c72 seq_file: constify seq_operations
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:29 -07:00
Linus Torvalds a87e84b5cd Merge branch 'for-2.6.32' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
  nfsd4: nfsv4 clients should cross mountpoints
  nfsd: revise 4.1 status documentation
  sunrpc/cache: avoid variable over-loading in cache_defer_req
  sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
  nfsd: return success for non-NFS4 nfs4_state_start
  nfsd41: Refactor create_client()
  nfsd41: modify nfsd4.1 backchannel to use new xprt class
  nfsd41: Backchannel: Implement cb_recall over NFSv4.1
  nfsd41: Backchannel: cb_sequence callback
  nfsd41: Backchannel: Setup sequence information
  nfsd41: Backchannel: Server backchannel RPC wait queue
  nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
  nfsd41: Backchannel: callback infrastructure
  nfsd4: use common rpc_cred for all callbacks
  nfsd4: allow nfs4 state startup to fail
  SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
  nfsd4: fix null dereference creating nfsv4 callback client
  nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
  nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
  sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
  ...
2009-09-22 07:54:33 -07:00
Alexey Dobriyan 7b021967c5 const: make lock_manager_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:25 -07:00
Steve Dickson 3c394ddaa7 nfsd4: nfsv4 clients should cross mountpoints
Allow NFS v4 clients to seamlessly cross mount point without
have to set either the 'crossmnt' or the 'nohide' export
options.

Signed-Off-By: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-21 16:02:25 -04:00
Ricardo Labiaga b09333c464 nfsd41: Refactor create_client()
Move common initialization of 'struct nfs4_client' inside create_client().

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>

[nfsd41: Remember the auth flavor to use for callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:52:13 -04:00
Alexandros Batsakis 3ddc8bf5f3 nfsd41: modify nfsd4.1 backchannel to use new xprt class
This patch enables the use of the nfsv4.1 backchannel.

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[initialize rpc_create_args.bc_xprt too]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:52:13 -04:00
Ricardo Labiaga 0421b5c55a nfsd41: Backchannel: Implement cb_recall over NFSv4.1
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[nfsd41: cb_recall callback]
[Share v4.0 and v4.1 back channel xdr]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Share v4.0 and v4.1 back channel xdr]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use nfsd4_cb_sequence for callback minorversion]
[nfsd41: conditionally decode_sequence in nfs4_xdr_dec_cb_recall]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Backchannel: Add sequence arguments to callback RPC arguments]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[pulled-in definition of nfsd4_cb_done]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:52:12 -04:00
Benny Halevy 2af73580b7 nfsd41: Backchannel: cb_sequence callback
Implement the cb_sequence callback conforming to draft-ietf-nfsv4-minorversion1

Note: highest slot id and target highest slot id do not have to be 0
as was previously implemented.  They can be greater than what the
nfs server sent if the client supports a larger slot table on the
backchannel.  At this point we just ignore that.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[Rework the back channel xdr using the shared v4.0 and v4.1 framework.]
Signed-off-by: Andy Adamson <andros@netapp.com>
[fixed indentation]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use nfsd4_cb_sequence for callback minorversion]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: fix verification of CB_SEQUENCE highest slot id[
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Backchannel: Remove old backchannel serialization]
[nfsd41: Backchannel: First callback sequence ID should be 1]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: decode_cb_sequence does not need to actually decode ignored fields]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:56 -04:00
Ricardo Labiaga 2a1d1b5938 nfsd41: Backchannel: Setup sequence information
Follows the model used by the NFS client.  Setup the RPC prepare and done
function pointers so that we can populate the sequence information if
minorversion == 1.  rpc_run_task() is then invoked directly just like
existing NFS client operations do.

nfsd4_cb_prepare() determines if the sequence information needs to be setup.
If the slot is in use, it adds itself to the wait queue.

nfsd4_cb_done() wakes anyone sleeping on the callback channel wait queue
after our RPC reply has been received.  It also sets the task message
result pointer to NULL to clearly indicate we're done using it.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[define and initialize cl_cb_seq_nr here]
[pulled out unused defintion of nfsd4_cb_done]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:56 -04:00
Ricardo Labiaga 199ff35e1c nfsd41: Backchannel: Server backchannel RPC wait queue
RPC callback requests will wait on this wait queue if the backchannel
is out of slots.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:55 -04:00
Ricardo Labiaga 132f97715c nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
Follow the model we use in the client. Make the sequence arguments
part of the regular RPC arguments.  None of the callbacks that are
soon to be implemented expect results that need to be passed back
to the caller, so we don't define a separate RPC results structure.
For session validation, the cb_sequence decoding will use a pointer
to the sequence arguments that are part of the RPC argument.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[define struct nfsd4_cb_sequence here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:55 -04:00
Andy Adamson 38524ab38f nfsd41: Backchannel: callback infrastructure
Keep the xprt used for create_session in cl_cb_xprt.
Mark cl_callback.cb_minorversion = 1 and remember
the client provided cl_callback.cb_prog rpc program number.
Use it to probe the callback path.

Use the client's network address to initialize as the
callback's address as expected by the xprt creation
routines.

Define xdr sizes and code nfs4_cb_compound header to be able
to send a null callback rpc.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[get callback minorversion from fore channel's]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pulled definition for cl_cb_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: set up backchannel's cb_addr]
[moved rpc_create_args init to "nfsd: modify nfsd4.1 backchannel to use new xprt class"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:55 -04:00
J. Bruce Fields 80fc015bdf nfsd4: use common rpc_cred for all callbacks
Callbacks are always made using the machine's identity, so we can use a
single auth_generic credential shared among callbacks to all clients and
let the rpc code take care of the rest.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:34 -04:00
J. Bruce Fields 29ab23cc5d nfsd4: allow nfs4 state startup to fail
The failure here is pretty unlikely, but we should handle it anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:33 -04:00
J. Bruce Fields 886e3b7fe6 nfsd4: fix null dereference creating nfsv4 callback client
On setting up the callback to the client, we attempt to use the same
authentication flavor the client did.  We find an rpc cred to use by
calling rpcauth_lookup_credcache(), which assumes that the given
authentication flavor has a credentials cache.  However, this is not
required to be true--in particular, auth_null does not use one.
Instead, we should call the auth's lookup_cred() method.

Without this, a client attempting to mount using nfsv4 and auth_null
triggers a null dereference.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-15 20:49:33 -04:00
Benny Halevy 4be36ca0ce nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-13 15:57:39 -04:00
Trond Myklebust ab3bbaa8b2 Merge branch 'nfs-for-2.6.32' 2009-09-11 14:59:37 -04:00
J. Bruce Fields aed100fafb nfsd: fix leak on error in nfsv3 readdir
Note the !dchild->d_inode case can leak the filehandle.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-04 15:48:00 -04:00
J. Bruce Fields 8177e6d6df nfsd: clean up readdirplus encoding
Make the return from compose_entry_fh() zero or an error, even though
the returned error isn't used, just to make the meaning of the return
immediately obvious.

Move some repeated code out of main function into helper.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-04 15:47:40 -04:00
J. Bruce Fields 1be10a88ca nfsd4: filehandle leak or error exit from fh_compose()
A number of callers (nfsd4_encode_fattr(), at least) don't bother to
release the filehandle returned to fh_compose() if fh_compose() returns
an error.  So, modify fh_compose() to release the filehandle before
returning an error.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-04 11:59:32 -04:00
Trond Myklebust 2671a4bf35 NFSd: Fix filehandle leak in exp_pseudoroot() and nfsd4_path()
nfsd4_path() allocates a temporary filehandle and then fails to free it
before the function exits, leaking reference counts to the dentry and
export that it refers to.

Also, nfsd4_lookupp() puts the result of exp_pseudoroot() in a temporary
filehandle which it releases on success of exp_pseudoroot() but not on
failure; fix exp_pseudoroot to ensure that on failure it releases the
filehandle before returning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-03 16:57:57 -04:00
J. Bruce Fields bc6c53d5a1 nfsd: move fsid_type choice out of fh_compose
More trivial cleanup.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-02 23:54:48 -04:00
J. Bruce Fields 8e498751f2 nfsd: move some of fh_compose into helper functions
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-02 23:53:51 -04:00
David Howells e0e817392b CRED: Add some configurable debugging [try #6]
Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
for credential management.  The additional code keeps track of the number of
pointers from task_structs to any given cred struct, and checks to see that
this number never exceeds the usage count of the cred struct (which includes
all references, not just those from task_structs).

Furthermore, if SELinux is enabled, the code also checks that the security
pointer in the cred struct is never seen to be invalid.

This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
kernel thread on seeing cred->security be a NULL pointer (it appears that the
credential struct has been previously released):

	http://www.kerneloops.org/oops.php?number=252883

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02 21:29:01 +10:00
Andy Adamson 557ce2646e nfsd41: replace page based DRC with buffer based DRC
Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
pages in cache.

Connectathon testing has shown that 1024 bytes for encoded compound operation
responses past the sequence operation is sufficient, 512 bytes is a little too
small. Set NFSD_SLOT_CACHE_SIZE to 1024.

Allocate memory for the session DRC in the CREATE_SESSION operation
to guarantee that the memory resource is available for caching responses.
Allocate each slot individually in preparation for slot table size negotiation.

Remove struct nfsd4_cache_entry and helper functions for the old page-based
DRC.

The iov_len calculation in nfs4svc_encode_compoundres is now always
correct.  Replay is now done in nfsd4_sequence under the state lock, so
the session ref count is only bumped on non-replay. Clean up the
nfs4svc_encode_compoundres session logic.

The nfsd4_compound_state statp pointer is also not used.
Remove nfsd4_set_statp().

Move useful nfsd4_cache_entry fields into nfsd4_slot.

Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-01 22:24:06 -04:00
Andy Adamson bdac86e215 nfsd41: replace nfserr_resource in pure nfs41 responses
nfserr_resource is not a legal error for NFSv4.1. Replace it with
nfserr_serverfault for EXCHANGE_ID and CREATE_SESSION processing.

We will also need to map nfserr_resource to other errors in routines shared
by NFSv4.0 and NFSv4.1

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-01 22:24:05 -04:00
Andy Adamson a8dfdaeb7a nfsd41: use session maxreqs for sequence target and highest slotid
This fixes a bug in the sequence operation reply.

The sequence operation returns the highest slotid it will accept in the future
in sr_highest_slotid, and the highest slotid it prefers the client to use.
Since we do not re-negotiate the session slot table yet, these should both
always be set to the session ca_maxrequests.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-01 22:24:05 -04:00
Andy Adamson a649637c73 nfsd41: bound forechannel drc size by memory usage
By using the requested ca_maxresponsesize_cached * ca_maxresponses to bound
a forechannel drc request size, clients can tailor a session to usage.

For example, an I/O session (READ/WRITE only) can have a much smaller
ca_maxresponsesize_cached (for only WRITE compound responses) and a lot larger
ca_maxresponses to service a large in-flight data window.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-01 22:24:05 -04:00
Trond Myklebust a06b1261bd NFSD: Fix a bug in the NFSv4 'supported attrs' mandatory attribute
The fact that the filesystem doesn't currently list any alternate
locations does _not_ imply that the fs_locations attribute should be
marked as "unsupported".

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-01 20:00:17 -04:00
Andy Adamson 468de9e54a nfsd41: expand solo sequence check
Compounds consisting of only a sequence operation don't need any
additional caching beyond the sequence information we store in the slot
entry.  Fix nfsd4_is_solo_sequence to identify this case correctly.

The additional check for a failed sequence in nfsd4_store_cache_entry()
is redundant, since the nfsd4_is_solo_sequence call lower down catches
this case.

The final ce_cachethis set in nfsd4_sequence is also redundant.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-28 12:20:15 -04:00
Frank Filz d8d0b85b11 nfsd4: remove ACE4_IDENTIFIER_GROUP flag from GROUP@ entry
RFC 3530 says "ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries
with these special identifiers.  When encoding entries with these
special identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to
zero."  It really shouldn't matter either way, but the point is that
this flag is used to distinguish named users from named groups (since
unix allows a group to have the same name as a user), so it doesn't
really make sense to use it on a special identifier such as this.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-27 17:35:41 -04:00
Benny Halevy aaf84eb95a nfsd41: renew_client must be called under the state lock
Until we work out the state locking so we can use a spin lock to protect
the cl_lru, we need to take the state_lock to renew the client.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Do not renew state on error]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Simplify exit code]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-27 17:17:40 -04:00
Ryusei Yamaguchi ed2d8aed52 knfsd: Replace lock_kernel with a mutex in nfsd pool stats.
lock_kernel() in knfsd was replaced with a mutex. The later
commit 03cf6c9f49 ("knfsd:
add file to export stats about nfsd pools") did not follow
that change. This patch fixes the issue.

Also move the get and put of nfsd_serv to the open and close methods
(instead of start and stop methods) to allow atomic check and increment
of reference count in the open method (where we can still return an
error).

Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Greg Banks <gnb@fmeh.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-25 12:39:37 -04:00
Frank Filz 55bb55dca0 nfsd: Fix unnecessary deny bits in NFSv4 ACL
The group deny entries end up denying tcy even though tcy was just
allowed by the allow entry. This appears to be due to:
	ace->access_mask = mask_from_posix(deny, flags);
instead of:
	ace->access_mask = deny_mask_from_posix(deny, flags);

Denying a previously allowed bit has no effect, so this shouldn't affect
behavior, but it's ugly.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-24 20:01:22 -04:00
Jeff Layton fbf4665f41 nfsd: populate sin6_scope_id on callback address with scopeid from rq_addr on SETCLIENTID call
When a SETCLIENTID call comes in, one of the args given is the svc_rqst.
This struct contains an rq_addr field which holds the address that sent
the call. If this is an IPv6 address, then we can use the sin6_scope_id
field in this address to populate the sin6_scope_id field in the
callback address.

AFAICT, the rq_addr.sin6_scope_id is non-zero if and only if the client
mounted the server's link-local address.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-21 11:27:44 -04:00
Jeff Layton 7077ecbabd nfsd: add support for NFSv4 callbacks over IPv6
The framework to add this is all in place. Now, add the code to allow
support for establishing a callback channel on an IPv6 socket.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-21 11:27:44 -04:00
Jeff Layton aa9a4ec770 nfsd: convert nfs4_cb_conn struct to hold address in sockaddr_storage
...rather than as a separate address and port fields. This will be
necessary for implementing callbacks over IPv6. Also, convert
gen_callback to use the standard rpcuaddr2sockaddr routine rather than
its own private one.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-21 11:27:43 -04:00
Jeff Layton 363168b4ea nfsd: make nfs4_client->cl_addr a struct sockaddr_storage
It's currently a __be32, which isn't big enough to hold an IPv6 address.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-21 11:27:43 -04:00
J. Bruce Fields e9dc122166 Merge branch 'nfs-for-2.6.32' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 into for-2.6.32-incoming
Conflicts:
	net/sunrpc/cache.c
2009-08-21 11:27:29 -04:00
Trond Myklebust f884dcaead Merge branch 'sunrpc_cache-for-2.6.32' into nfs-for-2.6.32 2009-08-10 17:45:58 -04:00
Trond Myklebust bc74b4f5e6 SUNRPC: Allow the cache_detail to specify alternative upcall mechanisms
For events that are rare, such as referral DNS lookups, it makes limited
sense to have a daemon constantly listening for upcalls on a channel. An
alternative in those cases might simply be to run the app that fills the
cache using call_usermodehelper_exec() and friends.

The following patch allows the cache_detail to specify alternative upcall
mechanisms for these particular cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:14:29 -04:00
Trond Myklebust 2da8ca26c6 NFSD: Clean up the idmapper warning...
What part of 'internal use' is so hard to understand?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:14:26 -04:00
Chuck Lever 4116092b92 NFSD: Support IPv6 addresses in write_failover_ip()
In write_failover_ip(), replace the sscanf() with a call to the common
sunrpc.ko presentation address parser.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:40 -04:00
Andy Adamson abfabf8caf nfsd41: encode replay sequence from the slot values
The sequence operation is not cached; always encode the sequence operation on
a replay from the slot table and session values. This simplifies the sessions
replay logic in nfsd4_proc_compound.

If this is a replay of a compound that was specified not to be cached, return
NFS4ERR_RETRY_UNCACHED_REP.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 16:12:34 -04:00
Andy Adamson c8647947f8 nfsd41: rename nfsd4_enc_uncached_replay
This function is only used for SEQUENCE replay.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:30:36 -04:00
Andy Adamson 49557cc74c nfsd41: Use separate DRC for setclientid
Instead of trying to share the generic 4.1 reply cache code for the
CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION
separately.

The nfs41 single slot clientid DRC holds the results of create session
processing.  CREATE_SESSION can be preceeded by a SEQUENCE operation
(an embedded CREATE_SESSION) and the create session single slot cache must be
maintained.  nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not
implement the replay of an embedded CREATE_SESSION.

The clientid DRC slot does not need the inuse, cachethis or other fields that
the multiple slot session cache uses.  Replace the clientid DRC cache struct
nfs4_slot cache with a new nfsd4_clid_slot cache.  Save the xdr struct
nfsd4_create_session into the cache at the end of processing, and on a replay,
replace the struct for the replay request with the cached version all while
under the state lock.

nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case
via the normal use of encode_operation.

Errors that do not change the create session cache:
A create session NFS4ERR_STALE_CLIENTID error means that a client record
(and associated create session slot) could not be found and therefore can't
be changed.  NFSERR_SEQ_MISORDERED errors do not change the slot cache.

All other errors get cached.

Remove the clientid DRC specific check in nfs4svc_encode_compoundres to
put the session only if cstate.session is set which will now always be true.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:30:29 -04:00
Andy Adamson 88e588d56a nfsd41: change check_slot_seqid parameters
For separation of session slot and clientid slot processing.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:30:23 -04:00
Andy Adamson 5261dcf8eb nfsd41: remove redundant forechannel max requests check
This check is done in set_forechannel_maxreqs.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:30:15 -04:00
Andy Adamson 0c193054a4 nfsd41: hange from page to memory based drc limits
NFSD_SLOT_CACHE_SIZE is the size of all encoded operation responses
(excluding the sequence operation) that we want to cache.

For now, keep NFSD_SLOT_CACHE_SIZE at PAGE_SIZE. It will be reduced
when the DRC is changed from page based to memory based.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:30:05 -04:00
Andy Adamson 6a14dd1a4f nfsd41: reserve less memory for DRC
Also remove a slightly misleading comment.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:29:59 -04:00
Andy Adamson b101ebbc39 nfsd41: minor set_forechannel_maxreqs cleanup
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:29:54 -04:00
Andy Adamson be98d1bbd1 nfsd41: reclaim DRC memory on session free
This fixes a leak which would eventually lock out new clients.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:29:48 -04:00
J. Bruce Fields 413d63d710 nfsd: minor write_pool_threads exit cleanup
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:29:41 -04:00
Eric Sesterhenn 2522a776c1 Fix memory leak in write_pool_threads
kmemleak produces the following warning

unreferenced object 0xc9ec02a0 (size 8):
  comm "cat", pid 19048, jiffies 730243
  backtrace:
    [<c01bf970>] create_object+0x100/0x240
    [<c01bfadb>] kmemleak_alloc+0x2b/0x60
    [<c01bcd4b>] __kmalloc+0x14b/0x270
    [<c02fd027>] write_pool_threads+0x87/0x1d0
    [<c02fcc08>] nfsctl_transaction_write+0x58/0x70
    [<c02fcc6f>] nfsctl_transaction_read+0x4f/0x60
    [<c01c2574>] vfs_read+0x94/0x150
    [<c01c297d>] sys_read+0x3d/0x70
    [<c0102d6b>] sysenter_do_call+0x12/0x32
    [<ffffffff>] 0xffffffff

write_pool_threads() only frees nthreads on error paths, in the success case
we leak it.

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-28 14:29:34 -04:00
Andy Adamson 4bd9b0f4af nfsd41: use globals for DRC limits
The version 4.1 DRC memory limit and tracking variables are server wide and
session specific. Replace struct svc_serv fields with globals.
Stop using the svc_serv sv_lock.

Add a spinlock to serialize access to the DRC limit management variables which
change on session creation and deletion (usage counter) or (future)
administrative action to adjust the total DRC memory limit.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-07-14 17:52:40 -04:00
Yu Zhiguo 9208faf297 NFSv4: ACL in operations 'open' and 'create' should be used
ACL in operations 'open' and 'create' is decoded but never be used.
It should be set as the initial ACL for the object according to RFC3530.
If error occurs when setting the ACL, just clear the ACL bit in the
returned attr bitmap.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-14 12:16:47 -04:00
Alexey Dobriyan 405f55712d headers: smp_lock.h redux
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:22:34 -07:00
David Howells 033a666ccb NFSD: Don't hold unrefcounted creds over call to nfsd_setuser()
nfsd_open() gets an unrefcounted pointer to the current process's effective
credentials at the top of the function, then calls nfsd_setuser() via
fh_verify() - which may replace and destroy the current process's effective
credentials - and then passes the unrefcounted pointer to dentry_open() - but
the credentials may have been destroyed by this point.

Instead, the value from current_cred() should be passed directly to
dentry_open() as one of its arguments, rather than being cached in a variable.

Possibly fh_verify() should return the creds to use.

This is a regression introduced by
745ca2475a "CRED: Pass credentials through
dentry_open()".

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-Verified-By: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-07-03 10:21:10 -04:00
Linus Torvalds 7e0338c0de Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
  SUNRPC: Fix the TCP server's send buffer accounting
  nfsd41: Backchannel: minorversion support for the back channel
  nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
  nfsd41: Remove ip address collision detection case
  nfsd: optimise the starting of zero threads when none are running.
  nfsd: don't take nfsd_mutex twice when setting number of threads.
  nfsd41: sanity check client drc maxreqs
  nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
  NFS: kill off complicated macro 'PROC'
  sunrpc: potential memory leak in function rdma_read_xdr
  nfsd: minor nfsd_vfs_write cleanup
  nfsd: Pull write-gathering code out of nfsd_vfs_write
  nfsd: track last inode only in use_wgather case
  sunrpc: align cache_clean work's timer
  nfsd: Use write gathering only with NFSv2
  NFSv4: kill off complicated macro 'PROC'
  NFSv4: do exact check about attribute specified
  knfsd: remove unreported filehandle stats counters
  knfsd: fix reply cache memory corruption
  knfsd: reply cache cleanups
  ...
2009-06-22 12:55:50 -07:00
Andy Adamson ab52ae6db0 nfsd41: Backchannel: minorversion support for the back channel
Prepare to share backchannel code with NFSv4.1.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[nfsd41: use nfsd4_cb_sequence for callback minorversion]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 18:33:57 -07:00
Andy Adamson ef52bff840 nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
Mimic the client and prepare to share the back channel xdr with NFSv4.1.
Bump the number of operations in each encode routine, then backfill the
number of operations.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 18:33:57 -07:00
Mike Sager 6ddbbbfe52 nfsd41: Remove ip address collision detection case
Verified that cthon and pynfs exchange id tests pass (except for the
two expected fails: EID8 and EID50)

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 17:43:53 -07:00
NeilBrown 671e1fcf63 nfsd: optimise the starting of zero threads when none are running.
Currently, if we ask to set then number of nfsd threads to zero when
there are none running, we set up all the sockets and register the
service, and then tear it all down again.
This is pointless.

So detect that case and exit promptly.
(also remove an assignment to 'error' which was never used.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
2009-06-18 09:42:41 -07:00
NeilBrown 82e12fe924 nfsd: don't take nfsd_mutex twice when setting number of threads.
Currently when we write a number to 'threads' in nfsdfs,
we take the nfsd_mutex, update the number of threads, then take the
mutex again to read the number of threads.

Mostly this isn't a big deal.  However if we are write '0', and
portmap happens to be dead, then we can get unpredictable behaviour.
If the nfsd threads all got killed quickly and the last thread is
waiting for portmap to respond, then the second time we take the mutex
we will block waiting for the last thread.
However if the nfsd threads didn't die quite that fast, then there
will be no contention when we try to take the mutex again.

Unpredictability isn't fun, and waiting for the last thread to exit is
pointless, so avoid taking the lock twice.
To achieve this, get nfsd_svc return a non-negative number of active
threads when not returning a negative error.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 09:40:31 -07:00
Andy Adamson 5d77ddfbcb nfsd41: sanity check client drc maxreqs
Ensure the client requested maximum requests are between 1 and
NFSD_MAX_SLOTS_PER_SESSION

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-16 17:13:16 -07:00
Alexandros Batsakis 6c18ba9f5e nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
the change is valid for both the forechannel and the backchannel (currently dummy)

Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-16 10:13:45 -07:00
Yu Zhiguo b9081d90f5 NFS: kill off complicated macro 'PROC'
kill off obscure macro 'PROC' of NFSv2&3 in order to make the code more clear.

Among other things, this makes it simpler to grep for callers of these
functions--something which has frequently caused confusion among nfs
developers.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 19:34:32 -07:00
J. Bruce Fields e4636d535e nfsd: minor nfsd_vfs_write cleanup
There's no need to check host_err >= 0 every time here when we could
check host_err < 0 once, following the usual kernel style.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 19:18:34 -07:00
J. Bruce Fields d911df7b8d nfsd: Pull write-gathering code out of nfsd_vfs_write
This is a relatively self-contained piece of code that handles a special
case--move it to its own function.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 18:54:05 -07:00
J. Bruce Fields 9d2a3f31d6 nfsd: track last inode only in use_wgather case
Updating last_ino and last_dev probably isn't useful in the !use_wgather
case.

Also remove some pointless ifdef'd-out code.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 18:52:47 -07:00
Trond Myklebust 48e03bc515 nfsd: Use write gathering only with NFSv2
NFSv3 and above can use unstable writes whenever they are sending more
than one write, rather than relying on the flaky write gathering
heuristics. More often than not, write gathering is currently getting it
wrong when the NFSv3 clients are sending a single write with FILE_SYNC
for efficiency reasons.

This patch turns off write gathering for NFSv3/v4, and ensures that
it only applies to the one case that can actually benefit: namely NFSv2.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-15 18:14:57 -07:00
J. Bruce Fields 7eef4091a6 Merge commit 'v2.6.30' into for-2.6.31 2009-06-15 18:08:07 -07:00
Al Viro 9393bd07cf switch follow_down()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:01 -04:00
Al Viro bab77ebf51 switch follow_up() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:00 -04:00
Al Viro e64c390ca0 switch rqst_exp_parent()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:00 -04:00
Al Viro 91c9fa8f75 switch rqst_exp_get_by_name()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:00 -04:00
Al Viro 5bf3bd2b5c switch exp_parent() to struct path
... and lose the always-NULL last argument (non-NULL case had been
split off a while ago).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:00 -04:00
Al Viro 55430e2ece nfsd struct path use: exp_get_by_name()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:35:59 -04:00
James Morris 0b4ec6e4e0 Merge branch 'master' into next 2009-06-09 09:27:53 +10:00
Yu Zhiguo 0a93a47f04 NFSv4: kill off complicated macro 'PROC'
J. Bruce Fields wrote:
...
> (This is extremely confusing code to track down: note that
> proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC()
> macro at the end of fs/nfsd/nfs4proc.c.  Which means, for example, that
> grepping for nfs4svc_decode_compoundargs() gets you nowhere.  Patches to
> kill off that macro would be welcomed....)

the macro 'PROC' is complicated and obscure, it had better
be killed off in order to make the code more clear.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-01 18:09:20 -04:00
Yu Zhiguo 3c8e03166a NFSv4: do exact check about attribute specified
Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is
not supported in current environment.
Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check.

This bug is found when do newpynfs tests. The names of the tests that failed
are following:
  CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s
  OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s

Add function do_check_fattr() to do exact check:
1, Check attribute specified is supported by the NFSv4 server or not.
2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported
   in current environment or not.
3, Check attribute specified is writable or not.

step 1 and 3 are done in function nfsd4_decode_fattr() but removed
to this function now.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-01 18:01:54 -04:00
Mimi Zohar 14dba5331b integrity: nfsd imbalance bug fix
An nfsd exported file is opened/closed by the kernel causing the
integrity imbalance message.

Before a file is opened, there normally is permission checking, which
is done in inode_permission().  However, as integrity checking requires
a dentry and mount point, which is not available in inode_permission(),
the integrity (permission) checking must be called separately.

In order to detect any missing integrity checking calls, we keep track
of file open/closes.  ima_path_check() increments these counts and
does the integrity (permission) checking. As a result, the number of
calls to ima_path_check()/ima_file_free() should be balanced.  An extra
call to fput(), indicates the file could have been accessed without first
calling ima_path_check().

In nfsv3 permission checking is done once, followed by multiple reads,
which do an open/close for each read.  The integrity (permission) checking
call should be in nfsd_permission() after the inode_permission() call, but
as there is no correlation between the number of permission checking and
open calls, the integrity checking call should not increment the counters,
but defer it to when the file is actually opened.

This patch adds:
- integrity (permission) checking for nfsd exported files in nfsd_permission().
- a call to increment counts for files opened by nfsd.

This patch has been updated to return the nfs error types.

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-05-28 09:32:43 +10:00
Wei Yongjun a0d24b295a nfsd: fix hung up of nfs client while sync write data to nfs server
Commit 'Short write in nfsd becomes a full write to the client'
(31dec2538e) broken the sync write.
With the following commands to reproduce:

  $ mount -t nfs -o sync 192.168.0.21:/nfsroot /mnt
  $ cd /mnt
  $ echo aaaa > temp.txt

Then nfs client is hung up.

In SYNC mode the server alaways return the write count 0 to the
client. This is because the value of host_err in nfsd_vfs_write()
will be overwrite in SYNC mode by 'host_err=nfsd_sync(file);',
and then we return host_err(which is now 0) as write count.

This patch fixed the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 17:40:06 -04:00
Greg Banks 1dbd0d53f3 knfsd: remove unreported filehandle stats counters
The file nfsfh.c contains two static variables nfsd_nr_verified and
nfsd_nr_put.  These are counters which are incremented as a side
effect of the fh_verify() fh_compose() and fh_put() operations,
i.e. at least twice per NFS call for any non-trivial workload.
Needless to say this makes the cacheline that contains them (and any
other innocent victims) a very hot contention point indeed under high
call-rate workloads on multiprocessor NFS server.  It also turns out
that these counters are not used anywhere.  They're not reported to
userspace, they're not used in logic, they're not even exported from
the object file (let alone the module).  All they do is waste CPU time.

So this patch removes them.

Tests on a 16 CPU Altix A4700 with 2 10gige Myricom cards, configured
separately (no bonding).  Workload is 640 client threads doing directory
traverals with random small reads, from server RAM.

Before
======

Kernel profile:

  %   cumulative   self              self     total
 time   samples   samples    calls   1/call   1/call  name
  6.05   2716.00  2716.00    30406     0.09     1.02  svc_process
  4.44   4706.00  1990.00     1975     1.01     1.01  spin_unlock_irqrestore
  3.72   6376.00  1670.00     1666     1.00     1.00  svc_export_put
  3.41   7907.00  1531.00     1786     0.86     1.02  nfsd_ofcache_lookup
  3.25   9363.00  1456.00    10965     0.13     1.01  nfsd_dispatch
  3.10  10752.00  1389.00     1376     1.01     1.01  nfsd_cache_lookup
  2.57  11907.00  1155.00     4517     0.26     1.03  svc_tcp_recvfrom
  ...
  2.21  15352.00  1003.00     1081     0.93     1.00  nfsd_choose_ofc  <----
  ^^^^

Here the function nfsd_choose_ofc() reads a global variable
which by accident happened to be located in the same cacheline as
nfsd_nr_verified.

Call rate:

nullarbor:~ # pmdumptext nfs3.server.calls
...
Thu Dec 13 00:15:27     184780.663
Thu Dec 13 00:15:28     184885.881
Thu Dec 13 00:15:29     184449.215
Thu Dec 13 00:15:30     184971.058
Thu Dec 13 00:15:31     185036.052
Thu Dec 13 00:15:32     185250.475
Thu Dec 13 00:15:33     184481.319
Thu Dec 13 00:15:34     185225.737
Thu Dec 13 00:15:35     185408.018
Thu Dec 13 00:15:36     185335.764

After
=====

kernel profile:

  %   cumulative   self              self     total
 time   samples   samples    calls   1/call   1/call  name
  6.33   2813.00  2813.00    29979     0.09     1.01  svc_process
  4.66   4883.00  2070.00     2065     1.00     1.00  spin_unlock_irqrestore
  4.06   6687.00  1804.00     2182     0.83     1.00  nfsd_ofcache_lookup
  3.20   8110.00  1423.00    10932     0.13     1.00  nfsd_dispatch
  3.03   9456.00  1346.00     1343     1.00     1.00  nfsd_cache_lookup
  2.62  10622.00  1166.00     4645     0.25     1.01  svc_tcp_recvfrom
[...]
  0.10  42586.00    44.00       74     0.59     1.00  nfsd_choose_ofc  <--- HA!!
  ^^^^

Call rate:

nullarbor:~ # pmdumptext nfs3.server.calls
...
Thu Dec 13 01:45:28     194677.118
Thu Dec 13 01:45:29     193932.692
Thu Dec 13 01:45:30     194294.364
Thu Dec 13 01:45:31     194971.276
Thu Dec 13 01:45:32     194111.207
Thu Dec 13 01:45:33     194999.635
Thu Dec 13 01:45:34     195312.594
Thu Dec 13 01:45:35     195707.293
Thu Dec 13 01:45:36     194610.353
Thu Dec 13 01:45:37     195913.662
Thu Dec 13 01:45:38     194808.675

i.e. about a 5.3% improvement in call rate.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Reviewed-by: David Chinner <dgc@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 14:14:03 -04:00
Greg Banks cf0a586cf4 knfsd: fix reply cache memory corruption
Fix a regression in the reply cache introduced when the code was
converted to use proper Linux lists.  When a new entry needs to be
inserted, the case where all the entries are currently being used
by threads is not correctly detected.  This can result in memory
corruption and a crash.  In the current code this is an extremely
unlikely corner case; it would require the machine to have 1024
nfsd threads and all of them to be busy at the same time.  However,
upcoming reply cache changes make this more likely; a crash due to
this problem was actually observed in field.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 14:14:02 -04:00
Greg Banks fca4217c5b knfsd: reply cache cleanups
Make REQHASH() an inline function.  Rename hash_list to cache_hash.
Fix an obsolete comment.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 14:14:02 -04:00
J. Bruce Fields 8daed1e549 nfsd: silence lockdep warning
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-11 17:23:14 -04:00
Wang Chen 02cb2858db nfsd: nfs4_stat_init cleanup
Save some loop time.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-06 16:22:41 -04:00
J. Bruce Fields b2c0cea6b1 nfsd4: check for negative dentry before use in nfsv4 readdir
After 2f9092e102 "Fix i_mutex vs.  readdir
handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
an entry may be removed between the first mutex_unlock and the second
mutex_lock. In this case, lookup_one_len() will return a negative
dentry.  Check for this case to avoid a NULL dereference.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Cc: stable@kernel.org
2009-05-06 16:16:36 -04:00
Randy Dunlap 9064caae8f nfsd: use C99 struct initializers
Eliminate 56 sparse warnings like this one:

fs/nfsd/nfs4xdr.c:1331:15: warning: obsolete array initializer, use C99 syntax

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-03 15:09:12 -04:00
J. Bruce Fields 63e4863fab nfsd4: make recall callback an asynchronous rpc
As with the probe, this removes the need for another kthread.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-03 15:08:56 -04:00
Andy Adamson ccecee1e5e nfsd41: slots are freed with session
The session and slots are allocated all in one piece.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-03 14:45:02 -04:00
J. Bruce Fields 3aea09dc91 nfsd4: track recall retries in nfs4_delegation
Move this out of a local variable into the nfs4_delegation object in
preparation for making this an async rpc call (at which point we'll need
any state like this in a common object that's preserved across function
calls).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 20:11:12 -04:00
J. Bruce Fields 6707bd3d42 nfsd4: remove unused dl_trunc
There's no point in keeping this field around--it's always zero.

(Background: the protocol allows you to tell the client that the file is
about to be truncated, as an optimization to save the client from
writing back dirty pages that will just be discarded.  We don't
implement this hint.  If we do some day, adding this field back in will
be the least of the work involved.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 19:57:46 -04:00
J. Bruce Fields b53d40c507 nfsd4: eliminate struct nfs4_cb_recall
The nfs4_cb_recall struct is used only in nfs4_delegation, so its
pointer to the containing delegation is unnecessary--we could just use
container_of().

But there's no real reason to have this a separate struct at all--just
move these fields to nfs4_delegation.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 19:50:00 -04:00
J. Bruce Fields c237dc0303 nfsd4: rename callback struct to cb_conn
I want to use the name for a struct that actually does represent a
single callback.

(Actually, I've never been sure it helps to a separate struct for the
callback information.  Some day maybe those fields could just be dumped
into struct nfs4_client.  I don't know.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-01 17:31:44 -04:00
J. Bruce Fields e300a63ce4 nfsd4: replace callback thread by asynchronous rpc
We don't really need a synchronous rpc, and moving to an asynchronous
rpc allows us to do without this extra kthread.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 17:10:53 -04:00
J. Bruce Fields 3cef9ab266 nfsd4: lookup up callback cred only once
Lookup the callback cred once and then use it for all subsequent
callbacks.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:45:03 -04:00
J. Bruce Fields ecdd03b791 nfsd4: create rpc callback client from server thread
The code is a little simpler, and it should be easier to avoid races, if
we just do all rpc client creation/destruction from nfsd or laundromat
threads and do only the rpc calls themselves asynchronously.  The rpc
creation doesn't involve any significant waiting (it doesn't call the
client, for example), so there's no reason not to do this.

Also don't bother destroying the client on failure of the rpc null
probe.  We may want to retry the probe later anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:53 -04:00
J. Bruce Fields e1cab5a589 nfsd4: set cb_client inside setup_callback_client
This is just a minor code simplification.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:47 -04:00
J. Bruce Fields 595947acaa nfsd4: set shorter timeout
We tried to do something overly complicated with the callback rpc
timeouts here.  And they're wrong--the result is that by the time a
single callback times out, it's already too late to tell the client
(using the cb_path_down return to RENEW) that the callback is down.

Use a much shorter, simpler timeout.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:40 -04:00
J. Bruce Fields f64f79ea5f nfsd4: setclientid_confirm callback-change fixes
This setclientid_confirm case should allow the client to change
callbacks, but it currently has a dummy implementation that just turns
off callbacks completely.  That dummy implementation isn't completely
correct either, though:

	- There's no need to remove any client recovery directory in
	  this case.
	- New clientid confirm verifiers should be generated (and
	  returned) in setclientid; there's no need to generate a new
	  one here.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-29 16:44:34 -04:00
J. Bruce Fields b8fd47aefa nfsd: quiet compile warning
Stephen Rothwell said:
"Today's linux-next build (powerpc ppc64_defconfig) produced this new
warning:

fs/nfsd/nfs4state.c: In function 'EXPIRED_STATEID':
fs/nfsd/nfs4state.c:2757: warning: comparison of distinct pointer types lacks a cast

Caused by commit 78155ed75f ("nfsd4:
distinguish expired from stale stateids")."

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
2009-04-29 11:36:17 -04:00