Commit graph

717 commits

Author SHA1 Message Date
Chuck Lever 1cd9cd161c NFSD: Fix BUG during NFSD shutdown processing
The Linux NFS server can be started via a user-space write to
/proc/fs/nfs/threads or to /proc/fs/nfs/portlist.  In the first case,
all default listeners are started (both UDP and TCP).  In the second,
a listener is started only for one specified transport.

The NFS server has to make sure lockd stays up until the last listener
transport goes away.  To support both start-up interfaces, it should
do one lockd_up() for each NFSD listener.

The nfsd_init_socks() function used to do one lockd_up() call for each
svc_create_xprt().  Recently commit
26a4140923 mistakenly changed
nfsd_init_socks() to do only one lockd_up() call even though it still
does two svc_create_xprt() calls.

The end result is a lockd_down() BUG during NFSD shutdown processing
because nfsd_last_threads() does a lockd_down() call for each entry
on the sv_permsocks list, but the start-up code doesn't do a matching
number of lockd_up() calls.

Add a second lockd_up() in nfsd_init_socks() to make sure the number
of lockd_up() calls matches the number of entries on the NFS servers's
sv_permsocks list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-22 13:36:05 -04:00
Chuck Lever 2937391385 NLM: Remove unused argument from svc_addsock() function
Clean up: The svc_addsock() function no longer uses its "proto"
argument, so remove it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-04 17:12:27 -04:00
Chuck Lever 26a4140923 NLM: Remove "proto" argument from lockd_up()
Clean up: Now that lockd_up() starts listeners for both transports, the
"proto" argument is no longer needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-04 17:12:27 -04:00
J. Bruce Fields af558e33be nfsd: common grace period control
Rewrite grace period code to unify management of grace period across
lockd and nfsd.  The current code has lockd and nfsd cooperate to
compute a grace period which is satisfactory to them both, and then
individually enforce it.  This creates a slight race condition, since
the enforcement is not coordinated.  It's also more complicated than
necessary.

Here instead we have lockd and nfsd each inform common code when they
enter the grace period, and when they're ready to leave the grace
period, and allow normal locking only after both of them are ready to
leave.

We also expect the locks_start_grace()/locks_end_grace() interface here
to be simpler to build on for future cluster/high-availability work,
which may require (for example) putting individual filesystems into
grace, or enforcing grace periods across multiple cluster nodes.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-03 16:19:02 -04:00
Benny Halevy d5b337b487 nfsd: use nfs client rpc callback program
since commit ff7d9756b5
"nfsd: use static memory for callback program and stats"
do_probe_callback uses a static callback program
(NFS4_CALLBACK) rather than the one set in clp->cl_callback.cb_prog
as passed in by the client in setclientid (4.0)
or create_session (4.1).

This patches introduces rpc_create_args.prognumber that allows
overriding program->number when creating rpc_clnt.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:40 -04:00
Benny Halevy 97eb89bb0e nfsd: do_probe_callback should not clear rpc stats
Now that cb_stats are static (since commit
ff7d9756b5)
there's no need to clear them.

Initially I thought it might make sense to do
that every callback probing but since the stats
are per-program and they are shared between possibly
several client callback instances, zeroing them out
seems like the wrong thing to do.

Note that that commit also introduced a bug
since stats.program is also being cleared in the process
and it is not restored after the memset as it used to be.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:40 -04:00
Jeff Layton 54a66e5480 knfsd: allocate readahead cache in individual chunks
I had a report from someone building a large NFS server that they were
unable to start more than 585 nfsd threads. It was reported against an
older kernel using the slab allocator, and I tracked it down to the
large allocation in nfsd_racache_init failing.

It appears that the slub allocator handles large allocations better,
but large contiguous allocations can often be problematic. There
doesn't seem to be any reason that the racache has to be allocated as a
single large chunk. This patch breaks this up so that the racache is
built up from separate allocations.

(Thanks also to Takashi Iwai for a bugfix.)

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Takashi Iwai <tiwai@suse.de>
2008-09-29 17:56:59 -04:00
Benny Halevy e31a1b662f nfsd: nfs4xdr decode_stateid helper function
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:59 -04:00
Benny Halevy 5bf8c6911f nfsd: properly xdr-decode NFS4_OPEN_CLAIM_DELEGATE_CUR stateid
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:58 -04:00
Benny Halevy 1b6b2257dc nfsd: don't declare p in ENCODE_SEQID_OP_HEAD
After using the encode_stateid helper the "p" pointer declared
by ENCODE_SEQID_OP_HEAD is warned as unused.
In the single site where it is still needed it can be declared
separately using the ENCODE_HEAD macro.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:58 -04:00
Benny Halevy e2f282b9f0 nfsd: nfs4xdr encode_stateid helper function
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:58 -04:00
Benny Halevy 5033b77a93 nfsd: fix nfsd4_encode_open buffer space reservation
nfsd4_encode_open first reservation is currently for 36 + sizeof(stateid_t)
while it writes after the stateid a cinfo (20 bytes) and 5 more 4-bytes
words, for a total of 40 + sizeof(stateid_t).

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:58 -04:00
Benny Halevy c47b2ca42e nfsd: properly xdr-encode deleg stateid returned from open
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:58 -04:00
Benny Halevy 8e40741494 nfsd: properly xdr-encode stateid4.seqid as uint32_t for cb_recall
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:57 -04:00
J. Bruce Fields 04716e6621 nfsd: permit unauthenticated stat of export root
RFC 2623 section 2.3.2 permits the server to bypass gss authentication
checks for certain operations that a client may perform when mounting.
In the case of a client that doesn't have some form of credentials
available to it on boot, this allows it to perform the mount unattended.
(Presumably real file access won't be needed until a user with
credentials logs in.)

Being slightly more lenient allows lots of old clients to access
krb5-only exports, with the only loss being a small amount of
information leaked about the root directory of the export.

This affects only v2 and v3; v4 still requires authentication for all
access.

Thanks to Peter Staubach testing against a Solaris client, which
suggesting addition of v3 getattr, to the list, and to Trond for noting
that doing so exposes no additional information.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
2008-09-29 17:56:56 -04:00
Chuck Lever e851db5b05 SUNRPC: Add address family field to svc_serv data structure
Introduce and initialize an address family field in the svc_serv structure.

This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:56 -04:00
J. Bruce Fields 91b80969ba nfsd: fix buffer overrun decoding NFSv4 acl
The array we kmalloc() here is not large enough.

Thanks to Johann Dahm and David Richter for bug report and testing.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Richter <richterd@citi.umich.edu>
Tested-by: Johann Dahm <jdahm@umich.edu>
2008-09-01 14:24:24 -04:00
Andy Adamson c228c24bf1 nfsd: fix compound state allocation error handling
Move the cstate_alloc call so that if it fails, the response is setup to
encode the NFS error. The out label now means that the
nfsd4_compound_state has not been allocated.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-01 14:17:48 -04:00
Linus Torvalds b0e0c9e7f6 Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux:
  fs/nfsd/export.c: Adjust error handling code involving auth_domain_put
  MAINTAINERS: mention lockd and sunrpc in nfs entries
  lockd: trivial sparse endian annotations
2008-08-12 16:39:22 -07:00
Adrian Bunk f1c7f79b6a [NFSD] uninline nfsd4_op_name()
There doesn't seem to be a compelling reason why nfsd4_op_name() is
marked as "inline":

It's only used in a dprintk(), and as long as it has only one caller
non-ancient gcc versions anyway inline it automatically.

This patch fixes the following compile error with gcc 3.4:

  ...
    CC      fs/nfsd/nfs4proc.o
  nfs4proc.c: In function `nfsd4_proc_compound':
  nfs4proc.c:854: sorry, unimplemented: inlining failed in call to
  nfs4proc.c:897: sorry, unimplemented: called from here
  make[3]: *** [fs/nfsd/nfs4proc.o] Error 1

Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
[ Also made it "const char *"  - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-08 11:22:19 -07:00
Julia Lawall 53e6d8d182 fs/nfsd/export.c: Adjust error handling code involving auth_domain_put
Once clp is assigned, it never becomes NULL, so we can make a label for it
in the error handling code.  Because the call to path_lookup follows the
call to auth_domain_find, its error handling code should jump to this new
label.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r@
expression x,E;
statement S;
position p1,p2,p3;
@@

(
if ((x = auth_domain_find@p1(...)) == NULL || ...) S
|
x = auth_domain_find@p1(...)
... when != x
if (x == NULL || ...) S
)
<...
if@p3 (...) { ... when != auth_domain_put(x)
                  when != if (x) { ... auth_domain_put(x); ...}
    return@p2 ...;
}
...>
(
return x;
|
return 0;
|
x = E
|
E = x
|
auth_domain_put(x)
)

@exists@
position r.p1,r.p2,r.p3;
expression x;
int ret != 0;
statement S;
@@

* x = auth_domain_find@p1(...)
  <...
* if@p3 (...)
  S
  ...>
* return@p2 \(NULL\|ret\);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-30 13:20:20 -04:00
Al Viro 3f8206d496 [PATCH] get rid of indirect users of namei.h
fs.h needs path.h, not namei.h; nfs_fs.h doesn't need it at all.
Several places in the tree needed direct include.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:42 -04:00
Al Viro f419a2e3b6 [PATCH] kill nameidata passing to permission(), rename to inode_permission()
Incidentally, the name that gives hundreds of false positives on grep
is not a good idea...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:31 -04:00
Miklos Szeredi db2e747b14 [patch 5/5] vfs: remove mode parameter from vfs_symlink()
Remove the unused mode parameter from vfs_symlink and callers.

Thanks to Tetsuo Handa for noticing.

CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-07-26 20:53:18 -04:00
Miklos Szeredi cc77b1521d lockd: dont return EAGAIN for a permanent error
Fix nlm_fopen() to return NLM_FAILED (or NLM_LCK_DENIED_NOLOCKS) instead
of NLM_LCK_DENIED.  The latter means the lock request failed because of a
conflicting lock (i.e.  a temporary error), which is wrong in this case.

Also fix the client to return ENOLCK instead of EAGAIN if a blocking lock
request returns with NLM_LOCK_DENIED.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:47 -07:00
Linus Torvalds 14b395e35d Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits)
  nfsd: nfs4xdr.c do-while is not a compound statement
  nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c
  lockd: Pass "struct sockaddr *" to new failover-by-IP function
  lockd: get host reference in nlmsvc_create_block() instead of callers
  lockd: minor svclock.c style fixes
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock
  lockd: nlm_release_host() checks for NULL, caller needn't
  file lock: reorder struct file_lock to save space on 64 bit builds
  nfsd: take file and mnt write in nfs4_upgrade_open
  nfsd: document open share bit tracking
  nfsd: tabulate nfs4 xdr encoding functions
  nfsd: dprint operation names
  svcrdma: Change WR context get/put to use the kmem cache
  svcrdma: Create a kmem cache for the WR contexts
  svcrdma: Add flush_scheduled_work to module exit function
  svcrdma: Limit ORD based on client's advertised IRD
  svcrdma: Remove unused wait q from svcrdma_xprt structure
  svcrdma: Remove unneeded spin locks from __svc_rdma_free
  svcrdma: Add dma map count and WARN_ON
  ...
2008-07-20 21:21:46 -07:00
Harvey Harrison 5108b27651 nfsd: nfs4xdr.c do-while is not a compound statement
The WRITEMEM macro produces sparse warnings of the form:
fs/nfsd/nfs4xdr.c:2668:2: warning: do-while statement is not a compound statement

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-18 15:18:35 -04:00
J. Bruce Fields ad1060c89c nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c
Thanks to problem report and original patch from Harvey Harrison.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>
2008-07-18 15:04:58 -04:00
Chuck Lever 367c8c7bd9 lockd: Pass "struct sockaddr *" to new failover-by-IP function
Pass a more generic socket address type to nlmsvc_unlock_all_by_ip() to
allow for future support of IPv6.  Also provide additional sanity
checking in failover_unlock_ip() when constructing the server's IP
address.

As an added bonus, provide clean kerneldoc comments on related NLM
interfaces which were recently added.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-15 16:11:29 -04:00
Olga Kornievskaia b6b6152c46 rpc: bring back cl_chatty
The cl_chatty flag alows us to control whether a given rpc client leaves

	"server X not responding, timed out"

messages in the syslog.  Such messages make sense for ordinary nfs
clients (where an unresponsive server means applications on the
mountpoint are probably hanging), but not for the callback client (which
can fail more commonly, with the only result just of disabling some
optimizations).

Previously cl_chatty was removed, do to lack of users; reinstate it, and
use it for the nfsd's callback client.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:10 -04:00
Benny Halevy e518f0560a nfsd: take file and mnt write in nfs4_upgrade_open
testing with newpynfs revealed this warning:
Jul  3 07:32:50 buml kernel: writeable file with no mnt_want_write()
Jul  3 07:32:50 buml kernel: ------------[ cut here ]------------
Jul  3 07:32:50 buml kernel: WARNING: at /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/include/linux/fs.h:855 drop_file_write_access+0x6b/0x7e()
Jul  3 07:32:50 buml kernel: Modules linked in: nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc
Jul  3 07:32:50 buml kernel: Call Trace:
Jul  3 07:32:50 buml kernel: 6eaadc88:  [<6002f471>] warn_on_slowpath+0x54/0x8e
Jul  3 07:32:50 buml kernel: 6eaadcc8:  [<601b790d>] printk+0xa0/0x793
Jul  3 07:32:50 buml kernel: 6eaadd38:  [<601b6205>] __mutex_lock_slowpath+0x1db/0x1ea
Jul  3 07:32:50 buml kernel: 6eaadd68:  [<7107d4d5>] nfs4_preprocess_seqid_op+0x2a6/0x31c [nfsd]
Jul  3 07:32:50 buml kernel: 6eaadda8:  [<60078dc9>] drop_file_write_access+0x6b/0x7e
Jul  3 07:32:50 buml kernel: 6eaaddc8:  [<710804e4>] nfsd4_open_downgrade+0x114/0x1de [nfsd]
Jul  3 07:32:50 buml kernel: 6eaade08:  [<71076215>] nfsd4_proc_compound+0x1ba/0x2dc [nfsd]
Jul  3 07:32:50 buml kernel: 6eaade48:  [<71068221>] nfsd_dispatch+0xe5/0x1c2 [nfsd]
Jul  3 07:32:50 buml kernel: 6eaade88:  [<71312f81>] svc_process+0x3fd/0x714 [sunrpc]
Jul  3 07:32:50 buml kernel: 6eaadea8:  [<60039a81>] kernel_sigprocmask+0xf3/0x100
Jul  3 07:32:50 buml kernel: 6eaadee8:  [<7106874b>] nfsd+0x182/0x29b [nfsd]
Jul  3 07:32:50 buml kernel: 6eaadf48:  [<60021cc9>] run_kernel_thread+0x41/0x4a
Jul  3 07:32:50 buml kernel: 6eaadf58:  [<710685c9>] nfsd+0x0/0x29b [nfsd]
Jul  3 07:32:50 buml kernel: 6eaadf98:  [<60021cb0>] run_kernel_thread+0x28/0x4a
Jul  3 07:32:50 buml kernel: 6eaadfc8:  [<60013829>] new_thread_handler+0x72/0x9c
Jul  3 07:32:50 buml kernel:
Jul  3 07:32:50 buml kernel: ---[ end trace 2426dd7cb2fba3bf ]---

Bruce Fields suggested this (Thanks!):
maybe we need to be doing a mnt_want_write on open_upgrade and mnt_put_write on downgrade?

This patch adds a call to mnt_want_write and file_take_write (which is
doing the actual work).

The counter-calls mnt_drop_write a file_release_write are now being properly
called by drop_file_write_access in the exact path printed by the warning
above.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-07 15:23:34 -04:00
J. Bruce Fields 4f83aa302f nfsd: document open share bit tracking
It's not immediately obvious from the code why we're doing this.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Benny Halevy <bhalevy@panasas.com>
2008-07-07 15:04:50 -04:00
Benny Halevy 695e12f8d2 nfsd: tabulate nfs4 xdr encoding functions
In preparation for minorversion 1

All encoders now return an nfserr status (typically their
nfserr argument).  Unsupported ops go through nfsd4_encode_operation
too, so use nfsd4_encode_noop to encode nothing for their reply body.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-04 16:21:30 -04:00
J. Bruce Fields e86322f611 Merge branch 'for-bfields' of git://linux-nfs.org/~tomtucker/xprt-switch-2.6 into for-2.6.27 2008-07-03 16:24:06 -04:00
Benny Halevy b001a1b6aa nfsd: dprint operation names
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-02 19:03:19 -04:00
Benny Halevy f2feb96bc3 nfsd: nfs4 minorversion decoder vectors
Have separate vectors of operation decoders for each minorversion.
Obsolete ops in newer minorversions have default implementation returning
nfserr_opnotsupp.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-02 15:58:21 -04:00
Benny Halevy 3c375c6f3a nfsd: unsupported nfs4 ops should fail with nfserr_opnotsupp
nfserr_opnotsupp should be returned for unsupported nfs4 ops
rather than nfserr_op_illegal.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-02 15:58:21 -04:00
Benny Halevy 347e0ad9c9 nfsd: tabulate nfs4 xdr decoding functions
In preparation for minorversion 1

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-02 15:58:20 -04:00
Benny Halevy 30cff1ffff nfsd: return nfserr_minor_vers_mismatch when compound minorversion != 0
Check minorversion once before decoding any operation and reject with
nfserr_minor_vers_mismatch if != 0 (this still happens in nfsd4_proc_compound).
In this case return a zero length resultdata array as required by RFC3530.

minorversion 1 processing will have its own vector of decoders.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-02 15:58:20 -04:00
Miklos Szeredi 07cad1d2a4 nfsd: clean up mnt_want_write calls
Multiple mnt_want_write() calls in the switch statement looks really
ugly.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-07-01 15:22:03 -04:00
Jeff Layton 100766f834 nfsd: treat all shutdown signals as equivalent
knfsd currently uses 2 signal masks when processing requests. A "loose"
mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and
then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it
allows when doing the actual operation on the local storage.

This is apparently unnecessarily complicated. The underlying filesystem
should be able to sanely handle a signal in the middle of an operation.
This patch removes the signal mask handling from knfsd altogether. When
knfsd is started as a kthread, all signals are ignored. It then allows
all of the signals in SHUTDOWN_SIGS. There's no need to set the mask
as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-30 15:27:47 -04:00
Neil Brown 496d6c32d4 nfsd: fix spurious EACCESS in reconnect_path()
Thanks to Frank Van Maarseveen for the original problem report: "A
privileged process on an NFS client which drops privileges after using
them to change the current working directory, will experience incorrect
EACCES after an NFS server reboot. This problem can also occur after
memory pressure on the server, particularly when the client side is
quiet for some time."

This occurs because the filehandle points to a directory whose parents
are no longer in the dentry cache, and we're attempting to reconnect the
directory to its parents without adequate permissions to perform lookups
in the parent directories.

We can therefore fix the problem by acquiring the necessary capabilities
before attempting the reconnection.  We do this only in the
no_subtree_check case, since the documented behavior of the
subtree_check export option requires the server to check that the user
has lookup permissions on all parents.

The subtree_check case still has a problem, since reconnect_path()
unnecessarily requires both read and lookup permissions on all parent
directories.  However, a fix in that case would be more delicate, and
use of subtree_check is already discouraged for other reasons.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-30 15:24:11 -04:00
Miklos Szeredi 8837abcab3 nfsd: rename MAY_ flags
Rename nfsd_permission() specific MAY_* flags to NFSD_MAY_* to make it
clear, that these are not used outside nfsd, and to avoid name and
number space conflicts with the VFS.

[comment from hch: rename MAY_READ, MAY_WRITE and MAY_EXEC as well]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:50 -04:00
NeilBrown 599eb3046a knfsd: nfsd: Handle ERESTARTSYS from syscalls.
OCFS2 can return -ERESTARTSYS from write requests (and possibly
elsewhere) if there is a signal pending.

If nfsd is shutdown (by sending a signal to each thread) while there
is still an IO load from the client, each thread could handle one last
request with a signal pending.  This can result in -ERESTARTSYS
which is not understood by nfserrno() and so is reflected back to
the client as nfserr_io aka -EIO.  This is wrong.

Instead, interpret ERESTARTSYS to mean "try again later" by returning
nfserr_jukebox.  The client will resend and - if the server is
restarted - the write will (hopefully) be successful and everyone will
be happy.

 The symptom that I narrowed down to this was:
    copy a large file via NFS to an OCFS2 filesystem, and restart
    the nfs server during the copy.
    The 'cp' might get an -EIO, and the file will be corrupted -
    presumably holes in the middle where writes appeared to fail.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:50 -04:00
Neil Brown c7d106c90e nfsd: fix race in nfsd_nrthreads()
We need the nfsd_mutex before accessing nfsd_serv->sv_nrthreads or we
can't even guarantee nfsd_serv will still be there.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:50 -04:00
Jeff Layton a75c5d01e4 sunrpc: remove sv_kill_signal field from svc_serv struct
Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Jeff Layton 9867d76ca1 knfsd: convert knfsd to kthread API
This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
  take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Jeff Layton e096bbc648 knfsd: remove special handling for SIGHUP
The special handling for SIGHUP in knfsd is a holdover from much
earlier versions of Linux where reloading the export table was
more expensive. That facility is not really needed anymore and
to my knowledge, is seldom-used.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Jeff Layton 3dd98a3bcc knfsd: clean up nfsd filesystem interfaces
Several of the nfsd filesystem interfaces allow changes to parameters
that don't have any effect on a running nfsd service. They are only ever
checked when nfsd is started. This patch fixes it so that changes to
those procfiles return -EBUSY if nfsd is already running to make it
clear that changes on the fly don't work.

The patch should also close some relatively harmless races between
changing the info in those interfaces and starting nfsd, since these
variables are being moved under the protection of the nfsd_mutex.

Finally, the nfsv4recoverydir file always returns -EINVAL if read. This
patch fixes it to return the recoverydir path as expected.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Neil Brown bedbdd8bad knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.
This removes the BKL from the RPC service creation codepath. The BKL
really isn't adequate for this job since some of this info needs
protection across sleeps.

Also, add some comments to try and clarify how the locking should work
and to make it clear that the BKL isn't necessary as long as there is
adequate locking between tasks when touching the svc_serv fields.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Benny Halevy 13b1867cac nfsd: make nfs4xdr WRITEMEM safe against zero count
WRITEMEM zeroes the last word in the destination buffer
for padding purposes, but this must not be done if
no bytes are to be copied, as it would result
in zeroing of the word right before the array.

The current implementation works since it's always called
with non zero nbytes or it follows an encoding of the
string (or opaque) length which, if equal to zero,
can be overwritten with zero.

Nevertheless, it seems safer to check for this case.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:48 -04:00
J. Bruce Fields 3b12cd9862 nfsd: add dprintk of compound return
We already print each operation of the compound when debugging is turned
on; printing the result could also help with remote debugging.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:48 -04:00
J. Bruce Fields 88dd0be387 nfsd: reorder printk in do_probe_callback to avoid use-after-free
We're currently dereferencing the client after we drop our reference
count to it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-05-18 19:13:07 -04:00
J. Bruce Fields b55e0ba19c nfsd: remove unnecessary atomic ops
These bit operations don't need to be atomic.  They're all done under a
single big mutex anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-05-18 19:12:54 -04:00
Harvey Harrison 8e24eea728 fs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
Denis V. Lunev 9ef2db2630 nfsd: use proc_create to setup de->proc_fops
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
J. Bruce Fields e36cd4a287 nfsd: don't allow setting ctime over v4
Presumably this is left over from earlier drafts of v4, which listed
TIME_METADATA as writeable.  It's read-only in rfc 3530, and shouldn't
be modifiable anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-25 13:00:11 -04:00
J. Bruce Fields 1a747ee0cc locks: don't call ->copy_lock methods on return of conflicting locks
The file_lock structure is used both as a heavy-weight representation of
an active lock, with pointers to reference-counted structures, etc., and
as a simple container for parameters that describe a file lock.

The conflicting lock returned from __posix_lock_file is an example of
the latter; so don't call the filesystem or lock manager callbacks when
copying to it.  This also saves the need for an unnecessary
locks_init_lock in the nfsv4 server.

Thanks to Trond for pointing out the error.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-25 13:00:11 -04:00
Wendy Cheng 17efa372cf lockd: unlock lockd locks held for a certain filesystem
Add /proc/fs/nfsd/unlock_filesystem, which allows e.g.:

shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem

so that a filesystem can be unmounted before allowing a peer nfsd to
take over nfs service for the filesystem.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Cc: Lon Hohberger  <lhh@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

 fs/lockd/svcsubs.c          |   66 +++++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/nfsctl.c            |   65 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/lockd/lockd.h |    7 ++++
 3 files changed, 131 insertions(+), 7 deletions(-)
2008-04-25 13:00:11 -04:00
Wendy Cheng 4373ea84c8 lockd: unlock lockd locks associated with a given server ip
For high-availability NFS service, we generally need to be able to drop
file locks held on the exported filesystem before moving clients to a
new server.  Currently the only way to do that is by shutting down lockd
entirely, which is often undesireable (for example, if you want to
continue exporting other filesystems).

This patch allows the administrator to release all locks held by clients
accessing the client through a given server ip address, by echoing that
address to a new file, /proc/fs/nfsd/unlock_ip, as in:

shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip

The expected sequence of events can be:
1. Tear down the IP address
2. Unexport the path
3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
4. Signal peer to begin take-over.

For now we only support IPv4 addresses and NFSv2/v3 (NFSv4 locks are not
affected).

Also, if unmounting the filesystem is required, we assume at step 3 that
clients using the given server ip are the only clients holding locks on
the given filesystem; otherwise, an additional patch is required to
allow revoking all locks held by lockd on a given filesystem.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Cc: Lon Hohberger  <lhh@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

 fs/lockd/svcsubs.c          |   66 +++++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/nfsctl.c            |   65 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/lockd/lockd.h |    7 ++++
 3 files changed, 131 insertions(+), 7 deletions(-)
2008-04-25 13:00:10 -04:00
Jeff Layton ca456252db knfsd: clear both setuid and setgid whenever a chown is done
Currently, knfsd only clears the setuid bit if the owner of a file is
changed on a SETATTR call, and only clears the setgid bit if the group
is changed. POSIX says this in the spec for chown():

    "If the specified file is a regular file, one or more of the
     S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
     process does not have appropriate privileges, the set-user-ID
     (S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
     be cleared upon successful return from chown()."

If I'm reading this correctly, then knfsd is doing this wrong. It should
be clearing both the setuid and setgid bit on any SETATTR that changes
the uid or gid. This wasn't really as noticable before, but now that the
ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.

This patch corrects the nfsd_setattr logic so that this occurs. It also
does a bit of cleanup to the function.

There is also one small behavioral change. If a SETATTR call comes in
that changes the uid/gid and the mode, then we now only clear the setgid
bit if the group execute bit isn't set. The setgid bit without a group
execute bit signifies mandatory locking and we likely don't want to
clear the bit in that case. Since there is no call in POSIX that should
generate a SETATTR call like this, then this should rarely happen, but
it's worth noting.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:43 -04:00
Jeff Layton dee3209d99 knfsd: get rid of imode variable in nfsd_setattr
...it's not really needed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:43 -04:00
Olga Kornievskaia ff7d9756b5 nfsd: use static memory for callback program and stats
There's no need to dynamically allocate this memory, and doing so may
create the possibility of races on shutdown of the rpc client.  (We've
witnessed it only after adding rpcsec_gss support to the server, after
which the rpc code can send destroys calls that expect to still be able
to access the rpc_stats structure after it has been destroyed.)

Such races are in theory possible if the module containing this "static"
memory is removed very quickly after an rpc client is destroyed, but
we haven't seen that happen.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:42 -04:00
J. Bruce Fields 03550fac06 nfsd: move most of fh_verify to separate function
Move the code that actually parses the filehandle and looks up the
dentry and export to a separate function.  This simplifies the reference
counting a little and moves fh_verify() a little closer to the kernel
ideal of small, minimally-indentended functions.  Clean up a few other
minor style sins along the way.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
2008-04-23 16:13:41 -04:00
Felix Blyakher 9167f501c6 nfsd: initialize lease type in nfs4_open_delegation()
While lease is correctly checked by supplying the type argument to
vfs_setlease(), it's stored with fl_type uninitialized. This breaks the
logic when checking the type of the lease.  The fix is to initialize
fl_type.

The old code still happened to function correctly since F_RDLCK is zero,
and we only implement read delegations currently (nor write
delegations).  But that's no excuse for not fixing this.

Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:40 -04:00
Harvey Harrison 3ba1514815 nfsd: fix sparse warning in vfs.c
fs/nfsd/vfs.c:991:27: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:39 -04:00
Harvey Harrison a254b246ee nfsd: fix sparse warnings
Add extern to nfsd/nfsd.h
fs/nfsd/nfssvc.c:146:5: warning: symbol 'nfsd_nrthreads' was not declared. Should it be static?
fs/nfsd/nfssvc.c:261:5: warning: symbol 'nfsd_nrpools' was not declared. Should it be static?
fs/nfsd/nfssvc.c:269:5: warning: symbol 'nfsd_get_nrthreads' was not declared. Should it be static?
fs/nfsd/nfssvc.c:281:5: warning: symbol 'nfsd_set_nrthreads' was not declared. Should it be static?
fs/nfsd/export.c:1534:23: warning: symbol 'nfs_exports_op' was not declared. Should it be static?

Add include of auth.h
fs/nfsd/auth.c:27:5: warning: symbol 'nfsd_setuser' was not declared. Should it be static?

Make static, move forward declaration closer to where it's needed.
fs/nfsd/nfs4state.c:1877:1: warning: symbol 'laundromat_main' was not declared. Should it be static?

Make static, forward declaration was already marked static.
fs/nfsd/nfs4idmap.c:206:1: warning: symbol 'idtoname_parse' was not declared. Should it be static?
fs/nfsd/vfs.c:1156:1: warning: symbol 'nfsd_create_setattr' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:39 -04:00
Adrian Bunk f2b0dee2ec make nfsd_create_setattr() static
This patch makes the needlessly global nfsd_create_setattr() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:38 -04:00
Chuck Lever 5ea0dd61f2 NFSD: Remove NFSD_TCP kernel build option
Likewise, distros usually leave CONFIG_NFSD_TCP enabled.

TCP support in the Linux NFS server is stable enough that we can leave it
on always.  CONFIG_NFSD_TCP adds about 10 lines of code, and defaults to
"Y" anyway.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:38 -04:00
J. Bruce Fields c0ce6ec87c nfsd: clarify readdir/mountpoint-crossing code
The code here is difficult to understand; attempt to clarify somewhat by
pulling out one of the more mystifying conditionals into a separate
function.

While we're here, also add lease_time to the list of attributes that we
don't really need to cross a mountpoint to fetch.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
2008-04-23 16:13:38 -04:00
J. Bruce Fields 6a85fa3add nfsd4: kill unnecessary check in preprocess_stateid_op
This condition is always true.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:37 -04:00
J. Bruce Fields 0836f58725 nfsd4: simplify stateid sequencing checks
Pull this common code into a separate function.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:37 -04:00
J. Bruce Fields f3362737be nfsd4: remove unnecessary CHECK_FH check in preprocess_seqid_op
Every caller sets this flag, so it's meaningless.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:37 -04:00
Aurélien Charbon f15364bd4c IPv6 support for NFS server export caches
This adds IPv6 support to the interfaces that are used to express nfsd
exports.  All addressed are stored internally as IPv6; backwards
compatibility is maintained using mapped addresses.

Thanks to Bruce Fields, Brian Haley, Neil Brown and Hideaki Joshifuji
for comments

Signed-off-by: Aurelien Charbon <aurelien.charbon@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Brian Haley <brian.haley@hp.com>
Cc:  YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:36 -04:00
Dave Hansen 2c463e9548 [PATCH] r/o bind mounts: check mnt instead of superblock directly
If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.

This patches uses __mnt_want_write().  It does not guarantee that the mount
will stay writeable after the check.  But, this is OK for one of the checks
because it is just for a printk().

The other two are probably unnecessary and duplicate existing checks in the
VFS.  This won't make them better checks than before, but it will make them
detect r/o mounts.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:27 -04:00
Dave Hansen 18f335aff8 [PATCH] r/o bind mounts: elevate write count for xattr_permission() callers
This basically audits the callers of xattr_permission(), which calls
permission() and can perform writes to the filesystem.

[AV: add missing parts - removexattr() and nfsd posix acls, plug for a leak
spotted by Miklos]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:15 -04:00
Dave Hansen 9079b1eb17 [PATCH] r/o bind mounts: get write access for vfs_rename() callers
This also uses the little helper in the NFS code to make an if() a little bit
less ugly.  We introduced the helper at the beginning of the series.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 75c3f29de7 [PATCH] r/o bind mounts: write counts for link/symlink
[AV: add missing nfsd pieces]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 463c319726 [PATCH] r/o bind mounts: get callers of vfs_mknod/create/mkdir()
This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().

So that we don't have to make three mnt_want/drop_write()
calls inside of the switch statement, we move some of its
logic outside of the switch and into a helper function
suggested by Christoph.

This also encapsulates a fix for mknod(S_IFREG) that Miklos
found.

[AV: merged mkdir handling, added missing nfsd pieces]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 0622753b80 [PATCH] r/o bind mounts: elevate write count for rmdir and unlink.
Elevate the write count during the vfs_rmdir() and vfs_unlink().

[AV: merged rmdir and unlink parts, added missing pieces in nfsd]

Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:33 -04:00
Dave Hansen aceaf78da9 [PATCH] r/o bind mounts: create helper to drop file write access
If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().

NFS does just that, and will use this in the next
patch.

AV: drop write access in __fput() only after we evict from file list.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:32 -04:00
J. Bruce Fields b663c6fd98 nfsd: fix oops on access from high-numbered ports
This bug was always here, but before my commit 6fa02839bf
("recheck for secure ports in fh_verify"), it could only be triggered by
failure of a kmalloc().  After that commit it could be triggered by a
client making a request from a non-reserved port for access to an export
marked "secure".  (Exports are "secure" by default.)

The result is a struct svc_export with a reference count one too low,
resulting in likely oopses next time the export is accessed.

The reference counting here is not straightforward; a later patch will
clean up fh_verify().

Thanks to Lukas Hejtmanek for the bug report and followup.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-14 16:49:15 -07:00
Pavel Emelyanov 5216a8e70e Wrap buffers used for rpc debug printks into RPC_IFDEBUG
Sorry for the noise, but here's the v3 of this compilation fix :)

There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.

Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-21 18:42:29 -05:00
Jan Blunck cf28b4863f d_path: Make d_path() use a struct path
d_path() is used on a <dentry,vfsmount> pair.  Lets use a struct path to
reflect this.

[akpm@linux-foundation.org: fix build in mm/memory.c]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Bryan Wu <bryan.wu@analog.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:09 -08:00
Jan Blunck c32c2f63a9 d_path: Make seq_path() use a struct path argument
seq_path() is always called with a dentry and a vfsmount from a struct path.
Make seq_path() take it directly as an argument.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck e83aece3af Use struct path in struct svc_expkey
I'm embedding struct path into struct svc_expkey.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck 5477549161 Use struct path in struct svc_export
I'm embedding struct path into struct svc_export.

[akpm@linux-foundation.org: coding-style fixes]
[ezk@cs.sunysb.edu: NFSD: fix wrong mnt_writer count in rename]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:17:08 -08:00
Jan Blunck 1d957f9bf8 Introduce path_put()
* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck 4ac9137858 Embed a struct path into struct nameidata instead of nd->{dentry,mnt}
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
  <dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
  struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
   text    data     bss     dec     hex filename
5321639  858418  715768 6895825  6938d1 vmlinux

with patch series:
   text    data     bss     dec     hex filename
5320026  858418  715768 6894212  693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
David Howells e231c2ee64 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
Andrew Morgan e338d263a7 Add 64-bit capability support to the kernel
The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities.  If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.

FWIW libcap-2.00 supports this change (and earlier capability formats)

 http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/

[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:20 -08:00
J. Bruce Fields 87d26ea777 nfsd: more careful input validation in nfsctl write methods
Neil Brown points out that we're checking buf[size-1] in a couple places
without first checking whether size is zero.

Actually, given the implementation of simple_transaction_get(), buf[-1]
is zero, so in both of these cases the subsequent check of the value of
buf[size-1] will catch this case.

But it seems fragile to depend on that, so add explicit checks for this
case.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
2008-02-01 16:42:15 -05:00
J. Bruce Fields f7b8066f9f knfsd: don't bother mapping putrootfh enoent to eperm
Neither EPERM and ENOENT map to valid errors for PUTROOTFH according to
rfc 3530, and, if anything, ENOENT is likely to be slightly more
informative; so don't bother mapping ENOENT to EPERM.  (Probably this
was originally done because one likely cause was that there is an fsid=0
export but that it isn't permitted to this particular client.  Now that
we allow WRONGSEC returns, this is somewhat less likely.)

In the long term we should work to make this situation less likely,
perhaps by turning off nfsv4 service entirely in the absence of the
pseudofs root, or constructing a pseudofilesystem root ourselves in the
kernel as necessary.

Thanks to Benny Halevy <bhalevy@panasas.com> for pointing out this
problem.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Benny Halevy <bhalevy@panasas.com>
2008-02-01 16:42:15 -05:00
Tom Tucker 9571af18fa svc: Add svc_xprt_names service to replace svc_sock_names
Create a transport independent version of the svc_sock_names function.

The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:14 -05:00
Tom Tucker a217813f90 knfsd: Support adding transports by writing portlist file
Update the write handler for the portlist file to allow creating new
listening endpoints on a transport. The general form of the string is:

<transport_name><space><port number>

For example:

echo "tcp 2049" > /proc/fs/nfsd/portlist

This is intended to support the creation of a listening endpoint for
RDMA transports without adding #ifdef code to the nfssvc.c file.

Transports can also be removed as follows:

'-'<transport_name><space><port number>

For example:

echo "-tcp 2049" > /proc/fs/nfsd/portlist

Attempting to add a listener with an invalid transport string results
in EPROTONOSUPPORT and a perror string of "Protocol not supported".

Attempting to remove an non-existent listener (.e.g. bad proto or port)
results in ENOTCONN and a perror string of
"Transport endpoint is not connected"

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:13 -05:00
Tom Tucker 7a18208383 svc: Make close transport independent
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.

The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.

This code races with module removal and transport addition.

Thanks to Simon Holm Thøgersen for a compile fix.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
2008-02-01 16:42:11 -05:00
Tom Tucker d7c9f1ed97 svc: Change services to use new svc_create_xprt service
Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
J. Bruce Fields 8838dc43d6 nfsd4: clean up access_valid, deny_valid checks.
Document these checks a little better and inline, as suggested by Neil
Brown (note both functions have two callers).  Remove an obviously bogus
check while we're there (checking whether unsigned value is negative).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
2008-02-01 16:42:07 -05:00
J. Bruce Fields 5c002b3bb2 nfsd: allow root to set uid and gid on create
The server silently ignores attempts to set the uid and gid on create.
Based on the comment, this appears to have been done to prevent some
overly-clever IRIX client from causing itself problems.

Perhaps we should remove that hack completely.  For now, at least, it
makes sense to allow root (when no_root_squash is set) to set uid and
gid.

While we're there, since nfsd_create and nfsd_create_v3 share the same
logic, pull that out into a separate function.  And spell out the
individual modifications of ia_valid instead of doing them both at once
inside a conditional.

Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report
and original patch on which this is based.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:07 -05:00
Frank Filz 406a7ea97d nfsd: Allow AIX client to read dir containing mountpoints
This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.

I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i

The AIX client mounts / -o sec=krb5:krb5i onto /mnt

If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.

Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.

In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.

The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 39325bd03f nfsd4: fix bad seqid on lock request incompatible with open mode
The failure to return a stateowner from nfs4_preprocess_seqid_op() means
in the case where a lock request is of a type incompatible with an open
(due to, e.g., an application attempting a write lock on a file open for
read), means that fs/nfsd/nfs4xdr.c:ENCODE_SEQID_OP_TAIL() never bumps
the seqid as it should.  The client, attempting to close the file
afterwards, then gets an (incorrect) bad sequence id error.  Worse, this
prevents the open file from ever being closed, so we leak state.

Thanks to Benny Halevy and Trond Myklebust for analysis, and to Steven
Wilton for the report and extensive data-gathering.

Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Steven Wilton <steven.wilton@team.eftel.com.au>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 404ec117be nfsd4: recognize callback channel failure earlier
When the callback channel fails, we inform the client of that by
returning a cb_path_down error the next time it tries to renew its
lease.

If we wait most of a lease period before deciding that a callback has
failed and that the callback channel is down, then we decrease the
chances that the client will find out in time to do anything about it.

So, mark the channel down as soon as we recognize that an rpc has
failed.  However, continue trying to recall delegations anyway, in hopes
it will come back up.  This will prevent more delegations from being
given out, and ensure cb_path_down is returned to renew calls earlier,
while still making the best effort to deliver recalls of existing
delegations.

Also fix a couple comments and remove a dprink that doesn't seem likely
to be useful.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 35bba9a37e nfsd4: miscellaneous nfs4state.c style fixes
Fix various minor style violations.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 5ec7b46c2f nfsd4: make current_clientid local
Declare this variable in the one function where it's used, and clean up
some minor style problems.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 99d965eda7 nfsd: fix encode_entryplus_baggage() indentation
Fix bizarre indentation.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields 366e0c1d91 nfsd4: kill unneeded cl_confirm check
We generate a unique cl_confirm for every new client; so if we've
already checked that this cl_confirm agrees with the cl_confirm of
unconf, then we already know that it does not agree with the cl_confirm
of conf.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:06 -05:00
J. Bruce Fields f3aba4e5a1 nfsd4: remove unnecessary cl_verifier check from setclientid_confirm
Again, the only way conf and unconf can have the same clientid is if
they were created in the "probable callback update" case of setclientid,
in which case we already know that the cl_verifier fields must agree.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields f394baad13 nfsd4: kill unnecessary same_name() in setclientid_confirm
If conf and unconf are both found in the lookup by cl_clientid, then
they share the same cl_clientid.  We always create a unique new
cl_clientid field when creating a new client--the only exception is the
"probable callback update" case in setclientid, where we copy the old
cl_clientid from another clientid with the same name.

Therefore two clients with the same cl_client field also always share
the same cl_name field, and a couple of the checks here are redundant.

Thanks to Simon Holm Thøgersen for a compile fix.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
2008-02-01 16:42:05 -05:00
J. Bruce Fields deda2faa8e nfsd: uniquify cl_confirm values
Using a counter instead of the nanoseconds value seems more likely to
produce a unique cl_confirm.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields 49ba87811f nfsd: eliminate final bogus case from setclientid logic
We're supposed to generate a different cl_confirm verifier for each new
client, so these to cl_confirm values should never be the same.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields a186e76747 nfsd4: kill some unneeded setclientid comments
Most of these comments just summarize the code.

The matching of code to the cases described in the RFC may still be
useful, though; add specific section references to make that easier to
follow.  Also update references to the outdated RFC 3010.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields 1f69f172c7 nfsd: minor fs/nfsd/auth.h cleanup
While we're here, let's remove the redundant (and now wrong) pathname in
the comment, and the #ifdef __KERNEL__'s.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields 2e8138a274 nfsd: move nfsd/auth.h into fs/nfsd
This header is used only in a few places in fs/nfsd, so there seems to
be little point to having it in include/.  (Thanks to Robert Day for
pointing this out.)

Cc: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields dbf847ecb6 knfsd: allow cache_register to return error on failure
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.

Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.

Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:05 -05:00
J. Bruce Fields e331f606a8 nfsd: fail init on /proc/fs/nfs/exports creation failure
I assume the reason failure of creation was ignored here was just to
continue support embedded systems that want nfsd but not proc.

However, in cases where proc is supported it would be clearer to fail
entirely than to come up with some features disabled.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
J. Bruce Fields df95a9d4fb knfsd: cache unregistration needn't return error
There's really nothing much the caller can do if cache unregistration
fails.  And indeed, all any caller does in this case is print an error
and continue.  So just return void and move the printk's inside
cache_unregister.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
J. Bruce Fields d5c3428b2c nfsd: fail module init on reply cache init failure
If the reply cache initialization fails due to a kmalloc failure,
currently we try to soldier on with a reduced (or nonexistant) reply
cache.

Better to just fail immediately: the failure is then much easier to
understand and debug, and it could save us complexity in some later
code.  (But actually, it doesn't help currently because the cache is
also turned off in some odd failure cases; we should probably find a
better way to handle those failure cases some day.)

Fix some minor style problems while we're at it, and rename
nfsd_cache_init() to remove the need for a comment describing it.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:04 -05:00
J. Bruce Fields 26808d3f10 nfsd: cleanup nfsd module initialization cleanup
Handle the failure case here with something closer to the standard
kernel style.

Doesn't really matter for now, but I'd like to add a few more failure
cases, and then this'll help.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
J. Bruce Fields 46b2589576 knfsd: cleanup nfsd4 properly on module init failure
We forgot to shut down the nfs4 state and idmapping code in this case.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
J. Bruce Fields ca2a05aa7c nfsd: Fix handling of negative lengths in read_buf()
The length "nbytes" passed into read_buf should never be negative, but
we check only for too-large values of "nbytes", not for too-small
values.  Make nbytes unsigned, so it's clear that the former tests are
sufficient.  (Despite this read_buf() currently correctly returns an xdr
error in the case of a negative length, thanks to an unsigned
comparison with size_of() and bounds-checking in kmalloc().  This seems
very fragile, though.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Chuck Lever a628f66758 NFSD: Fix mixed sign comparison in nfs3svc_decode_symlinkargs
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Chuck Lever 9c7544d3a1 NFSD: Use unsigned length argument for decode_pathname
Clean up: path name lengths are unsigned on the wire, negative lengths
are not meaningful natively either.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Chuck Lever 5a022fc870 NFSD: Adjust filename length argument of nfsd_lookup
Clean up: adjust the sign of the length argument of nfsd_lookup and
nfsd_lookup_dentry, for consistency with recent changes.  NFSD version
4 callers already pass an unsigned file name length.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:03 -05:00
Chuck Lever ee1a95b3b3 NFSD: Use unsigned length argument for decode_filename
Clean up: file name lengths are unsigned on the wire, negative lengths
are not meaningful natively either.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:02 -05:00
J. Bruce Fields d4395e03fe knfsd: fix broken length check in nfs4idmap.c
Obviously at some point we thought "error" represented the length when
positive.  This appears to be a long-standing typo.

Thanks to Prasad Potluri <pvp@us.ibm.com> for finding the problem and
proposing an earlier version of this patch.

Cc: Steve French <smfltc@us.ibm.com>
Cc: Prasad V Potluri <pvp@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:01 -05:00
Prasad P aefa89d178 nfsd: Fix inconsistent assignment
Dereferenced pointer "dentry" without checking and assigned to inode
in the declaration.

(We could just delete the NULL checks that follow instead, as we never
get to the encode function in this particular case.  But it takes a
little detective work to verify that fact, so it's probably safer to
leave the checks in place.)

Cc: Steve French <smfltc@us.ibm.com>
Signed-off-by: Prasad V Potluri <pvp@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:01 -05:00
J. Bruce Fields 63c86716ea nfsd: move callback rpc_client creation into separate thread
The whole reason to move this callback-channel probe into a separate
thread was because (for now) we don't have an easy way to create the
rpc_client asynchronously.  But I forgot to move the rpc_create() to the
spawned thread.  Doh!  Fix that.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:01 -05:00
J. Bruce Fields 46f8a64bae nfsd4: probe callback channel only once
Our callback code doesn't actually handle concurrent attempts to probe
the callback channel.  Some rethinking of the locking may be required.
However, we can also just move the callback probing to this case.  Since
this is the only time a client is "confirmed" (and since that can only
happen once in the lifetime of a client), this ensures we only probe
once.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:01 -05:00
NeilBrown ba67a39efd knfsd: Allow NFSv2/3 WRITE calls to succeed when krb5i etc is used.
When RPCSEC/GSS and krb5i is used, requests are padded, typically to a multiple
of 8 bytes.  This can make the request look slightly longer than it
really is.

As of

	f34b95689d "The NFSv2/NFSv3 server does not handle zero
		length WRITE request correctly",

the xdr decode routines for NFSv2 and NFSv3 reject requests that aren't
the right length, so krb5i (for example) WRITE requests can get lost.

This patch relaxes the appropriate test and enhances the related comment.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-13 09:57:57 -08:00
J. Bruce Fields 6fa02839bf nfsd4: recheck for secure ports in fh_verify
As with commit 7fc90ec93a ("knfsd: nfsd:
call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem")
this is a case where we need to redo a security check in fh_verify()
even though the filehandle already has an associated dentry--if the
filehandle was created by fh_compose() in an earlier operation of the
nfsv4 compound, then we may not have done these checks yet.

Without this fix it is possible, for example, to traverse from an export
without the secure ports requirement to one with it in a single
compound, and bypass the secure port check on the new export.

While we're here, fix up some minor style problems and change a printk()
to a dprintk(), to make it harder for random unprivileged users to spam
the logs.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-12 14:28:08 -08:00
J. Bruce Fields ac8587dcb5 knfsd: fix spurious EINVAL errors on first access of new filesystem
The v2/v3 acl code in nfsd is translating any return from fh_verify() to
nfserr_inval.  This is particularly unfortunate in the case of an
nfserr_dropit return, which is an internal error meant to indicate to
callers that this request has been deferred and should just be dropped
pending the results of an upcall to mountd.

Thanks to Roland <devzero@web.de> for bug report and data collection.

Cc: Roland <devzero@web.de>
Acked-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-12 14:28:08 -08:00
Adrian Bunk 87ae9afdca cleanup asm/scatterlist.h includes
Not architecture specific code should not #include <asm/scatterlist.h>.

This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-02 08:47:06 +01:00
Linus Torvalds 69450bb5eb Merge branch 'sg' of git://git.kernel.dk/linux-2.6-block
* 'sg' of git://git.kernel.dk/linux-2.6-block:
  Add CONFIG_DEBUG_SG sg validation
  Change table chaining layout
  Update arch/ to use sg helpers
  Update swiotlb to use sg helpers
  Update net/ to use sg helpers
  Update fs/ to use sg helpers
  [SG] Update drivers to use sg helpers
  [SG] Update crypto/ to sg helpers
  [SG] Update block layer to use sg helpers
  [SG] Add helpers for manipulating SG entries
2007-10-22 19:11:06 -07:00
Jens Axboe 60c74f8193 Update fs/ to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:55 +02:00
Christoph Hellwig cfaea787c0 exportfs: remove old methods
Now that all filesystems are converted remove support for the old methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:21 -07:00
Christoph Hellwig 6e91ea2bb0 exportfs: add fid type
This patchset is a medium scale rewrite of the export operations interface.
The goal is to make the interface less complex, and easier to understand from
the filesystem side, aswell as preparing generic support for exporting of
64bit inode numbers.

This touches all nfs exporting filesystems, and I've done testing on all of
the filesystems I have here locally (xfs, ext2, ext3, reiserfs, jfs)

This patch:

Add a structured fid type so that we don't have to pass an array of u32 values
around everywhere.  It's a union of possible layouts.

As a start there's only the u32 array and the traditional 32bit inode format,
but there will be more in one of my next patchset when I start to document the
various filehandle formats we have in lowlevel filesystems better.

Also add an enum that gives the various filehandle types human- readable
names.

Note: Some people might think the struct containing an anonymous union is
ugly, but I didn't want to pass around a raw union type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:19 -07:00
Pavel Emelyanov ba25f9dcc4 Use helpers to obtain task pid in printks
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:43 -07:00
Jeff Layton 8a0ce7d99a knfsd: only set ATTR_KILL_S*ID if ATTR_MODE isn't being explicitly set
It's theoretically possible for a single SETATTR call to come in that sets the
mode and the uid/gid.  In that case, don't set the ATTR_KILL_S*ID bits since
that would trip the BUG() in notify_change.  Just fix up the mode to have the
same effect.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Serge E. Hallyn b53767719b Implement file posix capabilities
Implement file posix capabilities.  This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.

This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php.  For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.

Changelog:
	Nov 27:
	Incorporate fixes from Andrew Morton
	(security-introduce-file-caps-tweaks and
	security-introduce-file-caps-warning-fix)
	Fix Kconfig dependency.
	Fix change signaling behavior when file caps are not compiled in.

	Nov 13:
	Integrate comments from Alexey: Remove CONFIG_ ifdef from
	capability.h, and use %zd for printing a size_t.

	Nov 13:
	Fix endianness warnings by sparse as suggested by Alexey
	Dobriyan.

	Nov 09:
	Address warnings of unused variables at cap_bprm_set_security
	when file capabilities are disabled, and simultaneously clean
	up the code a little, by pulling the new code into a helper
	function.

	Nov 08:
	For pointers to required userspace tools and how to use
	them, see http://www.friedhoff.org/fscaps.html.

	Nov 07:
	Fix the calculation of the highest bit checked in
	check_cap_sanity().

	Nov 07:
	Allow file caps to be enabled without CONFIG_SECURITY, since
	capabilities are the default.
	Hook cap_task_setscheduler when !CONFIG_SECURITY.
	Move capable(TASK_KILL) to end of cap_task_kill to reduce
	audit messages.

	Nov 05:
	Add secondary calls in selinux/hooks.c to task_setioprio and
	task_setscheduler so that selinux and capabilities with file
	cap support can be stacked.

	Sep 05:
	As Seth Arnold points out, uid checks are out of place
	for capability code.

	Sep 01:
	Define task_setscheduler, task_setioprio, cap_task_kill, and
	task_setnice to make sure a user cannot affect a process in which
	they called a program with some fscaps.

	One remaining question is the note under task_setscheduler: are we
	ok with CAP_SYS_NICE being sufficient to confine a process to a
	cpuset?

	It is a semantic change, as without fsccaps, attach_task doesn't
	allow CAP_SYS_NICE to override the uid equivalence check.  But since
	it uses security_task_setscheduler, which elsewhere is used where
	CAP_SYS_NICE can be used to override the uid equivalence check,
	fixing it might be tough.

	     task_setscheduler
		 note: this also controls cpuset:attach_task.  Are we ok with
		     CAP_SYS_NICE being used to confine to a cpuset?
	     task_setioprio
	     task_setnice
		 sys_setpriority uses this (through set_one_prio) for another
		 process.  Need same checks as setrlimit

	Aug 21:
	Updated secureexec implementation to reflect the fact that
	euid and uid might be the same and nonzero, but the process
	might still have elevated caps.

	Aug 15:
	Handle endianness of xattrs.
	Enforce capability version match between kernel and disk.
	Enforce that no bits beyond the known max capability are
	set, else return -EPERM.
	With this extra processing, it may be worth reconsidering
	doing all the work at bprm_set_security rather than
	d_instantiate.

	Aug 10:
	Always call getxattr at bprm_set_security, rather than
	caching it at d_instantiate.

[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:07 -07:00
Dave Hansen a8754beedb r/o bind mounts: create cleanup helper svc_msnfs()
I'm going to be modifying nfsd_rename() shortly to support read-only bind
mounts.  This #ifdef is around the area I'm patching, and it starts to get
really ugly if I just try to add my new code by itself.  Using this little
helper makes things a lot cleaner to use.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:05 -07:00
Adrian Bunk cce76f9b96 fs/nfsd/export.c: make 3 functions static
This patch makes the following needlessly global functions static:
- exp_get_by_name()
- exp_parent()
- exp_find()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:10 -07:00
Linus Torvalds 541010e4b8 Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
  nfsd: remove IS_ISMNDLCK macro
  Rework /proc/locks via seq_files and seq_list helpers
  fs/locks.c: use list_for_each_entry() instead of list_for_each()
  NFS: clean up explicit check for mandatory locks
  AFS: clean up explicit check for mandatory locks
  9PFS: clean up explicit check for mandatory locks
  GFS2: clean up explicit check for mandatory locks
  Cleanup macros for distinguishing mandatory locks
  Documentation: move locks.txt in filesystems/
  locks: add warning about mandatory locking races
  Documentation: move mandatory locking documentation to filesystems/
  locks: Fix potential OOPS in generic_setlease()
  Use list_first_entry in locks_wake_up_blocks
  locks: fix flock_lock_file() comment
  Memory shortage can result in inconsistent flocks state
  locks: kill redundant local variable
  locks: reverse order of posix_locks_conflict() arguments
2007-10-15 16:07:40 -07:00
Linus Torvalds f4921aff5b Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
  NFSv4: Fix a typo in nfs_inode_reclaim_delegation
  NFS: Add a boot parameter to disable 64 bit inode numbers
  NFS: nfs_refresh_inode should clear cache_validity flags on success
  NFS: Fix a connectathon regression in NFSv3 and NFSv4
  NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
  SUNRPC: Don't call xprt_release in call refresh
  SUNRPC: Don't call xprt_release() if call_allocate fails
  SUNRPC: Fix buggy UDP transmission
  [23/37] Clean up duplicate includes in
  [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
  SUNRPC: Use correct type in buffer length calculations
  SUNRPC: Fix default hostname created in rpc_create()
  nfs: add server port to rpc_pipe info file
  NFS: Get rid of some obsolete macros
  NFS: Simplify filehandle revalidation
  NFS: Ensure that nfs_link() returns a hashed dentry
  NFS: Be strict about dentry revalidation when doing exclusive create
  NFS: Don't zap the readdir caches upon error
  NFS: Remove the redundant nfs_reval_fsid()
  NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
  ...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
2007-10-15 10:47:35 -07:00
J. Bruce Fields 5e7fc43642 nfsd: remove IS_ISMNDLCK macro
This macro is only used in one place; in this place it seems simpler to
put open-code it and move the comment to where it's used.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09 18:32:46 -04:00
Pavel Emelyanov a16877ca9c Cleanup macros for distinguishing mandatory locks
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode).  However, fs/locks.c and some filesystems still perform
the explicit i_mode checking.  Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.

Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.

The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09 18:32:46 -04:00
J. Bruce Fields a16e92edcd knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
Without this we always return 2^32-1 as the the maximum namelength.

Thanks to Andreas Gruenbacher for bug report and testing.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Andreas Gruenbacher <agruen@suse.de>
2007-10-09 18:31:57 -04:00
J. Bruce Fields cfdcad4da1 knfsd: nfsv4 delegation recall should take reference on client
It's not enough to take a reference on the delegation object itself; we
need to ensure that the rpc_client won't go away just as we're about to
make an rpc call.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09 18:31:57 -04:00
J. Bruce Fields 1b1a9b3163 knfsd: don't shutdown callbacks until nfsv4 client is freed
If a callback still holds a reference on the client, then it may be
about to perform an rpc call, so it isn't safe to call rpc_shutdown().
(Though rpc_shutdown() does wait for any outstanding rpc's, it can't
know if a new rpc is about to be issued with that client.)

So, wait to shutdown the rpc_client until the reference count on the
client has gone to zero.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09 18:31:57 -04:00
J. Bruce Fields 0272e1fd9f knfsd: let nfsd manage timing out its own leases
Currently there's a race that can cause an oops in generic_setlease.

(In detail: nfsd, when it removes a lease, does so by calling
vfs_setlease() with F_UNLCK and a pointer to the fl_flock field, which
in turn points to nfsd's existing lease; but the first thing the
setlease code does is call time_out_leases().  If the lease happens to
already be beyond the lease break time, that will free the lease and (in
nfsd's release_private callback) set fl_flock to NULL, leading to a NULL
deference soon after in vfs_setlease().)

There are probably other things to fix here too, but it seems inherently
racy to allow either locks.c or nfsd to time out this lease.  Instead
just set the fl_break_time to 0 (preventing locks.c from ever timing out
this lock) and leave it up to nfsd's laundromat thread to deal with it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09 18:31:57 -04:00
Peter Staubach 40ee5dc6af knfsd: 64 bit ino support for NFS server
Modify the NFS server code to support 64 bit ino's, as
appropriate for the system and the NFS protocol version.

The gist of the changes is to query the underlying file system
for attributes and not just to use the cached attributes in the
inode.  For this specific purpose, the inode only contains an
ino field which unsigned long, which is large enough on 64 bit
platforms, but is not large enough on 32 bit platforms.

I haven't been able to find any reason why ->getattr can't be called
while i_mutex.  The specification indicates that i_mutex is not
required to be held in order to invoke ->getattr, but it doesn't say
that i_mutex can't be held while invoking ->getattr.

I also haven't come to any conclusions regarding the value of
lease_get_mtime() and whether it should or should not be invoked
by fill_post_wcc() too.  I chose not to change this because I
thought that it was safer to leave well enough alone.  If we
decide to make a change, it can be done separately.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
2007-10-09 18:31:57 -04:00
J. Bruce Fields c175b83c4c knfsd: remove code duplication in nfsd4_setclientid()
Each branch of this if-then-else has a bunch of duplicated code that we
could just put at the end.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
2007-10-09 18:31:57 -04:00
Andrew Morton 246d95ba05 nfsd warning fix
fs/nfsd/nfsctl.c: In function 'write_filehandle':
fs/nfsd/nfsctl.c:301: warning: 'maxsize' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:57 -04:00
J. Bruce Fields dd4877bfb6 knfsd: fix callback rpc cred
It doesn't make sense to make the callback with credentials that the
client made the setclientid with.  Instead the spec requires that the
callback occur with the credentials the client authenticated *to*.
It probably doesn't matter what we use for auth_unix, and some more
infrastructure will be needed for auth_gss, so let's just remove the
cred lookup for now.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:57 -04:00
J. Bruce Fields e8ff2a8453 knfsd: move nfsv4 slab creation/destruction to module init/exit
We have some slabs that the nfs4 server uses to store state objects.
We're currently creating and destroying those slabs whenever the server
is brought up or down.  That seems excessive; may as well just do that
in module initialization and exit.

Also add some minor header cleanup.  (Thanks to Andrew Morton for that
and a compile fix.)

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:56 -04:00
J. Bruce Fields 2b47eece1f knfsd: spawn kernel thread to probe callback channel
We want to allow gss on the callback channel, so people using krb5 can
still get the benefits of delegations.

But looking up the rpc credential can take some time in that case.  And
we shouldn't delay the response to setclientid_confirm while we wait.

It may be inefficient, but for now the simplest solution is just to
spawn a new thread as necessary for the purpose.

(Thanks to Adrian Bunk for catching a missing static here.)

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Adrian Bunk <bunk@kernel.org>
2007-10-09 18:31:56 -04:00
J. Bruce Fields c9b6cbe56d knfsd: nfs4 name->id mapping not correctly parsing negative downcall
Note that qword_get() returns length or -1, not an -ERROR.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:56 -04:00
J. Bruce Fields 2fdada03b3 knfsd: demote some printk()s to dprintk()s
To quote a recent mail from Andrew Morton:

	Look: if there's a way in which an unprivileged user can trigger
	a printk we fix it, end of story.

OK.  I assume that goes double for printk()s that might be triggered by
random hosts on the internet.  So, disable some printk()s that look like
they could be triggered by malfunctioning or malicious clients.  For
now, just downgrade them to dprintk()s.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:56 -04:00
J. Bruce Fields 599e0a2290 knfsd: cleanup of nfsd4 cmp_* functions
Benny Halevy suggested renaming cmp_* to same_* to make the meaning of
the return value clearer.

Fix some nearby style deviations while we're at it, including a small
swath of creative indentation in nfs4_preprocess_seqid_op().

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:56 -04:00
J. Bruce Fields 3b398f0ef8 knfsd: delete code made redundant by map_new_errors
I moved this check into map_new_errors, but forgot to delete the
original.  Oops.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:56 -04:00
Christoph Hellwig 9c85fca56b nfsd: fix horrible indentation in nfsd_setattr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:56 -04:00
J. Bruce Fields 45457e0916 nfsd: tone down inaccurate dprintk
The nfserr_dropit happens routinely on upcalls (so a kmalloc failure is
almost never the actual cause), but I occasionally get a complant from
some tester that's worried because they ran across this message after
turning on debugging to research some unrelated problem.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by:  Neil Brown <neilb@suse.de>
2007-10-09 18:31:54 -04:00
Chuck Lever 817cb9d43d NFSD: Convert printk's to dprintk's in NFSD's nfs4xdr
Due to recent edict to remove or replace printk's that can flood the system
log.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:14 -04:00
Neil Brown b8da0d1c27 knfsd: Validate filehandle type in fsid_source
fsid_source decided where to get the 'fsid' number to
return for a GETATTR based on the type of filehandle.
It can be from the device, from the fsid, or from the
UUID.

It is possible for the filehandle to be inconsistent
with the export information, so make sure the export information
actually has the info implied by the value returned by
fsid_source.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Luiz Fernando N. Capitulino" <lcapitulino@gmail.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-10 18:57:47 -07:00
Neil Brown a1033be72c knfsd: Fixed problem with NFS exporting directories which are mounted on.
Recent changes in NFSd cause a directory which is mounted-on
to not appear properly when the filesystem containing it is exported.

*exp_get* now returns -ENOENT rather than NULL and when
  commit 5d3dbbeaf5
removed the NULL checks, it didn't add a check for -ENOENT.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-10 18:57:47 -07:00
J. Bruce Fields 4a4b88317a knfsd: eliminate unnecessary -ENOENT returns on export downcalls
A succesful downcall with a negative result (which indicates that the given
filesystem is not exported to the given user) should not return an error.

Currently mountd is depending on stdio to write these downcalls.  With some
versions of libc this appears to cause subsequent writes to attempt to write
all accumulated data (for which writes previously failed) along with any new
data.  This can prevent the kernel from seeing responses to later downcalls.
Symptoms will be that nfsd fails to respond to certain requests.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
J. Bruce Fields 0a725fc4d3 nfsd4: idmap upcalls should use unsigned uid and gid
We shouldn't be using negative uid's and gid's in the idmap upcalls.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Jeff Layton 749997e512 knfsd: set the response bitmask for NFS4_CREATE_EXCLUSIVE
RFC 3530 says:

 If the server uses an attribute to store the exclusive create verifier, it
 will signify which attribute by setting the appropriate bit in the attribute
 mask that is returned in the results.

Linux uses the atime and mtime to store the verifier, but sends a zeroed out
bitmask back to the client.  This patch makes sure that we set the correct
bits in the bitmask in this situation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Al Viro ca5c8cde93 lockd and nfsd endianness annotation fixes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:11:56 -07:00
J. Bruce Fields 3e63516c82 knfsd: fix typo in export display, print uid and gid as unsigned
For display purposes, treat uid's and gid's as unsigned ints for now.
Also fix a typo.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 17:49:14 -07:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
J. Bruce Fields c7d51402d2 knfsd: clean up EX_RDONLY
Share a little common code, reverse the arguments for consistency, drop the
unnecessary "inline", and lowercase the name.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields e22841c637 knfsd: move EX_RDONLY out of header
EX_RDONLY is only called in one place; just put it there.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields 5d3dbbeaf5 nfsd: remove unnecessary NULL checks from nfsd_cross_mnt
We can now assume that rqst_exp_get_by_name() does not return NULL; so clean
up some unnecessary checks.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields 9a25b96c1f nfsd: return errors, not NULL, from export functions
I converted the various export-returning functions to return -ENOENT instead
of NULL, but missed a few cases.

This particular case could cause actual bugs in the case of a krb5 client that
doesn't match any ip-based client and that is trying to access a filesystem
not exported to krb5 clients.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields a280df32db nfsd: fix possible read-ahead cache and export table corruption
The value of nperbucket calculated here is too small--we should be rounding up
instead of down--with the result that the index j in the following loop can
overflow the raparm_hash array.  At least in my case, the next thing in memory
turns out to be export_table, so the symptoms I see are crashes caused by the
appearance of four zeroed-out export entries in the first bucket of the hash
table of exports (which were actually entries in the readahead cache, a
pointer to which had been written to the export table in this initialization
code).

It looks like the bug was probably introduced with commit
fce1456a19 ("knfsd: make the readahead params
cache SMP-friendly").

Cc: <stable@kernel.org>
Cc: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
J. Bruce Fields a9933cea7a locks: rename lease functions to reflect locks.c conventions
We've been using the convention that vfs_foo is the function that calls
a filesystem-specific foo method if it exists, or falls back on a
generic method if it doesn't; thus vfs_foo is what is called when some
other part of the kernel (normally lockd or nfsd) wants to get a lock,
whereas foo is what filesystems call to use the underlying local
functionality as part of their lock implementation.

So rename setlease to vfs_setlease (which will call a
filesystem-specific setlease after a later patch) and __setlease to
setlease.

Also, vfs_setlease need only be GPL-exported as long as it's only needed
by lockd and nfsd.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18 19:14:12 -04:00
J. Bruce Fields 1269bc69b6 knfsd: nfsd: enforce per-flavor id squashing
Allow root squashing to vary per-pseudoflavor, so that you can (for example)
allow root access only when sufficiently strong security is in use.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 9091224f3c knfsd: nfsd: allow auth_sys nlm on rpcsec_gss exports
Our clients (like other clients, as far as I know) use only auth_sys for nlm,
even when using rpcsec_gss for the main nfs operations.

Administrators that want to deny non-kerberos-authenticated locking requests
will need to turn off NFS protocol versions less than 4....

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 4796f45740 knfsd: nfsd4: secinfo handling without secinfo= option
We could return some sort of error in the case where someone asks for secinfo
on an export without the secinfo= option set--that'd be no worse than what
we've been doing.  But it's not really correct.  So, hack up an approximate
secinfo response in that case--it may not be complete, but it'll tell the
client at least one acceptable security flavor.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
Andy Adamson dcb488a3b7 knfsd: nfsd4: implement secinfo
Implement the secinfo operation.

(Thanks to Usha Ketineni wrote an earlier version of this support.)

Cc: Usha Ketineni <uketinen@us.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 91fe39d35e knfsd: nfsd: display export secinfo information
Add secinfo information to the display in proc/net/sunrpc/nfsd.export/content.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields ac34cdb03d knfsd: nfsd: factor out code from show_expflags
Factor out some code to be shared by secinfo display code.  Remove some
unnecessary conditional printing of commas where we know the condition is
true.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 0ec757df97 knfsd: nfsd4: make readonly access depend on pseudoflavor
Allow readonly access to vary depending on the pseudoflavor, using the flag
passed with each pseudoflavor in the export downcall.  The rest of the flags
are ignored for now, though some day we might also allow id squashing to vary
based on the flavor.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
Andy Adamson 32c1eb0cd7 knfsd: nfsd4: return nfserr_wrongsec
Make the first actual use of the secinfo information by using it to return
nfserr_wrongsec when an export is found that doesn't allow the flavor used on
this request.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 6c0a654dce knfsd: nfsd: factor nfsd_lookup into 2 pieces
Factor nfsd_lookup into nfsd_lookup_dentry, which finds the right dentry and
export, and a second part which composes the filehandle (and which will later
check the security flavor on the new export).

No change in behavior.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 2ea2209f07 knfsd: nfsd: use ip-address-based domain in secinfo case
With this patch, we fall back on using the gss/pseudoflavor only if we fail to
find a matching auth_unix export that has a secinfo list.

As long as sec= options aren't used, there's still no change in behavior here
(except possibly for some additional auth_unix cache lookups, whose results
will be ignored).

The sec= option, however, is not actually enforced yet; later patches will add
the necessary checks.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
J. Bruce Fields 3ab4d8b121 knfsd: nfsd: set rq_client to ip-address-determined-domain
We want it to be possible for users to restrict exports both by IP address and
by pseudoflavor.  The pseudoflavor information has previously been passed
using special auth_domains stored in the rq_client field.  After the preceding
patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
now we use rq_client for the ip information, as auth_null and auth_unix do.

However, we keep around the special auth_domain in the rq_gssclient field for
backwards compatibility purposes, so we can still do upcalls using the old
"gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
appropriate export.  This allows us to continue supporting old mountd.

In fact, for this first patch, we always use the "gss/pseudoflavor"
auth_domain (and only it) if it is available; thus rq_client is ignored in the
auth_gss case, and this patch on its own makes no change in behavior; that
will be left to later patches.

Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
upcall by a dummy value--no version of idmapd has ever used it, and it's
unlikely anyone really wants to perform idmapping differently depending on the
where the client is (they may want to perform *credential* mapping
differently, but that's a different matter--the idmapper just handles id's
used in getattr and setattr).  But I'm updating the idmapd code anyway, just
out of general backwards-compatibility paranoia.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 0989a78896 knfsd: nfsd: provide export lookup wrappers which take a svc_rqst
Split the callers of exp_get_by_name(), exp_find(), and exp_parent() into
those that are processing requests and those that are doing other stuff (like
looking up filehandles for mountd).

No change in behavior, just a (fairly pointless, on its own) cleanup.

(Note this has the effect of making nfsd_cross_mnt() pass rqstp->rq_client
instead of exp->ex_client into exp_find_by_name().  However, the two should
have the same value at this point.)

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 87548c37c8 knfsd: nfsd: remove superfluous assignment from nfsd_lookup
The "err" variable will only be used in the final return, which always happens
after either the preceding

	err = fh_compose(...);

or after the following

	err = nfserrno(host_err);

So the earlier assignment to err is ignored.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields df547efb03 knfsd: nfsd4: simplify exp_pseudoroot arguments
We're passing three arguments to exp_pseudoroot, two of which are just fields
of the svc_rqst.  Soon we'll want to pass in a third field as well.  So let's
just give up and pass in the whole struct svc_rqst.

Also sneak in some minor style cleanups while we're at it.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Andy Adamson e677bfe4d4 knfsd: nfsd4: parse secinfo information in exports downcall
We add a list of pseudoflavors to each export downcall, which will be used
both as a list of security flavors allowed on that export, and (in the order
given) as the list of pseudoflavors to return on secinfo calls.

This patch parses the new downcall information and adds it to the export
structure, but doesn't use it for anything yet.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 2d3bb25209 knfsd: nfsd: make all exp_finding functions return -errno's on err
Currently exp_find(), exp_get_by_name(), and friends, return an export on
success, and on failure return:

	errors -EAGAIN (drop this request pending an upcall) or
		-ETIMEDOUT (an upcall has timed out), or
	return NULL, which can mean either that there was a memory allocation
		failure, or that an export was not found, or that a passed-in
		export lacks an auth_domain.

Many callers seem to assume that NULL means that an export was not found,
which may lead to bugs in the case of a memory allocation failure.

Modify these functions to distinguish between the two NULL cases by returning
either -ENOENT or -ENOMEM.  They now never return NULL.  We get to simplify
some code in the process.

We return -ENOENT in the case of a missing auth_domain.  This case should
probably be removed (or converted to a bug) after confirming that it can never
happen.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Meelap Shah 47f9940c55 knfsd: nfsd4: don't delegate files that have had conflicts
One more incremental delegation policy improvement: don't give out a
delegation on a file if conflicting access has previously required that a
delegation be revoked on that file.  (In practice we'll forget about the
conflict when the struct nfs4_file is removed on close, so this is of limited
use for now, though it should at least solve a temporary problem with
self-conflicts on write opens from the same client.)

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Meelap Shah c2f1a551de knfsd: nfsd4: vary maximum delegation limit based on RAM size
Our original NFSv4 delegation policy was to give out a read delegation on any
open when it was possible to.

Since the lifetime of a delegation isn't limited to that of an open, a client
may quite reasonably hang on to a delegation as long as it has the inode
cached.  This becomes an obvious problem the first time a client's inode cache
approaches the size of the server's total memory.

Our first quick solution was to add a hard-coded limit.  This patch makes a
mild incremental improvement by varying that limit according to the server's
total memory size, allowing at most 4 delegations per megabyte of RAM.

My quick back-of-the-envelope calculation finds that in the worst case (where
every delegation is for a different inode), a delegation could take about
1.5K, which would make the worst case usage about 6% of memory.  The new limit
works out to be about the same as the old on a 1-gig server.

[akpm@linux-foundation.org: Don't needlessly bloat vmlinux]
[akpm@linux-foundation.org: Make it right for highmem machines]
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 1e5140279f knfsd: nfsd: remove unused header interface.h
It looks like Al Viro gutted this header file five years ago and it hasn't
been touched since.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields 4b2ca38ad6 knfsd: nfsd4: fix handling of acl errrors
nfs4_acl_nfsv4_to_posix() returns an error and returns any posix acls
calculated in two caller-provided pointers.  It was setting these pointers to
-errno in some error cases, resulting in nfsd4_set_nfs4_acl() calling
posix_acl_release() with a -errno as an argument.

Fix both the caller and the callee, by modifying nfsd4_set_nfs4_acl() to
stop relying on the passed-in-pointers being left as NULL in the error
case, and by modifying nfs4_acl_nfsv4_to_posix() to stop returning
garbage in those pointers.

Thanks to Alex Soule for reporting the bug.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Alexander Soule <soule@umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Benny Halevy 0ac68d1799 knfsd: nfsd4: fix enc_stateid_sz for nfsd callbacks
enc_stateid_sz should be given in u32 words units, not bytes, so we were
overestimating the buffer space needed here.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
J. Bruce Fields f7fede4b27 knfsd: nfsd4: silence a compiler warning in ACL code
Silence a compiler warning in the ACL code, and add a comment making clear the
initialization serves no other purpose.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Marc Eshel 9a8db97e77 knfsd: lockd: nfsd4: use same grace period for lockd and nfsd4
Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot,
during which clients may reclaim locks from the previous server instance, but
may not acquire new locks.

Currently the lockd and nfsd enforce grace periods of different lengths.  This
may cause problems when we reboot a server with both v2/v3 and v4 clients.
For example, if the lockd grace period is shorter (as is likely the case),
then a v3 client might acquire a new lock that conflicts with a lock already
held (but not yet reclaimed) by a v4 client.

This patch calculates a lease time that lockd and nfsd can both use.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:07 -07:00
Andrew Morton 12127498c8 nfsd warning fix
gcc-4.3:

fs/nfsd/nfsctl.c: In function 'write_getfs':
fs/nfsd/nfsctl.c:248: warning: cast from pointer to integer of different size

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:06 -07:00
Christoph Hellwig d37065cd6d knfsd: exportfs: add procedural interface for NFSD
Currently NFSD calls directly into filesystems through the export_operations
structure.  I plan to change this interface in various ways in later patches,
and want to avoid the export of the default operations to NFSD, so this patch
adds two simple exportfs_encode_fh/exportfs_decode_fh helpers for NFSD to call
instead of poking into exportfs guts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:06 -07:00
Christoph Hellwig a569425512 knfsd: exportfs: add exportfs.h header
currently the export_operation structure and helpers related to it are in
fs.h.  fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.

[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:06 -07:00
Rafael J. Wysocki 8314418629 Freezer: make kernel threads nonfreezable by default
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves.  This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie.  to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear.  Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Linus Torvalds 16cefa8c38 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (122 commits)
  sunrpc: drop BKL around wrap and unwrap
  NFSv4: Make sure unlock is really an unlock when cancelling a lock
  NLM: fix source address of callback to client
  SUNRPC client: add interface for binding to a local address
  SUNRPC server: record the destination address of a request
  SUNRPC: cleanup transport creation argument passing
  NFSv4: Make the NFS state model work with the nosharedcache mount option
  NFS: Error when mounting the same filesystem with different options
  NFS: Add the mount option "nosharecache"
  NFS: Add support for mounting NFSv4 file systems with string options
  NFS: Add final pieces to support in-kernel mount option parsing
  NFS: Introduce generic mount client API
  NFS: Add enums and match tables for mount option parsing
  NFS: Improve debugging output in NFS in-kernel mount client
  NFS: Clean up in-kernel NFS mount
  NFS: Remake nfsroot_mount as a permanent part of NFS client
  SUNRPC: Add a convenient default for the hostname when calling rpc_create()
  SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name
  SUNRPC: Rename rpcb_getport_external routine
  SUNRPC: Allow rpcbind requests to be interrupted by a signal.
  ...
2007-07-13 16:46:18 -07:00
Jens Axboe 4fbef206da nfsd: fix nfsd_vfs_read() splice actor setup
When nfsd was transitioned to use splice instead of sendfile() for data
transfers, a line setting the page index was lost. Restore it, so that
nfsd is functional when that path is used.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-13 16:45:43 -07:00
Chuck Lever 43780b87fa SUNRPC: Add a convenient default for the hostname when calling rpc_create()
A couple of callers just use a stringified IP address for the rpc client's
hostname.  Move the logic for constructing this into rpc_create(), so it can
be shared.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Trond Myklebust f61534dfd3 SUNRPC: Remove redundant calls to rpciod_up()/rpciod_down()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:30 -04:00
Jens Axboe cac36bb06e pipe: change the ->pin() operation to ->confirm()
The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.

A good return from ->confirm() means that the buffer is really
there, and that the contents are good.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Jens Axboe d6b29d7cee splice: divorce the splice structure/function definitions from the pipe header
We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Jens Axboe cf8208d0ea sendfile: convert nfsd to splice_direct_to_actor()
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Alexey Dobriyan e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
NeilBrown b41eeef14d knfsd: avoid Oops if buggy userspace performs confusing filehandle->dentry mapping
When a lookup request arrives, nfsd uses information provided by userspace
(mountd) to find the right filesystem.

It then assumes that the same filehandle type as the incoming filehandle can
be used to create an outgoing filehandle.

However if mountd is buggy, or maybe just being creative, the filesystem may
not support that filesystem type, and the kernel could oops, particularly if
'ex_uuid' is NULL but a FSID_UUID* filehandle type is used.

So add some proper checking that the fsid version/type from the incoming
filehandle is actually supportable, and ignore that information if it isn't
supportable.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
NeilBrown 072f62ed85 knfsd: various nfsd xdr cleanups
1/ decode_sattr and decode_sattr3 never return NULL, so remove
   several checks for that. ditto for xdr_decode_hyper.

2/ replace some open coded XDR_QUADLEN calls with calls to
   XDR_QUADLEN

3/ in decode_writeargs, simply an 'if' to use a single
   calculation.
   .page_len is the length of that part of the packet that did
   not fit in the first page (the head).
   So the length of the data part is the remainder of the
   head, plus page_len.

3/ other minor cleanups.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Christoph Hellwig f725b217b1 knfsd: trivial makefile cleanup
kbuild directly interprets <modulename>-y as objects to build into a module,
no need to assign it to the old foo-objs variable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
NeilBrown 402acd29e5 knfsd: avoid use of unitialised variables on error path when nfs exports
We need to zero various parts of 'exp' before any 'goto out', otherwise when
we go to free the contents...  we die.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Jeff Layton cd123012d9 RPC: add wrapper for svc_reserve to account for checksum
When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet.  If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:

RPC request reserved 164 but used 208

While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726

This is probably also a problem for other sec= types and for NFSv4.  The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.

This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum.  It also fixes up the appropriate callers of svc_reserve to
call the wrapper.  For now, it just uses a hardcoded value that I
determined via testing.  That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.

Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Eric W. Biederman 6697164335 nfsd/nfs4state: remove unnecessary daemonize call
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Peter Staubach f34b95689d The NFSv2/NFSv3 server does not handle zero length WRITE requests correctly
The NFSv2 and NFSv3 servers do not handle WRITE requests for 0 bytes
correctly.  The specifications indicate that the server should accept the
request, but it should mostly turn into a no-op.  Currently, the server
will return an XDR decode error, which it should not.

Attached is a patch which addresses this issue.  It also adds some boundary
checking to ensure that the request contains as much data as was requested
to be written.  It also correctly handles an NFSv3 request which requests
to write more data than the server has stated that it is prepared to
handle.  Previously, there was some support which looked like it should
work, but wasn't quite right.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Adrian Bunk 8842c9655b remove nfs4_acl_add_ace()
nfs4_acl_add_ace() can now be removed.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Randy Dunlap e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Linus Torvalds 2d56d3c43c Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
  gfs2: nfs lock support for gfs2
  lockd: add code to handle deferred lock requests
  lockd: always preallocate block in nlmsvc_lock()
  lockd: handle test_lock deferrals
  lockd: pass cookie in nlmsvc_testlock
  lockd: handle fl_grant callbacks
  lockd: save lock state on deferral
  locks: add fl_grant callback for asynchronous lock return
  nfsd4: Convert NFSv4 to new lock interface
  locks: add lock cancel command
  locks: allow {vfs,posix}_lock_file to return conflicting lock
  locks: factor out generic/filesystem switch from setlock code
  locks: factor out generic/filesystem switch from test_lock
  locks: give posix_test_lock same interface as ->lock
  locks: make ->lock release private data before returning in GETLK case
  locks: create posix-to-flock helper functions
  locks: trivial removal of unnecessary parentheses
2007-05-07 12:34:24 -07:00
Marc Eshel fd85b8170d nfsd4: Convert NFSv4 to new lock interface
Convert NFSv4 to the new lock interface.  We don't define any callback for now,
so we're not taking advantage of the asynchronous feature--that's less critical
for the multi-threaded nfsd then it is for the single-threaded lockd.  But this
does allow a cluster filesystems to export cluster-coherent locking to NFS.

Note that it's cluster filesystems that are the issue--of the filesystems that
define lock methods (nfs, cifs, etc.), most are not exportable by nfsd.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:49 -04:00
Marc Eshel 150b393456 locks: allow {vfs,posix}_lock_file to return conflicting lock
The nfsv4 protocol's lock operation, in the case of a conflict, returns
information about the conflicting lock.

It's unclear how clients can use this, so for now we're not going so far as to
add a filesystem method that can return a conflicting lock, but we may as well
return something in the local case when it's easy to.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 19:23:24 -04:00
Marc Eshel 9d6a8c5c21 locks: give posix_test_lock same interface as ->lock
posix_test_lock() and ->lock() do the same job but have gratuitously
different interfaces.  Modify posix_test_lock() so the two agree,
simplifying some code in the process.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 17:39:00 -04:00
Chuck Lever 2bea90d43a SUNRPC: RPC buffer size estimates are too large
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two.  That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:10 -07:00
J. Bruce Fields 79f6523a16 [PATCH] knfsd: nfsd4: remove superfluous cancel_delayed_work() call
This cancel_delayed_work call is called from a function that is only called
from a piece of code that immediate follows a cancel and destruction of the
workqueue, so it's clearly a mistake.

Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
Bruce Fields 21315edd48 [PATCH] knfsd: nfsd4: demote "clientid in use" printk to a dprintk
The reused clientid here is a more of a problem for the client than the
server, and the client can report the problem itself if it's serious.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
Bruce Fields 54c0440949 [PATCH] knfsd: nfsd4: fix inheritance flags on v4 ace derived from posix default ace
A regression introduced in the last set of acl patches removed the
INHERIT_ONLY flag from aces derived from the posix acl.  Fix.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
NeilBrown 598b9a5637 [PATCH] knfsd: allow nfsd READDIR to return 64bit cookies
->readdir passes lofft_t offsets (used as nfs cookies) to
nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it
becomes an 'off_t', which isn't good.

So filesystems that returned 64bit offsets would lose.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
Al Viro a033f35a22 [PATCH] include of asm/pgtable.h in nfsfh is bogus
not needed and actually breaks build on frv, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 15:27:49 -07:00
Greg Banks c9ce228306 [PATCH] Fix a free-wrong-pointer bug in nfs/acl server.
Due to type confusion, when an nfsacl verison 2 'ACCESS' request
finishes and tries to clean up, it calls fh_put on entiredly the
wrong thing and this can cause an oops.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-19 16:13:28 -08:00
J. Bruce Fields 3160a711ef [PATCH] knfsd: nfsd4: fix handling of directories without default ACLs
When setting an ACL that lacks inheritable ACEs on a directory, we should set
a default ACL of zero length, not a default ACL with all bits denied.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields bec50c47aa [PATCH] knfsd: nfsd4: acls: avoid unnecessary denies
We're inserting deny's between some ACEs in order to enforce posix draft acl
semantics which prevent permissions from accumulating across entries in an
acl.

That's fine, but we're doing that by inserting a deny after *every* allow,
which is overkill.  We shouldn't be adding them in places where they actually
make no difference.

Also replaced some helper functions for creating acl entries; I prefer just
assigning directly to the struct fields--it takes a few more lines, but the
field names provide some documentation that I think makes the result easier
understand.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields f43daf6787 [PATCH] knfsd: nfsd4: acls: don't return explicit mask
Return just the effective permissions, and forget about the mask.  It isn't
worth the complexity.

WARNING: This breaks backwards compatibility with overly-picky nfsv4->posix
acl translation, as may has been included in some patched versions of libacl.
To our knowledge no such version was every distributed by anyone outside citi.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields f34f924274 [PATCH] knfsd: nfsd4: fix error return on unsupported acl
We should be returning ATTRNOTSUPP, not NOTSUPP, when acls are unsupported.

Also fix a comment.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields a4db5fe5df [PATCH] knfsd: nfsd4: fix memory leak on kmalloc failure in savemem
The wrong pointer is being kfree'd in savemem() when defer_free returns with
an error.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields 28e05dd845 [PATCH] knfsd: nfsd4: represent nfsv4 acl with array instead of linked list
Simplify the memory management and code a bit by representing acls with an
array instead of a linked list.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields 575a6290f0 [PATCH] knfsd: nfsd4: simplify nfsv4->posix translation
The code that splits an incoming nfsv4 ACL into inheritable and effective
parts can be combined with the the code that translates each to a posix acl,
resulting in simpler code that requires one less pass through the ACL.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields 7bdfa68c5e [PATCH] knfsd: nfsd4: relax checking of ACL inheritance bits
The rfc allows us to be more permissive about the ACL inheritance bits we
accept:

	"If the server supports a single "inherit ACE" flag that applies to
	both files and directories, the server may reject the request
	(i.e., requiring the client to set both the file and directory
	inheritance flags). The server may also accept the request and
	silently turn on the ACE4_DIRECTORY_INHERIT_ACE flag."

Let's take the latter option--the ACL is a complex attribute that could be
rejected for a wide variety of reasons, and the protocol gives us little
ability to explain the reason for the rejection, so erroring out is a
user-unfriendly last resort.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
J. Bruce Fields f534a257ac [PATCH] knfsd: nfsd4: fix non-terminated string
The server name is expected to be a null-terminated string, so we can't pass
in the raw client identifier.

What's more, the client identifier is just a binary, not necessarily
printable, blob.  Let's just use the ip address instead.  The server name
appears to exist just to help debugging by making some printk's more
informative.

Note that the string is copies into the rpc client structure, so the pointer
to the local variable does not outlive the function call.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:14:01 -08:00
Tim Schmielau cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
NeilBrown af6a4e280e [PATCH] knfsd: add some new fsid types
Add support for using a filesystem UUID to identify and export point in the
filehandle.

For NFSv2, this UUID is xor-ed down to 4 or 8 bytes so that it doesn't take up
too much room.  For NFSv3+, we use the full 16 bytes, and possibly also a
64bit inode number for exports beneath the root of a filesystem.

When generating an fsid to return in 'stat' information, use the UUID (hashed
down to size) if it is available and a small 'fsid' was not specifically
provided.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:53 -08:00
NeilBrown 982aedfd09 [PATCH] knfsd: tidy up choice of filesystem-identifier when creating a filehandle
If we are using the same version/fsid as a current filehandle, then there is
no need to verify the the numbers are valid for this export, and they must be
(we used them to find this export).

This allows us to simplify the fsid selection code.

Also change "ref_fh_version" and "ref_fh_fsid_type" to "version" and
"fsid_type", as the important thing isn't that they are the version/type of
the reference filehandle, but they are the chosen type for the new filehandle.

And tidy up some indenting.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:53 -08:00
NeilBrown 8971a1016b [PATCH] knfsd: fix return value for writes to some files in 'nfsd' filesystem
Most files in the 'nfsd' filesystem are transactional.  When you write, a
reply is generated that can be read back only on the same 'file'.

If the reply has zero length, the 'write' will incorrectly return a value of
'0' instead of the length that was written.  This causes 'rpc.nfsd' to give an
annoying warning.

This patch fixes the test.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:53 -08:00
Chuck Lever 27459f0940 [PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses
Expand the rq_addr field to allow it to contain larger addresses.

Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:36 -08:00
Chuck Lever ad06e4bd62 [PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing
There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address.  Top among these are error and debugging messages
that display the server's IP address.

Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Chuck Lever 482fb94e1b [PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper
Sometimes we need to create an RPC service but not register it with the local
portmapper.  NFSv4 delegation callback, for example.

Change the svc_makesock() API to allow optionally creating temporary or
permanent sockets, optionally registering with the local portmapper, and make
it return the ephemeral port of the new socket.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Al Viro fc2dd2e51a [PATCH] endianness bug: ntohl() misspelled as >> 24 in fh_verify().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-01 16:17:06 -08:00
NeilBrown 34e9a63b4f [PATCH] knfsd: ratelimit some nfsd messages that are triggered by external events
Also remove {NFSD,RPC}_PARANOIA as having the defines doesn't really add
anything.

The printks covered by RPC_PARANOIA were triggered by badly formatted
packets and so should be ratelimited.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-30 08:26:45 -08:00
NeilBrown a0ad13ef64 [PATCH] knfsd: Fix type mismatch with filldir_t used by nfsd
nfsd defines a type 'encode_dent_fn' which is much like 'filldir_t' except
that the first pointer is 'struct readdir_cd *' rather than 'void *'.  It
then casts encode_dent_fn points to 'filldir_t' as needed.  This hides any
other type mismatches between the two such as the fact that the 'ino' arg
recently changed from ino_t to u64.

So: get rid of 'encode_dent_fn', get rid of the cast of the function type,
change the first arg of various functions from 'struct readdir_cd *' to
'void *', and live with the fact that we have a little less type checking
on the calling of these functions now.  Less internal (to nfsd) checking
offset by more external checking, which is more important.

Thanks to Gabriel Paubert <paubert@iram.es> for discovering this and
providing an initial patch.

Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:51:00 -08:00