alistair23-linux

redonkable

Author	SHA1	Message	Date
Jeff Layton	c96223d3b6	nfsd: abstract out the get and set routines into the fault injection ops Now that we've added more granular locking in other places, it's time to address the fault injection code. This code is currently quite reliant on the client_mutex for protection. Start to change this by adding a new set of fault injection op vectors. For now they all use the legacy ones. In later patches we'll add new routines that can deal with more granular locking. Also, move some of the printk routines into the callers to make the results of the operations more uniform. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:02 -04:00
Jeff Layton	294ac32e99	nfsd: protect clid and verifier generation with client_lock The clid counter is a global counter currently. Move it to be a per-net property so that it can be properly protected by the nn->client_lock instead of relying on the client_mutex. The verifier generator is also potentially racy if there are two simultaneous callers. Generate the verifier when we generate the clid value, so it's also created under the client_lock. With this, there's no need to keep two counters as they'd always be in sync anyway, so just use the clientid_counter for both. As Trond points out, what would be best is to eventually move this code to use IDR instead of the hash tables. That would also help ensure uniqueness, but that's probably best done as a separate project. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:02 -04:00
Jeff Layton	fd699b8a48	nfsd: don't destroy clients that are busy It's possible that we'll have an in-progress call on some of the clients while a rogue EXCHANGE_ID or DESTROY_CLIENTID call comes in. Be sure to try and mark the client expired first, so that the refcount is respected. This will only be a problem once the client_mutex is removed. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:01 -04:00
Kinglong Mee	fb94d766af	NFSD: Put the reference of nfs4_file when freeing stid After testing nfs4 lock, I restart the nfsd service, got messages as, [ 5677.403419] nfsd: last server has exited, flushing export cache [ 5677.463728] ============================================================================= [ 5677.463942] BUG nfsd4_files (Tainted: G B OE): Objects remaining in nfsd4_files on kmem_cache_close() [ 5677.464055] ----------------------------------------------------------------------------- [ 5677.464203] INFO: Slab 0xffffea0000233400 objects=28 used=1 fp=0xffff880008cd3d98 flags=0x3ffc0000004080 [ 5677.464318] CPU: 0 PID: 3772 Comm: rmmod Tainted: G B OE 3.16.0-rc2+ #29 [ 5677.464420] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 5677.464538] 0000000000000000 0000000036af2c9f ffff88000ce97d68 ffffffff816eacfa [ 5677.464643] ffffea0000233400 ffff88000ce97e40 ffffffff811cda44 ffffffff00000020 [ 5677.464774] ffff88000ce97e50 ffff88000ce97e00 656a624f00000008 616d657220737463 [ 5677.464875] Call Trace: [ 5677.464925] [<ffffffff816eacfa>] dump_stack+0x45/0x56 [ 5677.464983] [<ffffffff811cda44>] slab_err+0xb4/0xe0 [ 5677.465040] [<ffffffff811d0457>] ? __kmalloc+0x117/0x290 [ 5677.465099] [<ffffffff81100eec>] ? on_each_cpu_cond+0xac/0xf0 [ 5677.465158] [<ffffffff811d1bc0>] ? kmem_cache_close+0x110/0x2e0 [ 5677.465218] [<ffffffff811d1be0>] kmem_cache_close+0x130/0x2e0 [ 5677.465279] [<ffffffff8135a0c1>] ? kobject_cleanup+0x91/0x1b0 [ 5677.465338] [<ffffffff811d22be>] __kmem_cache_shutdown+0xe/0x10 [ 5677.465399] [<ffffffff8119bd28>] kmem_cache_destroy+0x48/0x100 [ 5677.465466] [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd] [ 5677.465530] [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd] [ 5677.465589] [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200 [ 5677.465649] [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90 [ 5677.465759] [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b [ 5677.465822] INFO: Object 0xffff880008cd0000 @offset=0 [ 5677.465882] INFO: Allocated in nfsd4_process_open1+0x61/0x350 [nfsd] age=7599 cpu=0 pid=3253 [ 5677.466115] __slab_alloc+0x3b0/0x4b1 [ 5677.466166] kmem_cache_alloc+0x1e4/0x240 [ 5677.466220] nfsd4_process_open1+0x61/0x350 [nfsd] [ 5677.466276] nfsd4_open+0xee/0x860 [nfsd] [ 5677.466329] nfsd4_proc_compound+0x4d7/0x7f0 [nfsd] [ 5677.466384] nfsd_dispatch+0xbb/0x200 [nfsd] [ 5677.466447] svc_process_common+0x453/0x6f0 [sunrpc] [ 5677.466506] svc_process+0x103/0x170 [sunrpc] [ 5677.466559] nfsd+0x117/0x190 [nfsd] [ 5677.466609] kthread+0xd8/0xf0 [ 5677.466656] ret_from_fork+0x7c/0xb0 [ 5677.466775] kmem_cache_destroy nfsd4_files: Slab cache still has objects [ 5677.466839] CPU: 0 PID: 3772 Comm: rmmod Tainted: G B OE 3.16.0-rc2+ #29 [ 5677.466937] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 5677.467049] 0000000000000000 0000000036af2c9f ffff88000ce97eb0 ffffffff816eacfa [ 5677.467150] ffff880020bb2d00 ffff88000ce97ed0 ffffffff8119bdd9 0000000000000000 [ 5677.467250] ffffffffa06065c0 ffff88000ce97ee0 ffffffffa05ef78d ffff88000ce97ef0 [ 5677.467351] Call Trace: [ 5677.467397] [<ffffffff816eacfa>] dump_stack+0x45/0x56 [ 5677.467454] [<ffffffff8119bdd9>] kmem_cache_destroy+0xf9/0x100 [ 5677.467516] [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd] [ 5677.467579] [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd] [ 5677.467639] [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200 [ 5677.467765] [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90 [ 5677.467826] [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Fixes: `11b9164ada` "nfsd: Add a struct nfs4_file field to struct nfs4_stid" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:53:36 -04:00
Jeff Layton	7abea1e8e8	nfsd: don't destroy client if mark_client_expired_locked fails If it fails, it means that the client is in use and so destroying it would be bad. Currently, the client_mutex prevents this from happening but once we remove it, we won't be able to do this. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:26 -04:00
Jeff Layton	97403d95e1	nfsd: move unhash_client_locked call into mark_client_expired_locked All the callers except for the fault injection code call it directly afterward, and in the fault injection case it won't hurt to do so anyway. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:25 -04:00
Jeff Layton	217526e7ec	nfsd: protect the close_lru list and oo_last_closed_stid with client_lock Currently, it's protected by the client_mutex. Move it so that the list and the fields in the openowner are protected by the client_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:24 -04:00
Trond Myklebust	0a880a28f8	nfsd: Add lockdep assertions to document the nfs4_client/session locking Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:23 -04:00
Trond Myklebust	3e339f964b	nfsd: Ensure lookup_clientid() takes client_lock Ensure that the client lookup is done safely under the client_lock, so we're not relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:23 -04:00
Trond Myklebust	6b10ad193d	nfsd: Protect nfsd4_destroy_clientid using client_lock ...instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:22 -04:00
Jeff Layton	d20c11d86d	nfsd: Protect session creation and client confirm using client_lock In particular, we want to ensure that the move_to_confirmed() is protected by the nn->client_lock spin lock, so that we can use that when looking up the clientid etc. instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:21 -04:00
Trond Myklebust	3dbacee6e1	nfsd: Protect unconfirmed client creation using client_lock ...instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:20 -04:00
Trond Myklebust	5cc40fd7b6	nfsd: Move create_client() call outside the lock For efficiency reasons, and because we want to use spin locks instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:20 -04:00
Trond Myklebust	425510f5c8	nfsd: Don't require client_lock in free_client The struct nfs_client is supposed to be invisible and unreferenced before it gets here. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:19 -04:00
Trond Myklebust	4864af97e0	nfsd: Ensure that the laundromat unhashes the client before releasing locks If we leave the client on the confirmed/unconfirmed tables, and leave the sessions visible on the sessionid_hashtbl, then someone might find them before we've had a chance to destroy them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:18 -04:00
Trond Myklebust	4beb345b37	nfsd: Ensure struct nfs4_client is unhashed before we try to destroy it When we remove the client_mutex protection, we will need to ensure that it can't be found by other threads while we're destroying it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:17 -04:00
J. Bruce Fields	83e452fee8	nfsd4: fix out of date comment Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:16 -04:00
Kinglong Mee	d9499a9571	NFSD: Decrease nfsd_users in nfsd_startup_generic fail A memory allocation failure could cause nfsd_startup_generic to fail, in which case nfsd_users wouldn't be incorrectly left elevated. After nfsd restarts nfsd_startup_generic will then succeed without doing anything--the first consequence is likely nfs4_start_net finding a bad laundry_wq and crashing. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: `4539f14981` "nfsd: replace boolean nfsd_up flag by users counter" Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:26:09 -04:00
Jeff Layton	4ae098d327	nfsd: rename unhash_generic_stateid to unhash_ol_stateid ...to better match other functions that deal with open/lock stateids. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:31 -04:00
Jeff Layton	d83017f94c	nfsd: don't thrash the cl_lock while freeing an open stateid When we remove the client_mutex, we'll have a potential race between FREE_STATEID and CLOSE. The root of the problem is that we are walking the st_locks list, dropping the spinlock and then trying to release the persistent reference to the lockstateid. In between, a FREE_STATEID call can come along and take the lock, find the stateid and then try to put the reference. That leads to a double put. Fix this by not releasing the cl_lock in order to release each lock stateid. Use put_generic_stateid_locked to unhash them and gather them onto a list, and free_ol_stateid_reaplist to free any that end up on the list. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:31 -04:00
Jeff Layton	2c41beb0e5	nfsd: reduce cl_lock thrashing in release_openowner Releasing an openowner is a bit inefficient as it can potentially thrash the cl_lock if you have a lot of stateids attached to it. Once we remove the client_mutex, it'll also potentially be dangerous to do this. Add some functions to make it easier to defer the part of putting a generic stateid reference that needs to be done outside the cl_lock while doing the parts that must be done while holding it under a single lock. First we unhash each open stateid. Then we call put_generic_stateid_locked which will put the reference to an nfs4_ol_stateid. If it turns out to be the last reference, it'll go ahead and remove the stid from the IDR tree and put it onto the reaplist using the st_locks list_head. Then, after dropping the lock we'll call free_ol_stateid_reaplist to walk the list of stateids that are fully unhashed and ready to be freed, and free each of them. This function can sleep, so it must be done outside any spinlocks. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:30 -04:00
Jeff Layton	fc5a96c3b7	nfsd: close potential race in nfsd4_free_stateid Once we remove the client_mutex, it'll be possible for the sc_type of a lock stateid to change after it's found and checked, but before we can go to destroy it. If that happens, we can end up putting the persistent reference to the stateid more than once, and unhash it more than once. Fix this by unhashing the lock stateid prior to dropping the cl_lock but after finding it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:29 -04:00
Jeff Layton	3c1c995cc2	nfsd: optimize destroy_lockowner cl_lock thrashing Reduce the cl_lock trashing in destroy_lockowner. Unhash all of the lockstateids on the lockowner's list. Put the reference under the lock and see if it was the last one. If so, then add it to a private list to be destroyed after we drop the lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:28 -04:00
Jeff Layton	a819ecc1bb	nfsd: add locking to stateowner release Once we remove the client_mutex, we'll need to properly protect the stateowner reference counts using the cl_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:27 -04:00
Jeff Layton	882e9d25e1	nfsd: clean up and reorganize release_lockowner Do more within the main loop, and simplify the function a bit. Also, there's no need to take a stateowner reference unless we're going to call release_lockowner. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:27 -04:00
Trond Myklebust	d4f0489f38	nfsd: Move the open owner hash table into struct nfs4_client Preparation for removing the client_mutex. Convert the open owner hash table into a per-client table and protect it using the nfs4_client->cl_lock spin lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:26 -04:00
Trond Myklebust	c58c6610ec	nfsd: Protect adding/removing lock owners using client_lock Once we remove client mutex protection, we'll need to ensure that stateowner lookup and creation are atomic between concurrent compounds. Ensure that alloc_init_lock_stateowner checks the hashtable under the client_lock before adding a new element. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:25 -04:00
Trond Myklebust	7ffb588086	nfsd: Protect adding/removing open state owners using client_lock Once we remove client mutex protection, we'll need to ensure that stateowner lookup and creation are atomic between concurrent compounds. Ensure that alloc_init_open_stateowner checks the hashtable under the client_lock before adding a new element. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:24 -04:00
Jeff Layton	b401be22b5	nfsd: don't allow CLOSE to proceed until refcount on stateid drops Once we remove client_mutex protection, it'll be possible to have an in-flight operation using an openstateid when a CLOSE call comes in. If that happens, we can't just put the sc_file reference and clear its pointer without risking an oops. Fix this by ensuring that v4.0 CLOSE operations wait for the refcount to drop before proceeding to do so. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:23 -04:00
Jeff Layton	d3134b1049	nfsd: make openstateids hold references to their openowners Change it so that only openstateids hold persistent references to openowners. References can still be held by compounds in progress. With this, we can get rid of NFS4_OO_NEW. It's possible that we will create a new openowner in the process of doing the open, but something later fails. In the meantime, another task could find that openowner and start using it on a successful open. If that occurs we don't necessarily want to tear it down, just put the reference that the failing compound holds. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:23 -04:00
Jeff Layton	5adfd8850b	nfsd: clean up refcounting for lockowners Ensure that lockowner references are only held by lockstateids and operations that are in-progress. With this, we can get rid of release_lockowner_if_empty, which will be racy once we remove client_mutex protection. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:22 -04:00
Trond Myklebust	e4f1dd7fc2	nfsd: Make lock stateid take a reference to the lockowner A necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:21 -04:00
Jeff Layton	8f4b54c53f	nfsd: add an operation for unhashing a stateowner Allow stateowners to be unhashed and destroyed when the last reference is put. The unhashing must be idempotent. In a future patch, we'll add some locking around it, but for now it's only protected by the client_mutex. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:20 -04:00
Jeff Layton	5db1c03feb	nfsd: clean up lockowner refcounting when finding them Ensure that when finding or creating a lockowner, that we get a reference to it. For now, we also take an extra reference when a lockowner is created that can be put when release_lockowner is called, but we'll remove that in a later patch once we change how references are held. Since we no longer destroy lockowners in the event of an error in nfsd4_lock, we must change how the seqid gets bumped in the lk_is_new case. Instead of doing so on creation, do it manually in nfsd4_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:20 -04:00
Jeff Layton	58fb12e6a4	nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache We don't want to rely on the client_mutex for protection in the case of NFSv4 open owners. Instead, we add a mutex that will only be taken for NFSv4.0 state mutating operations, and that will be released once the entire compound is done. Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay take a reference to the stateowner when they are using it for NFSv4.0 open and lock replay caching. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:19 -04:00
Jeff Layton	6b180f0b57	nfsd: Add reference counting to state owners The way stateowners are managed today is somewhat awkward. They need to be explicitly destroyed, even though the stateids reference them. This will be particularly problematic when we remove the client_mutex. We may create a new stateowner and attempt to open a file or set a lock, and have that fail. In the meantime, another RPC may come in that uses that same stateowner and succeed. We can't have the first task tearing down the stateowner in that situation. To fix this, we need to change how stateowners are tracked altogether. Refcount them and only destroy them once all stateids that reference them have been destroyed. This patch starts by adding the refcounting necessary to do that. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:18 -04:00
Trond Myklebust	2d3f96689f	nfsd: Migrate the stateid reference into nfs4_find_stateid_by_type() Allow nfs4_find_stateid_by_type to take the stateid reference, while still holding the &cl->cl_lock. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:17 -04:00
Trond Myklebust	fd9110113c	nfsd: Migrate the stateid reference into nfs4_lookup_stateid() Allow nfs4_lookup_stateid to take the stateid reference, instead of having all the callers do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:16 -04:00
Trond Myklebust	4cbfc9f704	nfsd: Migrate the stateid reference into nfs4_preprocess_seqid_op Allow nfs4_preprocess_seqid_op to take the stateid reference, instead of having all the callers do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:15 -04:00
Trond Myklebust	0667b1e9d8	nfsd: Add reference counting to nfs4_preprocess_confirmed_seqid_op Ensure that all the callers put the open stateid after use. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:15 -04:00
Trond Myklebust	2585fc7958	nfsd: nfsd4_open_confirm() must reference the open stateid Ensure that nfsd4_open_confirm() keeps a reference to the open stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:14 -04:00
Trond Myklebust	8a0b589d8f	nfsd: Prepare nfsd4_close() for open stateid referencing Prepare nfsd4_close for a future where nfs4_preprocess_seqid_op() hands it a fully referenced open stateid. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:13 -04:00
Trond Myklebust	d6f2bc5dcf	nfsd: nfsd4_process_open2() must reference the open stateid Ensure that nfsd4_process_open2() keeps a reference to the open stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:12 -04:00
Trond Myklebust	dcd94cc2e7	nfsd: nfsd4_process_open2() must reference the delegation stateid Ensure that nfsd4_process_open2() keeps a reference to the delegation stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:11 -04:00
Trond Myklebust	67cb1279be	nfsd: Ensure that nfs4_open_delegation() references the delegation stateid Ensure that nfs4_open_delegation() keeps a reference to the delegation stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:11 -04:00
Trond Myklebust	858cc57336	nfsd: nfsd4_locku() must reference the lock stateid Ensure that nfsd4_locku() keeps a reference to the lock stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:10 -04:00
Trond Myklebust	3d0fabd5a4	nfsd: Add reference counting to lock stateids Ensure that nfsd4_lock() references the lock stateid while it is manipulating it. Not currently necessary, but will be once the client_mutex is removed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:09 -04:00
Jeff Layton	1af71cc801	nfsd: ensure atomicity in nfsd4_free_stateid and nfsd4_validate_stateid Hold the cl_lock over the bulk of these functions. In addition to ensuring that they aren't freed prematurely, this will also help prevent a potential race that could be introduced later. Once we remove the client_mutex, it'll be possible for FREE_STATEID and CLOSE to race and for both to try to put the "persistent" reference to the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:08 -04:00
Jeff Layton	356a95ece7	nfsd: clean up races in lock stateid searching and creation Preparation for removal of the client_mutex. Currently, no lock aside from the client_mutex is held when calling find_lock_state. Ensure that the cl_lock is held by adding a lockdep assertion. Once we remove the client_mutex, it'll be possible for another thread to race in and insert a lock state for the same file after we search but before we insert a new one. Ensure that doesn't happen by redoing the search after allocating a new stid that we plan to insert. If one is found just put the one that was allocated. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:07 -04:00
Jeff Layton	1c755dc1ad	nfsd: Add locking to protect the state owner lists Change to using the clp->cl_lock for this. For now, there's a lot of cl_lock thrashing, but in later patches we'll eliminate that and close the potential races that can occur when releasing the cl_lock while walking the lists. For now, the client_mutex prevents those races. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:07 -04:00
Jeff Layton	b49e084d8c	nfsd: do filp_close in sc_free callback for lock stateids Releasing locks when we unhash the stateid instead of doing so only when the stateid is actually released will be problematic in later patches when we need to protect the unhashing with spinlocks. Move it into the sc_free operation instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:19:50 -04:00
Jeff Layton	4770d72201	nfsd4: use cl_lock to synchronize all stateid idr calls Currently, this is serialized by the client_mutex, which is slated for removal. Add finer-grained locking here. Also, do some cleanup around find_stateid to prepare for taking references. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:19:25 -04:00
Trond Myklebust	11b9164ada	nfsd: Add a struct nfs4_file field to struct nfs4_stid All stateids are associated with a nfs4_file. Let's consolidate. Replace delegation->dl_file with the dl_stid.sc_file, and nfs4_ol_stateid->st_file with st_stid.sc_file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 12:51:34 -04:00
Trond Myklebust	6011695da2	nfsd: Add reference counting to the lock and open stateids When we remove the client_mutex, we'll need to be able to ensure that these objects aren't destroyed while we're not holding locks. Add a ->free() callback to the struct nfs4_stid, so that we can release a reference to the stid without caring about the contents. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 12:43:53 -04:00
Jeff Layton	b3fbfe0e7a	nfsd: print status when nfsd4_open fails to open file it just created It's possible for nfsd to fail opening a file that it has just created. When that happens, we throw a WARN but it doesn't include any info about the error code. Print the status code to give us a bit more info. Our QA group hit some of these warnings under some very heavy stress testing. My suspicion is that they hit the file-max limit, but it's hard to know for sure. Go ahead and add a -ENFILE mapping to nfserr_serverfault to make the error more distinct (and correct). Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 23:08:38 -04:00
Jeff Layton	650ecc8f8f	nfsd: remove dl_fh field from struct nfs4_delegation Now that the nfs4_file has a filehandle in it, we no longer need to keep a per-delegation copy of it. Switch to using the one in the nfs4_file instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:58 -04:00
Jeff Layton	f54fe962b8	nfsd: give block_delegation and delegation_blocked its own spinlock The state lock can be fairly heavily contended, and there's no reason that nfs4_file lookups and delegation_blocked should be mutually exclusive. Let's give the new block_delegation code its own spinlock. It does mean that we'll need to take a different lock in the delegation break code, but that's not generally as critical to performance. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:57 -04:00
Jeff Layton	0b26693c56	nfsd: clean up nfs4_set_delegation Move the alloc_init_deleg call into nfs4_set_delegation and change the function to return a pointer to the delegation or an IS_ERR return. This allows us to skip allocating a delegation if the file has already experienced a lease conflict. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:56 -04:00
Jeff Layton	4cf59221c7	nfsd: clean up arguments to nfs4_open_delegation No need to pass in a net pointer since we can derive that. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:55 -04:00
Jeff Layton	f9416e281e	nfsd: drop unused stp arg to alloc_init_deleg Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:54 -04:00
Trond Myklebust	02a3508dba	nfsd: Convert delegation counter to an atomic_long_t type We want to convert to an atomic type so that we don't need to lock across the call to alloc_init_deleg(). Then convert to a long type so that we match the size of 'max_delegations'. None of this is a problem today, but it will be once we remove client_mutex protection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:54 -04:00
Jeff Layton	2d4a532d38	nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock Currently, both destroy_revoked_delegation and revoke_delegation manipulate the cl_revoked list without any locking aside from the client_mutex. Ensure that the clp->cl_lock is held when manipulating it, except for the list walking in destroy_client. At that point, the client should no longer be in use, and so it should be safe to walk the list without any locking. That also means that we don't need to do the list_splice_init there either. Also, the fact that revoke_delegation deletes dl_recall_lru list_head without any locking makes it difficult to know whether it's doing so safely in all cases. Move the list_del_init calls into the callers, and add a WARN_ON in the event that t's passed a delegation that has a non-empty list_head. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:53 -04:00
Jeff Layton	4269067696	nfsd: fully unhash delegations when revoking them Ensure that the delegations cannot be found by the laundromat etc once we add them to the various 'revoke' lists. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:52 -04:00
Trond Myklebust	f83388341b	nfsd: simplify stateid allocation and file handling Don't allow stateids to clear the open file pointer until they are being destroyed. In a later patches we'll want to rely on the fact that we have a valid file pointer when dealing with the stateid and this will save us from having to do a lot of NULL pointer checks before doing so. Also, move to allocating stateids with kzalloc and get rid of the explicit zeroing of fields. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:51 -04:00
Jeff Layton	f9c00c3ab4	nfsd: Do not let nfs4_file pin the struct inode Remove the fi_inode field in struct nfs4_file in order to remove the possibility of struct nfs4_file pinning the inode when it does not have any open state. The only place we still need to get to an inode is in check_for_locks, so change it to use find_any_file and use the inode from any that it finds. If it doesn't find one, then just assume there aren't any. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	b07c54a4a3	nfsd: nfs4_check_fh - make it actually check the filehandle ...instead of just checking the inode that corresponds to it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	ca94321783	nfsd: Use the filehandle to look up the struct nfs4_file instead of inode This makes more sense anyway since an inode pointer value can change even when the filehandle doesn't. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	e2cf80d73f	nfsd: Store the filehandle with the struct nfs4_file For use when we may not have a struct inode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:23 -04:00
Himangi Saraogi	fc8e5a644c	nfsd4: convert comma to semicolon Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 14:20:49 -04:00
Jeff Layton	2f6ce8e73c	nfsd: ensure that st_access_bmap and st_deny_bmap are initialized to 0 Open stateids must be initialized with the st_access_bmap and st_deny_bmap set to 0, so that nfs4_get_vfs_file can properly record their state in old_access_bmap and old_deny_bmap. This bug was introduced in commit `baeb4ff0e5` (nfsd: make deny mode enforcement more efficient and close races in it) and was causing the refcounts to end up incorrect when nfs4_get_vfs_file returned an error after bumping the refcounts. This made it impossible to unmount the underlying filesystem after running pynfs tests that involve deny modes. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 14:20:47 -04:00
Kinglong Mee	f98bac5a30	NFSD: Fix crash encoding lock reply on 32-bit Commit `8c7424cff6` "nfsd4: don't try to encode conflicting owner if low on space" forgot to free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak. Worse, kfree() can be called on an uninitialized pointer in the case of a succesful lock (or one that fails for a reason other than a conflict). (Note that lock->lk_denied.ld_owner.data appears it should be zero here, until you notice that it's one arm of a union the other arm of which is written to in the succesful case by the memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid, sizeof(stateid_t)); in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.) Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: `8c7424cff6` ""nfsd4: don't try to encode conflicting owner if low on space" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 10:31:56 -04:00
Jeff Layton	d55a166c96	nfsd: bump dl_time when unhashing delegation There's a potential race between a lease break and DELEGRETURN call. Suppose a lease break comes in and queues the workqueue job for a delegation, but it doesn't run just yet. Then, a DELEGRETURN comes in finds the delegation and calls destroy_delegation on it to unhash it and put its primary reference. Next, the workqueue job runs and queues the delegation back onto the del_recall_lru list, issues the CB_RECALL and puts the final reference. With that, the final reference to the delegation is put, but it's still on the LRU list. When we go to unhash a delegation, it's because we intend to get rid of it soon afterward, so we don't want lease breaks to mess with it once that occurs. Fix this by bumping the dl_time whenever we unhash a delegation, to ensure that lease breaks don't monkey with it. I believe this is a regression due to commit `02e1215f9f` (nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg). Prior to that, the state_lock was held in the lm_break callback itself, and that would have prevented this race. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-22 15:34:47 -04:00
Trond Myklebust	72c0b0fb9f	nfsd: Move the delegation reference counter into the struct nfs4_stid We will want to add reference counting to the lock stateid and open stateids too in later patches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 17:03:00 -04:00
Jeff Layton	417c6629b2	nfsd: fix race that grants unrecallable delegation If nfs4_setlease succesfully acquires a new delegation, then another task breaks the delegation before we reach hash_delegation_locked, then the breaking task will see an empty fi_delegations list and do nothing. The client will receive an open reply incorrectly granting a delegation and will never receive a recall. Move more of the delegation fields to be protected by the fi_lock. It's more granular than the state_lock and in later patches we'll want to be able to rely on it in addition to the state_lock. Attempt to acquire a delegation. If that succeeds, take the spinlocks and then check to see if the file has had a conflict show up since then. If it has, then we assume that the lease is no longer valid and that we shouldn't hand out a delegation. There's also one more potential (but very unlikely) problem. If the lease is broken before the delegation is hashed, then it could leak. In the event that the fi_delegations list is empty, reset the fl_break_time to jiffies so that it's cleaned up ASAP by the normal lease handling code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 16:31:17 -04:00
J. Bruce Fields	57a3714421	nfsd4: CREATE_SESSION should update backchannel immediately nfsd4_probe_callback kicks off some work that will eventually run nfsd4_process_cb_update and update the session flags. In theory we could process a following SEQUENCE call before that update happens resulting in flags that don't accurately represent, for example, the lack of a backchannel. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 12:30:50 -04:00
Chuck Lever	3c45ddf823	svcrdma: Select NFSv4.1 backchannel transport based on forward channel The current code always selects XPRT_TRANSPORT_BC_TCP for the back channel, even when the forward channel was not TCP (eg, RDMA). When a 4.1 mount is attempted with RDMA, the server panics in the TCP BC code when trying to send CB_NULL. Instead, construct the transport protocol number from the forward channel transport or'd with XPRT_TRANSPORT_BC. Transports that do not support bi-directional RPC will not have registered a "BC" transport, causing create_backchannel_client() to fail immediately. Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-18 11:35:45 -04:00
J. Bruce Fields	5d6031ca74	nfsd4: zero op arguments beyond the 8th compound op The first 8 ops of the compound are zeroed since they're a part of the argument that's zeroed by the memset(rqstp->rq_argp, 0, procp->pc_argsize); in svc_process_common(). But we handle larger compounds by allocating the memory on the fly in nfsd4_decode_compound(). Other than code recently fixed by `01529e3f81` "NFSD: Fix memory leak in encoding denied lock", I don't know of any examples of code depending on this initialization. But it definitely seems possible, and I'd rather be safe. Compounds this long are unusual so I'm much more worried about failure in this poorly tested cases than about an insignificant performance hit. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-17 16:20:39 -04:00
Jeff Layton	ae4b884fc6	nfsd: silence sparse warning about accessing credentials sparse says: fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces) fs/nfsd/auth.c:31:38: expected struct cred const cred fs/nfsd/auth.c:31:38: got struct cred const [noderef] <asn:4>real_cred Add a new accessor for the ->real_cred and use that to fetch the pointer. Accessing current->real_cred directly is actually quite safe since we know that they can't go away so this is mostly a cosmetic fixup to silence sparse. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-17 16:15:35 -04:00
Trond Myklebust	b0fc29d6fc	nfsd: Ensure stateids remain unique until they are freed Add an extra delegation state to allow the stateid to remain in the idr tree until the last reference has been released. This will be necessary to ensure uniqueness once the client_mutex is removed. [jlayton: reset the sc_type under the state_lock in unhash_delegation] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 21:39:51 -04:00
Jeff Layton	d564fbec7a	nfsd: nfs4_alloc_init_lease should take a nfs4_file arg No need to pass the delegation pointer in here as it's only used to get the nfs4_file pointer. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 21:35:25 -04:00
Jeff Layton	02e1215f9f	nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg state_lock is a heavily contended global lock. We don't want to grab that while simultaneously holding the inode->i_lock. Add a new per-nfs4_file lock that we can use to protect the per-nfs4_file delegation list. Hold that while walking the list in the break_deleg callback and queue the workqueue job for each one. The workqueue job can then take the state_lock and do the list manipulations without the i_lock being held prior to starting the rpc call. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 21:06:12 -04:00
Jeff Layton	e8051c837b	nfsd: eliminate nfsd4_init_callback It's just an obfuscated INIT_WORK call. Just make the work_func_t a non-static symbol and use a normal INIT_WORK call. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-16 14:18:58 -04:00
Kinglong Mee	d5d5c304b1	NFSD: Fix bad checking of space for padding in splice read Note that the caller has already reserved space for count and eof, so xdr->p has already moved past them, only the padding remains. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes `dc97618ddd` (nfsd4: separate splice and readv cases) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 15:19:25 -04:00
Kinglong Mee	35e634b83c	NFSD: Check acl returned from get_acl/posix_acl_from_mode Commit `4ac7249ea5` (nfsd: use get_acl and ->set_acl) don't check the acl returned from get_acl()/posix_acl_from_mode(). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 15:03:53 -04:00
Jeff Layton	a46cb7f287	nfsd: cleanup and rename nfs4_check_open Rename it to better describe what it does, and have it just return the stateid instead of a __be32 (which is now always nfs_ok). Also, do the search for an existing stateid after the delegation check, to reduce cleanup if the delegation check returns error. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:06:38 -04:00
Jeff Layton	baeb4ff0e5	nfsd: make deny mode enforcement more efficient and close races in it The current enforcement of deny modes is both inefficient and scattered across several places, which makes it hard to guarantee atomicity. The inefficiency is a problem now, and the lack of atomicity will mean races once the client_mutex is removed. First, we address the inefficiency. We have to track deny modes on a per-stateid basis to ensure that open downgrades are sane, but when the server goes to enforce them it has to walk the entire list of stateids and check against each one. Instead of doing that, maintain a per-nfs4_file deny mode. When a file is opened, we simply set any deny bits in that mode that were specified in the OPEN call. We can then use that unified deny mode to do a simple check to see whether there are any conflicts without needing to walk the entire stateid list. The only time we'll need to walk the entire list of stateids is when a stateid that has a deny mode on it is being released, or one is having its deny mode downgraded. In that case, we must walk the entire list and recalculate the fi_share_deny field. Since deny modes are pretty rare today, this should be very rare under normal workloads. To address the potential for races once the client_mutex is removed, protect fi_share_deny with the fi_lock. In nfs4_get_vfs_file, check to make sure that any deny mode we want to apply won't conflict with existing access. If that's ok, then have nfs4_file_get_access check that new access to the file won't conflict with existing deny modes. If that also passes, then get file access references, set the correct access and deny bits in the stateid, and update the fi_share_deny field. If opening the file or truncating it fails, then unwind the whole mess and return the appropriate error. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:06:32 -04:00
Jeff Layton	7214e8600e	nfsd: always hold the fi_lock when bumping fi_access refcounts Once we remove the client_mutex, there's an unlikely but possible race that could occur. It will be possible for nfs4_file_put_access to race with nfs4_file_get_access. The refcount will go to zero (briefly) and then bumped back to one. If that happens we set ourselves up for a use-after-free and the potential for a lock to race onto the i_flock list as a filp is being torn down. Ensure that we can safely bump the refcount on the file by holding the fi_lock whenever that's done. The only place it currently isn't is in get_lock_access. In order to ensure atomicity with finding the file, use the find_*_file_locked variants and then call get_lock_access to get new access references on the nfs4_file under the same lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:06:17 -04:00
Jeff Layton	3b84240a7b	nfsd: clean up reset_union_bmap_deny Fix the "deny" argument type, and start the loop at 1. The 0 iteration is always a noop. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:06:11 -04:00
Jeff Layton	6eb3a1d096	nfsd: set stateid access and deny bits in nfs4_get_vfs_file Cleanup -- ensure that the stateid bits are set at the same time that the file access refcounts are incremented. Keeping them coherent like this makes it easier to ensure that we account for all of the references. Since the initialization of the st_*_bmap fields is done when it's hashed, we go ahead and hash the stateid before getting access to the file and unhash it if that function returns error. This will be necessary anyway in a follow-on patch that will overhaul deny mode handling. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:06:05 -04:00
Jeff Layton	c11c591fe6	nfsd: shrink st_access_bmap and st_deny_bmap We never use anything above bit #3, so an unsigned long for each is wasteful. Shrink them to a char each, and add some WARN_ON_ONCE calls if we try to set or clear bits that would go outside those sizes. Note too that because atomic bitops work on unsigned longs, we have to abandon their use here. That shouldn't be a problem though since we don't really care about the atomicity in this code anyway. Using them was just a convenient way to flip bits. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:06:04 -04:00
Jeff Layton	6d338b51eb	nfsd: remove nfs4_file_put_fd ...and replace it with a simple swap call. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:05:57 -04:00
Jeff Layton	1265965172	nfsd: refactor nfs4_file_get_access and nfs4_file_put_access Have them take NFS4_SHARE_ACCESS_* flags instead of an open mode. This spares the callers from having to convert it themselves. This also allows us to simplify these functions as we no longer need to do the access_to_omode conversion in either one. Note too that this patch eliminates the WARN_ON in __nfs4_file_get_access. It's valid for now, but in a later patch we'll be bumping the refcounts prior to opening the file in order to close some races, at which point we'll need to remove it anyway. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-11 11:03:23 -04:00
Trond Myklebust	e20fcf1e65	nfsd: clean up helper __release_lock_stateid Use filp_close instead of open coding. filp_close does a bit more than just release the locks and put the filp. It also calls ->flush and dnotify_flush, both of which should be done here anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-10 15:05:26 -04:00
Trond Myklebust	de18643dce	nfsd: Add locking to the nfs4_file->fi_fds[] array Preparation for removal of the client_mutex, which currently protects this array. While we don't actually need the find_*_file_locked variants just yet, a later patch will. So go ahead and add them now to reduce future churn in this code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-10 15:05:26 -04:00
Trond Myklebust	1d31a2531a	nfsd: Add fine grained protection for the nfs4_file->fi_stateids list Access to this list is currently serialized by the client_mutex. Add finer grained locking around this list in preparation for its removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-10 15:05:25 -04:00
Jeff Layton	d6c249b4d4	nfsd: reduce some spinlocking in put_client_renew No need to take the lock unless the count goes to 0. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-10 13:41:00 -04:00
Jeff Layton	dff1399f8a	nfsd: close potential race between delegation break and laundromat Bruce says: There's also a preexisting expire_client/laundromat vs break race: - expire_client/laundromat adds a delegation to its local reaplist using the same dl_recall_lru field that a delegation uses to track its position on the recall lru and drops the state lock. - a concurrent break_lease adds the delegation to the lru. - expire/client/laundromat then walks it reaplist and sees the lru head as just another delegation on the list.... Fix this race by checking the dl_time under the state_lock. If we find that it's not 0, then we know that it has already been queued to the LRU list and that we shouldn't queue it again. In the case of destroy_client, we must also ensure that we don't hit similar races by ensuring that we don't move any delegations to the reaplist with a dl_time of 0. Just bump the dl_time by one before we drop the state_lock. We're destroying the delegations anyway, so a 1s difference there won't matter. The fault injection code also requires a bit of surgery here: First, in the case of nfsd_forget_client_delegations, we must prevent the same sort of race vs. the delegation break callback. For that, we just increment the dl_time to ensure that a delegation callback can't race in while we're working on it. We can't do that for nfsd_recall_client_delegations, as we need to have it actually queue the delegation, and that won't happen if we increment the dl_time. The state lock is held over that function, so we don't need to worry about these sorts of races there. There is one other potential bug nfsd_recall_client_delegations though. Entries on the victims list are not dequeued before calling nfsd_break_one_deleg. That's a potential list corruptor, so ensure that we do that there. Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-10 13:40:51 -04:00
Kinglong Mee	01529e3f81	NFSD: Fix memory leak in encoding denied lock Commit `8c7424cff6` (nfsd4: don't try to encode conflicting owner if low on space) forgot free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:08 -04:00
Trond Myklebust	0fe492db60	nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid() lookup_clientid is preferable to find_confirmed_client since it's able to use the cached client in the compound state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:07 -04:00
Trond Myklebust	2d91e8953c	nfsd: Always use lookup_clientid() in nfsd4_process_open1 In later patches, we'll be moving the stateowner table into the nfs4_client, and by doing this we ensure that we have a cached nfs4_client pointer. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:06 -04:00
Trond Myklebust	13d6f66b08	nfsd: Convert nfsd4_process_open1() to work with lookup_clientid() ...and have alloc_init_open_stateowner just use the cstate->clp pointer instead of passing in a clp separately. This allows us to use the cached nfs4_client pointer in the cstate instead of having to look it up again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:05 -04:00
Jeff Layton	4b24ca7d30	nfsd: Allow struct nfsd4_compound_state to cache the nfs4_client We want to use the nfsd4_compound_state to cache the nfs4_client in order to optimise away extra lookups of the clid. In the v4.0 case, we use this to ensure that we only have to look up the client at most once per compound for each call into lookup_clientid. For v4.1+ we set the pointer in the cstate during SEQUENCE processing so we should never need to do a search for it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:04 -04:00
Jeff Layton	62814d6a9b	nfsd: add a nfserrno mapping for -E2BIG to nfserr_fbig I saw this pop up with some pynfs testing: [ 123.609992] nfsd: non-standard errno: -7 ...and -7 is -E2BIG. I think what happened is that XFS returned -E2BIG due to some xattr operations with the ACL10 pynfs TEST (I guess it has limited xattr size?). Add a better mapping for that error since it's possible that we'll need it. How about we convert it to NFSERR_FBIG? As Bruce points out, they both have "BIG" in the name so it must be good. Also, turn the printk in this function into a WARN() so that we can get a bit more information about situations that don't have proper mappings. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:03 -04:00
Jeff Layton	722b620d18	nfsd: properly convert return from commit_metadata to __be32 Commit 2a7420c03e504 (nfsd: Ensure that nfsd_create_setattr commits files to stable storage), added a couple of calls to commit_metadata, but doesn't convert their return codes to __be32 in the appropriate places. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:02 -04:00
Trond Myklebust	2dd6e458c3	nfsd: Cleanup - Let nfsd4_lookup_stateid() take a cstate argument The cstate already holds information about the session, and hence the client id, so it makes more sense to pass that information rather than the current practice of passing a 'minor version' number. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:01 -04:00
Trond Myklebust	d4e19e7027	nfsd: Don't get a session reference without a client reference If the client were to disappear from underneath us while we're holding a session reference, things would be bad. This cleanup helps ensure that it cannot, which will be a possibility when the client_mutex is removed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:00 -04:00
Jeff Layton	fd44907c2d	nfsd: clean up nfsd4_release_lockowner Now that we know that we won't have several lockowners with the same, owner->data, we can simplify nfsd4_release_lockowner and get rid of the lo_list in the process. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:54:59 -04:00
Trond Myklebust	b3c32bcd9c	nfsd: NFSv4 lock-owners are not associated to a specific file Just like open-owners, lock-owners are associated with a name, a clientid and, in the case of minor version 0, a sequence id. There is no association to a file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:54:58 -04:00
Jeff Layton	c53530da4d	nfsd: Allow lockowners to hold several stateids A lockowner can have more than one lock stateid. For instance, if a process has more than one file open and has locks on both, then the same lockowner has more than one stateid associated with it. Change it so that this reality is better reflected by the objects that nfsd uses. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:54:57 -04:00
Trond Myklebust	3c87b9b7c0	nfsd: lock owners are not per open stateid In the NFSv4 spec, lock stateids are per-file objects. Lockowners are not. This patch replaces the current list of lock owners in the open stateids with a list of lock stateids. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:37 -04:00
Trond Myklebust	acf9295b1c	nfsd: clean up nfsd4_close_open_stateid Minor cleanup that should introduce no behavioral changes. Currently this function just unhashes the stateid and leaves the caller to do the work of the CLOSE processing. Change nfsd4_close_open_stateid so that it handles doing all of the work of closing a stateid. Move the handling of the unhashed stateid into it instead of doing that work in nfsd4_close. This will help isolate some coming changes to stateid handling from nfsd4_close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:36 -04:00
Jeff Layton	db24b3b4b2	nfsd: declare v4.1+ openowners confirmed on creation There's no need to confirm an openowner in v4.1 and above, so we can go ahead and set NFS4_OO_CONFIRMED when we create openowners in those versions. This will also be necessary when we remove the client_mutex, as it'll be possible for two concurrent opens to race in versions >4.0. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:35 -04:00
Trond Myklebust	b607664ee7	nfsd: Cleanup nfs4svc_encode_compoundres Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c instead of open coding in nfs4svc_encode_compoundres. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:34 -04:00
Trond Myklebust	e17f99b728	nfsd: nfs4_preprocess_seqid_op should only set *stpp on success Not technically a bugfix, since nothing tries to use the return pointer if this function doesn't return success, but it could be a problem with some coming changes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:33 -04:00
Jeff Layton	5b8db00bae	nfsd: add a new /proc/fs/nfsd/max_connections file Currently, the maximum number of connections that nfsd will allow is based on the number of threads spawned. While this is fine for a default, there really isn't a clear relationship between the two. The number of threads corresponds to the number of concurrent requests that we want to allow the server to process at any given time. The connection limit corresponds to the maximum number of clients that we want to allow the server to handle. These are two entirely different quantities. Break the dependency on increasing threads in order to allow for more connections, by adding a new per-net parameter that can be set to a non-zero value. The default is still to base it on the number of threads, so there should be no behavior change for anyone who doesn't use it. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:32 -04:00
Trond Myklebust	0f3a24b43b	nfsd: Ensure that nfsd_create_setattr commits files to stable storage Since nfsd_create_setattr strips the mode from the struct iattr, it is quite possible that it will optimise away the call to nfsd_setattr altogether. If this is the case, then we never call commit_metadata() on the newly created file. Also ensure that both nfsd_setattr() and nfsd_create_setattr() fail when the call to commit_metadata fails. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:31 -04:00
Kinglong Mee	1e444f5bc0	NFSD: Remove iattr parameter from nfsd_symlink() Commit `db2e747b14` (vfs: remove mode parameter from vfs_symlink()) have remove mode parameter from vfs_symlink. So that, iattr isn't needed by nfsd_symlink now, just remove it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:31 -04:00
Trond Myklebust	950e0118d0	nfsd: Protect addition to the file_hashtbl Current code depends on the client_mutex to guarantee a single struct nfs4_file per inode in the file_hashtbl and make addition atomic with respect to lookup. Rely instead on the state_Lock, to make it easier to stop taking the client_mutex here later. To prevent an i_lock/state_lock inversion, change nfsd4_init_file to use ihold instead if igrab. That's also more efficient anyway as we definitely hold a reference to the inode at that point. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:30 -04:00
Christoph Hellwig	7e6a72e5f1	nfsd: fix file access refcount leak when nfsd4_truncate fails nfsd4_process_open2 will currently will get access to the file, and then call nfsd4_truncate to (possibly) truncate it. If that operation fails though, then the access references will never be released as the nfs4_ol_stateid is never initialized. Fix by moving the nfsd4_truncate call into nfs4_get_vfs_file, ensuring that the refcounts are properly put if the truncate fails. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:29 -04:00
Kinglong Mee	1055414fe1	NFSD: Avoid warning message when compile at i686 arch fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_readv': >> fs/nfsd/nfs4xdr.c:3137:148: warning: comparison of distinct pointer types lacks a cast [enabled by default] thislen = min(len, ((void )xdr->end - (void )xdr->p)); Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:28 -04:00
J. Bruce Fields	d5e2338324	nfsd4: replace defer_free by svcxdr_tmpalloc Avoid an extra allocation for the tmpbuf struct itself, and stop ignoring some allocation failures. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:27 -04:00
J. Bruce Fields	bcaab953b1	nfsd4: remove nfs4_acl_new This is a not-that-useful kmalloc wrapper. And I'd like one of the callers to actually use something other than kmalloc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:27 -04:00
J. Bruce Fields	29c353b3fe	nfsd4: define svcxdr_dupstr to share some common code Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:26 -04:00
J. Bruce Fields	ce043ac826	nfsd4: remove unused defer_free argument `28e05dd845` "knfsd: nfsd4: represent nfsv4 acl with array instead of linked list" removed the last user that wanted a custom free function. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:25 -04:00
J. Bruce Fields	7fb84306f5	nfsd4: rename cr_linkname->cr_data The name of a link is currently stored in cr_name and cr_namelen, and the content in cr_linkname and cr_linklen. That's confusing. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:24 -04:00
J. Bruce Fields	52ee04330f	nfsd: let nfsd_symlink assume null-terminated data Currently nfsd_symlink has a weird hack to serve callers who don't null-terminate symlink data: it looks ahead at the next byte to see if it's zero, and copies it to a new buffer to null-terminate if not. That means callers don't have to null-terminate, but they do have to ensure that the byte following the end of the data is theirs to read. That's a bit subtle, and the NFSv4 code actually got this wrong. So let's just throw out that code and let callers pass null-terminated strings; we've already fixed them to do that. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:23 -04:00
J. Bruce Fields	0aeae33f5d	nfsd: make NFSv2 null terminate symlink data It's simple enough for NFSv2 to null-terminate the symlink data. A bit weird (it depends on knowing that we've already read the following byte, which is either padding or part of the mode), but no worse than the conditional kstrdup it otherwise relies on in nfsd_symlink(). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:23 -04:00
J. Bruce Fields	b829e9197a	nfsd: fix rare symlink decoding bug An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:22 -04:00
Kinglong Mee	c3a4561796	nfsd: Fix bad reserving space for encoding rdattr_error Introduced by commit `561f0ed498` (nfsd4: allow large readdirs). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-07 14:16:31 -04:00
Avi Kivity	69bbd9c7b9	nfs: fix nfs4d readlink truncated packet XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding, but truncates the packet to the padding-less size. Fix by taking the padding into consideration when truncating the packet. Symptoms: # ll /mnt/ ls: cannot read symbolic link /mnt/test: Input/output error total 4 -rw-r--r--. 1 root root 0 Jun 14 01:21 123456 lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree Signed-off-by: Avi Kivity <avi@cloudius-systems.com> Fixes: `476a7b1f4b` (nfsd4: don't treat readlink like a zero-copy operation) Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-02 17:37:13 -04:00
J. Bruce Fields	76f47128f9	nfsd: fix rare symlink decoding bug An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-27 16:10:46 -04:00
Jeff Layton	d4c8e34fe8	nfsd: properly handle embedded newlines in fault_injection input Currently rpc_pton() fails to handle the case where you echo an address into the file, as it barfs on the newline. Ensure that we NULL out the first occurrence of any newline. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:38 -04:00
Jeff Layton	f7ce5d2842	nfsd: fix return of nfs4_acl_write_who AFAICT, the only way to hit this error is to pass this function a bogus "who" value. In that case, we probably don't want to return -1 as that could get sent back to the client. Turn this into nfserr_serverfault, which is a more appropriate error for a server bug like this. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:38 -04:00
Jeff Layton	94ec938b61	nfsd: add appropriate __force directives to filehandle generation code The filehandle structs all use host-endian values, but will sometimes stuff big-endian values into those fields. This is OK since these values are opaque to the client, but it confuses sparse. Add __force to make it clear that we are doing this intentionally. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:37 -04:00
Jeff Layton	e2afc81919	nfsd: nfsd_splice_read and nfsd_readv should return __be32 The callers expect a __be32 return and the functions they call return __be32, so having these return int is just wrong. Also, nfsd_finish_read can be made static. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:37 -04:00
Jeff Layton	b3d8d1284a	nfsd: clean up sparse endianness warnings in nfscache.c We currently hash the XID to determine a hash bucket to use for the reply cache entry, which is fed into hash_32 without byte-swapping it. Add __force to make sparse happy, and add some comments to explain why. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:37 -04:00
Jeff Layton	f419992c1f	nfsd: add __force to opaque verifier field casts sparse complains that we're stuffing non-byte-swapped values into __be32's here. Since they're supposed to be opaque, it doesn't matter much. Just add __force to make sparse happy. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:37 -04:00
Kinglong Mee	bf18f163e8	NFSD: Using exp_get for export getting Don't using cache_get besides export.h, using exp_get for export. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
Kinglong Mee	0da22a919d	NFSD: Using path_get when assigning path for export Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
Kinglong Mee	f15a5cf912	SUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
Kinglong Mee	3c7aa15d20	NFSD: Using min/max/min_t/max_t for calculate Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
Kinglong Mee	f41c5ad2ff	NFSD: fix bug for readdir of pseudofs Commit `561f0ed498` (nfsd4: allow large readdirs) introduces a bug about readdir the root of pseudofs. Call xdr_truncate_encode() revert encoded name when skipping. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-17 16:42:48 -04:00
NeilBrown	6282cd5655	NFSD: Don't hand out delegations for 30 seconds after recalling them. If nfsd needs to recall a delegation for some reason it implies that there is contention on the file, so further delegations should not be handed out. The current code fails to do so, and the result is effectively a live-lock under some workloads: a client attempting a conflicting operation on a read-delegated file receives NFS4ERR_DELAY and retries the operation, but by the time it retries the server may already have given out another delegation. We could simply avoid delegations for (say) 30 seconds after any recall, but this is probably too heavy handed. We could keep a list of inodes (or inode numbers or filehandles) for recalled delegations, but that requires memory allocation and searching. The approach taken here is to use a bloom filter to record the filehandles which are currently blocked from delegation, and to accept the cost of a few false positives. We have 2 bloom filters, each of which is valid for 30 seconds. When a delegation is recalled the filehandle is added to one filter and will remain disabled for between 30 and 60 seconds. We keep a count of the number of filehandles that have been added, so when that count is zero we can bypass all other tests. The bloom filters have 256 bits and 3 hash functions. This should allow a couple of dozen blocked filehandles with minimal false positives. If many more filehandles are all blocked at once, behaviour will degrade towards rejecting all delegations for between 30 and 60 seconds, then resetting and allowing new delegations. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-17 16:42:47 -04:00
J. Bruce Fields	48385408b4	nfsd4: fix FREE_STATEID lockowner leak `27b11428b7` ("nfsd4: remove lockowner when removing lock stateid") introduced a memory leak. Cc: stable@vger.kernel.org Reported-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-09 17:13:54 -04:00
Jeff Layton	1b19453d1c	nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry Currently, the DRC cache pruner will stop scanning the list when it hits an entry that is RC_INPROG. It's possible however for a call to take a very long time. In that case, we don't want it to block other entries from being pruned if they are expired or we need to trim the cache to get back under the limit. Fix the DRC cache pruner to just ignore RC_INPROG entries. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-06 19:22:49 -04:00
J. Bruce Fields	542d1ab3c7	nfsd4: kill READ64 Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-06 19:22:48 -04:00
J. Bruce Fields	06553991e7	nfsd4: kill READ32 While we're here, let's kill off a couple of the read-side macros. Leaving the more complicated ones alone for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-06 19:22:47 -04:00
J. Bruce Fields	05638dc73a	nfsd4: simplify server xdr->next_page use The rpc code makes available to the NFS server an array of pages to encod into. The server represents its reply as an xdr buf, with the head pointing into the first page in that array, the pages ** array starting just after that, and the tail (if any) sharing any leftover space in the page used by the head. While encoding, we use xdr_stream->page_ptr to keep track of which page we're currently using. Currently we set xdr_stream->page_ptr to buf->pages, which makes the head a weird exception to the rule that page_ptr always points to the page we're currently encoding into. So, instead set it to buf->pages - 1 (the page actually containing the head), and remove the need for a little unintuitive logic in xdr_get_next_encode_buffer() and xdr_truncate_encode. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-06 19:22:46 -04:00
Benny Halevy	3fb87d13ce	nfsd4: hash deleg stateid only on successful nfs4_set_delegation We don't want the stateid to be found in the hash table before the delegation is granted. Currently this is protected by the client_mutex, but we want to break that up and this is a necessary step toward that goal. Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-04 15:42:04 -04:00
Benny Halevy	cdc9750500	nfsd4: rename recall_lock to state_lock ...as the name is a bit more descriptive and we've started using it for other purposes. Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-04 15:42:04 -04:00
Jeff Layton	7025005d5e	nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound The memset of resp in svc_process_common should ensure that these are already zeroed by the time they get here. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-04 15:42:03 -04:00
Jeff Layton	ba5378b66f	nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open In the NFS4_OPEN_CLAIM_PREVIOUS case, we should only mark it confirmed if the nfs4_check_open_reclaim check succeeds. In the NFS4_OPEN_CLAIM_DELEG_PREV_FH and NFS4_OPEN_CLAIM_DELEGATE_PREV cases, I see no point in declaring the openowner confirmed when the operation is going to fail anyway, and doing so might allow the client to game things such that it wouldn't need to confirm a subsequent open with the same owner. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-04 15:42:02 -04:00
Benny Halevy	931ee56c67	nfsd4: use recall_lock for delegation hashing This fixes a bug in the handling of the fi_delegations list. nfs4_setlease does not hold the recall_lock when adding to it. The client_mutex is held, which prevents against concurrent list changes, but nfsd_break_deleg_cb does not hold while walking it. New delegations could theoretically creep onto the list while we're walking it there. Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-04 15:41:52 -04:00
Jeff Layton	a832e7ae8b	nfsd: fix laundromat next-run-time calculation The laundromat uses two variables to calculate when it should next run, but one is completely ignored at the end of the run. Merge the two and rename the variable to be more descriptive of what it does. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 20:25:28 -04:00
Jeff Layton	da2ebce6a0	nfsd: make nfsd4_encode_fattr static sparse says: CHECK fs/nfsd/nfs4xdr.c fs/nfsd/nfs4xdr.c:2043:1: warning: symbol 'nfsd4_encode_fattr' was not declared. Should it be static? Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 20:25:28 -04:00
Kinglong Mee	a48fd0f9f7	SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING When debugging, rpc prints messages from dprintk(KERN_WARNING ...) with "^A4" prefixed, [ 2780.339988] ^A4nfsd: connect from unprivileged port: 127.0.0.1, port=35316 Trond tells, > dprintk != printk. We have NEVER supported dprintk(KERN_WARNING...) This patch removes using of dprintk with KERN_WARNING. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 20:25:28 -04:00
Christoph Hellwig	b52bd7bccc	nfsd: remove unused function nfsd_read_file Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:27 -04:00
Christoph Hellwig	12337901d6	nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer Note nobody's ever noticed because the typical client probably never requests FILES_AVAIL without also requesting something else on the list. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:26 -04:00
Kinglong Mee	be69da8052	NFSD: Error out when getting more than one fsloc/secinfo/uuid Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:25 -04:00
Kinglong Mee	1f53146da9	NFSD: Using type of uint32_t for ex_nflavors instead of int ex_nflavors can't be negative number, just defined by uint32_t. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:24 -04:00
Kinglong Mee	f0db79d54b	NFSD: Add missing comment of "expiry" in expkey_parse() Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:24 -04:00
Kinglong Mee	e6d615f742	NFSD: Remove typedef of svc_client and svc_export in export.c No need for a typedef wrapper for svc_export or svc_client, remove them. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:23 -04:00
Kinglong Mee	a30ae94c07	NFSD: Cleanup unneeded including net/ipv6.h Commit `49b28684fd` ("nfsd: Remove deprecated nfsctl system call and related code") removed the only use of ipv6_addr_set_v4mapped(), so net/ipv6.h is unneeded now. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:22 -04:00
Kinglong Mee	61a27f08a6	NFSD: Cleanup unused variable in nfsd_setuser() Commit `8f6c5ffc89` ("kernel/groups.c: remove return value of set_groups") removed the last use of "ret". Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:21 -04:00
Kinglong Mee	0faed901c6	NFSD: remove unneeded linux/user_namespace.h include After commit `4c1e1b34d5` ("nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids") using kuid/kgid for ex_anon_uid/ex_anon_gid, user_namespace.h is not needed. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:20 -04:00
Kinglong Mee	94eb36892d	NFSD: Adds macro EX_UUID_LEN for exports uuid's length Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:19 -04:00
Kinglong Mee	0d63790c36	NFSD: Helper function for parsing uuid Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:19 -04:00
Kinglong Mee	a1f05514b0	NFS4: Avoid NULL reference or double free in nfsd4_fslocs_free() If fsloc_parse() failed at kzalloc(), fs/nfsd/export.c 411 412 fsloc->locations = kzalloc(fsloc->locations_count 413 * sizeof(struct nfsd4_fs_location), GFP_KERNEL); 414 if (!fsloc->locations) 415 return -ENOMEM; svc_export_parse() will call nfsd4_fslocs_free() with fsloc->locations = NULL, so that, "kfree(fsloc->locations[i].path);" will cause a crash. If fsloc_parse() failed after that, fsloc_parse() will call nfsd4_fslocs_free(), and svc_export_parse() will call it again, so that, a double free is caused. This patch checks the fsloc->locations, and set to NULL after it be freed. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:18 -04:00
J. Bruce Fields	a5cddc885b	nfsd4: better reservation of head space for krb5 RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it once in the auth code, where this kind of estimate should be made. And while we're at it we can leave it zero when we're not using krb5i or krb5p. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:17 -04:00
J. Bruce Fields	d05d5744ef	nfsd4: kill write32, write64 And switch a couple other functions from the encode(&p,...) convention to the p = encode(p,...) convention mostly used elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:16 -04:00
J. Bruce Fields	0c0c267ba9	nfsd4: kill WRITEMEM Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:15 -04:00
J. Bruce Fields	b64c7f3bdf	nfsd4: kill WRITE64 Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:14 -04:00
J. Bruce Fields	c373b0a428	nfsd4: kill WRITE32 These macros just obscure what's going on. Adopt the convention of the client-side code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:13 -04:00
J. Bruce Fields	c8f13d9775	nfsd4: really fix nfs4err_resource in 4.1 case encode_getattr, for example, can return nfserr_resource to indicate it ran out of buffer space. That's not a legal error in the 4.1 case. And in the 4.1 case, if we ran out of buffer space, we should have exceeded a session limit too. (Note in `1bc49d83c3` "nfsd4: fix nfs4err_resource in 4.1 case" we originally tried fixing this error return before fixing the problem that we could error out while we still had lots of available space. The result was to trade one illegal error for another in those cases. We decided that was helpful, so reverted the change in `fc208d026b`, and are only reinstating it now that we've elimited almost all of those cases.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:13 -04:00
J. Bruce Fields	b042098063	nfsd4: allow exotic read compounds I'm not sure why a client would want to stuff multiple reads in a single compound rpc, but it's legal for them to do it, and we should really support it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:12 -04:00
J. Bruce Fields	fec25fa4ad	nfsd4: more read encoding cleanup More cleanup, no change in functionality. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:11 -04:00
J. Bruce Fields	34a78b488f	nfsd4: read encoding cleanup Trivial cleanup, no change in functionality. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:10 -04:00
J. Bruce Fields	dc97618ddd	nfsd4: separate splice and readv cases The splice and readv cases are actually quite different--for example the former case ignores the array of vectors we build up for the latter. It is probably clearer to separate the two cases entirely. There's some code duplication between the split out encoders, but this is only temporary and will be fixed by a later patch. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:09 -04:00
J. Bruce Fields	02fe470774	nfsd4: nfsd_vfs_read doesn't use file handle parameter Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:09 -04:00
J. Bruce Fields	b0e35fda82	nfsd4: turn off zero-copy-read in exotic cases We currently allow only one read per compound, with operations before and after whose responses will require no more than about a page to encode. While we don't expect clients to violate those limits any time soon, this limitation isn't really condoned by the spec, so to future proof the server we should lift the limitation. At the same time we'd like to continue to support zero-copy reads. Supporting multiple zero-copy-reads per compound would require a new data structure to replace struct xdr_buf, which can represent only one set of included pages. So for now we plan to modify encode_read() to support either zero-copy or non-zero-copy reads, and use some heuristics at the start of the compound processing to decide whether a zero-copy read will work. This will allow us to support more exotic compounds without introducing a performance regression in the normal case. Later patches handle those "exotic compounds", this one just makes sure zero-copy is turned off in those cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:08 -04:00
J. Bruce Fields	ccae70a9ee	nfsd4: estimate sequence response size Otherwise a following patch would turn off all 4.1 zero-copy reads. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:07 -04:00
J. Bruce Fields	b86cef60da	nfsd4: better estimate of getattr response size We plan to use this estimate to decide whether or not to allow zero-copy reads. Currently we're assuming all getattr's are a page, which can be both too small (ACLs e.g. may be arbitrarily long) and too large (after an upcoming read patch this will unnecessarily prevent zero copy reads in any read compound also containing a getattr). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:06 -04:00
J. Bruce Fields	476a7b1f4b	nfsd4: don't treat readlink like a zero-copy operation There's no advantage to this zero-copy-style readlink encoding, and it unnecessarily limits the kinds of compounds we can handle. (In practice I can't see why a client would want e.g. multiple readlink calls in a comound, but it's probably a spec violation for us not to handle it.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:05 -04:00
J. Bruce Fields	3b29970909	nfsd4: enforce rd_dircount As long as we're here, let's enforce the protocol's limit on the number of directory entries to return in a readdir. I don't think anyone's ever noticed our lack of enforcement, but maybe there's more of a chance they will now that we allow larger readdirs. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:04 -04:00
J. Bruce Fields	561f0ed498	nfsd4: allow large readdirs Currently we limit readdir results to a single page. This can result in a performance regression compared to NFSv3 when reading large directories. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:03 -04:00
J. Bruce Fields	32aaa62ede	nfsd4: use session limits to release send buffer reservation Once we know the limits the session places on the size of the rpc, we can also use that information to release any unnecessary reserved reply buffer space. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:02 -04:00
J. Bruce Fields	47ee529864	nfsd4: adjust buflen to session channel limit We can simplify session limit enforcement by restricting the xdr buflen to the session size. Also fix a preexisting bug: we should really have been taking into account the auth-required space when comparing against session limits, which are limits on the size of the entire rpc reply, including any krb5 overhead. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:02 -04:00
J. Bruce Fields	30596768b3	nfsd4: fix buflen calculation after read encoding We don't necessarily want to assume that the buflen is the same as the number of bytes available in the pages. We may have some reason to set it to something less (for example, later patches will use a smaller buflen to enforce session limits). So, calculate the buflen relative to the previous buflen instead of recalculating it from scratch. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:00 -04:00
J. Bruce Fields	89ff884ebb	nfsd4: nfsd4_check_resp_size should check against whole buffer Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:59 -04:00
J. Bruce Fields	6ff9897d2b	nfsd4: minor encode_read cleanup Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:58 -04:00
J. Bruce Fields	4f0cefbf38	nfsd4: more precise nfsd4_max_reply It will turn out to be useful to have a more accurate estimate of reply size; so, piggyback on the existing op reply-size estimators. Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct nfsd4_operation and friends. (Thanks to Christoph Hellwig for pointing out that simplification.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:57 -04:00
J. Bruce Fields	8c7424cff6	nfsd4: don't try to encode conflicting owner if low on space I ran into this corner case in testing: in theory clients can provide state owners up to 1024 bytes long. In the sessions case there might be a risk of this pushing us over the DRC slot size. The conflicting owner isn't really that important, so let's humor a client that provides a small maxresponsize_cached by allowing ourselves to return without the conflicting owner instead of outright failing the operation. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:55 -04:00
J. Bruce Fields	f5236013a2	nfsd4: convert 4.1 replay encoding Limits on maxresp_sz mean that we only ever need to replay rpc's that are contained entirely in the head. The one exception is very small zero-copy reads. That's an odd corner case as clients wouldn't normally ask those to be cached. in any case, this seems a little more robust. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:55 -04:00
J. Bruce Fields	2825a7f907	nfsd4: allow encoding across page boundaries After this we can handle for example getattr of very large ACLs. Read, readdir, readlink are still special cases with their own limits. Also we can't handle a new operation starting close to the end of a page. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:54 -04:00
J. Bruce Fields	a8095f7e80	nfsd4: size-checking cleanup Better variable name, some comments, etc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:53 -04:00
J. Bruce Fields	ea8d7720b2	nfsd4: remove redundant encode buffer size checking Now that all op encoders can handle running out of space, we no longer need to check the remaining size for every operation; only nonidempotent operations need that check, and that can be done by nfsd4_check_resp_size. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:52 -04:00
J. Bruce Fields	67492c9903	nfsd4: nfsd4_check_resp_size needn't recalculate length We're keeping the length updated as we go now, so there's no need for the extra calculation here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:51 -04:00
J. Bruce Fields	4e21ac4b6f	nfsd4: reserve space before inlining 0-copy pages Once we've included page-cache pages in the encoding it's difficult to remove them and restart encoding. (xdr_truncate_encode doesn't handle that case.) So, make sure we'll have adequate space to finish the operation first. For now COMPOUND_SLACK_SPACE checks should prevent this case happening, but we want to remove those checks. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:50 -04:00
J. Bruce Fields	d0a381dd0e	nfsd4: teach encoders to handle reserve_space failures We've tried to prevent running out of space with COMPOUND_SLACK_SPACE and special checking in those operations (getattr) whose result can vary enormously. However: - COMPOUND_SLACK_SPACE may be difficult to maintain as we add more protocol. - BUG_ON or page faulting on failure seems overly fragile. - Especially in the 4.1 case, we prefer not to fail compounds just because the returned result came close to session limits. (Though perfect enforcement here may be difficult.) - I'd prefer encoding to be uniform for all encoders instead of having special exceptions for encoders containing, for example, attributes. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:49 -04:00
J. Bruce Fields	082d4bd72a	nfsd4: "backfill" using write_bytes_to_xdr_buf Normally xdr encoding proceeds in a single pass from start of a buffer to end, but sometimes we have to write a few bytes to an earlier position. Use write_bytes_to_xdr_buf for these cases rather than saving a pointer to write to. We plan to rewrite xdr_reserve_space to handle encoding across page boundaries using a scratch buffer, and don't want to risk writing to a pointer that was contained in a scratch buffer. Also it will no longer be safe to calculate lengths by subtracting two pointers, so use xdr_buf offsets instead. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:49 -04:00

... 2 3 4 5 6 ...

2195 Commits (160a117f0864871ae1bab26554a985a1d2861afd)