Commit graph

4770 commits

Author SHA1 Message Date
Trond Myklebust cb06793517 pNFS/flexfiles: Fix ff_layout_add_ds_error_locked()
When we're merging an old entry into our new entry, we want to ensure that
we add the list entry in the correct place.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-07 13:41:58 -05:00
NeilBrown 7a0566b38c NFSv4: Add missing nfs_put_lock_context()
Otherwise the lock context won't be freed when we're done with it.

From: NeilBrown <neilb@suse.com>
Fixes: 5bd3f817 ("NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_owner")
Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-07 13:41:58 -05:00
Trond Myklebust 362fb578a5 pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid
Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the
layout stateid, so that processes and RPC tasks that are waiting on
the layout return can continue.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-05 22:52:01 -05:00
Trond Myklebust d94cbf6c73 NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery()
If the session has an error, then we want to start by recovering the
session, as any SEQUENCE we send is going to fail with a session
error.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04 19:34:38 -05:00
Trond Myklebust 2cf10cdd48 NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE
In the case where SEQUENCE receives a NFS4ERR_BADSESSION or
NFS4ERR_DEADSESSION error, we just want to report the session as needing
recovery, and then we want to retry the operation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04 19:26:40 -05:00
Trond Myklebust 1cd9cb05f9 NFS: Only look at the change attribute cache state in nfs_check_verifier
When looking at whether or not our dcache is valid, we really don't care
about the general state of the directory attribute cache. Instead, we
we only care about the state of the change attribute.

This fixes a performance issue when the client is responsible for
changing the directory contents; a number of NFSv4 operations will
atomically update the directory change attribute, but may not return
all the other attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04 18:34:34 -05:00
Trond Myklebust 9310b224f2 NFS: Fix incorrect size revalidation when holding a delegation
We should only care about checking the attributes if the page cache
is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the
NFS_INO_REVAL_FORCED flag is set.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04 18:08:40 -05:00
Trond Myklebust 10727772b9 NFS: Fix incorrect mapping revalidation when holding a delegation
We should only care about checking the attributes if the page cache
is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the
NFS_INO_REVAL_FORCED flag is set.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-04 16:50:09 -05:00
Trond Myklebust 230bc962a6 pNFS/flexfiles: Support sending layoutstats in layoutreturn
Add the ability to send an array of layoutstats entries as part of
layoutreturn.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03 15:37:46 -05:00
Trond Myklebust 422c93c881 pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturn
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03 15:37:45 -05:00
Trond Myklebust 2f8220c16e NFS: Fix up read of mirror stats
Need to lock while reading in order to ensure 64-bit reads are correct.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03 15:37:44 -05:00
Trond Myklebust 08e2e5bc6c pNFS/flexfiles: Clean up layoutstats
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03 15:37:44 -05:00
Trond Myklebust 5b9b3c855a pNFS/flexfiles: Refactor encoding of the layoutreturn payload
Add the layout error payload to the flexfiles layoutreturn private
data, and set up the encoding mechanisms. This is a refactoring in
preparation for adding the layout iostats payload.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03 15:37:43 -05:00
Trond Myklebust 287bd3e954 pNFS: Add a layoutreturn callback to performa layout-private setup
Add a callback to allow the flexfiles layout driver to initialise the
layout private payload.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03 13:12:16 -05:00
Trond Myklebust 4d796d751c pNFS: Allow layout drivers to manage private data in struct nfs4_layoutreturn
Cleanup to allow layout drivers to attach private data to layoutreturn,
and manage the data.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02 23:37:45 -05:00
Trond Myklebust 06946c6a3d pNFS/flexfiles: Only send layoutstats updates for mirrors that were updated
If there have been no reads or writes to a given mirror since the last
layoutstats update, then don't resend the same data.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02 11:42:58 -05:00
Trond Myklebust 46c98c6d1b pNFS/flexfiles: Don't attempt to send layoutstats if there are no entries
If the list of mirrors is empty, then don't send an RPC call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02 11:42:58 -05:00
Trond Myklebust 1bcf4c5c59 NFS: Allow getattr to also report readdirplus cache hits
If the use called stat() on an 'ls -l' workload, and the attribute
cache was successfully revalidate by READDIRPLUS, then we want to
report that back so that the readdir code continues to use
readdirplus.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02 11:42:51 -05:00
Trond Myklebust 63519fbc67 NFS: Be more targeted about readdirplus use when doing lookup/revalidation
There is little point in setting NFS_INO_ADVISE_RDPLUS in nfs_lookup and
nfs_lookup_revalidate() unless a process is actually doing readdir on the
parent directory.
Furthermore, there is little point in using readdirplus if we're trying
to revalidate a negative dentry.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02 11:42:51 -05:00
Trond Myklebust 79f687a3de NFS: Fix a performance regression in readdir
Ben Coddington reports that commit 311324ad17, by adding the function
nfs_dir_mapping_need_revalidate() that checks page cache validity on
each call to nfs_readdir() causes a performance regression when
the directory is being modified.

If the directory is changing while we're iterating through the directory,
POSIX does not require us to invalidate the page cache unless the user
calls rewinddir(). However, we still do want to ensure that we use
readdirplus in order to avoid a load of stat() calls when the user
is doing an 'ls -l' workload.

The fix should be to invalidate the page cache immediately when we're
setting the NFS_INO_ADVISE_RDPLUS bit.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 311324ad17 ("NFS: Be more aggressive in using readdirplus...")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02 11:42:50 -05:00
Wei Yongjun f36ab161be NFS: fix typo in parameter description
Fix typo in parameter description.

Fixes: 5405fc44c3 ("NFSv4.x: Add kernel parameter to control the
callback server")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 18:00:11 -05:00
NeilBrown d51fdb87a6 NFS: discard nfs_lockowner structure.
It now has only one field and is only used in one structure.
So replaced it in that structure by the field it contains.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:58:13 -05:00
NeilBrown 8d42443166 NFSv4: enhance nfs4_copy_lock_stateid to use a flock stateid if there is one
A process can have two possible lock owner for a given open file:
a per-process Posix lock owner and a per-open-file flock owner
Use both of these when searching for a suitable stateid to use.

With this patch, READ/WRITE requests will use the correct stateid
if a flock lock is active.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:58:05 -05:00
NeilBrown 1739347549 NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_owner
The only time that a lock_context is not immediately available is in
setattr, and now that it has an open_context, it can easily find one
with nfs_get_lock_context.
This removes the need for the on-stack nfs_lockowner.

This change is preparation for correctly support flock stateids.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:57:56 -05:00
NeilBrown 29b59f9416 NFSv4: change nfs4_do_setattr to take an open_context instead of a nfs4_state.
The open_context can always lead directly to the state, and is always easily
available, so this is a straightforward change.
Doing this makes more information available to _nfs4_do_setattr() for use
in the next patch.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:57:45 -05:00
NeilBrown 532d4def2f NFSv4: add flock_owner to open context
An open file description (struct file) in a given process can be
associated with two different lock owners.

It can have a Posix lock owner which will be different in each process
that has a fd on the file.
It can have a Flock owner which will be the same in all processes.

When searching for a lock stateid to use, we need to consider both of these
owners

So add a new "flock_owner" to the "nfs_open_context" (of which there
is one for each open file description).

This flock_owner does not need to be reference-counted as there is a
1-1 relation between 'struct file' and nfs open contexts,
and it will never be part of a list of contexts.  So there is no need
for a 'flock_context' - just the owner is enough.

The io_count included in the (Posix) lock_context provides no
guarantee that all read-aheads that could use the state have
completed, so not supporting it for flock locks in not a serious
problem.  Synchronization between flock and read-ahead can be added
later if needed.

When creating an open_context for a non-openning create call, we don't have
a 'struct file' to pass in, so the lock context gets initialized with
a NULL owner, but this will never be used.

The flock_owner is not used at all in this patch, that will come later.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:57:27 -05:00
NeilBrown b184b5c38e NFS: remove l_pid field from nfs_lockowner
this field is not used in any important way and probably should
have been removed by

Commit: 8003d3c4aa ("nfs4: treat lock owners as opaque values")

which removed the pid argument from nfs4_get_lock_state.

Except in unusual and uninteresting cases, two threads with the same
->tgid will have the same ->files pointer, so keeping them both
for comparison brings no benefit.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:57:07 -05:00
Anna Schumaker 4d3b55d3c7 NFS: Remove unused argument from nfs_direct_write_complete()
This parameter hasn't been used since 2a009ec9 (Linux 3.13-rc3), so
let's remove it from this function.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:50:37 -05:00
Anna Schumaker 7d38de3ffa NFS: Remove unused authflavour parameter from nfs_get_client()
This parameter hasn't been used since f8407299 (Linux 3.11-rc2), so
let's remove it from this function and callers.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:46:32 -05:00
J. Bruce Fields ced85a7568 nfs: fix false positives in nfs40_walk_client_list()
It's possible that two different servers can return the same (clientid,
verifier) pair purely by coincidence.  Both are 64-bit values, but
depending on the server implementation, they can be highly predictable
and collisions may be quite likely, especially when there are lots of
servers.

So, check for this case.  If the clientid and verifier both match, then
we actually know they *can't* be the same server, since a new
SETCLIENTID to an already-known server should have changed the verifier.

This helps fix a bug that could cause the client to mount a filesystem
from the wrong server.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Yongcheng Yang <yoyang@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:43:31 -05:00
Trond Myklebust b85f562049 pNFS: Skip invalid stateids when doing a bulk destroy
If the layout stateid is already invalid, we have no work to do.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:51 -05:00
Trond Myklebust 29ade5db12 pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:50 -05:00
Trond Myklebust abb3e1c877 pNFS: Don't mark the layout as freed if the last lseg is marked for return
Address another memory leak.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:50 -05:00
Trond Myklebust 4aab97327f pNFS: Sync the layout state bits in pnfs_cache_lseg_for_layoutreturn
Ensure that the layout state bits are synced when we cache a layout
segment for layoutreturn using an appropriate call to
pnfs_set_plh_return_info.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:49 -05:00
Trond Myklebust 24408f5282 pNFS: Fix bugs in _pnfs_return_layout
We need to honour the NFS_LAYOUT_RETURN_REQUESTED bit regardless of
whether or not there are layout segments pending.
Furthermore, we should ensure that we leave the plh_return_segs list
empty.

This patch fixes a memory leak of the layout segments on plh_return_segs.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:49 -05:00
Trond Myklebust fe1cf9469d pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalid
When the layout state is invalidated, then so is the layout segment
state, and hence we do need to clean up the state bits.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:48 -05:00
Trond Myklebust 53e6fc86ab pNFS: Prevent unnecessary layoutreturns after delegreturn
If we cannot grab the inode or superblock, then we cannot pin the
layout header, and so we cannot send a layoutreturn as part of an
async delegreturn call. In this case, we currently end up sending
an extra layoutreturn after the delegreturn. Since the layout was
implicitly returned by the delegreturn, that just gets a BAD_STATEID.

The fix is to simply complete the return-on-close immediately.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:48 -05:00
Trond Myklebust 1c5bd76d17 pNFS: Enable layoutreturn operation for return-on-close
Amend the pnfs return on close helper functions to enable sending the
layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between
CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the
client and the server to agree on whether or not there is an outstanding
layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:47 -05:00
Trond Myklebust 828ed9ec1b pNFS: Clean up - add a helper to initialise struct layoutreturn_args
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:47 -05:00
Trond Myklebust 586f1c39da NFSv4: Add encode/decode of the layoutreturn op in DELEGRETURN
Add XDR encoding for the layoutreturn op, and storage for the layoutreturn
arguments to the DELEGRETURN compound.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:46 -05:00
Trond Myklebust cf80516579 NFSv4: Add encode/decode of the layoutreturn op in CLOSE
Add XDR encoding for the layoutreturn op, and storage for the layoutreturn
arguments to the CLOSE compound.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:46 -05:00
Trond Myklebust d8434d4c54 NFSv4: Fix missing operation accounting in NFS4_dec_delegreturn_sz
We need to account for the reply to the PUTFH operation in the
DELEGRETURN compound.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:45 -05:00
Trond Myklebust 69820d22c5 pNFS: Don't mark layout segments invalid on layoutreturn in pnfs_roc
The layoutreturn call will take care of invalidating the layout segments
once the call is successful.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:45 -05:00
Trond Myklebust 94e5c571fc pNFS: Get rid of unnecessary layout parameter in encode_layoutreturn callback
The parameter is already present in the "args" structure.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:44 -05:00
Trond Myklebust 0cdc329ec9 pNFS: Skip checking for return-on-close if the layout is invalid
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:44 -05:00
Trond Myklebust e685d237e6 pNFS: Remove spurious wake up in pnfs_layout_remove_lseg()
There is no change to the value of NFS_LAYOUT_RETURN, so we should
not be waking up the RPC call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:43 -05:00
Trond Myklebust 2a974425e5 NFSv4: Ignore LAYOUTRETURN result if the layout doesn't match or is invalid
Fix a potential race with CB_LAYOUTRECALL in which the server recalls the
remaining layout segments while our LAYOUTRETURN is still in transit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:43 -05:00
Trond Myklebust 68f744797e pNFS: Do not free layout segments that are marked for return
We may want to process and transmit layout stat information for the
layout segments that are being returned, so we should defer freeing
them until after the layoutreturn has completed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:42 -05:00
Trond Myklebust 7b410d9ce4 pNFS: Delay getting the layout header in CB_LAYOUTRECALL handlers
Instead of grabbing the layout, we want to get the inode so that we
can reduce races between layoutget and layoutrecall when the server
does not support call referring.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:42 -05:00
Trond Myklebust 17822b207f pNFS: consolidate the different range intersection tests
Both pnfs.c and the flexfiles code have their own versions of the
range intersection testing, and the "end_offset" helper.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:41 -05:00
Trond Myklebust ee284e35d8 pNFS: Fix race in pnfs_wait_on_layoutreturn
We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.

Fixes: 500d701f33 ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:41 -05:00
Trond Myklebust 6604b203fb pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completed
If there is an I/O error, we should not call LAYOUTGET until the
LAYOUTRETURN that reports the error is complete.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
2016-12-01 17:21:40 -05:00
Trond Myklebust 9888d837f3 pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache
If the server sends us a completely new stateid, and the client thinks
it already holds a layout, then force a retry of the LAYOUTGET after
invalidating the existing layout in order to avoid corruption due to
races.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:40 -05:00
Trond Myklebust ae5a459d5f pNFS: Clear NFS_LAYOUT_RETURN_REQUESTED when invalidating the layout stateid
We must ensure that we don't schedule a layoutreturn if the layout stateid
has been marked as invalid.

Fixes: 2a59a04116 ("pNFS: Fix pnfs_set_layout_stateid() to clear...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
2016-12-01 17:21:39 -05:00
Trond Myklebust 7b650994ab pNFS: Don't clear the layout stateid if a layout return is outstanding
If we no longer hold any layout segments, we're normally expected to
consider the layout stateid to be invalid. However we cannot assume this
if we're about to, or in the process of sending a layoutreturn.

Fixes: 334a8f3711 ("pNFS: Don't forget the layout stateid if...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
2016-12-01 17:21:39 -05:00
Trond Myklebust 54e4a0dfa2 pNFS: Fix a deadlock between read resends and layoutreturn
We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor
while holding a reference to a layout segment, as that can deadlock
pnfs_update_layout().

Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
2016-12-01 17:21:38 -05:00
Fred Isaman 9a837856cf NFSv4.1: Fix regression in callback retry handling
When initializing a freshly created slot for the calllback channel,
the seq_nr needs to be 0, not 1.  Otherwise validate_seqid
and nfs4_slot_wait_on_seqid get confused and believe that the
mpty slot corresponds to a previously sent reply.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:38 -05:00
Trond Myklebust 1ad13dbc85 NFSv4: Optimise away forced revalidation when we know the attributes are OK
The NFS_INO_REVAL_FORCED flag needs to be set if we just got a delegation,
and we see that there might still be some ambiguity as to whether or not
our attribute or data cache are valid.
In practice, this means that a call to nfs_check_inode_attributes() will
have noticed a discrepancy between cached attributes and measured ones,
so let's move the setting of NFS_INO_REVAL_FORCED to there.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:37 -05:00
Trond Myklebust 3ecefc9295 NFSv4: Don't request close-to-open attribute when holding a delegation
If holding a delegation, we do not need to ask the server to return
close-to-open cache consistency attributes as part of the CLOSE
compound.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:37 -05:00
Trond Myklebust 3947b74d0f NFSv4: Don't request a GETATTR on open_downgrade.
If we're not closing the file completely, there is no need to request
close-to-open attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:36 -05:00
Trond Myklebust 1cc1baf14b NFSv4: Don't ask for the change attribute when reclaiming state
We don't need to ask for the change attribute when returning a delegation
or recovering from a server reboot, and it could actually cause us to
obtain an incorrect value if we're using a pNFS flavour that requires
LAYOUTCOMMIT.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:36 -05:00
Trond Myklebust 536585ccf9 NFSv4: Don't check file access when reclaiming state
If we're reclaiming state after a reboot, or as part of returning a
delegation, we don't need to check access modes again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:35 -05:00
David S. Miller 0b42f25d2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26 23:42:21 -05:00
Linus Torvalds 10b9dd5686 NFS client bugfixes for Linux 4.9 part 4
Stable Bugfixes:
 - Hide array-bounds warning
 
 Bugfixes:
 - Keep a reference on lock states while checking
 - Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state
 - Don't call close if the open stateid has already been cleared
 - Fix CLOSE rases with OPEN
 - Fix a regression in DELEGRETURN
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYNhGKAAoJENfLVL+wpUDrGgEP/0okAGQfb7yHVNYjDpMmVh7u
 6T1Vh+xbIMsGmuLXPOJH3FRFDnPWCrZO77K+l1y5oMl1fW/hA5h07yt0g0wT94+u
 if1wunZ6bak6KFeevo4xphpqXCjLhwpe801SbBcJPY6D6YxMckobHR8NcuzTjFab
 Kc9OAjnpIzS2lJBThaeyavGGnrlhNvH+Le+zEgMv/bSBTiPSymLlpj12a88cuHRF
 hx2vBao3UuR1vaTaZ5Zdp954DtNXNo7Pikye11cvVJVhesNwpZe37SszcRZ1U6P4
 o4LnYf/ImkjDrcRyvFRxc6bu/Q1jLBuAYZjB4oMcx7YQW8rJqcS/UkEpGzOfER3i
 3NQXFqacIAGhULfJxF8W0vPGzKM74koa0HRRI34C10qZAPe06Iy8slkdIjM4t2IX
 ASJI+uyrbIqTQ/x3FObWlqvw4TCOntYFpOsHF6G8M0uj+tX+3iXjpmwDGsJDVyFE
 y+egnnVn9LmGGfg1SBU2VBKL2945e/VAWfHtDGmJYgEwNDiqtutoIMDn+szESX60
 yGLPJdIL3O7pTWmDXdSSpUJZ+wqa90rrU34kGmk3njydaNHeA1SEhcNTi2Ha5ALb
 NcVD0omnhrZUFE5MRY0OtmHRwhsaa9CYlMyqzb5SEeb46Z3KUm1KX9qEy4I4rZHG
 C4MlTY5AScHqqNXmT8Pu
 =YhQv
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Most of these fix regressions or races, but there is one patch for
  stable that Arnd sent me

  Stable bugfix:
   - Hide array-bounds warning

  Bugfixes:
   - Keep a reference on lock states while checking
   - Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state
   - Don't call close if the open stateid has already been cleared
   - Fix CLOSE rases with OPEN
   - Fix a regression in DELEGRETURN"

* tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4.x: hide array-bounds warning
  NFSv4.1: Keep a reference on lock states while checking
  NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state
  NFSv4: Don't call close if the open stateid has already been cleared
  NFSv4: Fix CLOSE races with OPEN
  NFSv4.1: Fix a regression in DELEGRETURN
2016-11-23 14:43:40 -08:00
Arnd Bergmann d55b352b01 NFSv4.x: hide array-bounds warning
A correct bugfix introduced a harmless warning that shows up with gcc-7:

fs/nfs/callback.c: In function 'nfs_callback_up':
fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]

What happens here is that the 'minorversion == 0' check tells the
compiler that we assume minorversion can be something other than 0,
but when CONFIG_NFS_V4_1 is disabled that would be invalid and
result in an out-of-bounds access.

The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
really can't happen, which makes the code slightly smaller and also
avoids the warning.

The bugfix that introduced the warning is marked for stable backports,
we want this one backported to the same releases.

Fixes: 98b0f80c23 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net")
Cc: stable@vger.kernel.org # v3.7+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-22 16:11:44 -05:00
Benjamin Coddington d75a6a0e39 NFSv4.1: Keep a reference on lock states while checking
While walking the list of lock_states, keep a reference on each
nfs4_lock_state to be checked, otherwise the lock state could be removed
while the check performs TEST_STATEID and possible FREE_STATEID.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-21 11:58:39 -05:00
Benjamin Coddington d41cbfc9a6 NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state
Now that we're doing TEST_STATEID in nfs4_reclaim_open_state(), we can have
a NFS4ERR_OLD_STATEID returned from nfs41_open_expired() .  Instead of
marking state recovery as failed, mark the state for recovery again.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18 14:27:27 -05:00
Trond Myklebust 5cc7861eb5 NFSv4: Don't call close if the open stateid has already been cleared
Ensure we test to see if the open stateid is actually set, before we
send a CLOSE.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18 14:18:02 -05:00
Trond Myklebust 3e7dfb1659 NFSv4: Fix CLOSE races with OPEN
If the reply to a successful CLOSE call races with an OPEN to the same
file, we can end up scribbling over the stateid that represents the
new open state.
The race looks like:

  Client				Server
  ======				======

  CLOSE stateid A on file "foo"
					CLOSE stateid A, return stateid C
  OPEN file "foo"
					OPEN "foo", return stateid B
  Receive reply to OPEN
  Reset open state for "foo"
  Associate stateid B to "foo"

  Receive CLOSE for A
  Reset open state for "foo"
  Replace stateid B with C

The fix is to examine the argument of the CLOSE, and check for a match
with the current stateid "other" field. If the two do not match, then
the above race occurred, and we should just ignore the CLOSE.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18 13:35:58 -05:00
Trond Myklebust 23ea44c215 NFSv4.1: Fix a regression in DELEGRETURN
We don't want to call nfs4_free_revoked_stateid() in the case where
the delegreturn was successful.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-18 13:35:54 -05:00
Alexey Dobriyan c7d03a00b5 netns: make struct pernet_operations::id unsigned int
Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

	void f(long *p, int i)
	{
		g(p[i]);
	}

  roughly translates to

	movsx	rsi, esi
	mov	rdi, [rsi+...]
	call 	g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

	static inline void *net_generic(const struct net *net, int id)
	{
		...
		ptr = ng->ptr[id - 1];
		...
	}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
	function                                     old     new   delta
	nfsd4_lock                                  3886    3959     +73
	tipc_link_build_proto_msg                   1096    1140     +44
	mac80211_hwsim_new_radio                    2776    2808     +32
	tipc_mon_rcv                                1032    1058     +26
	svcauth_gss_legacy_init                     1413    1429     +16
	tipc_bcbase_select_primary                   379     392     +13
	nfsd4_exchange_id                           1247    1260     +13
	nfsd4_setclientid_confirm                    782     793     +11
		...
	put_client_renew_locked                      494     480     -14
	ip_set_sockfn_get                            730     716     -14
	geneve_sock_add                              829     813     -16
	nfsd4_sequence_done                          721     703     -18
	nlmclnt_lookup_host                          708     686     -22
	nfsd4_lockt                                 1085    1063     -22
	nfs_get_client                              1077    1050     -27
	tcf_bpf_init                                1106    1076     -30
	nfsd4_encode_fattr                          5997    5930     -67
	Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 10:59:15 -05:00
Linus Torvalds ef5beed998 NFS client bugfixes for Linux 4.9
Bugfixes:
 - Trim extra slashes in v4 nfs_paths to fix tools that use this
 - Fix a -Wmaybe-uninitialized warnings
 - Fix suspicious RCU usages
 - Fix Oops when mounting multiple servers at once
 - Suppress a false-positive pNFS error
 - Fix a DMAR failure in NFS over RDMA
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYJOCbAAoJENfLVL+wpUDrbO0QAIkcxdUu2iQeOrk07VP48kDE
 UEfJTal8vbW/KtKyL9bIeRa1qCvYpSJXnnKcR/Uo5VHE5nMz/5omoJofWf5Zg0UM
 iEHyZfOsuGFieBbl1NBaLjEd6MCoJYpmWFUj+3drZ8zqSdqDTL+JgrP7k3XEU2Mx
 glKb7U0AKoclm3h1MKyCyo5TgDVeI5TOhi+i3VVw2IN79VY2CUp4lHWMY4vloghp
 h+GuJWeVFS1nBpfCF9PpTU6LdHDfg4o/J5+DrP+IjIffD1XGzGEjfFR0BX5HyDcN
 PgOSF3fc7uVOOUIBEAqHUHY/7XiKlv6TEMRPdM8ALVoCXZ6hPSSFxq8JBJSWoVEp
 r11ts66VgYxdQgHbs51Y5AaKudLBwU60KosWuddbdZVb4YPM0cn5WQzVezrpoQYu
 k4rfrpt+LFv23NGfIJa6JaTSFBzM+YXmggEGUI8TI/YUFSN+wEp4uzLB4r19nqAP
 ff32iunzV9Z5edpPQFDCf3/1HAhzrL5KWo7E8EvijpdQKZl5k5CnUJxbG22lh4ct
 QIyYg51LjhCayzbRH8Mu+TKUFT29ORlcSp851BotLjT8ZdUetWXcFab93nAkQI7g
 sMREml4DvcXWy8qFAOzi8mX1ddTBumxBfOD0m3skPg+odxwsl/KiwjLCRwfTrgwS
 jfSXsXmrwTniPCDWgKg3
 =hFod
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Most of these fix regressions in 4.9, and none are going to stable
  this time around.

  Bugfixes:
   - Trim extra slashes in v4 nfs_paths to fix tools that use this
   - Fix a -Wmaybe-uninitialized warnings
   - Fix suspicious RCU usages
   - Fix Oops when mounting multiple servers at once
   - Suppress a false-positive pNFS error
   - Fix a DMAR failure in NFS over RDMA"

* tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
  fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
  NFS: Don't print a pNFS error if we aren't using pNFS
  NFS: Ignore connections that have cl_rpcclient uninitialized
  SUNRPC: Fix suspicious RCU usage
  NFSv4.1: work around -Wmaybe-uninitialized warning
  NFS: Trim extra slash in v4 nfs_path
2016-11-11 09:15:30 -08:00
Shuah Khan 0ac84b72c0 fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
Fix the following warn:

fs/nfs/nfs4session.c: In function ‘nfs4_slot_seqid_in_use’:
fs/nfs/nfs4session.c:203:54: warning: ‘cur_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  if (nfs4_slot_get_seqid(tbl, slotid, &cur_seq) == 0 &&
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
      cur_seq == seq_nr && test_bit(slotid, tbl->used_slots))
      ~~~~~~~~~~~~~~~~~

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-07 16:11:30 -05:00
Anna Schumaker 192747166a NFS: Don't print a pNFS error if we aren't using pNFS
We used to check for a valid layout type id before verifying pNFS flags
as an indicator for if we are using pNFS.  This changed in 3132e49ece
with the introduction of multiple layout types, since now we are passing
an array of ids instead of just one.  Since then, users have been seeing
a KERN_ERR printk show up whenever mounting NFS v4 without pNFS.  This
patch restores the original behavior of exiting set_pnfs_layoutdriver()
early if we aren't using pNFS.

Fixes 3132e49ece ("pnfs: track multiple layout types in fsinfo
structure")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-07 16:11:30 -05:00
Petr Vandrovec 8ef3295530 NFS: Ignore connections that have cl_rpcclient uninitialized
cl_rpcclient starts as ERR_PTR(-EINVAL), and connections like that
are floating freely through the system.  Most places check whether
pointer is valid before dereferencing it, but newly added code
in nfs_match_client does not.

Which causes crashes when more than one NFS mount point is present.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-07 16:11:29 -05:00
Arnd Bergmann 68a564006a NFSv4.1: work around -Wmaybe-uninitialized warning
A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use
if we enable -Wmaybe-uninitialized again:

fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in this function [-Werror=maybe-uninitialized]

gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair
results in a nonzero return value here. Using PTR_ERR_OR_ZERO()
instead makes this clear to the compiler.

The warning originally did not appear in v4.8 as it was globally
disabled, but the bugfix that introduced the warning got backported
to stable kernels which again enable it, and this is now the only
warning in the v4.7 builds.

Fixes: e09c978aae ("NFSv4.1: Fix Oopsable condition in server callback races")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-10-24 13:54:43 -04:00
Benjamin Coddington 86a6c211d6 NFS: Trim extra slash in v4 nfs_path
A NFSv4 mount of a subdirectory will show an extra slash (as in
'server://path') in proc's mountinfo which will not match the device name
and path.  This can cause problems for programs searching for the mount.
Fix this by checking for a leading slash in the dentry path, if so trim
away any trailing slashes in the device name.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-10-24 12:06:01 -04:00
Linus Torvalds 02593ac680 NFS client bugfixes for Linux 4.9
Stable bugfix:
 - Fix last_write_offset incorrectly set to page boundary
 
 Other bugfix:
 - Fix missing-braces warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYCnklAAoJENfLVL+wpUDrv5MQALGRrZyyvQVCGwHt8BhiZDMp
 5OAB1B7mFF0yf/L7j5rLUEvXs6+YyGVHTRrqWlAm1Mq7aqqGjW3YcE260KOJwse3
 sk0eZ8mj92Bbm19ktRRGJCWeeCi16BsywIJEIbFFLs0ssKltSJMMnhE/8gyZ3Oj1
 /TQ0jFCsAGxErr9GVny9FiVa4mlgkauOEfY/QJsgMHH7FBYftBU7mo7rH43RaxQ7
 XLLv9XTe/WFCedxAa0uY/SikmAplLCpShOHCnCvveOF4WhKdx1gjaCnp6nZSMMP8
 Hyd49AZHfxlQWK3B6amhHtI5iU2/tyNl8aFN49PXUdbN1VqJoSFnMCZgc/BWKIsQ
 NGpUuQSTqz2qnMtHC3sErWfi2/c9kNDn9R3DPkTJPtZKoE0+FHnnxlhTWl9YSvju
 iW4hisaDbldmP2davoMeKKDIrP9g+z0+8akcZx4lSoVEhswVtsDzpFpGETL8bM6Y
 0002b8UU0qj4QVLUoW1HNCad5/H0G3ir0utXr+//OduQb2SMAilQmscltOcFXzfe
 TzR6YD7RP2RZs/t5fqnxlvBB2kYkSa8vWC/dJdVC5MC0nq6L1yO1n6L0p+E7Keck
 9S2fWi89WnGN4guKxtIOo58vbr6wAcA+g6zM35WwLDgxtklQHZCKOTPJEsPbnlSr
 DpeZFTwLeG7/SENBFZhI
 =VadI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Just two bugfixes this time:

  Stable bugfix:
   - Fix last_write_offset incorrectly set to page boundary

  Other bugfix:
   - Fix missing-braces warning"

* tag 'nfs-for-4.9-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  nfs4: fix missing-braces warning
  pnfs/blocklayout: fix last_write_offset incorrectly set to page boundary
2016-10-21 19:06:59 -07:00
Arnd Bergmann 83aa3e0f79 nfs4: fix missing-braces warning
A bugfix introduced a harmless warning for update_open_stateid:

fs/nfs/nfs4proc.c:1548:2: error: missing braces around initializer [-Werror=missing-braces]

Removing the zero in the initializer will do the right thing here
and initialize the entire structure to zero.

Fixes: 1393d9612b ("NFSv4: Fix a race when updating an open_stateid")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-10-19 14:39:15 -04:00
Linus Torvalds c4a86165d1 NFS client updates for Linux 4.9
Highlights include:
 
 Stable bugfixes:
 - sunrpc: fix writ espace race causing stalls
 - NFS: Fix inode corruption in nfs_prime_dcache()
 - NFSv4: Don't report revoked delegations as valid in
   nfs_have_delegation()
 - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is
   invalid
 - NFSv4: Open state recovery must account for file permission changes
 - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
 
 Features:
 - Add support for tracking multiple layout types with an ordered list
 - Add support for using multiple backchannel threads on the client
 - Add support for pNFS file layout session trunking
 - Delay xprtrdma use of DMA API (for device driver removal)
 - Add support for xprtrdma remote invalidation
 - Add support for larger xprtrdma inline thresholds
 - Use a scatter/gather list for sending xprtrdma RPC calls
 - Add support for the CB_NOTIFY_LOCK callback
 - Improve hashing sunrpc auth_creds by using both uid and gid
 
 Bugfixes:
 - Fix xprtrdma use of DMA API
 - Validate filenames before adding to the dcache
 - Fix corruption of xdr->nwords in xdr_copy_to_scratch
 - Fix setting buffer length in xdr_set_next_buffer()
 - Don't deadlock the state manager on the SEQUENCE status flags
 - Various delegation and stateid related fixes
 - Retry operations if an interrupted slot receives EREMOTEIO
 - Make nfs boot time y2038 safe
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX/+ZfAAoJENfLVL+wpUDr5MUP/16s2Kp9ZZZZ7ICi3yrHOzb0
 9WpCOmbKUIELXl8YgkxlvPUYMzTQTIc32TwbVgdFV0g41my/0+O3z3+IiTrUGxH5
 8LgouMWBZ9KKmyUB//+KQAXr3j/bvDdF6Li6wJfz8a2o+9xT4oTkK1+Js8p0kn6e
 HNKfRknfCKwvE+j4tPCLfs2RX5qDyBFILXwWhj1fAbmT3rbnp+QqkXD4mWUrXb9z
 DBgxciXRhOkOQQAD2KQBFd2kUqWDZ5ED23b+aYsu9D3VCW45zitBqQFAxkQWL0hp
 x8Mp+MDCxlgdEaGQPUmUiDtPkG1X9ZxUJCAwaJWWsZaItwR2Il+en2sETctnTZ1X
 0IAxZVFdolzSeLzIfNx3OG32JdWJdaNjUzkIZam8gO6i1f6PAmK4alR0J3CT31nJ
 /OEN76o1E7acGWRMmj+MAZ2U5gPfR7EitOzyE8ZUPcHgyeGMiynjwi56WIpeSvT2
 F/Sp5kRe5+D5gtnYuppGp7Srp5vYdtFaz1zgPDUKpDLcxfDweO8AHGjJf3Zmrunx
 X24yia4A14CnfcUy4vKpISXRykmkG/3Z0tpWwV53uXZm4nlQfRc7gPibiW7Ay521
 af8sDoItW98K3DK5NQU7IUn83ua1TStzpoqlAEafRw//g9zPMTbhHvNvOyrRfrcX
 kjWn6hNblMu9M34JOjtu
 =XOrF
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - sunrpc: fix writ espace race causing stalls
   - NFS: Fix inode corruption in nfs_prime_dcache()
   - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
   - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
   - NFSv4: Open state recovery must account for file permission changes
   - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

  Features:
   - Add support for tracking multiple layout types with an ordered list
   - Add support for using multiple backchannel threads on the client
   - Add support for pNFS file layout session trunking
   - Delay xprtrdma use of DMA API (for device driver removal)
   - Add support for xprtrdma remote invalidation
   - Add support for larger xprtrdma inline thresholds
   - Use a scatter/gather list for sending xprtrdma RPC calls
   - Add support for the CB_NOTIFY_LOCK callback
   - Improve hashing sunrpc auth_creds by using both uid and gid

  Bugfixes:
   - Fix xprtrdma use of DMA API
   - Validate filenames before adding to the dcache
   - Fix corruption of xdr->nwords in xdr_copy_to_scratch
   - Fix setting buffer length in xdr_set_next_buffer()
   - Don't deadlock the state manager on the SEQUENCE status flags
   - Various delegation and stateid related fixes
   - Retry operations if an interrupted slot receives EREMOTEIO
   - Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
  NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
  fs: nfs: Make nfs boot time y2038 safe
  sunrpc: replace generic auth_cred hash with auth-specific function
  sunrpc: add RPCSEC_GSS hash_cred() function
  sunrpc: add auth_unix hash_cred() function
  sunrpc: add generic_auth hash_cred() function
  sunrpc: add hash_cred() function to rpc_authops struct
  Retry operation on EREMOTEIO on an interrupted slot
  pNFS: Fix atime updates on pNFS clients
  sunrpc: queue work on system_power_efficient_wq
  NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
  NFSv4: If recovery failed for a specific open stateid, then don't retry
  NFSv4: Fix retry issues with nfs41_test/free_stateid
  NFSv4: Open state recovery must account for file permission changes
  NFSv4: Mark the lock and open stateids as invalid after freeing them
  NFSv4: Don't test open_stateid unless it is set
  NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
  NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
  NFSv4: Fix a race when updating an open_stateid
  NFSv4: Fix a race in nfs_inode_reclaim_delegation()
  ...
2016-10-13 21:28:20 -07:00
Benjamin Coddington a3f9d1b58a pnfs/blocklayout: fix last_write_offset incorrectly set to page boundary
Commit 41963c10c4 sets the block layout's
last written byte to the offset of the end of the extent rather than the
end of the write which incorrectly updates the inode's size for
partial-page writes.

Fixes: 41963c10c4 ("pnfs/blocklayout: update last_write_offset atomically with extents")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-10-13 16:42:53 -04:00
Linus Torvalds 101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Linus Torvalds 97d2116708 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
 "xattr stuff from Andreas

  This completes the switch to xattr_handler ->get()/->set() from
  ->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Remove {get,set,remove}xattr inode operations
  xattr: Stop calling {get,set,remove}xattr inode operations
  vfs: Check for the IOP_XATTR flag in listxattr
  xattr: Add __vfs_{get,set,remove}xattr helpers
  libfs: Use IOP_XATTR flag for empty directory handling
  vfs: Use IOP_XATTR flag for bad-inode handling
  vfs: Add IOP_XATTR inode operations flag
  vfs: Move xattr_resolve_name to the front of fs/xattr.c
  ecryptfs: Switch to generic xattr handlers
  sockfs: Get rid of getxattr iop
  sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
  kernfs: Switch to generic xattr handlers
  hfs: Switch to generic xattr handlers
  jffs2: Remove jffs2_{get,set,remove}xattr macros
  xattr: Remove unnecessary NULL attribute name check
2016-10-10 17:11:50 -07:00
Linus Torvalds b66484cd74 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - fsnotify updates

 - ocfs2 updates

 - all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
  console: don't prefer first registered if DT specifies stdout-path
  cred: simpler, 1D supplementary groups
  CREDITS: update Pavel's information, add GPG key, remove snail mail address
  mailmap: add Johan Hovold
  .gitattributes: set git diff driver for C source code files
  uprobes: remove function declarations from arch/{mips,s390}
  spelling.txt: "modeled" is spelt correctly
  nmi_backtrace: generate one-line reports for idle cpus
  arch/tile: adopt the new nmi_backtrace framework
  nmi_backtrace: do a local dump_stack() instead of a self-NMI
  nmi_backtrace: add more trigger_*_cpu_backtrace() methods
  min/max: remove sparse warnings when they're nested
  Documentation/filesystems/proc.txt: add more description for maps/smaps
  mm, proc: fix region lost in /proc/self/smaps
  proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
  proc: add LSM hook checks to /proc/<tid>/timerslack_ns
  proc: relax /proc/<tid>/timerslack_ns capability requirements
  meminfo: break apart a very long seq_printf with #ifdefs
  seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
  proc: faster /proc/*/status
  ...
2016-10-07 21:38:00 -07:00
Andreas Gruenbacher fd50ecaddf vfs: Remove {get,set,remove}xattr inode operations
These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-07 21:48:36 -04:00
Huang Ying 8cd797887a mm: remove page_file_index
After using the offset of the swap entry as the key of the swap cache,
the page_index() becomes exactly same as page_file_index().  So the
page_file_index() is removed and the callers are changed to use
page_index() instead.

Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:28 -07:00
Al Viro 82c156f853 switch generic_file_splice_read() to use of ->read_iter()
... and kill the ->splice_read() instances that can be switched to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05 18:23:56 -04:00
Jeff Layton 3f807e5ae5 NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
The caller of rpc_run_task also gets a reference that must be put.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-10-04 16:30:54 -04:00
Deepa Dinamani 2f86e0919a fs: nfs: Make nfs boot time y2038 safe
boot_time is represented as a struct timespec.
struct timespec and CURRENT_TIME are not y2038 safe.
Overall, the plan is to use timespec64 and ktime_t for
all internal kernel representation of timestamps.
CURRENT_TIME will also be removed.

boot_time is used to construct the nfs client boot verifier.

Use ktime_t to represent boot_time and ktime_get_real() for
the boot_time value.

Following Trond's request https://lkml.org/lkml/2016/6/9/22 ,
use ktime_t instead of converting to struct timespec64.

Use higher and lower 32 bit parts of ktime_t for the boot
verifier.

Use the lower 32 bit part of ktime_t for the authsys_parms
stamp field.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-10-04 16:20:26 -04:00
Olga Kornievskaia a865880e20 Retry operation on EREMOTEIO on an interrupted slot
If an operation got interrupted, then since we don't know if the
server processed it on not, we keep the seq#. Upon reuse of slot
and seq# if we get reply from the cache (ie EREMOTEIO) then we
need to retry the operation after bumping the seq#

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-29 12:31:48 -04:00
Trond Myklebust bfc505ded0 pNFS: Fix atime updates on pNFS clients
Fix the code so that we always mark the atime as invalid in nfs4_read_done().
Currently, the expectation appears to be that the pNFS drivers should always
do this, with the result that most of them don't.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:36 -04:00
Trond Myklebust 8a64c4ef10 NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
TEST_STATEID only tells you that you have a valid open stateid. It doesn't
tell the client anything about whether or not it holds the required share
locks.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
[Anna: Wrap nfs_open_stateid_recover_openmode in CONFIG_NFS_V4_1 checks]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:31 -04:00
Trond Myklebust 7ebeb7fe74 NFSv4: If recovery failed for a specific open stateid, then don't retry
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:27 -04:00
Trond Myklebust 76e8a1bd14 NFSv4: Fix retry issues with nfs41_test/free_stateid
_nfs41_free_stateid() needs to be cached by the session, but
nfs41_test_stateid() may return NFS4ERR_RETRY_UNCACHED_REP (in which
case we should just retry).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:23 -04:00
Trond Myklebust 304020fe48 NFSv4: Open state recovery must account for file permission changes
If the file permissions change on the server, then we may not be able to
recover open state. If so, we need to ensure that we mark the file
descriptor appropriately.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:19 -04:00
Trond Myklebust 67dd483026 NFSv4: Mark the lock and open stateids as invalid after freeing them
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:15 -04:00
Trond Myklebust b134fc4a53 NFSv4: Don't test open_stateid unless it is set
We need to test the NFS_OPEN_STATE flag for whether or not the
open_stateid is valid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:11 -04:00
Trond Myklebust 272289a3df NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
If we're not yet sure that all state has expired or been revoked, we
should try to do a minimal recovery on just the one stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:07 -04:00
Trond Myklebust 7f04883146 NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
Don't rely on nfs_inode_detach_delegation() succeeding. That can race...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:04 -04:00
Trond Myklebust 1393d9612b NFSv4: Fix a race when updating an open_stateid
If we're replacing an old stateid which has a different 'other' field,
then we probably need to free the old stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:35:00 -04:00
Trond Myklebust b1a318de9b NFSv4: Fix a race in nfs_inode_reclaim_delegation()
If we race with a delegreturn before taking the spin lock, we
currently end up dropping the delegation stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:54 -04:00
Trond Myklebust 9c27869d3f NFSv4: Pass the stateid to the exception handler in nfs4_read/write_done_cb
The actual stateid used in the READ or WRITE can represent a delegation,
a lock or a stateid, so it is useful to pass it as an argument to the
exception handler when an expired/revoked response is received from the
server. It also ensures that we don't re-label the state as needing
recovery if that has already occurred.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:50 -04:00
Trond Myklebust 26f474432a NFSv4.1: nfs4_layoutget_handle_exception handle revoked state
Handle revoked open/lock/delegation stateids when LAYOUTGET tells us
the state was revoked.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:46 -04:00
Trond Myklebust d7f3e4bfe7 NFSv4: nfs4_handle_setlk_error() handle expiration as revoke case
If the server tells us our stateid has expired, then handle that as if
it was revoked.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:42 -04:00
Trond Myklebust 404ea3569a NFSv4: nfs4_handle_delegation_recall_error() handle expiration as revoke case
If the server tells us our stateid has expired, then handle that as if
it was revoked.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:38 -04:00
Trond Myklebust 6c2d8f8d30 NFSv4: nfs_inode_find_state_and_recover() should check all stateids
Modify the helper nfs_inode_find_state_and_recover() so that it
can check all open/lock/delegation state trackers on that inode for
whether or not they need are affected by a revoked stateid error.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:35 -04:00
Trond Myklebust 059b43e974 NFSv4: Ensure we don't re-test revoked and freed stateids
This fixes a potential infinite loop in nfs_reap_expired_delegations.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:31 -04:00
Trond Myklebust 26d36301bd NFSv4.1: Ensure we call FREE_STATEID if needed on close/delegreturn/locku
If a server returns NFS4ERR_ADMIN_REVOKED, NFS4ERR_DELEG_REVOKED
or NFS4ERR_EXPIRED on a call to close, open_downgrade, delegreturn, or
locku, we should call FREE_STATEID before attempting to recover.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:27 -04:00
Trond Myklebust f0b0bf8826 NFSv4.1: FREE_STATEID can be asynchronous
Nothing should need to be serialised with FREE_STATEID on the client,
so let's make the RPC call always asynchronous. Also constify the
stateid argument.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:23 -04:00
Trond Myklebust c5896fc862 NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks
Right now, we're only running TEST/FREE_STATEID on the locks if
the open stateid recovery succeeds. The protocol requires us to
always do so.
The fix would be to move the call to TEST/FREE_STATEID and do it
before we attempt open recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:12 -04:00
Trond Myklebust f7a62adad0 NFSv4.1: Allow revoked stateids to skip the call to TEST_STATEID
In some cases (e.g. when the SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED sequence
flag is set) we may already know that the stateid was revoked and that the
only valid operation we can call is FREE_STATEID. In those cases, allow
the stateid to carry the information in the type field, so that we skip
the redundant call to TEST_STATEID.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:34:01 -04:00
Trond Myklebust 63d63cbf5e NFSv4.1: Don't recheck delegations that have already been checked
Ensure we don't spam the server with test_stateid() calls for
delegations that have already been checked.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:33:55 -04:00
Trond Myklebust bb3d1a3b24 NFSv4.1: Deal with server reboots during delegation expiration recovery
Ensure that if the server reboots while we're testing and recovering
from revoked delegations, we exit to allow the state manager to
handle matters.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:33:49 -04:00
Trond Myklebust 45870d6909 NFSv4.1: Test delegation stateids when server declares "some state revoked"
According to RFC5661, if any of the SEQUENCE status bits
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED,
or SEQ4_STATUS_RECALLABLE_STATE_REVOKED are set, then we need to use
TEST_STATEID to figure out which stateids have been revoked, so we
can acknowledge the loss of state using FREE_STATEID.

While we already do this for open and lock state, we have not been doing
so for all the delegations.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:33:44 -04:00
Trond Myklebust 41020b671a NFSv4.x: Allow callers of nfs_remove_bad_delegation() to specify a stateid
Allow the callers of nfs_remove_bad_delegation() to specify the stateid
that needs to be marked as bad.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:33:37 -04:00
Trond Myklebust 4586f6e283 NFSv4.1: Add a helper function to deal with expired stateids
In NFSv4.1 and newer, if the server decides to revoke some or all of
the protocol state, the client is required to iterate through all the
stateids that it holds and call TEST_STATEID to determine which stateids
still correspond to valid state, and then call FREE_STATEID on the
others.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:33:21 -04:00
Trond Myklebust 43912bbbae NFSv4.1: Allow test_stateid to handle session errors without waiting
If the server crashes while we're testing stateids for validity, then
we want to initiate session recovery. Usually, we will be calling from
a state manager thread, though, so we don't really want to wait.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:32:59 -04:00
Trond Myklebust 4c8e544746 NFSv4.1: Don't check delegations that are already marked as revoked
If the delegation has been marked as revoked, we don't have to test
it, because we should already have called FREE_STATEID on it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Olek Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:32:41 -04:00
Trond Myklebust aa05c87f23 NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
We must not allow the use of delegations that have been revoked or are
being returned.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.19+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:32:31 -04:00
Trond Myklebust b3f9e72390 NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
If the delegation is revoked, then it can't be used for caching.

Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.19+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:32:12 -04:00
Trond Myklebust 7dc72d5f7a NFS: Fix inode corruption in nfs_prime_dcache()
Due to inode number reuse in filesystems, we can end up corrupting the
inode on our client if we apply the file attributes without ensuring that
the filehandle matches.
Typical symptoms include spurious "mode changed" reports in the syslog.

We still do want to ensure that we don't invalidate the dentry if the
inode number matches, but we don't have a filehandle.

Fixes: fa9233699c ("NFS: Don't require a filehandle to refresh...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:31:52 -04:00
Trond Myklebust 0a014a44a5 NFSv4.1: Don't deadlock the state manager on the SEQUENCE status flags
As described in RFC5661, section 18.46, some of the status flags exist
in order to tell the client when it needs to acknowledge the existence of
revoked state on the server and/or to recover state.
Those flags will then remain set until the recovery procedure is done.

In order to avoid looping, the client therefore needs to ignore
those particular flags while recovering.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-27 14:31:27 -04:00
Miklos Szeredi 2773bf00ae fs: rename "rename2" i_op to "rename"
Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27 11:03:58 +02:00
Miklos Szeredi 1cd66c93ba fs: make remaining filesystems use .rename2
This is trivial to do:

 - add flags argument to foo_rename()
 - check if flags is zero
 - assign foo_rename() to .rename2 instead of .rename

This doesn't mean it's impossible to support RENAME_NOREPLACE for these
filesystems, but it is not trivial, like for local filesystems.
RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
for a file to be created on one host while it is overwritten by rename on
another host).

Filesystems converted:

9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

After this, we can get rid of the duplicate interfaces for rename.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Howells <dhowells@redhat.com> [AFS]
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Mark Fasheh <mfasheh@suse.com>
2016-09-27 11:03:58 +02:00
Daniel Wagner 2a446a5d99 NFS: cache_lib: use complete() instead of complete_all()
There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().

The generic caching code from sunrpc is calling revisit() only once.

The usage pattern of the completion is:

waiter context                          waker context

do_cache_lookup_wait()
  nfs_cache_defer_req_alloc()
    init_completion()
  do_cache_lookup()
  nfs_cache_wait_for_upcall()
    wait_for_completion_timeout()

					nfs_dns_cache_revisit()
					  complete()

  nfs_cache_defer_req_put()

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-23 09:40:12 -04:00
Daniel Wagner 024de8f1ad NFS: direct: use complete() instead of complete_all()
There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().

nfs_file_direct_write() or nfs_file_direct_read() allocated a request
object via nfs_direct_req_alloc(), which initializes the
completion. The request object then is freed later in the exit path.
Between the initialization and the release either
nfs_direct_write_schedule_iovec() resp
nfs_direct_read_schedule_iovec() are called which will asynchronously
process the request. The calling function waits via nfs_direct_wait()
till the async work has been done. Thus there is only one waiter on
the completion.

nfs_direct_pgio_init() and nfs_direct_read_completion() are passed via
function pointers to nfs pageio. The first function does a ref
counting (get_dreq() and put_dreq()) which ensures that
nfs_direct_read_completion() and nfs_direct_read_schedule_iovec() only
call the completion path once.

The usage pattern of the completion is:

waiter context                          waker context

nfs_file_direct_write()
  dreq = nfs_direct_req_alloc()
    init_completion()
  nfs_direct_write_schedule_iovec()
  nfs_direct_wait()
    wait_for_completion_killable()

                                        nfs_direct_write_schedule_work()
                                          nfs_direct_complete()
                                            complete()

nfs_file_direct_read()
  dreq = nfs_direct_req_all()
    init_completion()
  nfs_direct_read_schedule_iovec()
  nfs_direct_wait()
    wait_for_completion_killable()
                                        nfs_direct_read_schedule_iovec()
                                          nfs_direct_complete()
                                            complete()

                                        nfs_direct_read_completion()
                                          nfs_direct_complete()
                                            complete()

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-23 09:14:16 -04:00
Trond Myklebust 78d04af499 NFS: nfs_prime_dcache must validate the filename
Before we try to stash it in the dcache, we need to at least check
that the filename passed to us by the server is non-empty and doesn't
contain any illegal '\0' or '/' characters.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 17:02:03 -04:00
Jeff Layton a1d617d8f1 nfs: allow blocking locks to be awoken by lock callbacks
Add a waitqueue head to the client structure. Have clients set a wait
on that queue prior to requesting a lock from the server. If the lock
is blocked, then we can use that to wait for wakeups.

Note that we do need to do this "manually" since we need to set the
wait on the waitqueue prior to requesting the lock, but requesting a
lock can involve activities that can block.

However, only do that for NFSv4.1 locks, either by compiling out
all of the waitqueue handling when CONFIG_NFS_V4_1 is disabled, or
skipping all of it at runtime if we're dealing with v4.0, or v4.1
servers that don't send lock callbacks.

Note too that even when we expect to get a lock callback, RFC5661
section 20.11.4 is pretty clear that we still need to poll for them,
so we do still sleep on a timeout. We do however always poll at the
longest interval in that case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
[Anna: nfs4_retry_setlk() "status" should default to -ERESTARTSYS]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 15:54:27 -04:00
Jeff Layton d2f3a7f918 nfs: move nfs4 lock retry attempt loop to a separate function
This also consolidates the waiting logic into a single function,
instead of having it spread across two like it is now.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 13:56:04 -04:00
Jeff Layton 1ea67dbd98 nfs: move nfs4_set_lock_state call into caller
We need to have this info set up before adding the waiter to the
waitqueue, so move this out of the _nfs4_proc_setlk and into the
caller. That's more efficient anyway since we don't need to do
this more than once if we end up waiting on the lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 13:56:04 -04:00
Jeff Layton db783688d4 nfs: add handling for CB_NOTIFY_LOCK in client
For now, the callback doesn't do anything. Support for that will be
added in later patches.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 13:56:04 -04:00
Jeff Layton a8ce377a5d nfs: track whether server sets MAY_NOTIFY_LOCK flag
We want to handle the two cases differently, such that we poll more
aggressively when we don't expect a callback.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 13:56:04 -04:00
Jeff Layton 66f570ab73 nfs: use safe, interruptible sleeps when waiting to retry LOCK
We actually want to use TASK_INTERRUPTIBLE sleeps when we're in the
process of polling for a NFSv4 lock. If there is a signal pending when
the task wakes up, then we'll be returning an error anyway. So, we might
as well wake up immediately for non-fatal signals as well. That allows
us to return to userland more quickly in that case, but won't change the
error that userland sees.

Also, there is no need to use the *_unsafe sleep variants here, as no
vfs-layer locks should be held at this point.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 13:56:04 -04:00
Jeff Layton 75575ddf29 nfs: eliminate pointless and confusing do_vfs_lock wrappers
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 13:56:04 -04:00
Jeff Layton b60475c940 nfs: the length argument to read_buf should be unsigned
Since it gets passed through to xdr_inline_decode, we might as well
have read_buf expect what it expects -- a size_t.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-22 13:56:04 -04:00
Chao Yu f844cd0d76 nfs: cover ->migratepage with CONFIG_MIGRATION
It will be more clean to use CONFIG_MIGRATION to cover nfs' private
.migratepage in nfs_file_aops like we do in other part of nfs
operations.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-20 09:29:39 -04:00
Jeff Layton ca440c383a pnfs: add a new mechanism to select a layout driver according to an ordered list
Currently, the layout driver selection code always chooses the first one
from the list. That's not really ideal however, as the server can send
the list of layout types in any order that it likes. It's up to the
client to select the best one for its needs.

This patch adds an ordered list of preferred driver types and has the
selection code sort the list of available layout drivers according to it.
Any unrecognized layout type is sorted to the end of the list.

For now, the order of preference is hardcoded, but it should be possible
to make this configurable in the future.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:11:13 -04:00
Andy Adamson 04fa2c6bb5 NFS pnfs data server multipath session trunking
Try all multipath addresses for a data server. The first address that
successfully connects and creates a session is the DS mount address.
All subsequent addresses are tested for session trunking and
added as aliases.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:37 -04:00
Andy Adamson ad0849a7ef NFS test session trunking with exchange id
Use an async exchange id call to test for session trunking

To conform with RFC 5661 section 18.35.4, the Non-Update on
Existing Clientid case, save the exchange id verifier in
cl_confirm and use it for the session trunking exhange id test.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Andy Adamson 04ea1b3e6d NFS add xprt switch addrs test to match client
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Andy Adamson ba84db96aa NFS detect session trunking
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Andy Adamson e7b7cbf662 NFS refactor nfs4_check_serverowner_major_id
For session trunking, to compare nfs41_exchange_id_res with
existing nfs_client

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Andy Adamson 8e548edb40 NFS refactor nfs4_match_clientids
For session trunking, to compare nfs41_exchange_id_res with
exiting nfs_client.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Andy Adamson 8d89bd70bc NFS setup async exchange_id
Testing an rpc_xprt for session trunking should not delay application
progress over already established transports.
Setup exchange_id to be able to be an async call to test an rpc_xprt
for session trunking use.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Trond Myklebust 5405fc44c3 NFSv4.x: Add kernel parameter to control the callback server
Add support for the kernel parameter nfs.callback_nr_threads to set
the number of threads that will be assigned to the callback channel.

Add support for the kernel parameter nfs.nfs.max_session_cb_slots
to set the maximum size of the callback channel slot table.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Trond Myklebust bb6aeba736 NFSv4.x: Switch to using svc_set_num_threads() to manage the callback threads
This will allow us to bump the number of callback threads at will.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Trond Myklebust 3b01c11ee8 NFSv4.x: Fix up the global tracking of the callback server
Ensure that the nfs_callback_info[] array correctly tracks the
struct svc_serv.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Trond Myklebust d002526886 SUNRPC: Initialise struct svc_serv backchannel fields during __svc_create()
Clean up.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Trond Myklebust f4b52bb084 NFSv4.x: Set up struct svc_serv_ops for the callback channel
In order to manage the threads using svc_set_num_threads, we need to
fill in a few extra fields.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:36 -04:00
Jeff Layton 3132e49ece pnfs: track multiple layout types in fsinfo structure
Current NFSv4.1/pNFS client assumes that MDS supports only one layout
type. While it's true for most existing servers, nevertheless, this can
be change in the near future.

For now, this patch just plumbs in the ability to track a list of
layouts in the fsinfo structure. The existing behavior of the client
is preserved, by having it just select the first entry in the list.

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-09-19 13:08:35 -04:00
Trond Myklebust b519d408ea NFSv4.1: Fix the CREATE_SESSION slot number accounting
Ensure that we conform to the algorithm described in RFC5661, section
18.36.4 for when to bump the sequence id. In essence we do it for all
cases except when the RPC call timed out, or in case of the server returning
NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
2016-09-11 14:56:44 -04:00
Trond Myklebust 334a8f3711 pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETs
If there are outstanding LAYOUTGET rpc calls, then we want to ensure that
we keep the layout stateid around so we that don't inadvertently pick up
an old/misordered sequence id.
The race is as follows:

Client				Server
======				======
LAYOUTGET(seqid)
LAYOUTGET(seqid)
				return LAYOUTGET(seqid+1)
				return LAYOUTGET(seqid+2)
process LAYOUTGET(seqid+2)
	forget layout
process LAYOUTGET(seqid+1)

If it forgets the layout stateid before processing seqid+1, then
the client will not check the layout->plh_barrier, and so will set
the stateid with seqid+1.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-09-04 12:59:00 -04:00
Trond Myklebust 52ec7be2e2 pNFS: Clear out all layout segments if the server unsets lrp->res.lrs_present
If the server fails to set lrp->res.lrs_present in the LAYOUTRETURN reply,
then that means it believes the client holds no more layout state for that
file, and that the layout stateid is now invalid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-09-03 12:10:38 -04:00
Trond Myklebust 2a59a04116 pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STID
If the layout was marked as invalid, we want to ensure to initialise
the layout header fields correctly.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-09-03 12:10:37 -04:00
Trond Myklebust bf0291dd22 pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised
According to RFC5661, the client is responsible for serialising
LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case
where we send both in parallel.

Client					Server
======					======
LAYOUTGET(seqid=X)
LAYOUTRETURN(seqid=X)
					LAYOUTGET return seqid=X+1
					LAYOUTRETURN return seqid=X+2
Process LAYOUTRETURN
          Forget layout stateid
Process LAYOUTGET
          Set seqid=X+1

The client processes the layoutget/layoutreturn in the wrong order,
and since the result of the layoutreturn was to clear the only
existing layout segment, the client forgets the layout stateid.

When the LAYOUTGET comes in, it is treated as having a completely
new stateid, and so the client sets the wrong sequence id...

Fix is to check if there are outstanding LAYOUTGET requests
before we send the LAYOUTRETURN (note that LAYOUGET will already
wait if it sees an outstanding LAYOUTRETURN).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-09-03 12:10:37 -04:00
Trond Myklebust c49edecd51 NFS: Fix error reporting in nfs_file_write()
When doing O_DSYNC writes, the actual write errors are reported through
generic_write_sync(), so we must test the result.

Reported-by: J. R. Okajima <hooanon05g@gmail.com>
Fixes: 18290650b1 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-09-03 12:10:36 -04:00
Trond Myklebust 98b0f80c23 NFSv4.x: Fix a refcount leak in nfs_callback_up_net
On error, the callers expect us to return without bumping
nn->cb_users[].

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.7+
2016-08-30 09:26:57 -04:00
Benjamin Coddington 52442f9b11 NFS4: Avoid migration loops
If a server returns itself as a location while migrating, the client may
end up getting stuck attempting to migrate twice to the same server.  Catch
this by checking if the nfs_client found is the same as the existing
client.  For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL).

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-30 09:26:32 -04:00
Trond Myklebust 3dc147359e pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
If the attempt to connect to a DS fails inside ff_layout_pg_init_read or
ff_layout_pg_init_write, then we currently end up clearing the layout
segment carried by the struct nfs_pageio_descriptor, causing an Oops
when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist.

The fix is to ensure we return the layout and then retry.

Fixes: 446ca21953 ("pNFS/flexfiles: When initing reads or writes, we...")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-29 15:21:16 -04:00
Trond Myklebust d138027a82 NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-28 14:23:27 -04:00
Trond Myklebust 2e80dbe7ac NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
Defer freeing the slot until after we have processed the results from
OPEN and LAYOUTGET. This means that the server can rely on the
mechanism in RFC5661 Section 2.10.6.3 to ensure that replies to an
OPEN or LAYOUTGET/RETURN RPC call don't race with the callbacks that
apply to them.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-28 14:23:27 -04:00
Trond Myklebust 07e8dcbda7 NFSv4.1: Defer bumping the slot sequence number until we free the slot
For operations like OPEN or LAYOUTGET, which return recallable state
(i.e. delegations and layouts) we want to enable the mechanism for
resolving recall races in RFC5661 Section 2.10.6.3.
To do so, we will want to defer bumping the slot's sequence number until
we have finished processing the RPC results.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-28 14:23:26 -04:00
Trond Myklebust 045d2a6d07 NFSv4.1: Delay callback processing when there are referring triples
If CB_SEQUENCE tells us that the processing of this request depends on
the completion of one or more referring triples (see RFC 5661 Section
2.10.6.3), delay the callback processing until after the RPC requests
being referred to have completed.
If we end up delaying for more than 1/2 second, then fall back to
returning NFS4ERR_DELAY in reply to the callback.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-28 14:23:26 -04:00
Trond Myklebust e09c978aae NFSv4.1: Fix Oopsable condition in server callback races
The slot table hasn't been an array since v3.7. Ensure that we
use nfs4_lookup_slot() to access the slot correctly.

Fixes: 87dda67e73 ("NFSv4.1: Allow SEQUENCE to resize the slot table...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.8+
2016-08-28 14:23:22 -04:00
Benjamin Coddington 41963c10c4 pnfs/blocklayout: update last_write_offset atomically with extents
Block/SCSI layout write completion may add committable extents to the
extent tree before updating the layout's last-written byte under the inode
lock.  If a sync happens before this value is updated, then
prepare_layoutcommit may find and encode these extents which would produce
a LAYOUTCOMMIT request whose encoded extents are larger than the request's
loca_length.

Fix this by using a last-written byte value that is updated atomically with
the extent tree so that commitable extents always match.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-23 11:41:38 -04:00
Trond Myklebust b88fa69eaa pNFS: The client must not do I/O to the DS if it's lease has expired
Ensure that the client conforms to the normative behaviour described in
RFC5661 Section 12.7.2: "If a client believes its lease has expired,
it MUST NOT send I/O to the storage device until it has validated its
lease."

So ensure that we wait for the lease to be validated before using
the layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.20+
2016-08-23 11:27:01 -04:00
Trond Myklebust 9a0fe86745 pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
We normally want to update the stateid and then retry,

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-19 16:27:31 -04:00
Trond Myklebust 15d03055cf pNFS/flexfiles: Set reasonable default retrans values for the data channel
Prior to this patch, the retrans value was set at 5, meaning that we
could see a maximum retransmission timeout value of more than 6 minutes.
That's a tad high for NFSv3 where the protocol does allow the server to
drop requests at any time.

Since this is a data channel, let's just set retrans to 0, and the default
timeout to 60s. The user can continue to adjust these defaults using the
dataserver_retrans and dataserver_timeo module parameters.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-16 11:16:19 -04:00
Trond Myklebust a956beda19 NFS: Allow the mount option retrans=0
We should allow retrans=0 as just meaning that every timeout is a major
timeout, and that there is no increment in the timeout value.

For instance, this means that we would allow TCP users to specify a
flat timeout value of 60s, by specifying "timeo=600,retrans=0" in their
mount option string.

Siged-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-16 11:00:06 -04:00
Trond Myklebust 1c8d477a77 pNFS/flexfiles: Fix layoutstat periodic reporting
Putting the periodicity timer in the mirror instances is causing
non-scalable reporting behaviour and missed reporting intervals.
When you recall layouts and/or implement client side mirroring, it
leads to consecutive reports with only a few ms between RPC calls.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: d0379a5d06 ("pNFS/flexfiles: Support server-supplied...")
2016-08-14 23:01:10 -04:00
Linus Torvalds 9909170065 NFS client bugfixes for Linux 4.8
Highlights include:
 
 - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs
   multiple different security services (e.g. krb5i and krb5p).
 - Stable patch to fix a regression introduced by the use of SO_REUSEPORT,
   and that prevented the use of multiple different NFS versions to the
   same server.
 - TCP socket reconnection timer fixes.
 - Patch from Neil to disable the use of IPv6 temporary addresses.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXrh03AAoJEGcL54qWCgDyp4EQALwZpmYCxWJE5xSHW95Fs124
 HYM8g4LznOfs3/ohInb1ja2FaQqUy0XEk3pSjNKfyYgjuwB4qJSOpnAqoIKxJFGB
 h4582leYZOZYMMCGslS2I4zcElBYO1WjnKNyb7MpZjCHmN0AdFfIcOXd2K7eL9hM
 /poImcs5KfMGIEJqmKqMUxmJ3RjxpK3LySQAes/Y5odOiHC4SGJdGUmSeuPGTbQd
 YjFWVHRFU6kVAzPd2Jl46Sgy6SpDaVz82HodXCSY+8lklmIkbIsVqJs0VWo3WkfL
 r5WLQ3PzZvloQ7o/E9tZGiB/LEi7roa51hYsG4sleN6Kap5vwyWg0QIKjqyJdFxB
 JmFanlCMfae3zNz4cusvgu1okvMnNqO4uRXJIAKfk64k775N9ebY7TXAZUK4/UbY
 4nxCHcxygamP/k/8HYFpc4964tMaimIs9JUdojad5a3dzffwXcgEC/0HPUih9R+i
 DO/cbVtWeDkmQPLrUqFfOAbmQdyAjELrv48d5BVIst49uuCULU2LlDlVLiAvaZvq
 s2YNmr7lkHowvgaH4ShL89wuyyD14Xu5/f49oFBFNKEQay9YthQ8s3XmdZBG7Zl0
 oyA1XJjWEq3p8nvPGIqFD26w75ppUbAWLTHsyoU0YfEYrZJrF9jPxowI7WlHgfVo
 Io79x1sbgTrckjG+osAf
 =UHph
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
     needs multiple different security services (e.g.  krb5i and krb5p).

   - Stable patch to fix a regression introduced by the use of
     SO_REUSEPORT, and that prevented the use of multiple different NFS
     versions to the same server.

   - TCP socket reconnection timer fixes.

   - Patch from Neil to disable the use of IPv6 temporary addresses"

* tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Cap the transport reconnection timer at 1/2 lease period
  NFSv4: Cleanup the setting of the nfs4 lease period
  SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
  SUNRPC: Fix reconnection timeouts
  NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
  SUNRPC: disable the use of IPv6 temporary addresses.
  SUNRPC: allow for upcalls for same uid but different gss service
  SUNRPC: Fix up socket autodisconnect
  SUNRPC: Handle EADDRNOTAVAIL on connection failures
2016-08-12 12:32:24 -07:00
Linus Torvalds 835c92d43b Merge branch 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull qstr constification updates from Al Viro:
 "Fairly self-contained bunch - surprising lot of places passes struct
  qstr * as an argument when const struct qstr * would suffice; it
  complicates analysis for no good reason.

  I'd prefer to feed that separately from the assorted fixes (those are
  in #for-linus and with somewhat trickier topology)"

* 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  qstr: constify instances in adfs
  qstr: constify instances in lustre
  qstr: constify instances in f2fs
  qstr: constify instances in ext2
  qstr: constify instances in vfat
  qstr: constify instances in procfs
  qstr: constify instances in fuse
  qstr constify instances in fs/dcache.c
  qstr: constify instances in nfs
  qstr: constify instances in ocfs2
  qstr: constify instances in autofs4
  qstr: constify instances in hfs
  qstr: constify instances in hfsplus
  qstr: constify instances in logfs
  qstr: constify dentry_init_security
2016-08-06 09:49:02 -04:00
Trond Myklebust 8d480326c3 NFSv4: Cap the transport reconnection timer at 1/2 lease period
We don't want to miss a lease period renewal due to the TCP connection
failing to reconnect in a timely fashion. To ensure this doesn't happen,
cap the reconnection timer so that we retry the connection attempt
at least every 1/2 lease period.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 19:22:22 -04:00
Trond Myklebust fb10fb67ad NFSv4: Cleanup the setting of the nfs4 lease period
Make a helper function nfs4_set_lease_period() and have
nfs41_setup_state_renewal() and nfs4_do_fsinfo() use it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 19:13:08 -04:00
Trond Myklebust 206b3bb574 NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
We should handle those errors in the same way we handle the other
stateid errors: by invalidating the faulty layout stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 12:18:10 -04:00
Linus Torvalds 7f155c7026 NFS client updates for Linux 4.8
Highlights include:
 
 Stable bugfixes:
  - nfs: don't create zero-length requests
  - Several LAYOUTGET bugfixes
 
 Features:
  - Several performance related features
    - More aggressive caching when we can rely on close-to-open cache
      consistency
    - Remove serialisation of O_DIRECT reads and writes
    - Optimise several code paths to not flush to disk unnecessarily. However
      allow for the idiosyncracies of pNFS for those layout types that need
      to issue a LAYOUTCOMMIT before the metadata can be updated on the server.
    - SUNRPC updates to the client data receive path
  - pNFS/SCSI support RH/Fedora dm-mpath device nodes
  - pNFS files/flexfiles can now use unprivileged ports when the generic NFS
    mount options allow it.
 
 Bugfixes:
  - Don't use RDMA direct data placement together with data integrity or
    privacy security flavours
  - Remove the RDMA ALLPHYSICAL memory registration mode as it has potential
    security holes.
  - Several layout recall fixes to improve NFSv4.1 protocol compliance.
  - Fix an Oops in the pNFS files and flexfiles connection setup to the DS
  - Allow retry of operations that used a returned delegation stateid
  - Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding
  - Fix writeback races in nfs4_copy_range() and nfs42_proc_deallocate()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXnSq8AAoJEGcL54qWCgDyn8cP/RCHLekUCq7Klh+NAnEsvuBi
 C7w9YpVHaC83/8Q0tR6LyFShSBJBWi/clWwO0IEomkNK/MuO77v4iyPujtEyqowK
 0+eWFh/e8CsTf7mNGoi0avrHAZDB3deSuOQeYbwnNWHmd7qKVkB6tHus8LQjk852
 eqwYmZ4kVr+eaCO6MttCCxJHf6datPnsbe0stiC9MpxmCzsdpZmFptfauidsFX+p
 0U1IHi/ABN6zIFoc4R0iXXbaDb8ErxGw32SWIb8cnnWwdlSD8I0+Jqxs4opp23LY
 lAm9E0vtDJ49bJBllYl0dUmizdhJ3+NefK4aqPh5H5h3Csub+MLIsuQv/+r2AOhH
 qLBi5kThpspPhGHZ40VDmfV825+csUPTc8WkDaNLvb4f4UGIPakK/KBrBtxiqn+P
 0etvYiWBuoBaqRVQpstawnyDdnBK0IMF/3LAULo+ozo7iTkpaZmOALYgPcBUYw2f
 d6pxZGeNN0GwWfjDmoUDGC07OpO/CSN5WouArgKsp5+VhjzPxjyaZLCnUhzHzXiM
 RV1oBytEs/iw2BLXX809noM9mqHYkdgSVmrZ9OvvDMslcLHaslpq6eaJKZSWqV2J
 fAws6rbcZdTFSnbAWr0OSxct6w6BijEjc3/uk+wWRtw9nkOhFqtlxI3y7k4odpW9
 wVcEmRNkxfA0LlYNXWuL
 =WNyE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable bugfixes:
   - nfs: don't create zero-length requests

   - several LAYOUTGET bugfixes

  Features:
   - several performance related features

   - more aggressive caching when we can rely on close-to-open
     cache consistency

   - remove serialisation of O_DIRECT reads and writes

   - optimise several code paths to not flush to disk unnecessarily.

     However allow for the idiosyncracies of pNFS for those layout
     types that need to issue a LAYOUTCOMMIT before the metadata can
     be updated on the server.

   - SUNRPC updates to the client data receive path

   - pNFS/SCSI support RH/Fedora dm-mpath device nodes

   - pNFS files/flexfiles can now use unprivileged ports when
     the generic NFS mount options allow it.

  Bugfixes:
   - Don't use RDMA direct data placement together with data
     integrity or privacy security flavours

   - Remove the RDMA ALLPHYSICAL memory registration mode as
     it has potential security holes.

   - Several layout recall fixes to improve NFSv4.1 protocol
     compliance.

   - Fix an Oops in the pNFS files and flexfiles connection
     setup to the DS

   - Allow retry of operations that used a returned delegation
      stateid

   - Don't mark the inode as revalidated if a LAYOUTCOMMIT is
     outstanding

   - Fix writeback races in nfs4_copy_range() and
     nfs42_proc_deallocate()"

* tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits)
  pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding
  NFSv4: Clean up lookup of SECINFO_NO_NAME
  NFSv4.2: Fix warning "variable ‘stateids’ set but not used"
  NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’"
  SUNRPC: Fix a compiler warning in fs/nfs/clnt.c
  pNFS: Remove redundant smp_mb() from pnfs_init_lseg()
  pNFS: Cleanup - do layout segment initialisation in one place
  pNFS: Remove redundant stateid invalidation
  pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()
  pNFS: Clear the layout metadata if the server changed the layout stateid
  pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()
  NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id
  pNFS: Do not set plh_return_seq for non-callback related layoutreturns
  pNFS: Ensure layoutreturn acts as a completion for layout callbacks
  pNFS: Fix CB_LAYOUTRECALL stateid verification
  pNFS: Always update the layout barrier seqid on LAYOUTGET
  pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set
  pNFS: Clear the layout return tracking on layout reinitialisation
  pNFS: LAYOUTRETURN should only update the stateid if the layout is valid
  nfs: don't create zero-length requests
  ...
2016-07-30 16:33:25 -07:00
Linus Torvalds c624c86615 This is mostly clean ups and small fixes. Some of the more visible
changes are:
 
  . The function pid code uses the event pid filtering logic
  . [ku]probe events have access to current->comm
  . trace_printk now has sample code
  . PCI devices now trace physical addresses
  . stack tracing has less unnessary functions traced
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXl+d2AAoJEKKk/i67LK/83QEH/RDJ0mcfFVsuEeOnZZrZXABm
 4Rxk4FE5UAD+TSrVycwwzcbQab1iPK63mMdYvIBvaOiIC6/OJaEVM7jzZxnNGqmr
 pj0H8bxwOr58pe5pfnP92ow5qTLLzsXraWNl5sRXhSSHON7CXpGVzkErB58GmMYd
 8p6d9ziifQjo8X2O6XC9rGAvYLY5kEkVvyfuE1hI7muNTeOjyOT4EqpkNzxdBk+I
 QkGZGsk3Xhc8II9nu8FPWkaD26TatGJoZtZmVWHOzfsb3HNzG4RXla+WVOQ5u1HV
 noVyB1CJHhkO5CEBPdYIqwBWPQU4B9HfG4gVcUpDDVRxfzMpnEcKi1uwe+uDjfs=
 =XFcv
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This is mostly clean ups and small fixes.  Some of the more visible
  changes are:

   - The function pid code uses the event pid filtering logic
   - [ku]probe events have access to current->comm
   - trace_printk now has sample code
   - PCI devices now trace physical addresses
   - stack tracing has less unnessary functions traced"

* tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  printk, tracing: Avoiding unneeded blank lines
  tracing: Use __get_str() when manipulating strings
  tracing, RAS: Cleanup on __get_str() usage
  tracing: Use outer () on __get_str() definition
  ftrace: Reduce size of function graph entries
  tracing: Have HIST_TRIGGERS select TRACING
  tracing: Using for_each_set_bit() to simplify trace_pid_write()
  ftrace: Move toplevel init out of ftrace_init_tracefs()
  tracing/function_graph: Fix filters for function_graph threshold
  tracing: Skip more functions when doing stack tracing of events
  tracing: Expose CPU physical addresses (resource values) for PCI devices
  tracing: Show the preempt count of when the event was called
  tracing: Add trace_printk sample code
  tracing: Choose static tp_printk buffer by explicit nesting count
  tracing: expose current->comm to [ku]probe events
  ftrace: Have set_ftrace_pid use the bitmap like events do
  tracing: Move pid_list write processing into its own function
  tracing: Move the pid_list seq_file functions to be global
  tracing: Move filtered_pid helper functions into trace.c
  tracing: Make the pid filtering helper functions global
2016-07-28 18:20:09 -07:00
Linus Torvalds 1c88e19b0f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "The rest of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits)
  mm, compaction: simplify contended compaction handling
  mm, compaction: introduce direct compaction priority
  mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  mm, page_alloc: make THP-specific decisions more generic
  mm, page_alloc: restructure direct compaction handling in slowpath
  mm, page_alloc: don't retry initial attempt in slowpath
  mm, page_alloc: set alloc_flags only once in slowpath
  lib/stackdepot.c: use __GFP_NOWARN for stack allocations
  mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
  mm, kasan: account for object redzone in SLUB's nearest_obj()
  mm: fix use-after-free if memory allocation failed in vma_adjust()
  zsmalloc: Delete an unnecessary check before the function call "iput"
  mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
  mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
  mm: optimize copy_page_to/from_iter_iovec
  mm: add cond_resched() to generic_swapfile_activate()
  Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
  mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
  mm: hwpoison: remove incorrect comments
  make __section_nr() more efficient
  ...
2016-07-28 16:36:48 -07:00
Mel Gorman 11fb998986 mm: move most file-based accounting to the node
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Linus Torvalds 6784725ab0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted cleanups and fixes.

  Probably the most interesting part long-term is ->d_init() - that will
  have a bunch of followups in (at least) ceph and lustre, but we'll
  need to sort the barrier-related rules before it can get used for
  really non-trivial stuff.

  Another fun thing is the merge of ->d_iput() callers (dentry_iput()
  and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
  except the one in __d_lookup_lru())"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  fs/dcache.c: avoid soft-lockup in dput()
  vfs: new d_init method
  vfs: Update lookup_dcache() comment
  bdev: get rid of ->bd_inodes
  Remove last traces of ->sync_page
  new helper: d_same_name()
  dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
  vfs: clean up documentation
  vfs: document ->d_real()
  vfs: merge .d_select_inode() into .d_real()
  unify dentry_iput() and dentry_unlink_inode()
  binfmt_misc: ->s_root is not going anywhere
  drop redundant ->owner initializations
  ufs: get rid of redundant checks
  orangefs: constify inode_operations
  missed comment updates from ->direct_IO() prototype change
  file_inode(f)->i_mapping is f->f_mapping
  trim fsnotify hooks a bit
  9p: new helper - v9fs_parent_fid()
  debugfs: ->d_parent is never NULL or negative
  ...
2016-07-28 12:59:05 -07:00
Linus Torvalds 554828ee0d Merge branch 'salted-string-hash'
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
  fs/dcache.c: Save one 32-bit multiply in dcache lookup
  vfs: make the string hashes salt the hash
2016-07-28 12:26:31 -07:00
Benjamin Coddington 944171cbf4 pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes,
and in that case NFS_INO_INVALID_ATTR is never set on the second pass
through nfs_update_inode().  The existing check to skip the clearing of
NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this
case (see commit 10b7e9ad44: "pNFS: Don't mark the inode as revalidated
if a LAYOUTCOMMIT is outstanding").  We know that if a LAYOUTCOMMIT is
outstanding then attributes will need upating, so always set
NFS_INO_INVALID_ATTR.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-28 14:49:08 -04:00
Linus Torvalds d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Trond Myklebust 698c937b0d NFSv4: Clean up lookup of SECINFO_NO_NAME
Use the minor version ops cached in struct nfs_client instead of looking
them up again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-26 10:59:23 -04:00
Trond Myklebust 6fdf339b0c NFSv4.2: Fix warning "variable ‘stateids’ set but not used"
Replace it with a test for whether or not the sent a stateid in violation
of what we asked for.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 17:36:06 -04:00
Trond Myklebust 139978239b NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’"
Make it static

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 17:35:56 -04:00
Trond Myklebust 1592c4d62a Merge branch 'nfs-rdma' 2016-07-24 17:09:02 -04:00
Trond Myklebust 668f455dac Merge branch 'pnfs' 2016-07-24 17:08:59 -04:00
Trond Myklebust 362745268c Merge branch 'writeback' 2016-07-24 17:08:31 -04:00
Trond Myklebust 01d7b29f0e pNFS: Remove redundant smp_mb() from pnfs_init_lseg()
It's not visible yet, and won't be until after we grab the inode->i_lock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:43 -04:00
Trond Myklebust 119cef97a4 pNFS: Cleanup - do layout segment initialisation in one place
...instead of splitting the initialisation over init_lseg() and
pnfs_layout_process().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:42 -04:00
Trond Myklebust 28c1acffea pNFS: Remove redundant stateid invalidation
The layout stateid will be invalidated once it holds no more layout
segments anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:42 -04:00
Trond Myklebust f71dfe8fc9 pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()
That's already being taken care of in pnfs_layout_remove_lseg().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:41 -04:00
Trond Myklebust d9b61708fe pNFS: Clear the layout metadata if the server changed the layout stateid
If the server changed the layout stateid's "other" field, then
we should treat the old layout as being completely gone. In that
case, we want to clear the metadata such as scheduled layoutreturns.

Do this by calling pnfs_mark_layout_stateid_invalid().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:41 -04:00
Trond Myklebust 5f46be049b pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()
Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid
invalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:40 -04:00
Trond Myklebust e036f46453 NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id
When determining which layout segments to return, we do want
pnfs_mark_matching_lsegs_return to check that they match the layout
sequence id. This ensures that we don't waste time if the server
is replaying a layout recall that has already been satisfied.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:40 -04:00
Trond Myklebust 2d6cf5ab0b pNFS: Do not set plh_return_seq for non-callback related layoutreturns
In cases where we need to send a layoutreturn in order to propagate
an error, we should not tie that to a specific layout stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:39 -04:00
Trond Myklebust e5fd1904b8 pNFS: Ensure layoutreturn acts as a completion for layout callbacks
When we return NFS_OK to the CB_LAYOUTRECALL, we are required to
send a layoutreturn that "completes" that layout recall request, using
the correct stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:39 -04:00
Trond Myklebust 793b7fe558 pNFS: Fix CB_LAYOUTRECALL stateid verification
We want to evaluate in this order:

If the client holds no layout for this inode, then return
NFS4ERR_NOMATCHING_LAYOUT; it probably forgot the layout.

If the client finds the inode among the list of layouts, but the corresponding
stateid has not yet been initialised, then return NFS4ERR_DELAY to ask the
server to retry once the outstanding LAYOUTGET is complete.

If the current layout stateid's "other" field does not match the recalled
stateid, return NFS4ERR_BAD_STATEID.

If already processing a layout recall with a newer stateid, return
NFS4ERR_OLD_STATEID. This can only happens for servers that are
non-compliant with the NFSv4.1 protocol.

If already processing a layout recall with an older stateid, return
NFS4ERR_DELAY to ask the server to retry once the outstanding
LAYOUTRETURN is complete. Again, this is technically incompliant with
the NFSv4.1 protocol.

If the current layout sequence id is newer than the recalled stateid's
sequence id, return NFS4ERR_OLD_STATEID. This too implies protocol
non-compliance.

If the current layout sequence id is older than the recalled stateid's
sequence id+1, return NFS4ERR_DELAY.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:38 -04:00
Trond Myklebust ecebb80bf3 pNFS: Always update the layout barrier seqid on LAYOUTGET
Currently, pnfs_set_layout_stateid() will update the layout sequence
id barrier only if the stateid itself is newer than the current
layout stateid. However in a situation where multiple LAYOUTGET calls
and a LAYOUTRETURN raced, it is entirely possible for one of the
LAYOUTGET to set the current stateid to something newer than the
LAYOUTRETURN that needs to set the barrier.

The fix is to allow the "update_barrier" flag to force a check as to
whether or not the barrier needs to be updated.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:38 -04:00
Trond Myklebust 13bede18de pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set
If the layout stateid is invalid, then pnfs_set_layout_stateid() must
always initialise it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:25 -04:00
Trond Myklebust 8e0acf9046 pNFS: Clear the layout return tracking on layout reinitialisation
Ensure that we don't carry over layoutreturn info from a previous
incarnation of this layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 12:51:49 -04:00
Trond Myklebust 45fcc7bca7 pNFS: LAYOUTRETURN should only update the stateid if the layout is valid
If the layout was completely returned, then ignore the returned layout
stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 12:51:49 -04:00
Trond Myklebust dc05973b28 Merge commit 'e7bdea7750eb'
Needed in order to work on top of pNFS changes in Linus' upstream kernel.
2016-07-24 12:51:10 -04:00
Benjamin Coddington 149a4fddd0 nfs: don't create zero-length requests
NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-22 15:15:16 -04:00
Artem Savkov 297fae4d0b Fix NULL pointer dereference in bl_free_device().
When bl_parse_deviceid() fails in bl_alloc_deviceid_node() on
blkdev_get_by_*() step we get an pnfs_block_dev struct that is
uninitialized except for bdev field which is set to whatever error
blkdev_get_by_*() returns.  bl_free_device() then tries to call
blkdev_put() if bdev is not 0 resulting in a wrong pointer dereference.

Fixing this by setting bdev in struct pnfs_block_dev only if we didn't
get an error from blkdev_get_by_*().

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-22 15:14:21 -04:00
Trond Myklebust e033fb51eb pNFS/files: filelayout_write_done_cb must call nfs_writeback_update_inode()
All write callbacks are required to call nfs_writeback_update_inode() upon
success to ensure that file size changes are recorded, and the attribute
cache is invalidated.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-21 09:46:42 -04:00
Al Viro beffb8feb6 qstr: constify instances in nfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-20 23:30:06 -04:00
Tigran Mkrtchyan b224f7cb63 nfs4: flexfiles: respect noresvport when establishing connections to DSes
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:25 -04:00
Tigran Mkrtchyan 3fc75f1208 nfs4: clnt: respect noresvport when establishing connections to DSes
result:

$ mount -o vers=4.1 dcache-lab007:/ /pnfs
$ cp /etc/profile /pnfs
tcp        0      0 131.169.185.68:1005     131.169.191.141:32049   ESTABLISHED
tcp        0      0 131.169.185.68:751      131.169.191.144:2049    ESTABLISHED
$

$ mount -o vers=4.1,noresvport dcache-lab007:/ /pnfs
$ cp /etc/profile /pnfs
tcp        0      0 131.169.185.68:34894    131.169.191.141:32049   ESTABLISHED
tcp        0      0 131.169.185.68:35722    131.169.191.144:2049    ESTABLISHED
$

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:25 -04:00
Benjamin Coddington d9c0ce0e45 pnfs/blocklayout: put deviceid node after releasing bl_ext_lock
The last put of deviceid nodes for SCSI layouts may sleep, so we shouldn't
hold any spinlocks.  Make sure we put them outside the bl_ext_lock.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:24 -04:00
Scott Mayhew ce52914eb7 sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags
A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.

This can be reproduced as follows:

1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server.  Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys

2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m

3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set.  Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

5. Now do some I/O to the sec=sys mount.  This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client.  Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect.  Grabbing a new kerberos ticket at this
point will have no effect either.

Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:24 -04:00
Steve Dickson e68fd7c807 mount: use sec= that was specified on the command line
When older servers return RPC_AUTH_NULL, it means the
rpc creds will be ignored. In that case use the sec=
that was specified instead of setting sec=null

Fixes Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1112983
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:23 -04:00
Trond Myklebust f7db0b2838 pNFS: Fix LAYOUTGET handling of NFS4ERR_BAD_STATEID and NFS4ERR_EXPIRED
We want to recover the open stateid if there is no layout stateid
and/or the stateid argument matches an open stateid.
Otherwise throw out the existing layout and recover from scratch, as
the layout stateid is bad.

Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:23:23 -04:00
Trond Myklebust 66b53f3258 pNFS: Handle NFS4ERR_RECALLCONFLICT correctly in LAYOUTGET
Instead of giving up altogether and falling back to doing I/O
through the MDS, which may make the situation worse, wait for
2 lease periods for the callback to resolve itself, and then
try destroying the existing layout.

Only if this was an attempt at getting a first layout, do we
give up altogether, as the server is clearly crazy.

Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:23:22 -04:00
Trond Myklebust e85d7ee420 pNFS: Separate handling of NFS4ERR_LAYOUTTRYLATER and RECALLCONFLICT
They are not the same error, and need to be handled differently.

Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:23:22 -04:00
Trond Myklebust 56b38a1f7c pNFS: Fix post-layoutget error handling in pnfs_update_layout()
The non-retry error path is currently broken and ends up releasing the
reference to the layout twice. It also can end up clearing the
NFS_LAYOUT_FIRST_LAYOUTGET flag twice, causing a race.

In addition, the retry path will fail to decrement the plh_outstanding
counter.

Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:22:47 -04:00
Trond Myklebust 10b7e9ad44 pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding
We know that the attributes will need updating if there is still a
LAYOUTCOMMIT outstanding.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-18 00:51:01 -04:00
Daniel Bristot de Oliveira 752d596b39 tracing: Use __get_str() when manipulating strings
Use __get_str(str) rather than __get_dynamic_array(str) when
deadling with strings.

It is just a code cleanup, no changes on tracepoint ABI.

Link: http://lkml.kernel.org/r/ea260df91817411cca2a1f3db2abd88860094788.1467407618.git.bristot@redhat.com

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-nfs@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-15 15:52:20 -04:00
Kinglong Mee c77efc1e78 nfs/blocklayout: Check max uuids and devices before decoding
Avoid nfs return uuids/devices larger than maximum.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Kinglong Mee ecc2b88c4a nfs/blocklayout: Make sure calculate signature length aligned
Avoid a bad nfs server return an unaligned length of signature.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Christoph Hellwig 11487ddbdb nfs/blocklayout: support RH/Fedora dm-mpath device nodes
Instead of reusing the wwn-* names for multipath devices nodes RHEL and
Fedora introduce new dm-mpath-uuid-* nodes with a slightly different
naming scheme.  Try these names first to ensure we always get a
multipath-capable device if it exists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig d702d41ed4 nfs/blocklayout: refactor open-by-wwn
The current code works with the standard udev/systemd names, but we'll have
to add another method in the next patch.  Refactor it into a separate helper
to make room for the new variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig 0173ca0544 nfs/blocklayout: use proper fmode for opening block devices
This was fixed for the original block layout code a while ago, but also
needs to be fixed for the SCSI layout path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Trond Myklebust 8b7d9d09b2 NFSv4: Revert "Truncating file opens should also sync O_DIRECT writes"
We're not holding any locks, so both nfs_wb_all() and inode_dio_wait()
are unenforcible and have livelock potential. Just limit ourselves to
flushing out the data.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-14 12:42:40 -04:00
Chuck Lever a4e187d83d NFS: Don't drop CB requests with invalid principals
Before commit 778be232a2 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.

Fixes: 778be232a2 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11 15:50:43 -04:00
Trond Myklebust 9a773e7c8d NFS nfs_vm_page_mkwrite: Don't freeze me, Bro...
Prevent filesystem freezes while handling the write page fault.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:08 -04:00
Trond Myklebust e95fc4a069 NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync
We want to ensure that we write the cached data to the server, but
don't require it be synced to disk. If the server reboots, we will
get a stateid error, which will cause us to retry anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:08 -04:00
Trond Myklebust 837bb1d752 NFSv4.2: Fix writeback races in nfs4_copy_file_range
We need to ensure that any writes to the destination file are serialised
with the copy, meaning that the writeback has to occur under the inode lock.

Also relax the writeback requirement on the source, and rely on the
stateid checking to tell us if the source rebooted. Add the helper
nfs_filemap_write_and_wait_range() to call pnfs_sync_inode() as
is appropriate for pNFS servers that may need a layoutcommit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:07 -04:00
Trond Myklebust 1e564d3dbd NFSv4.2: Fix a race in nfs42_proc_deallocate()
When punching holes in a file, we want to ensure the operation is
serialised w.r.t. other writes, meaning that we want to call
nfs_sync_inode() while holding the inode lock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:07 -04:00
Trond Myklebust 79566ef018 NFS: Getattr doesn't require data sync semantics
When retrieving stat() information, NFS unfortunately does require us to
sync writes to disk in order to ensure that mtime and ctime are up to
date. However we shouldn't have to ensure that those writes are persisted.

Relaxing that requirement does mean that we may see an mtime/ctime change
if the server reboots and forces us to replay all writes.

The exception to this rule are pNFS clients that are required to send
layoutcommit, however that is dealt with by the call to pnfs_sync_inode()
in _nfs_revalidate_inode().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:06 -04:00
Trond Myklebust 651b0e7029 NFS: Do not aggressively cache file attributes in the case of O_DIRECT
A file that is open for O_DIRECT is by definition not obeying
close-to-open cache consistency semantics, so let's not cache
the attributes too aggressively either.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:06 -04:00
Trond Myklebust be527494e0 NFS: Remove unused function nfs_revalidate_mapping_protected()
Clean up...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:05 -04:00
Trond Myklebust f508d46ae4 NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin()
We're now waiting immediately after taking the locks, so waiting
in fsync() and write_begin() is either redundant or potentially
subject to livelock (if not holding the lock).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:05 -04:00
Trond Myklebust f7b5c340ac NFS: Cleanup nfs_direct_complete()
There is only one caller that sets the "write" argument to true,
so just move the call to nfs_zap_mapping() and get rid of the
now redundant argument.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:04 -04:00
Trond Myklebust a5864c999d NFS: Do not serialise O_DIRECT reads and writes
Allow dio requests to be scheduled in parallel, but ensuring that they
do not conflict with buffered I/O.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:04 -04:00
Trond Myklebust 18290650b1 NFS: Move buffered I/O locking into nfs_file_write()
Preparation for the patch that de-serialises O_DIRECT reads and
writes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:03 -04:00
Trond Myklebust 89698b24d2 NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:03 -04:00
Trond Myklebust 2f3c7d87a3 NFS: Remove racy size manipulations in O_DIRECT
On success, the RPC callbacks will ensure that we make the appropriate calls
to nfs_writeback_update_inode()

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:02 -04:00
Trond Myklebust a5314a7492 NFS: Ensure we reset the write verifier 'committed' value on resend.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:02 -04:00
Trond Myklebust 8fc3c38627 NFS: Fix O_DIRECT verifier problems
We should not be interested in looking at the value of the stable field,
since that could take any value.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:01 -04:00
Trond Myklebust 6712007734 pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1
Cleanup...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:01 -04:00
Trond Myklebust ac46bd374c pNFS: Ensure we layoutcommit before revalidating attributes
If we need to update the cached attributes, then we'd better make
sure that we also layoutcommit first. Otherwise, the server may have stale
attributes.

Prior to this patch, the revalidation code tried to "fix" this problem by
simply disabling attributes that would be affected by the layoutcommit.
That approach breaks nfs_writeback_check_extend(), leading to a file size
corruption.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:08:27 -04:00
Trond Myklebust 2e18d4d822 pNFS: Files and flexfiles always need to commit before layoutcommit
So ensure that we mark the layout for commit once the write is done,
and then ensure that the commit to ds is finished before sending
layoutcommit.

Note that by doing this, we're able to optimise away the commit
for the case of servers that don't need layoutcommit in order to
return updated attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:08:01 -04:00
Trond Myklebust bc28e1c2e3 pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit()
Let's just have one place where we check ff_layout_need_layoutcommit().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 18:52:26 -04:00
Trond Myklebust c001c87a63 pNFS/flexfiles: Fix layoutcommit after a commit to DS
We should always do a layoutcommit after commit to DS, except if
the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT.

Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 18:52:26 -04:00
Trond Myklebust 73e6c5d854 pNFS/files: Fix layoutcommit after a commit to DS
According to the errata
https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
we should always send layout commit after a commit to DS.

Fixes: bc7d4b8fd0 ("nfs/filelayout: set layoutcommit...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 18:52:25 -04:00
Al Viro c94c09535c nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-05 16:02:31 -04:00
Al Viro 00699ad857 Use the right predicate in ->atomic_open() instances
->atomic_open() can be given an in-lookup dentry *or* a negative one
found in dcache.  Use d_in_lookup() to tell one from another, rather
than d_unhashed().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-05 16:02:23 -04:00
Trond Myklebust 8487c479e2 NFSv4: Allow retry of operations that used a returned delegation stateid
Fix up nfs4_do_handle_exception() so that it can check if the operation
that received the NFS4ERR_BAD_STATEID was using a defunct delegation.
Apply that to the case of SETATTR, which will currently return EIO
in some cases where this happens.

Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-30 15:29:57 -04:00