Commit graph

447 commits

Author SHA1 Message Date
Trond Myklebust c74dfe97c1 NFS: Add mount option 'softreval'
Add a mount option 'softreval' that allows attribute revalidation 'getattr'
calls to time out, and causes them to fall back to using the cached
attributes.
The use case for this option is for ensuring that we can still (slowly)
traverse paths and use cached information even when the server is down.
Once the server comes back up again, the getattr calls start succeeding,
and the caches will revalidate as usual.

The 'softreval' mount option is automatically enabled if you have
specified 'softerr'.  It can be turned off using the options
'nosoftreval', or 'hard'.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Scott Mayhew ce8866f091 NFS: Attach supplementary error information to fs_context.
Split out from commit "NFS: Add fs_context support."

Add wrappers nfs_errorf(), nfs_invalf(), and nfs_warnf() which log error
information to the fs_context.  Convert some printk's to use these new
wrappers instead.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Scott Mayhew 62a55d088c NFS: Additional refactoring for fs_context conversion
Split out from commit "NFS: Add fs_context support."

This patch adds additional refactoring for the conversion of NFS to use
fs_context, namely:

 (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
     nfs_clone_mount has had several fields removed, and nfs_mount_info
     has been removed altogether.
 (*) Various functions now take an fs_context as an argument instead
     of nfs_mount_info, nfs_fs_context, etc.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells f2aedb713c NFS: Add fs_context support.
Add filesystem context support to NFS, parsing the options in advance and
attaching the information to struct nfs_fs_context.  The highlights are:

 (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.  This
     structure represents NFS's superblock config.

 (*) Make use of the VFS's parsing support to split comma-separated lists

 (*) Pin the NFS protocol module in the nfs_fs_context.

 (*) Attach supplementary error information to fs_context.  This has the
     downside that these strings must be static and can't be formatted.

 (*) Remove the auxiliary file_system_type structs since the information
     necessary can be conveyed in the nfs_fs_context struct instead.

 (*) Root mounts are made by duplicating the config for the requested mount
     so as to have the same parameters.  Submounts pick up their parameters
     from the parent superblock.

[AV -- retrans is u32, not string]
[SM -- Renamed cfg to ctx in a few functions in an earlier patch]
[SM -- Moved fs_context mount option parsing to an earlier patch]
[SM -- Moved fs_context error logging to a later patch]
[SM -- Fixed printks in nfs4_try_get_tree() and nfs4_get_referral_tree()]
[SM -- Added is_remount_fc() helper]
[SM -- Deferred some refactoring to a later patch]
[SM -- Fixed referral mounts, which were broken in the original patch]
[SM -- Fixed leak of nfs_fattr when fs_context is freed]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Scott Mayhew 38465f5d1a NFS: rename nfs_fs_context pointer arg in a few functions
Split out from commit "NFS: Add fs_context support."

Rename cfg to ctx in nfs_init_server(), nfs_verify_authflavors(),
and nfs_request_mount().  No functional changes.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells e558100fda NFS: Do some tidying of the parsing code
Do some tidying of the parsing code, including:

 (*) Returning 0/error rather than true/false.

 (*) Putting the nfs_fs_context pointer first in some arg lists.

 (*) Unwrap some lines that will now fit on one line.

 (*) Provide unioned sockaddr/sockaddr_storage fields to avoid casts.

 (*) nfs_parse_devname() can paste its return values directly into the
     nfs_fs_context struct as that's where the caller puts them.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells 5eb005caf5 NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context
Rename struct nfs_parsed_mount_data to struct nfs_fs_context and rename
pointers to it to "ctx".  At some point this will be pointed to by an
fs_context struct's fs_private pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells 9954bf92c0 NFS: Move mount parameterisation bits into their own file
Split various bits relating to mount parameterisation out from
fs/nfs/super.c into their own file to form the basis of filesystem context
handling for NFS.

No other changes are made to the code beyond removing 'static' qualifiers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Al Viro adf2314fe6 nfs: get rid of ->set_security()
it's always either nfs_set_sb_security() or nfs_clone_sb_security(),
the choice being controlled by mount_info->cloned != NULL.  No need
to add methods, especially when both instances live right next to
the caller and are never accessed anywhere else.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro ba8b614806 nfs_clone_sb_security(): simplify the check for server bogosity
We used to check ->i_op for being nfs_dir_inode_operations.  With
separate inode_operations for v3 and v4 that became bogus, but
rather than going for protocol-dependent comparison we could've
just checked ->i_fop instead; _that_ is the same for all protocol
versions.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro ab88dca311 nfs: get rid of mount_info ->fill_super()
The only possible values are nfs_fill_super and nfs_clone_super.  The
latter is used only when crossing into a submount and it is almost
identical to the former; the only differences are
	* ->s_time_gran unconditionally set to 1 (even for v2 mounts).
Regression dating back to 2012, actually.
	* ->s_blocksize/->s_blocksize_bits set to that of parent.

Rather than messing with the method, stash ->s_blocksize_bits in
mount_info in submount case and after the (now unconditional)
call of nfs_fill_super() override ->s_blocksize/->s_blocksize_bits
if that has been set.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro 0c38f2131d nfs: don't pass nfs_subversion to ->create_server()
pick it from mount_info

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro 1bc3a2cbf2 nfs: unexport nfs_fs_mount_common()
Make it static, even.  And remove a stale extern of (long-gone)
nfs_xdev_mount_common() from internal.h, while we are at it.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro 82eaed2bee nfs: merge xdev and remote file_system_type
they are identical now...

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro a55d3297be nfs: don't bother passing nfs_subversion to ->try_mount() and nfs_fs_mount_common()
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro 6a3f7a399e nfs: stash nfs_subversion reference into nfs_mount_info
That will allow to get rid of passing those references around in
quite a few places.  Moreover, that will allow to merge xdev and
remote file_system_type.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro 250d69f6a4 nfs: lift setting mount_info from nfs_xdev_mount()
Do it in nfs_do_submount() instead.  As a side benefit, nfs_clone_data
doesn't need ->fh and ->fattr anymore.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro d0b779d47c nfs: stash server into struct nfs_mount_info
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro 444a52960c saner calling conventions for nfs_fs_mount_common()
Allow it to take ERR_PTR() for server and return ERR_CAST() of it in
such case.  All callers used to open-code that...

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
YueHaibing 9c91fa36b6 NFS: remove unneeded semicolon
remove unneeded semicolon.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:37:58 +01:00
Linus Torvalds 972a2bf7df NFS Client Updates for Linux 5.3
Stable bugfixes:
 - Dequeue the request from the receive queue while we're re-encoding # v4.20+
 - Fix buffer handling of GSS MIC without slack # 5.1
 
 Features:
 - Increase xprtrdma maximum transport header and slot table sizes
 - Add support for nfs4_call_sync() calls using a custom rpc_task_struct
 - Optimize the default readahead size
 - Enable pNFS filelayout LAYOUTGET on OPEN
 
 Other bugfixes and cleanups:
 - Fix possible null-pointer dereferences and memory leaks
 - Various NFS over RDMA cleanups
 - Various NFS over RDMA comment updates
 - Don't receive TCP data into a reset request buffer
 - Don't try to parse incomplete RPC messages
 - Fix congestion window race with disconnect
 - Clean up pNFS return-on-close error handling
 - Fixes for NFS4ERR_OLD_STATEID handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl2NC04ACgkQ18tUv7Cl
 QOs4Tg//bAlGs+dIKixAmeMKmTd6I34laUnuyV/12yPQDgo6bryLrTngfe2BYvmG
 2l+8H7yHfR4/gQE4vhR0c15xFgu6pvjBGR0/nNRaXienIPXO4xsQkcaxVA7SFRY2
 HjffZwyoBfjyRps0jL+2sTsKbRtSkf9Dn+BONRgesg51jK1jyWkXqXpmgi4uMO4i
 ojpTrW81dwo7Yhv08U2A/Q1ifMJ8F9dVYuL5sm+fEbVI/Nxoz766qyB8rs8+b4Xj
 3gkfyh/Y1zoMmu6c+r2Q67rhj9WYbDKpa6HH9yX1zM/RLTiU7czMX+kjuQuOHWxY
 YiEk73NjJ48WJEep3odess1q/6WiAXX7UiJM1SnDFgAa9NZMdfhqMm6XduNO1m60
 sy0i8AdxdQciWYexOXMsBuDUCzlcoj4WYs1QGpY3uqO1MznQS/QUfu65fx8CzaT5
 snm6ki5ivqXH/js/0Z4MX2n/sd1PGJ5ynMkekxJ8G3gw+GC/oeSeGNawfedifLKK
 OdzyDdeiel5Me1p4I28j1WYVLHvtFmEWEU9oytdG0D/rjC/pgYgW/NYvAao8lQ4Z
 06wdcyAM66ViAPrbYeE7Bx4jy8zYRkiw6Y3kIbLgrlMugu3BhIW5Mi3BsgL4f4am
 KsqkzUqPZMCOVwDuUILSuPp4uHaR+JTJttywiLniTL6reF5kTiA=
 =4Ey6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Dequeue the request from the receive queue while we're re-encoding
     # v4.20+
   - Fix buffer handling of GSS MIC without slack # 5.1

  Features:
   - Increase xprtrdma maximum transport header and slot table sizes
   - Add support for nfs4_call_sync() calls using a custom
     rpc_task_struct
   - Optimize the default readahead size
   - Enable pNFS filelayout LAYOUTGET on OPEN

  Other bugfixes and cleanups:
   - Fix possible null-pointer dereferences and memory leaks
   - Various NFS over RDMA cleanups
   - Various NFS over RDMA comment updates
   - Don't receive TCP data into a reset request buffer
   - Don't try to parse incomplete RPC messages
   - Fix congestion window race with disconnect
   - Clean up pNFS return-on-close error handling
   - Fixes for NFS4ERR_OLD_STATEID handling"

* tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
  pNFS/filelayout: enable LAYOUTGET on OPEN
  NFS: Optimise the default readahead size
  NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
  NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
  NFSv4: Fix OPEN_DOWNGRADE error handling
  pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
  NFSv4: Add a helper to increment stateid seqids
  NFSv4: Handle RPC level errors in LAYOUTRETURN
  NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
  NFSv4: Clean up pNFS return-on-close error handling
  pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
  NFS: remove unused check for negative dentry
  NFSv3: use nfs_add_or_obtain() to create and reference inodes
  NFS: Refactor nfs_instantiate() for dentry referencing callers
  SUNRPC: Fix congestion window race with disconnect
  SUNRPC: Don't try to parse incomplete RPC messages
  SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
  SUNRPC: Fix buffer handling of GSS MIC without slack
  SUNRPC: RPC level errors should always set task->tk_rpc_status
  SUNRPC: Don't receive TCP data into a request buffer that has been reset
  ...
2019-09-26 12:20:14 -07:00
Trond Myklebust c128e57551 NFS: Optimise the default readahead size
In the years since the max readahead size was fixed in NFS, a number of
things have happened:
- Users can now set the value directly using /sys/class/bdi
- NFS max supported block sizes have increased by several orders of
  magnitude from 64K to 1MB.
- Disk access latencies are orders of magnitude faster due to SSD + NVME.

In particular note that if the server is advertising 1MB as the optimal
read size, as that will set the readahead size to 15MB.
Let's therefore adjust down, and try to default to VM_READAHEAD_PAGES.
However let's inform the VM about our preferred block size so that it
can choose to round up in cases where that makes sense.

Reported-by: Alkis Georgopoulos <alkisg@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-24 15:58:20 -04:00
Deepa Dinamani 1fcb79c1b2 fs: nfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

The time formats for various verious is detailed in the
RFCs as below:

https://tools.ietf.org/html/rfc7862(time metadata)
https://tools.ietf.org/html/rfc7530:

nfstime4

   struct nfstime4 {
           int64_t         seconds;
           uint32_t        nseconds;
   };

https://tools.ietf.org/html/rfc1094

          struct timeval {
              unsigned int seconds;
              unsigned int useconds;
          };

https://tools.ietf.org/html/rfc1813

struct nfstime3 {
         uint32   seconds;
         uint32   nseconds;
      };

Use the limits as per the RFC.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: trond.myklebust@hammerspace.com
Cc: anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
2019-08-30 07:27:18 -07:00
Trond Myklebust dea1bb35c5 NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
People are reporing seeing fscache errors being reported concerning
duplicate cookies even in cases where they are not setting up fscache
at all. The rule needs to be that if fscache is not enabled, then it
should have no side effects at all.

To ensure this is the case, we disable fscache completely on all superblocks
for which the 'fsc' mount option was not set. In order to avoid issues
with '-oremount', we also disable the ability to turn fscache on via
remount.

Fixes: f1fe29b4a0 ("NFS: Use i_writecount to control whether...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Steve Dickson <steved@redhat.com>
Cc: David Howells <dhowells@redhat.com>
2019-08-04 22:35:41 -04:00
Linus Torvalds 18253e034d Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache and mountpoint updates from Al Viro:
 "Saner handling of refcounts to mountpoints.

  Transfer the counting reference from struct mount ->mnt_mountpoint
  over to struct mountpoint ->m_dentry. That allows us to get rid of the
  convoluted games with ordering of mount shutdowns.

  The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
  mixed-filesystem shrink lists, which we'll also need for the Slab
  Movable Objects patchset"

* 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch the remnants of releasing the mountpoint away from fs_pin
  get rid of detach_mnt()
  make struct mountpoint bear the dentry reference to mountpoint, not struct mount
  Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
  fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
  __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
  nfs: dget_parent() never returns NULL
  ceph: don't open-code the check for dead lockref
2019-07-20 09:15:51 -07:00
Markus Elfring 1c316e39a0 NFS: Replace 16 seq_printf() calls by seq_puts()
Some strings should be put into a sequence.
Thus use the corresponding function “seq_puts”.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13 11:47:49 -04:00
Markus Elfring 9bcaa35c68 NFS: Use seq_putc() in nfs_show_stats()
A single character (line break) should be put into a sequence.
Thus use the corresponding function “seq_putc”.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13 11:47:49 -04:00
Dave Wysochanski 1a7441b282 NFSv4: Add lease_time and lease_expired to 'nfs4:' line of mountstats
On the NFS client there is no low-impact way to determine the nfs4
lease time or whether the lease is expired, so add these to mountstats
with times displayed in seconds.

If the lease is not expired, display lease_expired=0. Otherwise,
display lease_expired=seconds_since_expired, similar to 'age:' line
in mountstats.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06 14:54:52 -04:00
Trond Myklebust fd87c8b73a NFS: Display the "nconnect" mount option if it is set.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2019-07-06 14:54:50 -04:00
Trond Myklebust 28cc5cd8c6 NFS: Add a mount option to specify number of TCP connections to use
Allow the user to specify that the client should use multiple connections
to the server. For the moment, this functionality will be limited to
TCP and to NFSv4.x (x>0).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2019-07-06 14:54:50 -04:00
Al Viro 1cfb7072c1 nfs: dget_parent() never returns NULL
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-04 18:58:37 -04:00
Thomas Gleixner 457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Linus Torvalds 06cbd26d31 NFS client updates for Linux 5.2
Stable bugfixes:
 - Fall back to MDS if no deviceid is found rather than aborting   # v4.11+
 - NFS4: Fix v4.0 client state corruption when mount
 
 Features:
 - Much improved handling of soft mounts with NFS v4.0
   - Reduce risk of false positive timeouts
   - Faster failover of reads and writes after a timeout
   - Added a "softerr" mount option to return ETIMEDOUT instead of
     EIO to the application after a timeout
 - Increase number of xprtrdma backchannel requests
 - Add additional xprtrdma tracepoints
 - Improved send completion batching for xprtrdma
 
 Other bugfixes and cleanups:
 - Return -EINVAL when NFS v4.2 is passed an invalid dedup mode
 - Reduce usage of GFP_ATOMIC pages in SUNRPC
 - Various minor NFS over RDMA cleanups and bugfixes
 - Use the correct container namespace for upcalls
 - Don't share superblocks between user namespaces
 - Various other container fixes
 - Make nfs_match_client() killable to prevent soft lockups
 - Don't mark all open state for recovery when handling recallable state revoked flag
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlzUjdcACgkQ18tUv7Cl
 QOsUiw/+OirzlZI7XeHfpZ/CwS7A+tSk3AAg9PDS1gjbfylER0g++GpA08tXnmDt
 JdUnBKYC5ujLyAqxN1j7QK+EvmXZQro8rucJxhEdPJMIQDC65fQQnmW7efl2bAEv
 CAWNDCf9Xe4g6X8LSR5jrnaMV4kuOQBYX4wqrrmaV8I+g/A/GKXW262KWnAv+w1M
 Y1ZlX+d1Gm8hODXhvqz4lldW6bkyrpWpU9BKUtYSYnSR0x1fam6PLPuCTm74fEDR
 N/Tgy5XvJi4xgti4SOZ/dI2O/Oqu6ut81PEPlhs8sTX04G8bLhr+hl3rSksCZFlu
 Afz9Hcnxg6XYB3Va7j7AO67H5SbyX4Zyj5cRMipXQE7Ebc1iXo5lu3vdhAEOAtNx
 fdNJlqD86MC/XWbtM+DfWlD+KjtpZ+lkxN+xuMgC/kVaPTeFI7nEWM796hJP/4no
 EYtnSLbSpJyH6F7wH9IL5V2EJYFxbzTvnPSTxV+QNZ0HgF17gTY0AGmQBzDE5bF0
 tfQteOG6MYXMHg64pTEzjlowlXOWdnE5TnuaFpt64/yP+hVznZMepBMSkxZO1xYt
 jc1wQlJkv/SyVH7cMGsj5lw3A6zwTrLManDUUmrLjIsVVmh4dk8WKlNtWQmvf1v6
 nFBklUa2GzH8LWKRT2ftNGcUeEiCuw/QF9oE5T/V7/7SQ/wmmvA=
 =skb2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - Fall back to MDS if no deviceid is found rather than aborting   # v4.11+
   - NFS4: Fix v4.0 client state corruption when mount

  Features:
   - Much improved handling of soft mounts with NFS v4.0:
       - Reduce risk of false positive timeouts
       - Faster failover of reads and writes after a timeout
       - Added a "softerr" mount option to return ETIMEDOUT instead of
         EIO to the application after a timeout
   - Increase number of xprtrdma backchannel requests
   - Add additional xprtrdma tracepoints
   - Improved send completion batching for xprtrdma

  Other bugfixes and cleanups:
   - Return -EINVAL when NFS v4.2 is passed an invalid dedup mode
   - Reduce usage of GFP_ATOMIC pages in SUNRPC
   - Various minor NFS over RDMA cleanups and bugfixes
   - Use the correct container namespace for upcalls
   - Don't share superblocks between user namespaces
   - Various other container fixes
   - Make nfs_match_client() killable to prevent soft lockups
   - Don't mark all open state for recovery when handling recallable
     state revoked flag"

* tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (69 commits)
  SUNRPC: Rebalance a kref in auth_gss.c
  NFS: Fix a double unlock from nfs_match,get_client
  nfs: pass the correct prototype to read_cache_page
  NFSv4: don't mark all open state for recovery when handling recallable state revoked flag
  SUNRPC: Fix an error code in gss_alloc_msg()
  SUNRPC: task should be exit if encode return EKEYEXPIRED more times
  NFS4: Fix v4.0 client state corruption when mount
  PNFS fallback to MDS if no deviceid found
  NFS: make nfs_match_client killable
  lockd: Store the lockd client credential in struct nlm_host
  NFS: When mounting, don't share filesystems between different user namespaces
  NFS: Convert NFSv2 to use the container user namespace
  NFSv4: Convert the NFS client idmapper to use the container user namespace
  NFS: Convert NFSv3 to use the container user namespace
  SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall
  SUNRPC: Use the client user namespace when encoding creds
  NFS: Store the credential of the mount process in the nfs_server
  SUNRPC: Cache cred of process creating the rpc_client
  xprtrdma: Remove stale comment
  xprtrdma: Update comments that reference ib_drain_qp
  ...
2019-05-09 14:33:15 -07:00
Linus Torvalds 168e153d5e Merge branch 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode freeing updates from Al Viro:
 "Introduction of separate method for RCU-delayed part of
  ->destroy_inode() (if any).

  Pretty much as posted, except that destroy_inode() stashes
  ->free_inode into the victim (anon-unioned with ->i_fops) before
  scheduling i_callback() and the last two patches (sockfs conversion
  and folding struct socket_wq into struct socket) are excluded - that
  pair should go through netdev once davem reopens his tree"

* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
  orangefs: make use of ->free_inode()
  shmem: make use of ->free_inode()
  hugetlb: make use of ->free_inode()
  overlayfs: make use of ->free_inode()
  jfs: switch to ->free_inode()
  fuse: switch to ->free_inode()
  ext4: make use of ->free_inode()
  ecryptfs: make use of ->free_inode()
  ceph: use ->free_inode()
  btrfs: use ->free_inode()
  afs: switch to use of ->free_inode()
  dax: make use of ->free_inode()
  ntfs: switch to ->free_inode()
  securityfs: switch to ->free_inode()
  apparmor: switch to ->free_inode()
  rpcpipe: switch to ->free_inode()
  bpf: switch to ->free_inode()
  mqueue: switch to ->free_inode()
  ufs: switch to ->free_inode()
  coda: switch to ->free_inode()
  ...
2019-05-07 10:57:05 -07:00
Al Viro ca1a199e3b nfs{,4}: switch to ->free_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:25 -04:00
Trond Myklebust 3b7eb5e35d NFS: When mounting, don't share filesystems between different user namespaces
If two different containers that share the same network namespace attempt
to mount the same filesystem, we should not allow them to share the same
super block if they do not share the same user namespace, since the
user mappings on the wire will need to differ.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-26 17:39:42 -04:00
Trond Myklebust 91a575e1a9 NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT errors
Add a mount option that exposes the ETIMEDOUT errors that occur during
soft timeouts to the application. This allows aware applications to
distinguish between server disk IO errors and client timeout errors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25 14:18:14 -04:00
Tetsuo Handa 7c2bd9a398 NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
(which is embedded into user-visible "struct nfs_mount_data" structure)
despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
bytes of AF_INET6 address to rpc_sockaddr2uaddr().

Since "struct nfs_mount_data" structure is user-visible, we can't change
"struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
assuming that everybody is using AF_INET family when passing address via
"struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.

[1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c

Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-04-11 15:23:48 -04:00
Eric W. Biederman 40cc394be1 fs/nfs: Fix nfs_parse_devname to not modify it's argument
In the rare and unsupported case of a hostname list nfs_parse_devname
will modify dev_name.  There is no need to modify dev_name as the all
that is being computed is the length of the hostname, so the computed
length can just be shorted.

Fixes: dc04589827 ("NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 17:33:55 -05:00
Yao Liu 80ff001724 nfs: Fix NULL pointer dereference of dev_name
There is a NULL pointer dereference of dev_name in nfs_parse_devname()

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  ...
  RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs]
  ...
  Call Trace:
   ? ida_alloc_range+0x34b/0x3d0
   ? nfs_clone_super+0x80/0x80 [nfs]
   ? nfs_free_parsed_mount_data+0x60/0x60 [nfs]
   mount_fs+0x52/0x170
   ? __init_waitqueue_head+0x3b/0x50
   vfs_kern_mount+0x6b/0x170
   do_mount+0x216/0xdc0
   ksys_mount+0x83/0xd0
   __x64_sys_mount+0x25/0x30
   do_syscall_64+0x65/0x220
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix this by adding a NULL check on dev_name

Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-28 12:08:30 -05:00
Linus Torvalds 505b050fdf Merge branch 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount API prep from Al Viro:
 "Mount API prereqs.

  Mostly that's LSM mount options cleanups. There are several minor
  fixes in there, but nothing earth-shattering (leaks on failure exits,
  mostly)"

* 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
  mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT
  smack: rewrite smack_sb_eat_lsm_opts()
  smack: get rid of match_token()
  smack: take the guts of smack_parse_opts_str() into a new helper
  LSM: new method: ->sb_add_mnt_opt()
  selinux: rewrite selinux_sb_eat_lsm_opts()
  selinux: regularize Opt_... names a bit
  selinux: switch away from match_token()
  selinux: new helper - selinux_add_opt()
  LSM: bury struct security_mnt_opts
  smack: switch to private smack_mnt_opts
  selinux: switch to private struct selinux_mnt_opts
  LSM: hide struct security_mnt_opts from any generic code
  selinux: kill selinux_sb_get_mnt_opts()
  LSM: turn sb_eat_lsm_opts() into a method
  nfs_remount(): don't leak, don't ignore LSM options quietly
  btrfs: sanitize security_mnt_opts use
  selinux; don't open-code a loop in sb_finish_set_opts()
  LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()
  new helper: security_sb_eat_lsm_opts()
  ...
2019-01-05 13:25:58 -08:00
Chuck Lever 0dfbb5f05e NFS: Make "port=" mount option optional for RDMA mounts
Having to specify "proto=rdma,port=20049" is cumbersome.

RFC 8267 Section 6.3 requires NFSv4 clients to use "the alternative
well-known port number", which is 20049. Make the use of the well-
known port number automatic, just as it is for NFS/TCP and port
2049.

For NFSv2/3, Section 4.2 allows clients to simply choose 20049 as
the default or use rpcbind. I don't know of an NFS/RDMA server
implementation that registers it's NFS/RDMA service with rpcbind,
so automatically choosing 20049 seems like the better choice. The
other widely-deployed NFS/RDMA client, Solaris, also uses 20049
as the default port.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-02 12:05:17 -05:00
Chris Perl 594d1644cd NFS: nfs_compare_mount_options always compare auth flavors.
This patch removes the check from nfs_compare_mount_options to see if a
`sec' option was passed for the current mount before comparing auth
flavors and instead just always compares auth flavors.

Consider the following scenario:

You have a server with the address 192.168.1.1 and two exports /export/a
and /export/b.  The first export supports `sys' and `krb5' security, the
second just `sys'.

Assume you start with no mounts from the server.

The following results in EIOs being returned as the kernel nfs client
incorrectly thinks it can share the underlying `struct nfs_server's:

$ mkdir /tmp/{a,b}
$ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a
$ sudo mount -t nfs -o vers=3          192.168.1.1:/export/b /tmp/b
$ df >/dev/null
df: ‘/tmp/b’: Input/output error

Signed-off-by: Chris Perl <cperl@janestreet.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-21 13:38:45 -05:00
Al Viro 757cbe597f LSM: new method: ->sb_add_mnt_opt()
Adding options to growing mnt_opts.  NFS kludge with passing
context= down into non-text-options mount switched to it, and
with that the last use of ->sb_parse_opts_str() is gone.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:50:02 -05:00
Al Viro 204cc0ccf1 LSM: hide struct security_mnt_opts from any generic code
Keep void * instead, allocate on demand (in parse_str_opts, at the
moment).  Eventually both selinux and smack will be better off
with private structures with several strings in those, rather than
this "counter and two pointers to dynamically allocated arrays"
ugliness.  This commit allows to do that at leisure, without
disrupting anything outside of given module.

Changes:
	* instead of struct security_mnt_opt use an opaque pointer
initialized to NULL.
	* security_sb_eat_lsm_opts(), security_sb_parse_opts_str() and
security_free_mnt_opts() take it as var argument (i.e. as void **);
call sites are unchanged.
	* security_sb_set_mnt_opts() and security_sb_remount() take
it by value (i.e. as void *).
	* new method: ->sb_free_mnt_opts().  Takes void *, does
whatever freeing that needs to be done.
	* ->sb_set_mnt_opts() and ->sb_remount() might get NULL as
mnt_opts argument, meaning "empty".

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:48:34 -05:00
Al Viro 6a0440e5b7 nfs_remount(): don't leak, don't ignore LSM options quietly
* if mount(2) passes something like "context=foo" with MS_REMOUNT
in flags (/sbin/mount.nfs will _not_ do that - you need to issue
the syscall manually), you'll get leaked copies for LSM options.
The reason is that instead of nfs_{alloc,free}_parsed_mount_data()
nfs_remount() uses kzalloc/kfree, which lacks the needed cleanup.

* selinux options are not changed on remount (as for any other
fs), but in case of NFS the failure is quiet - they are not compared
to what we used to have, with complaint in case of attempted changes.
Trivially fixed by converting to use of security_sb_remount().

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:47:19 -05:00
Al Viro f5c0c26d90 new helper: security_sb_eat_lsm_opts()
combination of alloc_secdata(), security_sb_copy_data(),
security_sb_parse_opt_str() and free_secdata().

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:46:00 -05:00
Dan Carpenter 379ebf0796 NFS: silence a harmless uninitialized variable warning
kstrtoul() can return -ERANGE so Smatch complains that "num" can be
uninitialized.  We check that it's within bounds so it's not a huge
deal.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:40 -04:00
Dave Wysochanski 016583d703 sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt clones
The existing rpc_print_iostats has a few shortcomings.  First, the naming
is not consistent with other functions in the kernel that display stats.
Second, it is really displaying stats for an rpc_clnt structure as it
displays both xprt stats and per-op stats.  Third, it does not handle
rpc_clnt clones, which is important for the one in-kernel tree caller
of this function, the NFS client's nfs_show_stats function.

Fix all of the above by renaming the rpc_print_iostats to
rpc_clnt_show_stats and looping through any rpc_clnt clones via
cl_parent.

Once this interface is fixed, this addresses a problem with NFSv4.
Before this patch, the /proc/self/mountstats always showed incorrect
counts for NFSv4 lease and session related opcodes such as SEQUENCE,
RENEW, SETCLIENTID, CREATE_SESSION, etc.  These counts were always 0
even though many ops would go over the wire.  The reason for this is
there are multiple rpc_clnt structures allocated for any given NFSv4
mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
which only handled one of them, nfs_server->client.  Fix these counts
by calling sunrpc's new rpc_clnt_show_stats() function, which handles
cloned rpc_clnt structs and prints the stats together.

Note that one side-effect of the above is that multiple mounts from
the same NFS server will show identical counts in the above ops due
to the fact the one rpc_clnt (representing the NFSv4 client state)
is shared across mounts.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:35 -04:00
Eric W. Biederman 95dd77580c fs: Teach path_connected to handle nfs filesystems with multiple roots.
On nfsv2 and nfsv3 the nfs server can export subsets of the same
filesystem and report the same filesystem identifier, so that the nfs
client can know they are the same filesystem.  The subsets can be from
disjoint directory trees.  The nfsv2 and nfsv3 filesystems provides no
way to find the common root of all directory trees exported form the
server with the same filesystem identifier.

The practical result is that in struct super s_root for nfs s_root is
not necessarily the root of the filesystem.  The nfs mount code sets
s_root to the root of the first subset of the nfs filesystem that the
kernel mounts.

This effects the dcache invalidation code in generic_shutdown_super
currently called shrunk_dcache_for_umount and that code for years
has gone through an additional list of dentries that might be dentry
trees that need to be freed to accomodate nfs.

When I wrote path_connected I did not realize nfs was so special, and
it's hueristic for avoiding calling is_subdir can fail.

The practical case where this fails is when there is a move of a
directory from the subtree exposed by one nfs mount to the subtree
exposed by another nfs mount.  This move can happen either locally or
remotely.  With the remote case requiring that the move directory be cached
before the move and that after the move someone walks the path
to where the move directory now exists and in so doing causes the
already cached directory to be moved in the dcache through the magic
of d_splice_alias.

If someone whose working directory is in the move directory or a
subdirectory and now starts calling .. from the initial mount of nfs
(where s_root == mnt_root), then path_connected as a heuristic will
not bother with the is_subdir check.  As s_root really is not the root
of the nfs filesystem this heuristic is wrong, and the path may
actually not be connected and path_connected can fail.

The is_subdir function might be cheap enough that we can call it
unconditionally.  Verifying that will take some benchmarking and
the result may not be the same on all kernels this fix needs
to be backported to.  So I am avoiding that for now.

Filesystems with snapshots such as nilfs and btrfs do something
similar.  But as the directory tree of the snapshots are disjoint
from one another and from the main directory tree rename won't move
things between them and this problem will not occur.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Fixes: 397d425dc2 ("vfs: Test for and handle paths that are unreachable from their mnt_root")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-15 18:48:38 -04:00