alistair23-linux

redonkable

Author	SHA1	Message	Date
Trond Myklebust	fa332941c0	NFSv4: Fix another potential state manager deadlock Don't hold the NFSv4 sequence id while we check for open permission. The call to ACCESS may block due to reboot recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-09 13:19:35 -04:00
Trond Myklebust	7a8203d8cb	NFS: Ensure that NFS file unlock waits for readahead to complete Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-08 22:12:42 -04:00
Trond Myklebust	577b42327d	NFS: Add functionality to allow waiting on all outstanding reads to complete This will later allow NFS locking code to wait for readahead to complete before releasing byte range locks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-08 22:12:33 -04:00
Trond Myklebust	bc7a05ca51	NFSv4: Handle timeouts correctly when probing for lease validity When we send a RENEW or SEQUENCE operation in order to probe if the lease is still valid, we want it to be able to time out since the lease we are probing is likely to time out too. Currently, because we use soft mount semantics for these RPC calls, the return value is EIO, which causes the state manager to exit with an "unhandled error" message. This patch changes the call semantics, so that the RPC layer returns ETIMEDOUT instead of EIO. We then have the state manager default to a simple retry instead of exiting. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-08 18:01:59 -04:00
Trond Myklebust	826e001308	NFSv4: Fix CB_RECALL_ANY to only return delegations that are not in use Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:57 -04:00
Trond Myklebust	b02ba0b660	NFSv4: Clean up nfs_expire_all_delegations Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:56 -04:00
Trond Myklebust	5c31e2368f	NFSv4: Fix nfs_server_return_all_delegations If the state manager thread is already running, we may end up racing with it in nfs_client_return_marked_delegations. Better to just allow the state manager thread to do the job. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:56 -04:00
Trond Myklebust	b757144fd7	NFSv4: Be less aggressive about returning delegations for open files Currently, if the application that holds the file open isn't doing I/O, we may end up returning the delegation. This means that we can no longer cache the file as aggressively, and often also that we multiply the state that both the server and the client needs to track. This patch adds a check for open files to the routine that scans for delegations that are unreferenced. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:55 -04:00
Trond Myklebust	db4f2e637f	NFSv4: Clean up delegation recall error handling Unify the error handling in nfs4_open_delegation_recall and nfs4_lock_delegation_recall. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:55 -04:00
Trond Myklebust	be76b5b68d	NFSv4: Clean up nfs4_open_delegation_recall Make it symmetric with nfs4_lock_delegation_recall Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:54 -04:00
Trond Myklebust	4a706fa09f	NFSv4: Clean up nfs4_lock_delegation_recall All error cases are handled by the switch() statement, meaning that the call to nfs4_handle_exception() is unreachable. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:54 -04:00
Trond Myklebust	8b6cc4d6f8	NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall A server shouldn't normally return NFS4ERR_GRACE if the client holds a delegation, since no conflicting lock reclaims can be granted, however the spec does not require the server to grant the open in this instance Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-04-05 17:03:53 -04:00
Trond Myklebust	dbb21c25a3	NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_lock_delegation_recall A server shouldn't normally return NFS4ERR_GRACE if the client holds a delegation, since no conflicting lock reclaims can be granted, however the spec does not require the server to grant the lock in this instance. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-04-05 17:03:53 -04:00
Jeff Layton	25d280aad8	nfs: allow the v4.1 callback thread to freeze The v4.1 callback thread has set_freezable() at the top, but it doesn't ever try to freeze within the loop. Have it call try_to_freeze() at the top of the loop. If a freeze event occurs, recheck kthread_should_stop() after thawing. Reported-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 17:03:52 -04:00
Trond Myklebust	7b1f1fd184	NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list It is unsafe to use list_for_each_entry_safe() here, because when we drop the nn->nfs_client_lock, we pin the _current_ list entry and ensure that it stays in the list, but we don't do the same for the _next_ list entry. Use of list_for_each_entry() is therefore the correct thing to do. Also fix the refcounting in nfs41_walk_client_list(). Finally, ensure that the nfs_client has finished being initialised and, in the case of NFSv4.1, that the session is set up. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]	2013-04-05 16:59:19 -04:00
Trond Myklebust	b193d59a48	NFSv4: Fix a memory leak in nfs4_discover_server_trunking When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy the old one. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org [>=3.7]	2013-04-05 16:59:15 -04:00
Trond Myklebust	845cbceb22	NFSv4: Don't clear the machine cred when client establish returns EACCES The expected behaviour is that the client will decide at mount time whether or not to use a krb5i machine cred, or AUTH_NULL. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com>	2013-04-05 15:37:04 -04:00
Trond Myklebust	ea33e6c3e7	NFSv4: Fix issues in nfs4_discover_server_trunking - Ensure that we exit with ENOENT if the call to ops->get_clid_cred() fails. - Handle the case where ops->detect_trunking() exits with an unexpected error, and return EIO. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-05 13:22:50 -04:00
Trond Myklebust	23631227a6	NFSv4: Fix the fallback to AUTH_NULL if krb5i is not available If the rpcsec_gss_krb5 module cannot be loaded, the attempt to create an rpc_client in nfs4_init_client will currently fail with an EINVAL. Fix is to retry with AUTH_NULL. Regression introduced by the commit "NFS: Use "krb5i" to establish NFSv4 state whenever possible" Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com>	2013-04-04 17:01:25 -04:00
Chuck Lever	4580a92d44	NFS: Use server-recommended security flavor by default (NFSv3) Since commit `ec88f28d` in 2009, checking if the user-specified flavor is in the server's flavor list has been the source of a few noticeable regressions (now fixed), but there is one that is still vexing. An NFS server can list AUTH_NULL in its flavor list, which suggests a client should try to mount the server with the flavor of the client's choice, but the server will squash all accesses. In some cases, our client fails to mount a server because of this check, when the mount could have proceeded successfully. Skip this check if the user has specified "sec=" on the mount command line. But do consult the server-provided flavor list to choose a security flavor if no sec= option is specified on the mount command. If a server lists Kerberos pseudoflavors before "sys" in its export options, our client now chooses Kerberos over AUTH_UNIX for mount points, when no security flavor is specified by the mount command. This could be surprising to some administrators or users, who would then need to have Kerberos credentials to access the export. Or, a client administrator may not have enabled rpc.gssd. In this case, auth_rpcgss.ko might still be loadable, which is enough for the new logic to choose Kerberos over AUTH_UNIX. But the mount would fail since no GSS context can be created without rpc.gssd running. To retain the use of AUTH_UNIX by default: o The server administrator can ensure that "sys" is listed before Kerberos flavors in its export security options (see exports(5)), o The client administrator can explicitly specify "sec=sys" on its mount command line (see nfs(5)), o The client administrator can use "Sec=sys" in an appropriate section of /etc/nfsmount.conf (see nfsmount.conf(5)), or o The client administrator can blacklist auth_rpcgss.ko. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-04-04 17:01:01 -04:00
Jeff Layton	094f7b69ea	selinux: make security_sb_clone_mnt_opts return an error on context mismatch I had the following problem reported a while back. If you mount the same filesystem twice using NFSv4 with different contexts, then the second context= option is ignored. For instance: # mount server:/export /mnt/test1 # mount server:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0 # ls -dZ /mnt/test1 drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test1 # ls -dZ /mnt/test2 drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test2 When we call into SELinux to set the context of a "cloned" superblock, it will currently just bail out when it notices that we're reusing an existing superblock. Since the existing superblock is already set up and presumably in use, we can't go overwriting its context with the one from the "original" sb. Because of this, the second context= option in this case cannot take effect. This patch fixes this by turning security_sb_clone_mnt_opts into an int return operation. When it finds that the "new" superblock that it has been handed is already set up, it checks to see whether the contexts on the old superblock match it. If it does, then it will just return success, otherwise it'll return -EBUSY and emit a printk to tell the admin why the second mount failed. Note that this patch may cause casualties. The NFSv4 code relies on being able to walk down to an export from the pseudoroot. If you mount filesystems that are nested within one another with different contexts, then this patch will make those mounts fail in new and "exciting" ways. For instance, suppose that /export is a separate filesystem on the server: # mount server:/ /mnt/test1 # mount salusa:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0 mount.nfs: an incorrect mount option was specified ...with the printk in the ring buffer. Because we might eventually walk down to /mnt/test1/export, the mount is denied due to this patch. The second mount needs the pseudoroot superblock, but that's already present with the wrong context. OTOH, if we mount these in the reverse order, then both mounts work, because the pseudoroot superblock created when mounting /export is discarded once that mount is done. If we then however try to walk into that directory, the automount fails for the similar reasons: # cd /mnt/test1/scratch/ -bash: cd: /mnt/test1/scratch: Device or resource busy The story I've gotten from the SELinux folks that I've talked to is that this is desirable behavior. In SELinux-land, mounting the same data under different contexts is wrong -- there can be only one. Cc: Steve Dickson <steved@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>	2013-04-02 11:30:13 +11:00
Chuck Lever	4edaa30888	NFS: Use "krb5i" to establish NFSv4 state whenever possible Currently our client uses AUTH_UNIX for state management on Kerberos NFS mounts in some cases. For example, if the first mount of a server specifies "sec=sys," the SETCLIENTID operation is performed with AUTH_UNIX. Subsequent mounts using stronger security flavors can not change the flavor used for lease establishment. This might be less security than an administrator was expecting. Dave Noveck's migration issues draft recommends the use of an integrity-protecting security flavor for the SETCLIENTID operation. Let's ignore the mount's sec= setting and use krb5i as the default security flavor for SETCLIENTID. If our client can't establish a GSS context (eg. because it doesn't have a keytab or the server doesn't support Kerberos) we fall back to using AUTH_NULL. For an operation that requires a machine credential (which never represents a particular user) AUTH_NULL is as secure as AUTH_UNIX. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:45:22 -04:00
Chuck Lever	c4eafe1135	NFS: Try AUTH_UNIX when PUTROOTFH gets NFS4ERR_WRONGSEC Most NFSv4 servers implement AUTH_UNIX, and administrators will prefer this over AUTH_NULL. It is harmless for our client to try this flavor in addition to the flavors mandated by RFC 3530/5661. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:45:09 -04:00
Chuck Lever	9a744ba398	NFS: Use static list of security flavors during root FH lookup recovery If the Linux NFS client receives an NFS4ERR_WRONGSEC error while trying to look up an NFS server's root file handle, it retries the lookup operation with various security flavors to see what flavor the NFS server will accept for pseudo-fs access. The list of flavors the client uses during retry consists only of flavors that are currently registered in the kernel RPC client. This list may not include any GSS pseudoflavors if auth_rpcgss.ko has not yet been loaded. Let's instead use a static list of security flavors that the NFS standard requires the server to implement (RFC 3530bis, section 3.2.1). The RPC client should now be able to load support for these dynamically; if not, they are skipped. Recovery behavior here is prescribed by RFC 3530bis, section 15.33.5: > For LOOKUPP, PUTROOTFH and PUTPUBFH, the client will be unable to > use the SECINFO operation since SECINFO requires a current > filehandle and none exist for these two [sic] operations. Therefore, > the client must iterate through the security triples available at > the client and reattempt the PUTROOTFH or PUTPUBFH operation. In > the unfortunate event none of the MANDATORY security triples are > supported by the client and server, the client SHOULD try using > others that support integrity. Failing that, the client can try > using AUTH_NONE, but because such forms lack integrity checks, > this puts the client at risk. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:44:58 -04:00
Chuck Lever	83ca7f5ab3	NFS: Avoid PUTROOTFH when managing leases Currently, the compound operation the Linux NFS client sends to the server to confirm a client ID looks like this: { SETCLIENTID_CONFIRM; PUTROOTFH; GETATTR(lease_time) } Once the lease is confirmed, it makes sense to know how long before the client will have to renew it. And, performing these operations in the same compound saves a round trip. Unfortunately, this arrangement assumes that the security flavor used for establishing a client ID can also be used to access the server's pseudo-fs. If the server requires a different security flavor to access its pseudo-fs than it allowed for the client's SETCLIENTID operation, the PUTROOTFH in this compound fails with NFS4ERR_WRONGSEC. Even though the SETCLIENTID_CONFIRM succeeded, our client's trunking detection logic interprets the failure of the compound as a failure by the server to confirm the client ID. As part of server trunking detection, the client then begins another SETCLIENTID pass with the same nfs4_client_id. This fails with NFS4ERR_CLID_INUSE because the first SETCLIENTID/SETCLIENTID_CONFIRM already succeeded in confirming that client ID -- it was the PUTROOTFH operation that caused the SETCLIENTID_CONFIRM compound to fail. To address this issue, separate the "establish client ID" step from the "accessing the server's pseudo-fs root" step. The first access of the server's pseudo-fs may require retrying the PUTROOTFH operation with different security flavors. This access is done in nfs4_proc_get_rootfh(). That leaves the matter of how to retrieve the server's lease time. nfs4_proc_fsinfo() already retrieves the lease time value, though none of its callers do anything with the retrieved value (nor do they mark the lease as "renewed"). Note that NFSv4.1 state recovery invokes nfs4_proc_get_lease_time() using the lease management security flavor. This may cause some heartburn if that security flavor isn't the same as the security flavor the server requires for accessing the pseudo-fs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:44:49 -04:00
Chuck Lever	2ed4b95b7e	NFS: Clean up nfs4_proc_get_rootfh The long lines with no vertical white space make this function difficult for humans to read. Add a proper documenting comment while we're here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:44:12 -04:00
Chuck Lever	75bc8821bd	NFS: Handle missing rpc.gssd when looking up root FH When rpc.gssd is not running, any NFS operation that needs to use a GSS security flavor of course does not work. If looking up a server's root file handle results in an NFS4ERR_WRONGSEC, nfs4_find_root_sec() is called to try a bunch of security flavors until one works or all reasonable flavors have been tried. When rpc.gssd isn't running, this loop seems to fail immediately after rpcauth_create() craps out on the first GSS flavor. When the rpcauth_create() call in nfs4_lookup_root_sec() fails because rpc.gssd is not available, nfs4_lookup_root_sec() unconditionally returns -EIO. This prevents nfs4_find_root_sec() from retrying any other flavors; it drops out of its loop and fails immediately. Having nfs4_lookup_root_sec() return -EACCES instead allows nfs4_find_root_sec() to try all flavors in its list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:43:55 -04:00
Chuck Lever	9568c5e9a6	SUNRPC: Introduce rpcauth_get_pseudoflavor() A SECINFO reply may contain flavors whose kernel module is not yet loaded by the client's kernel. A new RPC client API, called rpcauth_get_pseudoflavor(), is introduced to do proper checking for support of a security flavor. When this API is invoked, the RPC client now tries to load the module for each flavor first before performing the "is this supported?" check. This means if a module is available on the client, but has not been loaded yet, it will be loaded and registered automatically when the SECINFO reply is processed. The new API can take a full GSS tuple (OID, QoP, and service). Previously only the OID and service were considered. nfs_find_best_sec() is updated to verify all flavors requested in a SECINFO reply, including AUTH_NULL and AUTH_UNIX. Previously these two flavors were simply assumed to be supported without consulting the RPC client. Note that the replaced version of nfs_find_best_sec() can return RPC_AUTH_MAXFLAVOR if the server returns a recognized OID but an unsupported "service" value. nfs_find_best_sec() now returns RPC_AUTH_UNIX in this case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:43:07 -04:00
Chuck Lever	fb15b26f8b	SUNRPC: Define rpcsec_gss_info structure The NFSv4 SECINFO procedure returns a list of security flavors. Any GSS flavor also has a GSS tuple containing an OID, a quality-of- protection value, and a service value, which specifies a particular GSS pseudoflavor. For simplicity and efficiency, I'd like to return each GSS tuple from the NFSv4 SECINFO XDR decoder and pass it straight into the RPC client. Define a data structure that is visible to both the NFS client and the RPC client. Take structure and field names from the relevant standards to avoid confusion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-29 15:42:56 -04:00
Trond Myklebust	809b426c7f	NFSv4: Fix Oopses in the fs_locations code If the server sends us a pathname with more components than the client limit of NFS4_PATHNAME_MAXCOMPONENTS, more server entries than the client limit of NFS4_FS_LOCATION_MAXSERVERS, or sends a total number of fs_locations entries than the client limit of NFS4_FS_LOCATIONS_MAXENTRIES then we will currently Oops because the limit checks are done _after_ we've decoded the data into the arrays. Reported-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-28 16:22:17 -04:00
Trond Myklebust	91876b13b8	NFSv4: Fix another reboot recovery race If the open_context for the file is not yet fully initialised, then open recovery cannot succeed, and since nfs4_state_find_open_context returns an ENOENT, we end up treating the file as being irrecoverable. What we really want to do, is just defer the recovery until later. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-28 16:22:16 -04:00
Trond Myklebust	6e3cf24152	NFSv4: Add a mapping for NFS4ERR_FILE_OPEN in nfs4_map_errors With unlink is an asynchronous operation in the sillyrename case, it expects nfs4_async_handle_error() to map the error correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-27 12:44:40 -04:00
Linus Torvalds	5d538483ea	NFS client bugfixes for Linux 3.9 - Fix an NFSv4 idmapper regression - Fix an Oops in the pNFS blocks client - Fix up various issues with pNFS layoutcommit - Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRUedyAAoJEGcL54qWCgDyar0P/2pTT/yxX8ejTu5DmY7e4PYJ jhPG2AEqY/yMLn9GvB375VIs1L8tuY50+3NFhWZFjyNbEU3GV+5Y+kPpBtAgYiSI VyIXiJ/xMtXdYJMYuE/nh5jbcqJsHwGjpcIaSd5BuWzQUaoUYvLulxWd4QN8mmaT 5SuzmgV+7WIqV6RjlaYF82srcOKAjwemcrfRkCNzzJr6aT39gH2YdYFbDaTr7qhU fw0x3QlI7887vSNQcfaGbC1+jr6oe8wRCneOR0tceU/8bcj6zlUDk5HxqSOc28mA jUQieoVRggcM4s5DFpNcuwW6qCPZOmzv/OFD6oqnhyyonPOrue+7zaoujZmGNmjx dT2V/jQehanYD25WpDO8OyFXUeYE4x9bgHKsszhBTwr4x5D8ceEJ1sugcOPiTTxu tflbbuWbt+BguvXp4p8QayUj0V2cplM/nOovWyUG+BH46sz3Dtv46NOgJeO2a29g T6jayxmKCxvtPKtG0j34BzLngiKabZTSEhFms6Qarp9lwWvHWrR9KWGuDBNvy1Ts GMBN8P6Ib40yVi6Pwlj5Jpy6yLKVklHtJQpactr63AZmYrF4bBBSom+MWAh3X1iO QtF0x9Z1bBkXY2Q/u+3vWMxQtEPeW+pSiloj8aiceFAt33zKM+1bLofDhEw0s2fI wJEHYsGyGtDQINgP0v1e =OPbZ -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: - Fix an NFSv4 idmapper regression - Fix an Oops in the pNFS blocks client - Fix up various issues with pNFS layoutcommit - Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked * tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked NFSv4.1: Add a helper pnfs_commit_and_return_layout NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn NFSv4.1: Fix a race in pNFS layoutcommit pnfs-block: removing DM device maybe cause oops when call dev_remove NFSv4: Fix the string length returned by the idmapper	2013-03-26 14:23:45 -07:00
Trond Myklebust	ccb46e2063	NFSv4.1: Use CLAIM_DELEG_CUR_FH opens when available Now that we do CLAIM_FH opens, we may run into situations where we get a delegation but don't have perfect knowledge of the file path. When returning the delegation, we might therefore not be able to us CLAIM_DELEGATE_CUR opens to convert the delegation into OPEN stateids and locks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:11 -04:00
Trond Myklebust	49f9a0fafd	NFSv4.1: Enable open-by-filehandle Sometimes, we actually _want_ to do open-by-filehandle, for instance when recovering opens after a network partition, or when called from nfs4_file_open. Enable that functionality using a new capability NFS_CAP_ATOMIC_OPEN_V1, and which is only enabled for NFSv4.1 servers that support it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:11 -04:00
Trond Myklebust	d9fc6619ca	NFSv4.1: Add xdr support for CLAIM_FH and CLAIM_DELEG_CUR_FH opens Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:11 -04:00
Trond Myklebust	4a1c089345	NFSv4: Clean up nfs4_opendata_alloc in preparation for NFSv4.1 open modes Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:11 -04:00
Trond Myklebust	3b66486c4c	NFSv4.1: Select the "most recent locking state" for read/write/setattr stateids Follow the practice described in section 8.2.2 of RFC5661: When sending a read/write or setattr stateid, set the seqid field to zero in order to signal that the NFS server should apply the most recent locking state. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:11 -04:00
Trond Myklebust	39c6daae70	NFSv4: Prepare for minorversion-specific nfs_server capabilities Clean up the setting of the nfs_server->caps, by shoving it all into nfs4_server_common_setup(). Then add an 'initial capabilities' field into struct nfs4_minor_version_ops. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:11 -04:00
Trond Myklebust	5521abfdcf	NFSv4: Resend the READ/WRITE RPC call if a stateid change causes an error Adds logic to ensure that if the server returns a BAD_STATEID, or other state related error, then we check if the stateid has already changed. If it has, then rather than start state recovery, we should just resend the failed RPC call with the new stateid. Allow nfs4_select_rw_stateid to notify that the stateid is unstable by having it return -EWOULDBLOCK if an RPC is underway that might change the stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:10 -04:00
Trond Myklebust	9b20614988	NFSv4: The stateid must remain the same for replayed RPC calls If we replay a READ or WRITE call, we should not be changing the stateid. Currently, we may end up doing so, because the stateid is only selected at xdr encode time. This patch ensures that we select the stateid after we get an NFSv4.1 session slot, and that we keep that same stateid across retries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:10 -04:00
Trond Myklebust	8c86899f62	NFS: __nfs_find_lock_context needs to check ctx->lock_context for a match too Currently, we're forcing an unnecessary duplication of the initial nfs_lock_context in calls to nfs_get_lock_context, since __nfs_find_lock_context ignores the ctx->lock_context. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:10 -04:00
Trond Myklebust	c58c844187	NFS: Don't accept more reads/writes if the open context recovery failed If the state recovery failed, we want to ensure that the application doesn't try to use the same file descriptor for more reads or writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:10 -04:00
Trond Myklebust	5d422301f9	NFSv4: Fail I/O if the state recovery fails irrevocably If state recovery fails with an ESTALE or a ENOENT, then we shouldn't keep retrying. Instead, mark the stateid as being invalid and fail the I/O with an EIO error. For other operations such as POSIX and BSD file locking, truncate etc, fail with an EBADF to indicate that this file descriptor is no longer valid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-25 12:04:10 -04:00
Trond Myklebust	240286725d	NFSv4.1: Add a helper pnfs_commit_and_return_layout In order to be able to safely return the layout in nfs4_proc_setattr, we need to block new uses of the layout, wait for all outstanding users of the layout to complete, commit the layout and then return it. This patch adds a helper in order to do all this safely. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Boaz Harrosh <bharrosh@panasas.com>	2013-03-21 10:31:21 -04:00
Trond Myklebust	2495680434	NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout segments. The only two sites that need to do this are the ones that call pnfs_return_layout() without first doing a layout commit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org	2013-03-21 10:31:21 -04:00
Trond Myklebust	a073dbff35	NFSv4.1: Fix a race in pNFS layoutcommit We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations where the two are out of sync. The first half of the problem is to ensure that pnfs_layoutcommit_inode clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg. We still need to keep the reference to those segments until the RPC call is finished, so in order to make it clear _where_ those references come from, we add a helper pnfs_list_write_lseg_done() that cleans up after pnfs_list_write_lseg. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org	2013-03-21 10:31:19 -04:00
fanchaoting	4376c94618	pnfs-block: removing DM device maybe cause oops when call dev_remove when pnfs block using device mapper,if umounting later,it maybe cause oops. we apply "1 + sizeof(bl_umount_request)" memory for msg->data, the memory maybe overflow when we do "memcpy(&dataptr [sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request))", because the size of bl_msg is more than 1 byte. Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-03-21 10:11:06 -04:00
Trond Myklebust	cf4ab538f1	NFSv4: Fix the string length returned by the idmapper Functions like nfs_map_uid_to_name() and nfs_map_gid_to_group() are expected to return a string without any terminating NUL character. Regression introduced by commit `57e62324e4` (NFS: Store the legacy idmapper result in the keyring). Reported-by: Dave Chiluk <dave.chiluk@canonical.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org [>=3.4]	2013-03-20 16:45:16 -04:00
Eric W. Biederman	fa7614ddd6	fs: Readd the fs module aliases. I had assumed that the only use of module aliases for filesystems prior to "fs: Limit sys_mount to only request filesystem modules." was in request_module. It turns out I was wrong. At least mkinitcpio in Arch linux uses these aliases. So readd the preexising aliases, to keep from breaking userspace. Userspace eventually will have to follow and use the same aliases the kernel does. So at some point we may be delete these aliases without problems. However that day is not today. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-03-12 18:55:21 -07:00
Eric W. Biederman	7f78e03513	fs: Limit sys_mount to only request filesystem modules. Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-03-03 19:36:31 -08:00
Linus Torvalds	8d05b3771d	NFS client bugfixes for Linux 3.9 - Don't allow NFS silly-renamed files to be deleted - Don't start the retransmission timer when out of socket space - Fix a couple of pnfs-related Oopses. - Fix one more NFSv4 state recovery deadlock - Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRMpMhAAoJEGcL54qWCgDy4BMP/0Zl7Ei7x9bJSb1C1lpPSo5p Lr9XoHLYqhPcAwKUXQfgM5IkC69bE62bD5esmdDqkgZYqnmGE0E4LG6MsbsMmvzk yug5WOxmjOFee7Bdpd8B86Z0qsa4l2TkQu2h9G3zE36P2rPKQaNzpteIjhis5UEQ EfNyLoBdFcuUSh4ztMVZOzbAyDcbNfsyl03XVmlv+Qn/o0l42Zjth0qwOP60bjuM zJF1CkHi5NLbXEhmOev9mA6UYz6zWRbiA/Yu92pomtXVDtOtzWpUniBIcf/S1ZH/ V8Gj6bWdHHyFCa2PjhY1/QdLBOPRPdxpAAJk+q48AKmzyiOU6g3lIHBp5ai1WZNI 1C+SYxABE/EJgq9SoQYGqq6SUiolrFulqnFHXF0jHF+ifdjoHjSRmpGQAoyoZ0k1 aSl+Ojqx7QHibJd8GZBavWc3upRDzhHDRRB3tkQCENi+hryBZxEyeS2Z54NmBRUN tsOuyac6rtknZdD8Do4DMt9uc9u1DWicaiZbLfkP1VL1Angh6NKSA7qbmH6giLBS 9Y+DPcIk5e34uKQ21WTxFydGD+SMg0EMnOmfr6EYXWEHBhKNYVR+cHyH0mAF6RzX enU2g0H2m+3vUQqajPUP0DV/eLGtdsvWvMjiskc3KX90CWfHmV2C8GFSxjV2OkT1 vG1KFrICO6DR2943Udit =FMtb -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "We've just concluded another Connectathon interoperability testing week, and so here are the fixes for the bugs that were discovered: - Don't allow NFS silly-renamed files to be deleted - Don't start the retransmission timer when out of socket space - Fix a couple of pnfs-related Oopses. - Fix one more NFSv4 state recovery deadlock - Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER" * tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: One line comment fix NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS SUNRPC: add call to get configured timeout PNFS: set the default DS timeout to 60 seconds NFSv4: Fix another open/open_recovery deadlock nfs: don't allow nfs_find_actor to match inodes of the wrong type NFSv4.1: Hold reference to layout hdr in layoutget pnfs: fix resend_to_mds for directio SUNRPC: Don't start the retransmission timer when out of socket space NFS: Don't allow NFS silly-renamed files to be deleted, no signal	2013-03-02 16:46:07 -08:00
Linus Torvalds	b6669737d3	Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux Pull nfsd changes from J Bruce Fields: "Miscellaneous bugfixes, plus: - An overhaul of the DRC cache by Jeff Layton. The main effect is just to make it larger. This decreases the chances of intermittent errors especially in the UDP case. But we'll need to watch for any reports of performance regressions. - Containerized nfsd: with some limitations, we now support per-container nfs-service, thanks to extensive work from Stanislav Kinsbursky over the last year." Some notes about conflicts, since there were two non-data semantic conflicts here: - idr_remove_all() had been added by a memory leak fix, but has since become deprecated since idr_destroy() does it for us now. - xs_local_connect() had been added by this branch to make AF_LOCAL connections be synchronous, but in the meantime Trond had changed the calling convention in order to avoid a RCU dereference. There were a couple of more obvious actual source-level conflicts due to the hlist traversal changes and one just due to code changes next to each other, but those were trivial. * 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits) SUNRPC: make AF_LOCAL connect synchronous nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum svcrpc: fix rpc server shutdown races svcrpc: make svc_age_temp_xprts enqueue under sv_lock lockd: nlmclnt_reclaim(): avoid stack overflow nfsd: enable NFSv4 state in containers nfsd: disable usermode helper client tracker in container nfsd: use proper net while reading "exports" file nfsd: containerize NFSd filesystem nfsd: fix comments on nfsd_cache_lookup SUNRPC: move cache_detail->cache_request callback call to cache_read() SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function SUNRPC: rework cache upcall logic SUNRPC: introduce cache_detail->cache_request callback NFS: simplify and clean cache library NFS: use SUNRPC cache creation and destruction helper for DNS cache nfsd4: free_stid can be static nfsd: keep a checksum of the first 256 bytes of request sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer sunrpc: fix comment in struct xdr_buf definition ...	2013-02-28 18:02:55 -08:00
Weston Andros Adamson	3000512137	NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS The client will currently try LAYOUTGETs forever if a server is returning NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no longer needs the layout (ie process killed, unmounted). This patch uses the DS timeout value (module parameter 'dataserver_timeo' via rpc layer) to set an upper limit of how long the client tries LATOUTGETs in this situation. Once the timeout is reached, IO is redirected to the MDS. This also changes how the client checks if a layout is on the clp list to avoid a double list_add. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-28 17:41:35 -08:00
Weston Andros Adamson	eb97cbb459	PNFS: set the default DS timeout to 60 seconds The client should have 60 second default timeouts for DS operations, not 6 seconds. NFS4_DEF_DS_TIMEO is used as "timeout in tenths of a second" in nfs_init_timeout_values (and is not used anywhere else). This matches up with the description of the module param dataserver_timeo. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-28 17:35:00 -08:00
Trond Myklebust	7aa262b522	NFSv4: Fix another open/open_recovery deadlock If we don't release the open seqid before we wait for state recovery, then we may end up deadlocking the state recovery thread. This patch addresses a new deadlock that was introduced by commit `c21443c2c7` (NFSv4: Fix a reboot recovery race when opening a file) Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-28 16:19:59 -08:00
Sasha Levin	b67bfe0d42	hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S \| hlist_for_each_entry_continue(a, - b, c) S \| hlist_for_each_entry_from(a, - b, c) S \| hlist_for_each_entry_rcu(a, - b, c, d) S \| hlist_for_each_entry_rcu_bh(a, - b, c, d) S \| hlist_for_each_entry_continue_rcu_bh(a, - b, c) S \| for_each_busy_worker(a, c, - b, d) S \| ax25_uid_for_each(a, - b, c) S \| ax25_for_each(a, - b, c) S \| inet_bind_bucket_for_each(a, - b, c) S \| sctp_for_each_hentry(a, - b, c) S \| sk_for_each(a, - b, c) S \| sk_for_each_rcu(a, - b, c) S \| sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S \| sk_for_each_safe(a, - b, c, d) S \| sk_for_each_bound(a, - b, c) S \| hlist_for_each_entry_safe(a, - b, c, d, e) S \| hlist_for_each_entry_continue_rcu(a, - b, c) S \| nr_neigh_for_each(a, - b, c) S \| nr_neigh_for_each_safe(a, - b, c, d) S \| nr_node_for_each(a, - b, c) S \| nr_node_for_each_safe(a, - b, c, d) S \| - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S \| - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S \| for_each_host(a, - b, c) S \| for_each_host_safe(a, - b, c, d) S \| for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:24 -08:00
Tejun Heo	d687031265	nfs4client: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:20 -08:00
Tejun Heo	d97bec801d	nfs: idr_destroy() no longer needs idr_remove_all() idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop reference to idr_remove_all(). Note that the code wasn't completely correct before because idr_remove() on all entries doesn't necessarily release all idr_layers which could lead to memory leak. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:14 -08:00
Jeff Layton	f6488c9ba5	nfs: don't allow nfs_find_actor to match inodes of the wrong type Benny Halevy reported the following oops when testing RHEL6: <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644 <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>PGD 81448a067 PUD 831632067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled <4>CPU 6 <4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6 /ProLiant DL170e G6 <4>RIP: 0010:[<ffffffffa02a52c5>] [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>RSP: 0018:ffff88081458bb98 EFLAGS: 00010292 <4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003 <4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760 <4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008 <4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780 <4>FS: 00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040) <4>Stack: <4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745 <4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea <4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760 <4>Call Trace: <4> [<ffffffff81182745>] __fput+0xf5/0x210 <4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs] <4> [<ffffffff81182885>] fput+0x25/0x30 <4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360 <4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60 <4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90 <4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs] <4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs] <4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160 <4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0 <4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50 <4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0 <4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480 <4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160 <4> [<ffffffff8117de79>] do_sys_open+0x69/0x140 <4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80 <4> [<ffffffff8117df90>] sys_open+0x20/0x30 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31 <1>RIP [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4> RSP <ffff88081458bb98> <4>CR2: 0000000000000000 I think this is ultimately due to a bug on the server. The client had previously found a directory dentry. It then later tried to do an atomic open on a new (regular file) dentry. The attributes it got back had the same filehandle as the previously found directory inode. It then tried to put the filp because it failed the aops tests for O_DIRECT opens, and oopsed here because the ctx was still NULL. Obviously the root cause here is a server issue, but we can take steps to mitigate this on the client. When nfs_fhget is called, we always know what type of inode it is. In the event that there's a broken or malicious server on the other end of the wire, the client can end up crashing because the wrong ops are set on it. Have nfs_find_actor check that the inode type is correct after checking the fileid. The fileid check should rarely ever match, so it should only rarely ever get to this check. In the case where we have a broken server, we may see two different inodes with the same i_ino, but the client should be able to cope with them without crashing. This should fix the oops reported here: https://bugzilla.redhat.com/show_bug.cgi?id=913660 Reported-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-27 17:28:20 -08:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Jeff Layton	ecf3d1f1aa	vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op The following set of operations on a NFS client and server will cause server# mkdir a client# cd a server# mv a a.bak client# sleep 30 # (or whatever the dir attrcache timeout is) client# stat . stat: cannot stat `.': Stale NFS file handle Obviously, we should not be getting an ESTALE error back there since the inode still exists on the server. The problem is that the lookup code will call d_revalidate on the dentry that "." refers to, because NFS has FS_REVAL_DOT set. nfs_lookup_revalidate will see that the parent directory has changed and will try to reverify the dentry by redoing a LOOKUP. That of course fails, so the lookup code returns ESTALE. The problem here is that d_revalidate is really a bad fit for this case. What we really want to know at this point is whether the inode is still good or not, but we don't really care what name it goes by or whether the dcache is still valid. Add a new d_op->d_weak_revalidate operation and have complete_walk call that instead of d_revalidate. The intent there is to allow for a "weaker" d_revalidate that just checks to see whether the inode is still good. This is also gives us an opportunity to kill off the FS_REVAL_DOT special casing. [AV: changed method name, added note in porting, fixed confusion re having it possibly called from RCU mode (it won't be)] Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-26 02:46:09 -05:00
Weston Andros Adamson	a47970ff78	NFSv4.1: Hold reference to layout hdr in layoutget This fixes an oops where a LAYOUTGET is in still in the rpciod queue, but the requesting processes has been killed. Without this, killing the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout to NULL while the LAYOUTGET rpc task still references it. Example oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] PGD 7365b067 PUD 7365d067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4 CPU 0 Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa01bd586>] [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] RSP: 0018:ffff88007b0c1c88 EFLAGS: 00010246 RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958 RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958 R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0) Stack: ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791 Call Trace: [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4] [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc] [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc] Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Weston Andros Adamson <dros@netapp.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-25 18:32:59 -08:00
Linus Torvalds	94f2f14234	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...	2013-02-25 16:00:49 -08:00
Benny Halevy	78f33277f9	pnfs: fix resend_to_mds for directio Pass the directio request on pageio_init to clean up the API. Percolate pg_dreq from original nfs_pageio_descriptor to the pnfs_{read,write}_done_resend_to_mds and use it on respective call to nfs_pageio_init_{read,write} on the newly created nfs_pageio_descriptor. Reproduced by command: mount -o vers=4.1 server:/ /mnt dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] PGD 34786067 PUD 34794067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4 CPU 1 Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs RIP: 0010:[<ffffffffa021a3a8>] [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP: 0018:ffff880038f8fa68 EFLAGS: 00010206 RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000 RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028 RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510 R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40 R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878 FS: 0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480) Stack: ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48 Call Trace: [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs] [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs] [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs] [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs] [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs] [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4] [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files] [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files] [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4] [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs] [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs] [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files] [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc] [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc] [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc] [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff8105f8c1>] process_one_work+0x226/0x422 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422 [<ffffffff81094757>] ? lock_acquired+0x210/0x249 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff810600d8>] worker_thread+0x126/0x1c4 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240 [<ffffffff81064ef8>] kthread+0xb1/0xb9 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f RIP [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP <ffff880038f8fa68> CR2: 0000000000000028 Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@kernel.org [>= 3.6] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-24 10:07:36 -05:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Trond Myklebust	5a7a613a47	NFS: Don't allow NFS silly-renamed files to be deleted, no signal Commit `73ca100` broke the code that prevents the client from deleting a silly renamed dentry. This affected "delete on last close" semantics as after that commit, nothing prevented removal of silly-renamed files. As a result, a process holding a file open could easily get an ESTALE on the file in a directory where some other process issued 'rm -rf some_dir_containing_the_file' twice. Before the commit, any attempt at unlinking silly renamed files would fail inside may_delete() with -EBUSY because of the DCACHE_NFSFS_RENAMED flag. The following testcase demonstrates the problem: tail -f /nfsmnt/dir/file & rm -rf /nfsmnt/dir rm -rf /nfsmnt/dir # second removal does not fail, 'tail' process receives ESTALE The problem with the above commit is that it unhashes the old and new dentries from the lookup path, even in the normal case when a signal is not encountered and it would have been safe to call d_move. Unfortunately the old dentry has the special DCACHE_NFSFS_RENAMED flag set on it. Unhashing has the side-effect that future lookups call d_alloc(), allocating a new dentry without the special flag for any silly-renamed files. As a result, subsequent calls to unlink silly renamed files do not fail but allow the removal to go through. This will result in ESTALE errors for any other process doing operations on the file. To fix this, go back to using d_move on success. For the signal case, it's unclear what we may safely do beyond d_drop. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org	2013-02-22 14:55:34 -05:00
fanchaoting	5a12cca697	umount oops when remove blocklayoutdriver first now pnfs client uses block layout, maybe we can remove blocklayoutdriver first. if we umount later, it can cause oops in unset_pnfs_layoutdriver. because nfss->pnfs_curr_ld->clear_layoutdriver is invalid. reproduce it: modprobe blocklayoutdriver mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/ rmmod blocklayoutdriver umount /mnt then you can see following CPU 0 Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa04cfe6d>] [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286 RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001 RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800 RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400 R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550 FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0) Stack: ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7 ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400 Call Trace: [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4] [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs] [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs] [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70 [<ffffffff8117986a>] deactivate_super+0x4a/0x70 [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130 [<ffffffff81194d62>] sys_umount+0x72/0xe0 [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2 RIP [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP <ffff8800022d9e48> CR2: ffffffffa04a1b38 ---[ end trace 29f75aaedda058bf ]--- Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-02-17 15:40:15 -05:00
Tim Gardner	96aa1549af	nfs: remove kfree() redundant null checks smatch analysis: fs/nfs/getroot.c:130 nfs_get_root() info: redundant null check on name calling kfree() fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null check on devname_garbage calling kfree() Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-17 15:27:21 -05:00
Weston Andros Adamson	085b7a45c6	NFSv4.1: Don't decode skipped layoutgets layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0). Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean that the RPC was successfully sent, received and parsed. To fix this, use the result's len member to see if parsing took place. This fixes the following OOPS -- calling xdr_init_decode() with a buffer length 0 doesn't set the stream's 'p' member and ends up using uninitialized memory in filelayout_decode_layout. BUG: unable to handle kernel paging request at 0000000000008050 IP: [<ffffffff81282e78>] memcpy+0x18/0x120 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq CPU 1 Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib] Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff81282e78>] [<ffffffff81282e78>] memcpy+0x18/0x120 RSP: 0018:ffff88003dfab588 EFLAGS: 00010206 RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009 RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000 RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024 R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0 FS: 0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540) Stack: ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0 <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70 <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc] [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files] [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files] [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-02-17 15:24:16 -05:00
Stanislav Kinsbursky	21cd1254d3	SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function Passing this pointer is redundant since it's stored on cache_detail structure, which is also passed to sunrpc_cache_pipe_upcall () function. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-02-15 10:43:47 -05:00
Stanislav Kinsbursky	73fb847a44	SUNRPC: introduce cache_detail->cache_request callback This callback will allow to simplify upcalls in further patches in this series. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-02-15 10:43:45 -05:00
Stanislav Kinsbursky	462b8f6bf1	NFS: simplify and clean cache library This is a cleanup patch. Such helpers like nfs_cache_init() and nfs_cache_destroy() are redundant, because they are just a wrappers around sunrpc_init_cache_detail() and sunrpc_destroy_cache_detail() respectively. So let's remove them completely and move corresponding logic to nfs_cache_register_net() and nfs_cache_unregister_net() respectively (since they are called together anyway). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-02-15 10:43:36 -05:00
Stanislav Kinsbursky	483479c26a	NFS: use SUNRPC cache creation and destruction helper for DNS cache This cache was the first containerized and doesn't use net-aware cache creation and destruction helpers. This is a cleanup patch which just makes code looks clearer and reduce amount of lines of code. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-02-15 10:43:12 -05:00
Trond Myklebust	fd9a8d7160	NFSv4.1: Fix bulk recall and destroy of layouts The current code in pnfs_destroy_all_layouts() assumes that removing the layout from the server->layouts list is sufficient to make it invisible to other processes. This ignores the fact that most users access the layout through the nfs_inode->layout... There is further breakage due to lack of reference counting of the layouts, meaning that the whole thing Oopses at the drop of a hat. The code in initiate_bulk_draining() is almost correct, and can be used as a model for pnfs_destroy_all_layouts(), so move that code to pnfs.c, and refactor the code to allow us to choose between a single filesystem bulk recall, and a recall of all layouts. Also note that initiate_bulk_draining() currently calls iput() while holding locks. Fix that too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-02-14 13:22:50 -05:00
Eric W. Biederman	9ff593c473	nfs: kuid and kgid conversions for nfs/inode.c - Use uid_eq and gid_eq when comparing kuids and kgids. - Use make_kuid(&init_user_ns, -2) and make_kgid(&init_user_ns, -2) as the initial uid and gid on nfs inodes, instead of using the typeunsafe value of -2. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:33 -08:00
Eric W. Biederman	e5782076e7	nfs: Convert nfs4xdr to use kuids and kgids When reading uids and gids off the wire convert them to kuids and kgids. When putting kuids and kgids onto the wire first convert them to uids and gids the other side will understand. When printing kuids and kgids convert them to values in the initial user namespace then use normal printf formats. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:32 -08:00
Eric W. Biederman	57a38dae2a	nfs: Convert nfs3xdr to use kuids and kgids When reading uids and gids off the wire convert them to kuids and kgids. When putting kuids and kgids onto the wire first convert them to uids and gids the other side will understand. Add an additional failure mode incoming for uids or gids that are invalid. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:31 -08:00
Eric W. Biederman	cfa0898d4f	nfs: Convert nfs2xdr to use kuids and kgids When reading uids and gids off the wire convert them to kuids and kgids. When putting kuids and kgids onto the wire first convert them to uids and gids the other side will understand. Add an additional failure mode for incoming uid or gids that are invalid. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:30 -08:00
Eric W. Biederman	9f309c86cf	nfs: Convert idmap to use kuids and kgids Convert nfs_map_name_to_uid to return a kuid_t value. Convert nfs_map_name_to_gid to return a kgid_t value. Convert nfs_map_uid_to_name to take a kuid_t paramater. Convert nfs_map_gid_to_name to take a kgid_t paramater. Tweak nfs_fattr_map_owner_to_name to use a kuid_t intermediate value. Tweak nfs_fattr_map_group_to_name to use a kgid_t intermediate value. Which makes these functions properly handle kuids and kgids, including erroring of the generated kuid or kgid is invalid. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:29 -08:00
Eric W. Biederman	4e963d4f3e	nfs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring alloc Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-02-13 06:15:27 -08:00
Trond Myklebust	c8da19b986	NFSv4.1: Fix an ABBA locking issue with session and state serialisation Ensure that if nfs_wait_on_sequence() causes our rpc task to wait for an NFSv4 state serialisation lock, then we also drop the session slot. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-02-11 19:04:25 -05:00
Trond Myklebust	c21443c2c7	NFSv4: Fix a reboot recovery race when opening a file If the server reboots after it has replied to our OPEN, but before we call nfs4_opendata_to_nfs4_state(), then the reboot recovery thread will not see a stateid for this open, and so will fail to recover it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-11 15:33:14 -05:00
Trond Myklebust	65b62a29f7	NFSv4: Ensure delegation recall and byte range lock removal don't conflict Add a mutex to the struct nfs4_state_owner to ensure that delegation recall doesn't conflict with byte range lock removal. Note that we nest the new mutex _outside_ the state manager reclaim protection (nfsi->rwsem) in order to avoid deadlocks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-11 15:33:13 -05:00
Trond Myklebust	37380e4264	NFSv4: Fix up the return values of nfs4_open_delegation_recall Adjust the return values so that they return EAGAIN to the caller in cases where we might want to retry the delegation recall after the state recovery has run. Note that we can't wait and retry in this routine, because the caller may be the state manager thread. If delegation recall fails due to a session or reboot related issue, also ensure that we mark the stateid as delegated so that nfs_delegation_claim_opens can find it again later. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-11 15:33:13 -05:00
Trond Myklebust	d25be546a8	NFSv4.1: Don't lose locks when a server reboots during delegation return If the server reboots while we are converting a delegation into OPEN/LOCK stateids as part of a delegation return, the current code will simply exit with an error. This causes us to lose both delegation state and locking state (i.e. locking atomicity). Deal with this by exposing the delegation stateid during delegation return, so that we can recover the delegation, and then resume open/lock recovery. Note that not having to hold the nfs_inode->rwsem across the calls to nfs_delegation_claim_opens() also fixes a deadlock against the NFSv4.1 reboot recovery code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-11 15:33:12 -05:00
Trond Myklebust	9a99af494b	NFSv4.1: Prevent deadlocks between state recovery and file locking We currently have a deadlock in which the state recovery thread ends up blocking due to one of the locks which it is trying to recover holding the nfs_inode->rwsem. The situation is as follows: the state recovery thread is scheduled in order to recover from a reboot. It immediately drains the session, forcing all ordinary NFSv4.1 calls to nfs41_setup_sequence() to be put to sleep. This includes the file locking process that holds the nfs_inode->rwsem. When the thread gets to nfs4_reclaim_locks(), it tries to grab a write lock on nfs_inode->rwsem, and boom... Fix is to have the lock drop the nfs_inode->rwsem while it is doing RPC calls. We use a sequence lock in order to signal to the locking process whether or not a state recovery thread has run on that inode, in which case it should retry the lock. Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-11 15:33:12 -05:00
Trond Myklebust	c137afabe3	NFSv4: Allow the state manager to mark an open_owner as being recovered This patch adds a seqcount_t lock for use by the state manager to signal that an open owner has been recovered. This mechanism will be used by the delegation, open and byte range lock code in order to figure out if they need to replay requests due to collisions with lock recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-11 15:33:11 -05:00
Jeff Layton	5976687a2b	sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h These routines are used by server and client code, so having them in a separate header would be best. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-02-05 09:41:14 -05:00
Trond Myklebust	322b2b9032	Revert "NFS: add nfs_sb_deactive_async to avoid deadlock" This reverts commit `324d003b0c`. The deadlock turned out to be caused by a workqueue limitation that has now been worked around in the RPC code (see comment in rpc_free_task). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-02-01 10:13:48 -05:00
Trond Myklebust	c489ee290b	NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It usually means that the server is busy handling an unfinished RPC request. Just sleep for a second and then retry. We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return value. If the NFS server has outstanding callbacks, we just want to similarly sleep & retry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-01-30 17:45:15 -05:00
Trond Myklebust	ab22541782	NFS: Don't silently fail setattr() requests on mountpoints Ensure that any setattr and getattr requests for junctions and/or mountpoints are sent to the server. Ever since commit `0ec26fd069` (vfs: automount should ignore LOOKUP_FOLLOW), we have silently dropped any setattr requests to a server-side mountpoint. For referrals, we have silently dropped both getattr and setattr requests. This patch restores the original behaviour for setattr on mountpoints, and tries to do the same for referrals, provided that we have a filehandle... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-01-30 17:41:04 -05:00
Trond Myklebust	65436ec0c8	NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery We do need to start the lease recovery thread prior to waiting for the client initialisation to complete in NFSv4.1. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ben Greear <greearb@candelatech.com> Cc: stable@vger.kernel.org [>=3.7]	2013-01-27 15:51:41 -05:00
Trond Myklebust	202c312dba	NFSv4: Fix NFSv4 trunking discovery If walking the list in nfs4[01]_walk_client_list fails, then the most likely explanation is that the server dropped the clientid before we actually managed to confirm it. As long as our nfs_client is the very last one in the list to be tested, the caller can be assured that this is the case when the final return value is NFS4ERR_STALE_CLIENTID. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org [>=3.7] Tested-by: Ben Greear <greearb@candelatech.com>	2013-01-27 15:51:28 -05:00
Trond Myklebust	4ae19c2dd7	NFSv4: Fix NFSv4 reference counting for trunked sessions The reference counting in nfs4_init_client assumes wongly that it is safe for nfs4_discover_server_trunking() to return a pointer to a nfs_client prior to bumping the reference count. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ben Greear <greearb@candelatech.com> Cc: stable@vger.kernel.org [>=3.7]	2013-01-27 15:51:15 -05:00
Trond Myklebust	dee972b967	NFS: Fix error reporting in nfs_xdev_mount Currently, nfs_xdev_mount converts all errors from clone_server() to ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that. Also ensure that if nfs_fs_mount_common() returns an error, we don't dprintk(0)... The regression originated in commit `3d176e3fe4` (NFS: Use nfs_fs_mount_common() for xdev mounts) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.5]	2013-01-27 15:51:15 -05:00
Nickolai Zeldovich	ecf0eb9edb	nfs: avoid dereferencing null pointer in initiate_bulk_draining Fix an inverted null pointer check in initiate_bulk_draining(). Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]	2013-01-05 14:26:51 -05:00
Trond Myklebust	6db6dd7d3f	NFS: Ensure that we free the rpc_task after read and write cleanups are done This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org [>= 3.5]	2013-01-04 12:59:10 -05:00
Xi Wang	e25fbe380c	nfs: fix null checking in nfs_get_option_str() The following null pointer check is broken. option = match_strdup(args); return !option; The pointer `option' must be non-null, and thus `!option' is always false. Use `!option' instead. The bug was introduced in commit `c5cb09b6f8` ("Cleanup: Factor out some cut-and-paste code."). Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-01-04 10:54:43 -05:00
Yanchuan Nian	39e88fcfb1	pnfs: Increase the refcount when LAYOUTGET fails the first time The layout will be set unusable if LAYOUTGET fails. Is it reasonable to increase the refcount iff LAYOUTGET fails the first time? Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]	2013-01-04 10:50:42 -05:00
Weston Andros Adamson	f8d9a897d4	NFS: Fix access to suid/sgid executables nfs_open_permission_mask() should only check MAY_EXEC for files that are opened with __FMODE_EXEC. Also fix NFSv4 access-in-open path in a similar way -- openflags must be used because fmode will not always have FMODE_EXEC set. This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=49101 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2013-01-03 17:06:27 -05:00
Trond Myklebust	c4271c6e37	NFS: Kill fscache warnings when mounting without -ofsc The fscache code will currently bleat a "non-unique superblock keys" warning even if the user is mounting without the 'fsc' option. There should be no reason to even initialise the superblock cache cookie unless we're planning on using fscache for something, so ensure that we check for the NFS_OPTION_FSCACHE flag before calling into the fscache code. Reported-by: Paweł Sikora <pawel.sikora@agmk.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David Howells <dhowells@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-21 08:32:09 -08:00
David Howells	c129c29347	NFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=n Provide a stub nfs_fscache_wait_on_invalidate() function for when CONFIG_NFS_FSCACHE=n lest the following error appear: fs/nfs/inode.c: In function 'nfs_invalidate_mapping': fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-21 08:06:48 -08:00
David Howells	a4ff146881	NFS4: Open files for fscaching nfs4_file_open() should open files for fscaching. Signed-off-by: David Howells <dhowells@redhat.com>	2012-12-20 22:19:42 +00:00
David Howells	8c209ce721	NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably leading to the following bad-page-state: BUG: Bad page state in process python-bin pfn:17d39b page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null) index:38686 (Tainted: G B ---------------- ) Pid: 31053, comm: python-bin Tainted: G B ---------------- 2.6.32-71.24.1.el6.x86_64 #1 Call Trace: [<ffffffff8111bfe7>] bad_page+0x107/0x160 [<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220 [<ffffffff8111ef19>] __pagevec_free+0x59/0xb0 [<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130 [<ffffffff8112230c>] release_pages+0x21c/0x250 [<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0 [<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70 [<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180 [<ffffffff811226f8>] __lru_cache_add+0x58/0x70 [<ffffffff81122731>] lru_cache_add_lru+0x21/0x40 [<ffffffff81123f49>] putback_lru_page+0x69/0x100 [<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0 [<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180 [<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370 [<ffffffff8115255c>] compact_zone+0x4cc/0x600 [<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820 [<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0 [<ffffffff8115290e>] compact_zone_order+0x7e/0xb0 [<ffffffff81152a49>] try_to_compact_pages+0x109/0x170 [<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850 [<ffffffff814c9136>] ? thread_return+0x4e/0x778 [<ffffffff81150d43>] alloc_pages_vma+0x93/0x150 [<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340 [<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30 [<ffffffff81136755>] handle_mm_fault+0x245/0x2b0 [<ffffffff814ce383>] do_page_fault+0x123/0x3a0 [<ffffffff814cbdf5>] page_fault+0x25/0x30 nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait - even if __GFP_WAIT is set. The reason that doesn't wait is that fscache_maybe_release_page() might deadlock the allocator as the work threads writing to the cache may all end up sleeping on memory allocation. However, I wonder if that is actually a problem. There are a number of things I can do to deal with this: (1) Make nfs_migrate_page() wait. (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag. (3) Set a timeout around the wait. (4) Make nfs_migrate_page() return an error if the page is still busy. For the moment, I'll select (2) and (4). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com>	2012-12-20 22:12:03 +00:00
David Howells	de242c0b8b	NFS: Use FS-Cache invalidation Use the new FS-Cache invalidation facility from NFS to deal with foreign changes being detected on the server rather than attempting to retire the old cookie and get a new one. The problem with the old method was that NFS did not wait for all outstanding storage and retrieval ops on the cache to complete. There was no automatic wait between the calls to ->readpages() and calls to invalidate_inode_pages2() as the latter can only wait on locked pages that have been added to the pagecache (which they haven't yet on entry to ->readpages()). This was leading to oopses like the one below when an outstanding read got cut off from its cookie by a premature release. BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] PGD 15889067 PUD 15890067 PMD 0 Oops: 0000 [#1] SMP CPU 0 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY RIP: 0010:[<ffffffffa0075118>] [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] RSP: 0018:ffff8800158799e8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0 RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90 RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002 R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4 R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8 FS: 00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040) Stack: ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508 ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8 ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040 Call Trace: [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs] [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs] [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662 [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs] [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b Reported-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>	2012-12-20 22:06:33 +00:00
Linus Torvalds	2d4dce0070	NFS client updates for Linux 3.8 Features include: - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code Remove altogether where possible, and replace with WARN_ON_ONCE and appropriate error returns where not. - NFSv4.1 client adds session dynamic slot table management. There is matching server side code that has been submitted to Bruce for consideration. Together, this code allows the server to dynamically manage the amount of memory it allocates to the duplicate request cache for each client. It will constantly resize those caches to reserve more memory for clients that are hot while shrinking caches for those that are quiescent. In addition, there are assorted bugfixes for the generic NFS write code, fixes to deal with the drop_nlink() warnings, and yet another fix for NFSv4 getacl. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQz8VNAAoJEGcL54qWCgDy7iYQAKbr7AAZOcZPoJigzakZ7nMi UKYulGbFais2Llwzw1e+U5RzmorTSbvl7/m8eS7pDf3auYw/t4xtXjKSGZUNxaE1 q2hNKgVwodMbScYdkZXvKKNckS93oPDttrmEyzjKanqey+1E3HSklvOvikN0ihte B/G1OtA7Qpcr92bPrLK+PjDqarCBUI4g42dYbZOBrZnXKTRtzUqsuKPu7WjpPiof SHE5b1Emt7oUxgcijWGcvYCQ8voZdeSCnSksH3DgvORlutwdhUD3Yg8KyEfFZdyc 6C59ozXRLiHkV3c+jMhJzDkQXR9bYHrnK3tlq4G8v1NdJxRktQliZeqecRvip/Wz rAxfE6fnPDEvKsCpZb3+5yTAt+aZwzEhRg1fFC9qfGOp+oRa+CWw5kJCyIFHwJu6 4LOlubQAf6rnIsja1L8D0FdeqHUa1+wy61On5kgVYS5JGtoBsQHpa1zTwdOxPmsR 2XTMYGNCEabvpKpO9+5xQbUzkFExPTesw47ygXiUuDT/snaarpV3/f05SSCaWZkX R8QsGEOXTIh8/S+UxARGpc7H6xi1PdBM5nBziHVzjEdHgZRF4wGFaJe2CirMjSJO Df5GEd5Z/8VCGWs+1w7HD5EaQ2n0wbt5daCE80Y2jRBr7NMYnY+ciF8/GktLpHsn Zq1bXGOdr3UZ92LXuzL9 =G3N9 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Features include: - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code. Remove altogether where possible, and replace with WARN_ON_ONCE and appropriate error returns where not. - NFSv4.1 client adds session dynamic slot table management. There is matching server side code that has been submitted to Bruce for consideration. Together, this code allows the server to dynamically manage the amount of memory it allocates to the duplicate request cache for each client. It will constantly resize those caches to reserve more memory for clients that are hot while shrinking caches for those that are quiescent. In addition, there are assorted bugfixes for the generic NFS write code, fixes to deal with the drop_nlink() warnings, and yet another fix for NFSv4 getacl." * tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits) SUNRPC: continue run over clients list on PipeFS event instead of break NFS: Don't use SetPageError in the NFS writeback code SUNRPC: variable 'svsk' is unused in function bc_send_request SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket NFSv4.1: Deal effectively with interrupted RPC calls. NFSv4.1: Move the RPC timestamp out of the slot. NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED. NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0 NFS: Fix calls to drop_nlink() NFS: Ensure that we always drop inodes that have been marked as stale nfs: Remove unused list nfs4_clientid_list nfs: Remove duplicate function declaration in internal.h NFS: avoid NULL dereference in nfs_destroy_server SUNRPC handle EKEYEXPIRED in call_refreshresult SUNRPC set gss gc_expiry to full lifetime nfs: fix page dirtying in NFS DIO read codepath nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ NFSv4.1: Be conservative about the client highest slotid NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly nfs: don't extend writes to cover entire page if pagecache is invalid ...	2012-12-18 09:36:34 -08:00
Andrew Morton	965c8e59cf	lseek: the "whence" argument is called "whence" But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:12 -08:00
Linus Torvalds	2a74dbb9a8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "A quiet cycle for the security subsystem with just a few maintenance updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: create a sysfs mount point for smackfs Smack: use select not depends in Kconfig Yama: remove locking from delete path Yama: add RCU to drop read locking drivers/char/tpm: remove tasklet and cleanup KEYS: Use keyring_alloc() to create special keyrings KEYS: Reduce initial permissions on keys KEYS: Make the session and process keyrings per-thread seccomp: Make syscall skipping and nr changes more consistent key: Fix resource leak keys: Fix unreachable code KEYS: Add payload preparsing opportunity prior to key instantiate or update	2012-12-16 15:40:50 -08:00
Trond Myklebust	ada8e20d04	NFS: Don't use SetPageError in the NFS writeback code The writeback code is already capable of passing errors back to user space by means of the open_context->error. In the case of ENOSPC, Neil Brown is reporting seeing 2 errors being returned. Neil writes: "e.g. if /mnt2/ if an nfs mounted filesystem that has no space then strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1 reported Input/output error and the relevant parts of the strace output are: write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 fsync(1) = -1 EIO (Input/output error) close(1) = -1 ENOSPC (No space left on device)" Neil then shows that the duplication of error messages appears to be due to the use of the PageError() mechanism, which causes filemap_fdatawait_range to return the extra EIO. The regression was introduced by commit `7b281ee026` (NFS: fsync() must exit with an error if page writeback failed). Fix this by removing the call to SetPageError(), and just relying on open_context->error reporting the ENOSPC back to fsync(). Reported-by: Neil Brown <neilb@suse.de> Tested-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [3.6+]	2012-12-15 17:12:14 -05:00
Trond Myklebust	ac20d163fc	NFSv4.1: Deal effectively with interrupted RPC calls. If an RPC call is interrupted, assume that the server hasn't processed the RPC call so that the next time we use the slot, we know that if we get a NFS4ERR_SEQ_MISORDERED or NFS4ERR_SEQ_FALSE_RETRY, we just have to bump the sequence number. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-15 15:39:59 -05:00
Trond Myklebust	8e63b6a8ad	NFSv4.1: Move the RPC timestamp out of the slot. Shave a few bytes off the slot table size by moving the RPC timestamp into the sequence results. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-15 15:21:52 -05:00
Trond Myklebust	e879444084	NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED. If the server returns NFS4ERR_SEQ_MISORDERED, it could be a sign that the slot was retired at some point. Retry the attempt after reinitialising the slot sequence number to 1. Also add a handler for NFS4ERR_SEQ_FALSE_RETRY. Just bump the slot sequence number and retry... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-15 14:49:09 -05:00
Trond Myklebust	65a0c14954	NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0 If the inode has no links, then we should force a new lookup. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-14 17:51:40 -05:00
Trond Myklebust	1f018458b3	NFS: Fix calls to drop_nlink() It is almost always wrong for NFS to call drop_nlink() after removing a file. What we really want is to mark the inode's attributes for revalidation, and we want to ensure that the VFS drops it if we're reasonably sure that this is the final unlink(). Do the former using the usual cache validity flags, and the latter by testing if inode->i_nlink == 1, and clearing it in that case. This also fixes the following warning reported by Neil Brown and Jeff Layton (among others). [634155.004438] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442] Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm: bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace: [634155.004630] [<ffffffff8100444a>] dump_trace+0xaa/0x2b0 [634155.004641] [<ffffffff815a23dc>] dump_stack+0x69/0x6f [634155.004653] [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0 [634155.004662] [<ffffffff811832e4>] drop_nlink+0x34/0x40 [634155.004687] [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs] [634155.004714] [<ffffffff8118049e>] dput+0x12e/0x230 [634155.004726] [<ffffffff8116b230>] __fput+0x170/0x230 [634155.004735] [<ffffffff81167c0f>] filp_close+0x5f/0x90 [634155.004743] [<ffffffff81167cd7>] sys_close+0x97/0x100 [634155.004754] [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b [634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [3.3+]	2012-12-14 17:45:11 -05:00
Trond Myklebust	eed9935745	NFS: Ensure that we always drop inodes that have been marked as stale There is no need to cache stale inodes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-14 14:36:36 -05:00
Yanchuan Nian	48d7a57693	nfs: Remove unused list nfs4_clientid_list This list was designed to store struct nfs4_client in the client side. But nfs4_client was obsolete and has been removed from the source code. So remove the unused list. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-13 10:40:09 -05:00
Yanchuan Nian	aaea7d2f78	nfs: Remove duplicate function declaration in internal.h Remove duplicate function declaration in internal.h Signed-off-by: Yanchuan Nian <ycnian@gmail.com> [Trond: Added nfs_pageio_init_read, which suffered from the same problem] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-13 10:38:54 -05:00
NeilBrown	f259613a1e	NFS: avoid NULL dereference in nfs_destroy_server In rare circumstances, nfs_clone_server() of a v2 or v3 server can get an error between setting server->destory (to nfs_destroy_server), and calling nfs_start_lockd (which will set server->nlm_host). If this happens, nfs_clone_server will call nfs_free_server which will call nfs_destroy_server and thence nlmclnt_done(NULL). This causes the NULL to be dereferenced. So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL. The other guards there are irrelevant as nlm_host can only be non-NULL if one of these flags are set - so remove those tests. (Thanks to Trond for this suggestion). This is suitable for any stable kernel since 2.6.25. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-12 23:55:56 -05:00
Andy Adamson	eb96d5c97b	SUNRPC handle EKEYEXPIRED in call_refreshresult Currently, when an RPCSEC_GSS context has expired or is non-existent and the users (Kerberos) credentials have also expired or are non-existent, the client receives the -EKEYEXPIRED error and tries to refresh the context forever. If an application is performing I/O, or other work against the share, the application hangs, and the user is not prompted to refresh/establish their credentials. This can result in a denial of service for other users. Users are expected to manage their Kerberos credential lifetimes to mitigate this issue. Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number of times to refresh the gss_context, and then return -EACCES to the application. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-12 15:36:02 -05:00
Jeff Layton	be7e985804	nfs: fix page dirtying in NFS DIO read codepath The NFS DIO code will dirty pages that catch read responses in order to handle the case where someone is doing DIO reads into an mmapped buffer. The existing code doesn't really do the right thing though since it doesn't take into account the case where we might be attempting to read past the EOF. Fix the logic in that code to only dirty pages that ended up receiving data from the read. Note too that it really doesn't matter if NFS_IOHDR_ERROR is set or not. All that matters is if the page was altered by the read. Cc: Fred Isaman <iisaman@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-12 12:56:19 -05:00
Jeff Layton	67fad106a2	nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ Eryu provided a test program that would segfault when attempting to read past the EOF on file that was opened O_DIRECT. The buffer given to the read() call was on the stack, and when he attempted to read past it it would scribble over the rest of the stack page. If we hit the end of the file on a DIO READ request, then we don't want to zero out the rest of the buffer. These aren't pagecache pages after all, and there's no guarantee that the buffers that were passed in represent entire pages. Cc: <stable@vger.kernel.org> # v3.5+ Cc: Fred Isaman <iisaman@netapp.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-12 12:56:09 -05:00
Trond Myklebust	b0ef9647a0	NFSv4.1: Be conservative about the client highest slotid If the server sends us a target that looks like an outlier, but is lower than the existing target, then respect it anyway. However defer actually updating the generation counter until we get a target that doesn't look like an outlier. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-11 12:29:10 -05:00
Trond Myklebust	8556307374	NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly Most (all) NFS4ERR_BADSLOT errors are due to the client failing to respect the server's sr_highest_slotid limit. This mainly happens due to reordered RPC requests. The way to handle it is simply to drop the slot that we're using, and retry using the new highest_slotid limits. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-11 10:31:12 -05:00
Trond Myklebust	7ce0171d4f	Merge branch 'bugfixes' into nfs-for-next	2012-12-11 09:16:26 -05:00
Jeff Layton	81d9bce530	nfs: don't extend writes to cover entire page if pagecache is invalid Jian reported that the following sequence would leave "testfile" with corrupt data: # mount localhost:/export /mnt/nfs/ -o vers=3 # echo abc > /mnt/nfs/testfile; echo def >> /export/testfile; echo ghi >> /mnt/nfs/testfile # cat -v /export/testfile abc ^@^@^@^@ghi While there's no locking involved here, the operations are serialized, so CTO should prevent corruption. The first write to the file is fine and writes 4 bytes. The file is then extended on the server. When it's reopened a GETATTR is issued and the size change is noticed. This causes NFS_INO_INVALID_DATA to be set on the file. Because the file is opened for write only, nfs_want_read_modify_write() returns 0 to nfs_write_begin(). nfs_updatepage then calls nfs_write_pageuptodate() to see if it should extend the nfs_page to cover the whole page. NFS_INO_INVALID_DATA is still set on the file at that point, but that flag is ignored and nfs_pageuptodate erroneously extends the write to cover the whole page, with the write done on the server side filled in with zeroes. This patch just has that function check for NFS_INO_INVALID_DATA in addition to NFS_INO_REVAL_PAGECACHE. This fixes the bug, but looking over the code, I wonder if we might have a similar bug in nfs_revalidate_size(). The difference between those two flags is very subtle, so it seems like we ought to be checking for NFS_INO_INVALID_DATA in most of the places that we look for NFS_INO_REVAL_PAGECACHE. I believe this is regression introduced by commit `8d197a568`. The code did check for NFS_INO_INVALID_DATA prior to that patch. Original bug report is here: https://bugzilla.redhat.com/show_bug.cgi?id=885743 Cc: <stable@vger.kernel.org> # 3.5+ Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-11 09:14:51 -05:00
Sven Wegener	7d3e91a89b	NFSv4: Check for buffer length in __nfs4_get_acl_uncached Commit `1f1ea6c` "NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached" accidently dropped the checking for too small result buffer length. If someone uses getxattr on "system.nfs4_acl" on an NFSv4 mount supporting ACLs, the ACL has not been cached and the buffer suplied is too short, we still copy the complete ACL, resulting in kernel and user space memory corruption. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-11 09:14:50 -05:00
Trond Myklebust	1fa8064429	NFSv4.1: Try to eliminate outliers when updating target_highest_slotid Look for sudden changes in the first and second derivatives in order to eliminate outlier changes to target_highest_slotid (which are due to out-of-order RPC replies). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:53 +01:00
Trond Myklebust	b75ad4cda5	NFSv4.1: Ensure smooth handover of slots from one task to the next waiting Currently, we see a lot of bouncing for the value of highest_used_slotid due to the fact that slots are getting freed, instead of getting instantly transmitted to the next waiting task. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:52 +01:00
Trond Myklebust	1e1093c7fd	NFSv4.1: Don't mess with task priorities in nfs41_setup_sequence We want to preserve the rpc_task priority for things like writebacks, that may have differing levels of urgency. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:51 +01:00
Bryan Schumaker	104287cd4e	NFS: Remove _nfs_call_sync_session All it does is pass its arguments through to another function. Let's cut out the middleman... Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:51 +01:00
Trond Myklebust	8fe72bac8d	NFSv4: Clean up handling of privileged operations Privileged rpc calls are those that are run by the state recovery thread, in cases where we're trying to recover the system after a server reboot or a network partition. In those cases, we want to fence off all other rpc calls (see nfs4_begin_drain_session()) so that they don't end up using stateids or clientids that are in the process of being recovered. Prior to this patch, we had to set up special callback functions in order to declare an rpc call as being privileged. By adding a new field to the sequence arguments, this patch simplifies things considerably, and allows us to declare the rpc call as privileged before it is run. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:50 +01:00
Trond Myklebust	275e7e20aa	NFSv4.1: Remove the 'FIFO' behaviour for nfs41_setup_sequence It is more important to preserve the task priority behaviour, which ensures that things like reclaim writes take precedence over background and kupdate writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:50 +01:00
Trond Myklebust	7b939a3f44	NFSv4.1: Clean up nfs41_setup_sequence Move all the sleep-and-exit cases into a single section of code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:49 +01:00
Trond Myklebust	fd0c09537a	NFSv4: Simplify the NFSv4/v4.1 synchronous call switch We shouldn't need to pass the 'cache_reply' parameter if we initialise the sequence_args/sequence_res in the caller. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:49 +01:00
Trond Myklebust	d9afbd1b08	NFSv4.1: Simplify the sequence setup Nobody calls nfs4_setup_sequence or nfs41_setup_sequence without also calling rpc_call_start() on success. This commit therefore folds the rpc_call_start call into nfs41_setup_sequence(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:48 +01:00
Trond Myklebust	6ba7db3420	NFSv4.1: Use nfs41_setup_sequence where appropriate There is no point in using nfs4_setup_sequence or nfs4_sequence_done in pure NFSv4.1 functions. We already know that those have sessions... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:48 +01:00
Trond Myklebust	c10e449827	NFSv4.1: Ping server when our session table limits are too high If the server requests a lower target_highest_slotid, then ensure that we ping it with at least one RPC call containing an appropriate SEQUENCE op. This ensures that the server won't need to send a recall callback in order to shrink the slot table. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:47 +01:00
Trond Myklebust	0ca3f4825a	NFSv4.1: Set the maximum slot table size to 1024 slots This means that we end up statically allocating 128 bytes for the bitmap on each slot table. For a server that supports 1MB write and read I/O sizes this means that we can completely fill the maximum 1GB TCP send/receive windows. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:47 +01:00
Trond Myklebust	76e697ba7e	NFSv4.1: Move slot table and session struct definitions to nfs4session.h Clean up. Gather NFSv4.1 slot definitions in fs/nfs/nfs4session.h. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:46 +01:00
Trond Myklebust	73e39aaa83	NFSv4.1: Cleanup move session slot management to fs/nfs/nfs4session.c NFSv4.1 session management is getting complex enough to deserve a separate file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:45 +01:00
Trond Myklebust	3302127967	NFSv4: Move nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease are both generic state related functions. As such, they belong in nfs4state.c, and not nfs4proc.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:45 +01:00
Trond Myklebust	5d63360dd8	NFSv4.1: Clean up session draining Coalesce nfs4_check_drain_bc_complete and nfs4_check_drain_fc_complete into a single function that can be called when the slot table is known to be empty, then change nfs4_callback_free_slot() and nfs4_free_slot() to use it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:44 +01:00
Trond Myklebust	69d206b5b3	NFSv4.1: If slot allocation fails due to OOM, retry more quickly If the NFSv4.1 session slot allocation fails due to an ENOMEM condition, then set the task->tk_timeout to 1/4 second to ensure that we do retry the slot allocation more quickly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:44 +01:00
Trond Myklebust	ac0748359a	NFSv4.1: CB_RECALL_SLOT must schedule a sequence op after updating targets RFC5661 requires us to make sure that the server knows we've updated our slot table size by sending at least one SEQUENCE op containing the new 'highest_slotid' value. We can do so using the 'CHECK_LEASE' functionality of the state manager. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:43 +01:00
Trond Myklebust	afa296103e	NFSv4.1: Remove the state manager code to resize the slot table The state manager no longer needs any special machinery to stop the session flow and resize the slot table. It is all done on the fly by the SEQUENCE op code now. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:43 +01:00
Trond Myklebust	87dda67e73	NFSv4.1: Allow SEQUENCE to resize the slot table on the fly Instead of an array of slots, use a singly linked list of slots that can be dynamically appended to or shrunk. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:42 +01:00
Trond Myklebust	97e548a93d	NFSv4.1: Support dynamic resizing of the session slot table Allow the server to control the size of the session slot table by adjusting the value of sr_target_max_slots in the reply to the SEQUENCE operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:42 +01:00
Trond Myklebust	1b285ff16a	NFSv4.1: Allow the server to recall all but one slot If the server wants to leave us with only one slot, or it wants to "shrink" our slot table to something larger than we have now, then so be it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:42 +01:00
Trond Myklebust	d5fb4ce33e	NFSv4.1: Don't confuse target_highest_slotid and max_slots in cb_recall_slot Don't confuse the table size and the target_highest_slotid... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:41 +01:00
Trond Myklebust	ce008c4bb9	NFSv4.1: Fix nfs4_callback_recallslot to work with dynamic slot allocation Ensure that the NFSv4.1 CB_RECALL_SLOT callback updates the slot table target max slotid safely. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:37 +01:00
Trond Myklebust	da0507b7c9	NFSv4.1: Reset the sequence number for slots that have been deallocated When the server tells us that it is dynamically resizing the session replay cache, we should reset the sequence number for those slots that have been deallocated. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:30:17 +01:00
Trond Myklebust	464ee9f966	NFSv4.1: Ensure that the client tracks the server target_highest_slotid Dynamic slot allocation in NFSv4.1 depends on the client being able to track the server's target value for the highest slotid in the slot table. See the reference in Section 2.10.6.1 of RFC5661. To avoid ordering problems in the case where 2 SEQUENCE replies contain conflicting updates to this target value, we also introduce a generation counter, to track whether or not an RPC containing a SEQUENCE operation was launched before or after the last update. Also rename the nfs4_slot_table target_max_slots field to 'target_highest_slotid' to avoid confusion with a slot table size or number of slots. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-12-06 00:29:47 +01:00
Al Viro	c44600c9d1	nfs_lookup_revalidate(): fix a leak We are leaking fattr and fhandle if we decide that dentry is not to be invalidated, after all (e.g. happens to be a mountpoint). Just free both before that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-29 22:04:36 -05:00
Al Viro	696199f8cc	don't do blind d_drop() in nfs_prime_dcache() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-29 22:00:51 -05:00
Trond Myklebust	f4af6e2abc	NFSv4.1: Clean up nfs4_free_slot Change the argument to take the pointer to the slot, instead of just the slotid. We know that the new value of highest_used_slot must be less than the current value. No need to scan the whole table. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-26 17:49:53 -05:00
Trond Myklebust	2dc03b7f00	NFSv4.1: Simplify slot allocation Clean up the NFSv4.1 slot allocation by replacing nfs_find_slot() with a function nfs_alloc_slot() that returns a pointer to the nfs4_slot instead of an offset into the slot table. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-26 17:49:52 -05:00
Trond Myklebust	2b2fa71723	NFSv4.1: Simplify struct nfs4_sequence_args too Replace the session pointer + slotid with a pointer to the allocated slot. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-26 17:49:52 -05:00
Trond Myklebust	df2fabffba	NFSv4.1: Label each entry in the session slot tables with its slot number Instead of doing slot table pointer gymnastics every time we want to know which slot we're using. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-26 17:49:51 -05:00
Trond Myklebust	e3725ec015	NFSv4.1: Shrink struct nfs4_sequence_res by moving the session pointer Move the session pointer into the slot table, then have struct nfs4_slot point to that slot table. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-26 17:49:04 -05:00
Yanchuan Nian	4c10021008	nfs: Fix wrong slab cache in nfs_commit_mempool The slab cache in nfs_commit_mempool is wrong, and I think it is just a slip. I tested it on a x86-32 machine, the size of nfs_write_header is 544, and the size of nfs_commit_data is 408, so it works fine. It is also true that sizeof(struct nfs_write_header) > sizeof(struct nfs_commit_data) on other platforms in my opinoin. Just fix it. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-25 11:59:33 -05:00
Jim Rees	d751f748b3	NFS: Reduce stack use in encode_exchange_id() encode_exchange_id() uses more stack space than necessary, giving a compile time warning. Reduce the size of the static buffer for implementation name. Signed-off-by: Jim Rees <rees@umich.edu> Reviewed-by: "Adamson, Dros" <Weston.Adamson@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 22:59:29 -05:00
Trond Myklebust	fe20d7d5ee	NFSv4: Fix a compile time warning when #undef CONFIG_NFS_V4_1 The function nfs4_get_machine_cred_locked is used by NFSv4.0 routines too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 22:59:28 -05:00
Trond Myklebust	933602e368	NFSv4.1: Shrink struct nfs4_sequence_res by moving sr_renewal_time Store the renewal time inside the session slot instead. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 09:29:53 -05:00
Trond Myklebust	9216106a84	NFSv4.1: clean up nfs4_recall_slot to use nfs4_alloc_slots Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 09:29:53 -05:00
Trond Myklebust	2d473d378e	NFSv4.1: nfs4_alloc_slots doesn't need zeroing All that memory is going to be initialised to non-zero by nfs4_add_and_init_slots anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 09:29:52 -05:00
Trond Myklebust	43095d3972	NFSv4.1: We must bump the clientid sequence number after CREATE_SESSION We must always bump the clientid sequence number after a successful call to CREATE_SESSION on the server. The result of nfs4_verify_channel_attrs() is irrelevant to that requirement. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 09:29:52 -05:00
Trond Myklebust	688a9024e2	NFSv4.1: Adjust CREATE_SESSION arguments when mounting a new filesystem If we're mounting a new filesystem, ensure that the session has negotiated large enough request and reply sizes to match the wsize and rsize mount arguments. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 09:29:51 -05:00
Trond Myklebust	ae72ae6760	NFSv4.1: Don't confuse CREATE_SESSION arguments and results Don't store the target request and response sizes in the same variables used to store the server's replies to those targets. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 09:29:51 -05:00
Trond Myklebust	5df904aeb0	NFSv4.1: Handle session reset and bind_conn_to_session before lease check We can't send a SEQUENCE op unless the session is OK, so it is pointless to handle the CHECK_LEASE state before we've dealt with SESSION_RESET and BIND_CONN_TO_SESSION. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-21 09:28:42 -05:00
Bryan Schumaker	6bdb5f213c	NFS: Add sequence_priviliged_ops for nfs4_proc_sequence() If I mount an NFS v4.1 server to a single client multiple times and then run xfstests over each mountpoint I usually get the client into a state where recovery deadlocks. The server informs the client of a cb_path_down sequence error, the client then does a bind_connection_to_session and checks the status of the lease. I found that bind_connection_to_session sets the NFS4_SESSION_DRAINING flag on the client, but this flag is never unset before nfs4_check_lease() reaches nfs4_proc_sequence(). This causes the client to deadlock, halting all NFS activity to the server. nfs4_proc_sequence() is only called by the state manager, so I can change it to run in privileged mode to bypass the NFS4_SESSION_DRAINING check and avoid the deadlock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-11-20 23:34:54 -05:00
Trond Myklebust	28d79ea33f	NFS: Remove the BUG_ON() in the mount code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:39 -05:00
Trond Myklebust	f48407ddd4	NFS: Remove BUG_ON()s in the fs/nfs/inode.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:39 -05:00
Trond Myklebust	4ea8fed593	NFSv4: Get rid of unnecessary BUG_ON()s Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:39 -05:00
Trond Myklebust	deed85e760	NFS: Remove BUG_ON() calls from the generic writeback code ...and ensure that we set the return value for nfs_page_async_flush() to zero! (Reported-by: Dros Adamson) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:39 -05:00
Trond Myklebust	bc5a89b337	NFSv4.1: Remove assertion BUG_ON()s from the files and generic layout code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:39 -05:00
Trond Myklebust	eba24e1fe5	NFSv4.1: Remove unused function last_byte_offset Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:38 -05:00
Trond Myklebust	d3edcf9614	NFSv4: Remove the BUG_ON() from nfs4_get_lease_time_prepare()... An EAGAIN return value would be unexpected, but there is no reason to BUG... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:38 -05:00
Trond Myklebust	7fc388460e	NFS: Remove asserts from the NFS XDR code Convert the ones that are not trivial to check into WARN_ON_ONCE(). Remove checks for things such as NFS2_MAXPATHLEN, which are trivially done by the caller. Add a comment to the case of nfs3_xdr_enc_setacl3args. What is being done there is just wrong... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:38 -05:00
Trond Myklebust	1fea73a865	NFS: Get rid of unnecessary asserts If the nfs_client fails to initialise correctly, then it will return an error condition. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-04 14:43:38 -05:00
Weston Andros Adamson	998f40b550	NFS4: nfs4_opendata_access should return errno Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-02 18:51:54 -04:00
Trond Myklebust	f9b1ef5f06	NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-11-01 12:02:03 -04:00
Weston Andros Adamson	324d003b0c	NFS: add nfs_sb_deactive_async to avoid deadlock Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue context. This avoids a deadlock where rpc_shutdown_client loops forever in a workqueue kworker context, trying to kill all RPC tasks associated with the client, while one or more of these tasks have already been assigned to the same kworker (and will never run rpc_exit_task). This approach is needed because RPC tasks that have already been assigned to a kworker by queue_work cannot be canceled, as explained in the comment for workqueue.c:insert_wq_barrier. Signed-off-by: Weston Andros Adamson <dros@netapp.com> [Trond: add module_get/put.] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-31 16:26:26 -04:00
Ben Hutchings	97a5486826	nfs: Show original device name verbatim in /proc//mount{s,info} Since commit `c7f404b` ('vfs: new superblock methods to override /proc//mount{s,info}'), nfs_path() is used to generate the mounted device name reported back to userland. nfs_path() always generates a trailing slash when the given dentry is the root of an NFS mount, but userland may expect the original device name to be returned verbatim (as it used to be). Make this canonicalisation optional and change the callers accordingly. [jrnieder@gmail.com: use flag instead of bool argument] Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu> Reference: http://bugs.debian.org/669314 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: <stable@vger.kernel.org> # v2.6.39+ Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-31 16:26:26 -04:00
Scott Mayhew	acce94e68a	nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts In very busy v3 environment, rpc.mountd can respond to the NULL procedure but not the MNT procedure in a timely manner causing the MNT procedure to time out. The problem is the mount system call returns EIO which causes the mount to fail, instead of ETIMEDOUT, which would cause the mount to be retried. This patch sets the RPC_TASK_SOFT\|RPC_TASK_TIMEOUT flags to the rpc_call_sync() call in nfs_mount() which causes ETIMEDOUT to be returned on timed out connections. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-10-31 16:26:25 -04:00
Yanchuan Nian	7175fe9015	nfs: Check whether a layout pointer is NULL before free it The new layout pointer in pnfs_find_alloc_layout() may be NULL because of out of memory. we must do some check work, otherwise pnfs_free_layout_hdr() will go wrong because it can not deal with a NULL pointer. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-31 16:26:25 -04:00
NeilBrown	8d96b10639	NFS: fix bug in legacy DNS resolver. The DNS resolver's use of the sunrpc cache involves a 'ttl' number (relative) rather that a timeout (absolute). This confused me when I wrote commit `c5b29f885a` "sunrpc: use seconds since boot in expiry cache" and I managed to break it. The effect is that any TTL is interpreted as 0, and nothing useful gets into the cache. This patch removes the use of get_expiry() - which really expects an expiry time - and uses get_uint() instead, treating the int correctly as a ttl. This fixes a regression that has been present since 2.6.37, causing certain NFS accesses in certain environments to incorrectly fail. Reported-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-31 16:25:59 -04:00
Trond Myklebust	2b1bc308f4	NFSv4: nfs4_locku_done must release the sequence id If the state recovery machinery is triggered by the call to nfs4_async_handle_error() then we can deadlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-10-31 15:10:04 -04:00
Trond Myklebust	2240a9e2d0	NFSv4.1: We must release the sequence id when we fail to get a session slot If we do not release the sequence id in cases where we fail to get a session slot, then we can deadlock if we hit a recovery scenario. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-10-31 15:08:18 -04:00
Bryan Schumaker	399f11c3d8	NFS: Wait for session recovery to finish before returning Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-10-31 13:13:28 -04:00
Trond Myklebust	e9b7e91745	NFSv4: Fix the return value for nfs_callback_start_svc returning PTR_ERR(cb_info->task) just after we have set it to NULL looks like a typo... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>	2012-10-16 13:14:42 -04:00
Trond Myklebust	2e928e4878	NFSv4.1: Declare osd_pri_2_pnfs_err(), objio_init_read/write to be static Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-16 12:38:00 -04:00
Trond Myklebust	d6aa6a81d4	NFSv4: fs/nfs/nfs4getroot.c needs to include "internal.h" Fix a warning about "no previous prototype for ‘nfs4_get_rootfh’" Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-16 12:37:59 -04:00
Trond Myklebust	1813badd98	NFSv4.1: Use kcalloc() to allocate zeroed arrays instead of kzalloc() Don't circumvent the array size checks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-15 10:49:43 -04:00
Trond Myklebust	d527e5c15d	NFSv4.1: Do not call pnfs_return_layout() from an rpciod context Move the call to pnfs_return_layout() to the read and write rpc_release() callbacks, so that it gets called from nfsiod, which is a more appropriate context. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-15 10:49:43 -04:00
Trond Myklebust	8fcdc31b3d	NFSv4.1: Kill nfs4_ds_disconnect() There is nothing to prevent another thread from dereferencing ds->ds_clp during or after the call to nfs4_ds_disconnect(), and Oopsing due to the resulting NULL pointer. Instead, we should just rely on filelayout_mark_devid_invalid() to keep us out of trouble by avoiding that deviceid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-15 10:49:42 -04:00
Linus Torvalds	bd81ccea85	Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux Pull nfsd update from J Bruce Fields: "Another relatively quiet cycle. There was some progress on my remaining 4.1 todo's, but a couple of them were just of the form "check that we do X correctly", so didn't have much affect on the code. Other than that, a bunch of cleanup and some bugfixes (including an annoying NFSv4.0 state leak and a busy-loop in the server that could cause it to peg the CPU without making progress)." * 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits) UAPI: (Scripted) Disintegrate include/linux/sunrpc UAPI: (Scripted) Disintegrate include/linux/nfsd nfsd4: don't allow reclaims of expired clients nfsd4: remove redundant callback probe nfsd4: expire old client earlier nfsd4: separate session allocation and initialization nfsd4: clean up session allocation nfsd4: minor free_session cleanup nfsd4: new_conn_from_crses should only allocate nfsd4: separate connection allocation and initialization nfsd4: reject bad forechannel attrs earlier nfsd4: enforce per-client sessions/no-sessions distinction nfsd4: set cl_minorversion at create time nfsd4: don't pin clientids to pseudoflavors nfsd4: fix bind_conn_to_session xdr comment nfsd4: cast readlink() bug argument NFSD: pass null terminated buf to kstrtouint() nfsd: remove duplicate init in nfsd4_cb_recall nfsd4: eliminate redundant nfs4_free_stateid fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR ...	2012-10-13 10:53:54 +09:00
J. Bruce Fields	a9ca4043d0	Merge Trond's bugfixes Merge branch 'bugfixes' of git://linux-nfs.org/~trondmy/nfs-2.6 into for-3.7-incoming. Mainly needed for Bryan's "SUNRPC: Set alloc_slot for backchannel tcp ops", without which the 4.1 server oopses.	2012-10-11 12:41:05 -04:00
Linus Torvalds	df632d3ce7	NFS client updates for Linux 3.7 Features include: - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1 Aside from the issues discussed at the LKS, distros are shipping NFSv4.1 with all the trimmings. - Fix fdatasync()/fsync() for the corner case of a server reboot. - NFSv4 OPEN access fix: finally distinguish correctly between open-for-read and open-for-execute permissions in all situations. - Ensure that the TCP socket is closed when we're in CLOSE_WAIT - More idmapper bugfixes - Lots of pNFS bugfixes and cleanups to remove unnecessary state and make the code easier to read. - In cases where a pNFS read or write fails, allow the client to resume trying layoutgets after two minutes of read/write-through-mds. - More net namespace fixes to the NFSv4 callback code. - More net namespace fixes to the NFSv3 locking code. - More NFSv4 migration preparatory patches. Including patches to detect network trunking in both NFSv4 and NFSv4.1 - pNFS block updates to optimise LAYOUTGET calls. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0 A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf 3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng 2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3 +FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD J8ajEl+dNZeFE8rkwykX =uBk7 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Features include: - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1 Aside from the issues discussed at the LKS, distros are shipping NFSv4.1 with all the trimmings. - Fix fdatasync()/fsync() for the corner case of a server reboot. - NFSv4 OPEN access fix: finally distinguish correctly between open-for-read and open-for-execute permissions in all situations. - Ensure that the TCP socket is closed when we're in CLOSE_WAIT - More idmapper bugfixes - Lots of pNFS bugfixes and cleanups to remove unnecessary state and make the code easier to read. - In cases where a pNFS read or write fails, allow the client to resume trying layoutgets after two minutes of read/write- through-mds. - More net namespace fixes to the NFSv4 callback code. - More net namespace fixes to the NFSv3 locking code. - More NFSv4 migration preparatory patches. Including patches to detect network trunking in both NFSv4 and NFSv4.1 - pNFS block updates to optimise LAYOUTGET calls." * tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits) pnfsblock: cleanup nfs4_blkdev_get NFS41: send real read size in layoutget NFS41: send real write size in layoutget NFS: track direct IO left bytes NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked() NFSv4.1: Ensure that the layout sequence id stays 'close' to the current NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code NFSv4 set open access operation call flag in nfs4_init_opendata_res NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL NFSv4 reduce attribute requests for open reclaim NFSv4: nfs4_open_done first must check that GETATTR decoded a file type NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid NFSv4.1: Deal with wraparound issues when updating the layout stateid NFSv4.1: Always set the layout stateid if this is the first layoutget NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout NFSv4: don't put ACCESS in OPEN compound if O_EXCL NFSv4: don't check MAY_WRITE access bit in OPEN NFS: Set key construction data for the legacy upcall NFSv4.1: don't do two EXCHANGE_IDs on mount NFS: nfs41_walk_client_list(): re-lock before iterating ...	2012-10-10 23:52:35 +09:00
J. Bruce Fields	f474af7051	UAPI Disintegration 2012-10-09 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+ ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi svYy4kPn3PA= =etoL -----END PGP SIGNATURE----- nfs: disintegrate UAPI for nfs This is to complete part of the Userspace API (UAPI) disintegration for which the preparatory patches were pulled recently. After these patches, userspace headers will be segregated into: include/uapi/linux/.../foo.h for the userspace interface stuff, and: include/linux/.../foo.h for the strictly kernel internal stuff. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-10-09 18:35:22 -04:00
Konstantin Khlebnikov	0b173bc4da	mm: kill vma flag VM_CAN_NONLINEAR Move actual pte filling for non-linear file mappings into the new special vma operation: ->remap_pages(). Filesystems must implement this method to get non-linear mapping support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:17 +09:00
Peng Tao	af283885b7	pnfsblock: cleanup nfs4_blkdev_get It is not needed at all and it is messing with return values... Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-08 19:32:40 -04:00
Peng Tao	1fd937bd75	NFS41: send real read size in layoutget For buffer read, use offst-to-isize. For direct read, use dreq->bytes_left. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-08 19:32:34 -04:00
Peng Tao	6296556f0b	NFS41: send real write size in layoutget For buffer write, block layout client scan inode mapping to find next hole and use offset-to-hole as layoutget length. Object layout client uses offset-to-isize as layoutget length. For direct write, both block layout and object layout use dreq->bytes_left. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-08 19:32:22 -04:00
Peng Tao	35754bc00e	NFS: track direct IO left bytes Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-08 19:31:55 -04:00
Trond Myklebust	19c54abab7	NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked() Split it into two functions, one which checks if layoutgets are blocked, and one which checks if the layout stateid has expired. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-05 16:56:58 -07:00
Trond Myklebust	22aaf71495	NFSv4.1: Ensure that the layout sequence id stays 'close' to the current Clamp the layout barrier sequence id to the current sequence id minus the maximum number of outstanding layoutget requests. Also ensure that we correctly initialise lo->plh_barrier if there are no layout segments associated to this layout header. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-04 16:57:48 -07:00
Trond Myklebust	0f35ad6f68	NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-04 16:28:17 -07:00
Andy Adamson	5f65753033	NFSv4 set open access operation call flag in nfs4_init_opendata_res nfs4_open_recover_helper zeros the nfs4_opendata result structures, removing the result access_request information which leads to an XDR decode error. Move the setting of the result access_request field to nfs4_init_opendata_res which sets all the other required nfs4_opendata result fields and is shared between the open and recover open paths. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-03 17:10:28 -07:00
Trond Myklebust	8544a9dc18	NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL CONFIG_EXPERIMENTAL is deprecated and, regardless of that, this code is being enabled in most newer distributions. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-03 10:54:50 -07:00
Linus Torvalds	aab174f0df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...	2012-10-02 20:25:04 -07:00
Kirill A. Shutemov	8c0a853770	fs: push rcu_barrier() from deactivate_locked_super() to filesystems There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-02 21:35:55 -04:00
Andy Adamson	e23008ec81	NFSv4 reduce attribute requests for open reclaim We currently make no distinction in attribute requests between normal OPENs and OPEN with CLAIM_PREVIOUS. This offers more possibility of failures in the GETATTR response which foils OPEN reclaim attempts. Reduce the requested attributes to the bare minimum needed to update the reclaim open stateid and split nfs4_opendata_to_nfs4_state processing accordingly. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 18:12:25 -07:00
Trond Myklebust	807d66d802	NFSv4: nfs4_open_done first must check that GETATTR decoded a file type ...before it can check the validity of that file type. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 17:09:00 -07:00
Trond Myklebust	25a1a6211d	NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid ...and fix a bug in pnfs_set_layout_stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 17:04:33 -07:00
Trond Myklebust	5a65503f3d	NFSv4.1: Deal with wraparound issues when updating the layout stateid ...and add a helper function. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 16:47:14 -07:00
Trond Myklebust	038d649376	NFSv4.1: Always set the layout stateid if this is the first layoutget If the list of layout segments is empty, we must unconditionally set the layout stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 16:38:41 -07:00
Trond Myklebust	251ec410c4	NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 15:41:05 -07:00
Weston Andros Adamson	ae2bb03236	NFSv4: don't put ACCESS in OPEN compound if O_EXCL Don't put an ACCESS op in OPEN compound if O_EXCL, because ACCESS will return permission denied for all bits until close. Fixes a regression due to commit `6168f62c` (NFSv4: Add ACCESS operation to OPEN compound) Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 14:56:19 -07:00
Weston Andros Adamson	bbd3a8eee8	NFSv4: don't check MAY_WRITE access bit in OPEN Don't check MAY_WRITE as a newly created file may not have write mode bits, but POSIX allows the creating process to write regardless. This is ok because NFSv4 OPEN ops handle write permissions correctly - the ACCESS in the OPEN compound is to differentiate READ v EXEC permissions. Fixes a regression due to commit `6168f62c` (NFSv4: Add ACCESS operation to OPEN compound) Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 14:55:41 -07:00
Bryan Schumaker	ddfc4e1712	NFS: Set key construction data for the legacy upcall This prevents a null pointer dereference when nfs_idmap_complete_pipe_upcall_locked() calls complete_request_key(). Fixes a regression caused by commit `0cac12023` (NFSv4: Ensure that idmap_pipe_downcall sanity-checks the downcall data). Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 13:04:09 -07:00
Weston Andros Adamson	fd4835708f	NFSv4.1: don't do two EXCHANGE_IDs on mount Since the addition of NFSv4 server trunking detection the mount context calls nfs4_proc_exchange_id then schedules the state manager, which also calls nfs4_proc_exchange_id. Setting the NFS4CLNT_LEASE_CONFIRM bit makes the state manager skip the unneeded EXCHANGE_ID and continue on with session creation. Reported-by: Jorge Mora <mora@netapp.com> Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 11:35:47 -07:00
David Howells	f8aa23a55f	KEYS: Use keyring_alloc() to create special keyrings Use keyring_alloc() to create special keyrings now that it has a permissions parameter rather than using key_alloc() + key_instantiate_and_link(). Also document and export keyring_alloc() so that modules can use it too. Signed-off-by: David Howells <dhowells@redhat.com>	2012-10-02 19:24:56 +01:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Chuck Lever	c2ccc084eb	NFS: nfs41_walk_client_list(): re-lock before iterating Sparse identified an execution path in nfs41_walk_client_list() where the nfs_client_lock is not re-acquired before taking the next loop iteration. fs/nfs/nfs4client.c:437:9: sparse: context imbalance in 'nfs41_walk_client_list' - different lock contexts for basic block Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 09:25:02 -07:00
Trond Myklebust	ee314c2a35	NFSv4.1: Handle BAD_STATEID and EXPIRED errors in layoutget If the layoutget call returns a stateid error, we want to invalidate the layout stateid, and/or recover the open stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 08:34:29 -07:00
Trond Myklebust	6f018efac1	NFSv4.1: bl_pg_init_write should be static Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 08:34:29 -07:00
Stanislav Kinsbursky	4e437e95ae	nfs: include internal.h in getroot.h Sparse warning: fs/nfs/nfs4getroot.c:11:5: warning: symbol 'nfs4_get_rootfh' was not declared. Should it be static? Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 08:17:04 -07:00
Stanislav Kinsbursky	22e2430961	nfs: include nfs4_fh.h in nfs4sysctl.c Sparse warnings: fs/nfs/nfs4sysctl.c:56:5: warning: symbol 'nfs4_register_sysctl' was not declared. Should it be static? fs/nfs/nfs4sysctl.c:64:6: warning: symbol 'nfs4_unregister_sysctl' was not declared. Should it be static? Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 08:17:03 -07:00
Stanislav Kinsbursky	3dd4f8ef7b	nfs: declare nfs_xdev_mount as static Sparse warning: fs/nfs/super.c:2517:15: warning: symbol 'nfs_xdev_mount' was not declared. Should it be static? Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 08:17:03 -07:00
Stanislav Kinsbursky	0b37d20ca2	nfs: declare nfs_callback_tcp_port in header Sparse warning: fs/nfs/super.c:2638:16: sparse: symbol 'nfs_callback_tcpport' was not declared. Should it be static? Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 08:17:02 -07:00
Stanislav Kinsbursky	ca57ccc48f	nfs: include NFSv4 header in netns.h Build error: fs/nfs/netns.h:27:15: error: 'NFS4_MAX_MINOR_VERSION' undeclared here (not in a function) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-02 08:17:02 -07:00
Andy Adamson	47b803c8d2	NFSv4.0 reclaim reboot state when re-establishing clientid We should reclaim reboot state when the clientid is stale. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 18:12:41 -07:00
Trond Myklebust	f9d640f3a4	NFSv4: nfs4_match_clientids is only used by NFSv4.1 Fix another compiler warning. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 16:58:48 -07:00
Trond Myklebust	758201e2c9	NFSv4: Fix the minor version callback channel startup The current spaghetti code confuses some versions of gcc (and just looks ugly as hell)! Clean up... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 16:58:39 -07:00
Trond Myklebust	9f62387d6e	NFSv4: Fix up a merge conflict between migration and container changes nfs_callback_tcpport is now per-net_namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 16:17:31 -07:00
Trond Myklebust	2afdfa5a84	Merge branch 'bugfixes' into nfs-for-next	2012-10-01 15:42:14 -07:00
Daniel Walter	7297cb682a	nfs: replace strict_strto* with kstrto* [nfs] replace strict_str* with kstr* variants * replace string conversions with newer kstr* functions Signed-off-by: Daniel Walter <sahne@0x90.at> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:40:05 -07:00
Yanchuan Nian	ee34e13620	NFS: Remove unnecessary semicolons (fs/nfs/client.c) There are some unnecessary semicolons in function find_nfs_version. Just remove them. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:40:03 -07:00
Wei Yongjun	4e266229db	pnfsblock: use list_move_tail instead of list_del/list_add_tail Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:39:05 -07:00
Peng Tao	96c9eae638	pnfsblock: fix non-aligned DIO write For DIO writes, if it is not blocksize aligned, we need to do internal serialization. It may slow down writers anyway. So we just bail them out and resend to MDS. Cc: stable <stable@vger.kernel.org> [since v3.4] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:38:35 -07:00
Peng Tao	f742dc4a32	pnfsblock: fix non-aligned DIO read For DIO read, if it is not sector aligned, we should reject it and resend via MDS. Otherwise there might be data corruption. Also teach bl_read_pagelist to handle partial page reads for DIO. Cc: stable <stable@vger.kernel.org> [since v3.4] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:38:29 -07:00
Peng Tao	fe6e1e8d9f	pnfsblock: fix partial page buffer wirte If applications use flock to protect its write range, generic NFS will not do read-modify-write cycle at page cache level. Therefore LD should know how to handle non-sector aligned writes. Otherwise there will be data corruption. Cc: stable <stable@vger.kernel.org> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:38:24 -07:00
Peng Tao	5d0e3a004f	Revert "pnfsblock: bail out partial page IO" This reverts commit `159e0561e3`, in favor of a more complete fix to the alignment issue. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:38:16 -07:00
Peng Tao	dc182549d4	NFS41: fix error of setting blocklayoutdriver After commit `e38eb650` (NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()), set_pnfs_layoutdriver() is called inside nfs4_proc_fsinfo(), but pnfs_blksize is not set. It causes setting blocklayoutdriver failure and pnfsblock mount failure. Cc: stable <stable@vger.kernel.org> [since v3.5] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:37:39 -07:00
Peng Tao	7acdb02681	NFSv41: fix DIO write_io calculation pnfs_within_mdsthreshold() is called inside pg_init. We need to set read_io/write_io before that. Otherwise we fail pnfs_within_mdsthreshold() and IO goes to MDS. A simple test case: dd if=foo of=/mnt/pnfs/bar bs=10M count=1 oflag=direct Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:37:34 -07:00
Chuck Lever	6f2ea7f2a3	NFS: Add nfs4_unique_id boot parameter An optional boot parameter is introduced to allow client administrators to specify a string that the Linux NFS client can insert into its nfs_client_id4 id string, to make it both more globally unique, and to ensure that it doesn't change even if the client's nodename changes. If this boot parameter is not specified, the client's nodename is used, as before. Client installation procedures can create a unique string (typically, a UUID) which remains unchanged during the lifetime of that client instance. This works just like creating a UUID for the label of the system's root and boot volumes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:33:33 -07:00
Chuck Lever	05f4c350ee	NFS: Discover NFSv4 server trunking when mounting "Server trunking" is a fancy named for a multi-homed NFS server. Trunking might occur if a client sends NFS requests for a single workload to multiple network interfaces on the same server. There are some implications for NFSv4 state management that make it useful for a client to know if a single NFSv4 server instance is multi-homed. (Note this is only a consideration for NFSv4, not for legacy versions of NFS, which are stateless). If a client cares about server trunking, no NFSv4 operations can proceed until that client determines who it is talking to. Thus server IP trunking discovery must be done when the client first encounters an unfamiliar server IP address. The nfs_get_client() function walks the nfs_client_list and matches on server IP address. The outcome of that walk tells us immediately if we have an unfamiliar server IP address. It invokes nfs_init_client() in this case. Thus, nfs4_init_client() is a good spot to perform trunking discovery. Discovery requires a client to establish a fresh client ID, so our client will now send SETCLIENTID or EXCHANGE_ID as the first NFS operation after a successful ping, rather than waiting for an application to perform an operation that requires NFSv4 state. The exact process for detecting trunking is different for NFSv4.0 and NFSv4.1, so a minorversion-specific init_client callout method is introduced. CLID_INUSE recovery is important for the trunking discovery process. CLID_INUSE is a sign the server recognizes the client's nfs_client_id4 id string, but the client is using the wrong principal this time for the SETCLIENTID operation. The SETCLIENTID must be retried with a series of different principals until one works, and then the rest of trunking discovery can proceed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:33:33 -07:00
Chuck Lever	e984a55a74	NFS: Use the same nfs_client_id4 for every server Currently, when identifying itself to NFS servers, the Linux NFS client uses a unique nfs_client_id4.id string for each server IP address it talks with. For example, when client A talks to server X, the client identifies itself using a string like "AX". The requirements for these strings are specified in detail by RFC 3530 (and bis). This form of client identification presents a problem for Transparent State Migration. When client A's state on server X is migrated to server Y, it continues to be associated with string "AX." But, according to the rules of client string construction above, client A will present string "AY" when communicating with server Y. Server Y thus has no way to know that client A should be associated with the state migrated from server X. "AX" is all but abandoned, interfering with establishing fresh state for client A on server Y. To support transparent state migration, then, NFSv4.0 clients must instead use the same nfs_client_id4.id string to identify themselves to every NFS server; something like "A". Now a client identifies itself as "A" to server X. When a file system on server X transitions to server Y, and client A identifies itself as "A" to server Y, Y will know immediately that the state associated with "A," whether it is native or migrated, is owned by the client, and can merge both into a single lease. As a pre-requisite to adding support for NFSv4 migration to the Linux NFS client, this patch changes the way Linux identifies itself to NFS servers via the SETCLIENTID (NFSv4 minor version 0) and EXCHANGE_ID (NFSv4 minor version 1) operations. In addition to removing the server's IP address from nfs_client_id4, the Linux NFS client will also no longer use its own source IP address as part of the nfs_client_id4 string. On multi-homed clients, the value of this address depends on the address family and network routing used to contact the server, thus it can be different for each server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-10-01 15:33:33 -07:00

... 3 4 5 6 7 ...

3248 Commits (3b03f72445ba1437cfa29f9719bb3cfdb60558d9)