alistair23-linux

redonkable

Author	SHA1	Message	Date
J. Bruce Fields	a3f432bfd0	nfs: use IS_ROOT not DCACHE_DISCONNECTED This check was added by Al Viro with `d9e80b7de9` "nfs d_revalidate() is too trigger-happy with d_drop()", with the explanation that we don't want to remove the root of a disconnected tree, which will still be included on the s_anon list. But DCACHE_DISCONNECTED does not actually identify dentries that are disconnected from the dentry tree or hashed on s_anon. IS_ROOT() is the way to do that. Also add a comment from Al's commit to remind us why this check is there. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 19:19:19 -04:00
Geyslan G. Bem	6706246b22	nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c' Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 18:16:56 -04:00
Geyslan G. Bem	54bcfa6682	nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 18:16:55 -04:00
Geyslan G. Bem	4f5829d726	nfs: Remove useless 'error' assignment the 'error' variable was been assigned twice in vain. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 18:16:55 -04:00
Weston Andros Adamson	4d4b69dd84	NFS: add support for multiple sec= mount options This patch adds support for multiple security options which can be specified using a colon-delimited list of security flavors (the same syntax as nfsd's exports file). This is useful, for instance, when NFSv4.x mounts cross SECINFO boundaries. With this patch a user can use "sec=krb5i,krb5p" to mount a remote filesystem using krb5i, but can still cross into krb5p-only exports. New mounts will try all security options before failing. NFSv4.x SECINFO results will be compared against the sec= flavors to find the first flavor in both lists or if no match is found will return -EPERM. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:38:02 -04:00
Weston Andros Adamson	5837f6dfcb	NFS: stop using NFS_MOUNT_SECFLAVOUR server flag Since the parsed sec= flavor is now stored in nfs_server->auth_info, we no longer need an nfs_server flag to determine if a sec= option was used. This flag has not been completely removed because it is still needed for the (old but still supported) non-text parsed mount options ABI compatability. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:37:56 -04:00
Weston Andros Adamson	0f5f49b8b3	NFS: cache parsed auth_info in nfs_server Cache the auth_info structure in nfs_server and pass these values to submounts. This lays the groundwork for supporting multiple sec= options. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:37:43 -04:00
Weston Andros Adamson	a3f73c27af	NFS: separate passed security flavs from selected When filling parsed_mount_data, store the parsed sec= mount option in the new struct nfs_auth_info and the chosen flavor in selected_flavor. This patch lays the groundwork for supporting multiple sec= options. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:36:58 -04:00
Weston Andros Adamson	47fd88e6b7	NFSv4: make nfs_find_best_sec static It's not used outside of nfs4namespace.c anymore. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:33:34 -04:00
Chuck Lever	0625c2dd6a	NFS: Fix possible endless state recovery wait In nfs4_wait_clnt_recover(), hold a reference to the clp being waited on. The state manager can reduce clp->cl_count to 1, in which case the nfs_put_client() in nfs4_run_state_manager() can free clp before wait_on_bit() returns and allows nfs4_wait_clnt_recover() to run again. The behavior at that point is non-deterministic. If the waited-on bit still happens to be zero, wait_on_bit() will wake the waiter as expected. If the bit is set again (say, if the memory was poisoned when freed) wait_on_bit() can leave the waiter asleep. This is a narrow fix which ensures the safety of accessing clp in nfs4_wait_clnt_recover(), but does not address the continued use of a possibly freed *clp after nfs4_wait_clnt_recover() returns (see nfs_end_delegation_return(), for example). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:31:55 -04:00
Chuck Lever	cd3fadece2	NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR Broadly speaking, v4.1 migration is untested. There are no servers in the wild that support NFSv4.1 migration. However, as server implementations become available, we do want to enable testing by developers, while leaving it disabled for environments for which broken migration support would be an unpleasant surprise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:31:25 -04:00
Chuck Lever	d1c2331e75	NFS: Handle SEQ4_STATUS_LEASE_MOVED With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease moved" condition is reported differently than it is in NFSv4.0. NFSv4 minor version 0 servers return an error status code, NFS4ERR_LEASE_MOVED, to signal that a lease has moved. This error causes the whole compound operation to fail. Normal compounds against this server continue to fail until the client performs migration recovery on the migrated share. Minor version 1 and later servers assert a bit flag in the reply to a compound's SEQUENCE operation to signal LEASE_MOVED. This is not a fatal condition: operations against this server continue normally. The server asserts this flag until the client performs migration recovery on the migrated share. Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4 clients not using NFSv4.0. After the server asserts any of the sr_status_flags in the SEQUENCE operation in a typical compound, our client initiates standard lease recovery. For NFSv4.1+, a stand-alone SEQUENCE operation is performed to discover what recovery is needed. If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE operation, our client attempts to discover which FSIDs have been migrated, and then performs migration recovery on each. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:31:07 -04:00
Chuck Lever	f8aba1e8d5	NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW With NFSv4 minor version 0, the asynchronous lease RENEW heartbeat can return NFS4ERR_LEASE_MOVED. Error recovery logic for async RENEW is a separate code path from the generic NFS proc paths, so it must be updated to handle NFS4ERR_LEASE_MOVED as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:30:52 -04:00
Chuck Lever	60ea681299	NFS: Migration support for RELEASE_LOCKOWNER Currently the Linux NFS client ignores the operation status code for the RELEASE_LOCKOWNER operation. Like NFSv3's UMNT operation, RELEASE_LOCKOWNER is a courtesy to help servers manage their resources, and the outcome is not consequential for the client. During a migration, a server may report NFS4ERR_LEASE_MOVED, in which case the client really should retry, since typically LEASE_MOVED has nothing to do with the current operation, but does prevent it from going forward. Also, it's important for a client to respond as soon as possible to a moved lease condition, since the client's lease could expire on the destination without further action by the client. NFS4ERR_DELAY is not included in the list of valid status codes for RELEASE_LOCKOWNER in RFC 3530bis. However, rfc3530-migration-update does permit migration-capable servers to return DELAY to clients, but only in the context of an ongoing migration. In this case the server has frozen lock state in preparation for migration, and a client retry would help the destination server purge unneeded state once migration recovery is complete. Interestly, NFS4ERR_MOVED is not valid for RELEASE_LOCKOWNER, even though lock owners can be migrated with Transparent State Migration. Note that RFC 3530bis section 9.5 includes RELEASE_LOCKOWNER in the list of operations that renew a client's lease on the server if they succeed. Now that our client pays attention to the operation's status code, we can note that renewal appropriately. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:30:46 -04:00
Chuck Lever	8ef2f8d46a	NFS: Implement support for NFS4ERR_LEASE_MOVED Trigger lease-moved recovery when a request returns NFS4ERR_LEASE_MOVED. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:30:27 -04:00
Chuck Lever	b7f7a66e42	NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager A migration on the FSID in play for the current NFS operation is reported via the error status code NFS4ERR_MOVED. "Lease moved" means that a migration has occurred on some other FSID than the one for the current operation. It's a signal that the client should take action immediately to handle a migration that it may not have noticed otherwise. This is so that the client's lease does not expire unnoticed on the destination server. In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED error status code. To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server to see if it is still present. Invoke nfs4_try_migration() if the FSID is no longer present on the server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:30:21 -04:00
Chuck Lever	44c9993384	NFS: Add method to detect whether an FSID is still on the server Introduce a mechanism for probing a server to determine if an FSID is present or absent. The on-the-wire compound is different between minor version 0 and 1. Minor version 0 appends a RENEW operation to identify which client ID is probing. Minor version 1 has a SEQUENCE operation in the compound which effectively carries the same information. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:30:03 -04:00
Chuck Lever	352297b917	NFS: Handle NFS4ERR_MOVED during delegation recall When a server returns NFS4ERR_MOVED during a delegation recall, trigger the new migration recovery logic in the state manager. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:25:30 -04:00
Chuck Lever	519ae255d4	NFS: Add migration recovery callouts in nfs4proc.c When a server returns NFS4ERR_MOVED, trigger the new migration recovery logic in the state manager. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:25:23 -04:00
Chuck Lever	9f51a78e3a	NFS: Rename "stateid_invalid" label I'm going to use this exit label also for migration recovery failures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:25:10 -04:00
Chuck Lever	f1478c13c0	NFS: Re-use exit code in nfs4_async_handle_error() Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:24:55 -04:00
Chuck Lever	c9fdeb280b	NFS: Add basic migration support to state manager thread Migration recovery and state recovery must be serialized, so handle both in the state manager thread. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:24:40 -04:00
Chuck Lever	ce6cda1845	NFS: Add a super_block backpointer to the nfs_server struct NFS_SB() returns the pointer to an nfs_server struct, given a pointer to a super_block. But we have no way to go back the other way. Add a super_block backpointer field so that, given an nfs_server struct, it is easy to get to the filesystem's root dentry. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:24:26 -04:00
Chuck Lever	b03d735b4c	NFS: Add method to retrieve fs_locations during migration recovery The nfs4_proc_fs_locations() function is invoked during referral processing to perform a GETATTR(fs_locations) on an object's parent directory in order to discover the target of the referral. It performs a LOOKUP in the compound, so the client needs to know the parent's file handle a priori. Unfortunately this function is not adequate for handling migration recovery. We need to probe fs_locations information on an FSID, but there's no parent directory available for many operations that can return NFS4ERR_MOVED. Another subtlety: recovering from NFS4ERR_LEASE_MOVED is a process of walking over a list of known FSIDs that reside on the server, and probing whether they have migrated. Once the server has detected that the client has probed all migrated file systems, it stops returning NFS4ERR_LEASE_MOVED. A minor version zero server needs to know what client ID is requesting fs_locations information so it can clear the flag that forces it to continue returning NFS4ERR_LEASE_MOVED. This flag is set per client ID and per FSID. However, the client ID is not an argument of either the PUTFH or GETATTR operations. Later minor versions have client ID information embedded in the compound's SEQUENCE operation. Therefore, by convention, minor version zero clients send a RENEW operation in the same compound as the GETATTR(fs_locations), since RENEW's one argument is a clientid4. This allows a minor version zero server to identify correctly the client that is probing for a migration. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:24:00 -04:00
Chuck Lever	9e6ee76dfb	NFS: Export _nfs_display_fhandle() Allow code in nfsv4.ko to use _nfs_display_fhandle(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:23:35 -04:00
Chuck Lever	ec011fe847	NFS: Introduce a vector of migration recovery ops The differences between minor version 0 and minor version 1 migration will be abstracted by the addition of a set of migration recovery ops. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:23:17 -04:00
Chuck Lever	800c06a5bf	NFS: Add functions to swap transports during migration recovery Introduce functions that can walk through an array of returned fs_locations information and connect a transport to one of the destination servers listed therein. Note that NFS minor version 1 introduces "fs_locations_info" which extends the locations array sorting criteria available to clients. This is not supported yet. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:23:07 -04:00
Chuck Lever	32e62b7c3e	NFS: Add nfs4_update_server New function nfs4_update_server() moves an nfs_server to a different nfs_client. This is done as part of migration recovery. Though it may be appealing to think of them as the same thing, migration recovery is not the same as following a referral. For a referral, the client has not descended into the file system yet: it has no nfs_server, no super block, no inodes or open state. It is enough to simply instantiate the nfs_server and super block, and perform a referral mount. For a migration, however, we have all of those things already, and they have to be moved to a different nfs_client. No local namespace changes are needed here. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:22:29 -04:00
Weston Andros Adamson	d2bfda2e7a	NFSv4: don't reprocess cached open CLAIM_PREVIOUS Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state and can safely skip being reprocessed, but must still call update_open_stateid to make sure that all active fmodes are recovered. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Cc: stable@vger.kernel.org # 3.7.x: f494a6071d3: NFSv4: fix NULL dereference Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missin Cc: stable@vger.kernel.org # 3.7.x Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 15:10:56 -04:00
Trond Myklebust	d49f042aee	NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state Currently, if the call to nfs_refresh_inode fails, then we end up leaking a reference count, due to the call to nfs4_get_open_state. While we're at it, replace nfs4_get_open_state with a simple call to atomic_inc(); there is no need to do a full lookup of the struct nfs_state since it is passed as an argument in the struct nfs4_opendata, and is already assigned to the variable 'state'. Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missing Cc: stable@vger.kernel.org # 3.7.x Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 14:57:12 -04:00
Weston Andros Adamson	a43ec98b72	NFSv4: don't fail on missing fattr in open recover This is an unneeded check that could cause the client to fail to recover opens. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 14:54:03 -04:00
Weston Andros Adamson	f494a6071d	NFSv4: fix NULL dereference in open recover _nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached open CLAIM_PREVIOUS, but this can happen. An example is when there are RDWR openers and RDONLY openers on a delegation stateid. The recovery path will first try an open CLAIM_PREVIOUS for the RDWR openers, this marks the delegation as not needing RECLAIM anymore, so the open CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc. The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state returning PTR_ERR(rpc_status) when !rpc_done. When the open is cached, rpc_done == 0 and rpc_status == 0, thus _nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected by callers of nfs4_opendata_to_nfs4_state(). This can be reproduced easily by opening the same file two times on an NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then sleeping for a long time. While the files are held open, kick off state recovery and this NULL dereference will be hit every time. An example OOPS: [ 65.003602] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000030 [ 65.005312] IP: [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4] [ 65.006820] PGD 7b0ea067 PUD 791ff067 PMD 0 [ 65.008075] Oops: 0000 [#1] SMP [ 65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd _seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio _raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw _vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr m mptscsih mptbase i2c_core [ 65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19 .x86_64 #1 [ 65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 65.022012] task: ffff88003707e320 ti: ffff88007b906000 task.ti: ffff88007b906000 [ 65.023414] RIP: 0010:[<ffffffffa037d6ee>] [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4] [ 65.025079] RSP: 0018:ffff88007b907d10 EFLAGS: 00010246 [ 65.026042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 65.027321] RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000000 [ 65.028691] RBP: ffff88007b907d38 R08: 0000000000016f60 R09: 0000000000000000 [ 65.029990] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 65.031295] R13: 0000000000000050 R14: 0000000000000000 R15: 0000000000000001 [ 65.032527] FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000 [ 65.033981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.035177] CR2: 0000000000000030 CR3: 000000007b27f000 CR4: 00000000000407f0 [ 65.036568] Stack: [ 65.037011] 0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220 [ 65.038472] ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80 [ 65.039935] ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40 [ 65.041468] Call Trace: [ 65.042050] [<ffffffffa037e4a5>] nfs4_close_state+0x15/0x20 [nfsv4] [ 65.043209] [<ffffffffa036a6c8>] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4] [ 65.044529] [<ffffffffa036a886>] nfs4_open_recover+0x116/0x150 [nfsv4] [ 65.045730] [<ffffffffa036d98d>] nfs4_open_reclaim+0xad/0x150 [nfsv4] [ 65.046905] [<ffffffffa037d979>] nfs4_do_reclaim+0x149/0x5f0 [nfsv4] [ 65.048071] [<ffffffffa037e1dc>] nfs4_run_state_manager+0x3bc/0x670 [nfsv4] [ 65.049436] [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4] [ 65.050686] [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4] [ 65.051943] [<ffffffff81088640>] kthread+0xc0/0xd0 [ 65.052831] [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40 [ 65.054697] [<ffffffff8165686c>] ret_from_fork+0x7c/0xb0 [ 65.056396] [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40 [ 65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44 [ 65.065225] RIP [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4] [ 65.067175] RSP <ffff88007b907d10> [ 65.068570] CR2: 0000000000000030 [ 65.070098] ---[ end trace 0d1fe4f5c7dd6f8b ]--- Cc: <stable@vger.kernel.org> #3.7+ Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 14:53:32 -04:00
Trond Myklebust	83c78eb042	NFSv4.1: Don't change the security label as part of open reclaim. The current caching model calls for the security label to be set on first lookup and/or on any subsequent label changes. There is no need to do it as part of an open reclaim. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 14:50:38 -04:00
Jeff Layton	1966903f8e	nfs: fix handling of invalid mount options in nfs_remount nfs_parse_mount_options returns 0 on error, not -errno. Reported-by: Karel Zak <kzak@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 14:35:07 -04:00
Jeff Layton	57acc40d73	nfs: reject version and minorversion changes on remount attempts Reported-by: Eric Doutreleau <edoutreleau@genoscope.cns.fr> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 14:30:23 -04:00
Andy Adamson	3660cd4322	NFSv4 Remove zeroing state kern warnings As of commit `5d422301f9` we no longer zero the state. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-28 14:28:53 -04:00
Tim Gardner	944d6f1a5b	cifs: Remove redundant multiplex identifier check from check_smb_hdr() The only call site for check_smb_header() assigns 'mid' from the SMB packet, which is then checked again in check_smb_header(). This seems like redundant redundancy. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Tim Gardner <timg@tpi.com> Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-28 09:31:36 -05:00
Steve French	34f626406c	Query file system attributes from server on SMB2, not just cifs, mounts Currently SMB2 and SMB3 mounts do not query the file system attributes from the server at mount time as is done for cifs. These can be useful for debugging. Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-28 09:22:55 -05:00
Steve French	64a5cfa6db	Allow setting per-file compression via SMB2/3 Allow cifs/smb2/smb3 to return whether or not a file is compressed via lsattr, and allow SMB2/SMB3 to set the per-file compression flag ("chattr +c filename" on an smb3 mount). Windows users often set the compressed flag (it can be done from the desktop and file manager). David Disseldorp has patches to Samba server to support this (at least on btrfs) which are complementary to this Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-28 09:22:31 -05:00
Steve French	7ff8d45c9d	Fix corrupt SMB2 ioctl requests We were off by one calculating the length of ioctls in some cases because the protocol specification for SMB2 ioctl includes a mininum one byte payload but not all SMB2 ioctl requests actually have a data buffer to send. We were also not zeroing out the return buffer (in case of error this is helpful). Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-28 09:21:36 -05:00
Jaegeuk Kim	2ed2d5b33c	f2fs: fix a deadlock during init_acl procedure The deadlock is found through the following scenario. sys_mkdir() -> f2fs_add_link() -> __f2fs_add_link() -> init_inode_metadata() : lock_page(inode); -> f2fs_init_acl() -> f2fs_set_acl() -> f2fs_setxattr(..., NULL) : This NULL page incurs a deadlock at update_inode_page(). So, likewise f2fs_init_security(), this patch adds a parameter to transfer the locked inode page to f2fs_setxattr(). Found by Linux File System Verification project (linuxtesting.org). Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-28 13:39:09 +09:00
Jaegeuk Kim	b8b60e1a65	f2fs: clean up acl flow for better readability This patch cleans up a couple of acl codes. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-28 13:38:21 +09:00
Changman Lee	4625d6aac2	f2fs: remove unnecessary segment bitmap updates Only one dirty type is set in __locate_dirty_segment and we can know dirty type of segment. So we don't need to check other dirty types. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-28 13:38:16 +09:00
Huang Shijie	e104f1e9da	jffs2: do not support the MLC nand We should not support the MLC nand for jffs2. So if the nand type is MLC, we quit immediatly. Signed-off-by: Huang Shijie <b32955@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>	2013-10-27 16:27:07 -07:00
Mats Kärrman	58a4e23703	UBIFS: correct data corruption range With power-cut emulation, it is possible that sometimes no data at all is corrupted and that confusing messages are printed due to errors in the computation of data corruption range. [1] The start of the range should be [0..len-1], not [0..len]. [2] The end of the range should always be at least 1 greater than the start. Signed-off-by: Mats Karrman <mats.karrman@tritech.se> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2013-10-26 11:33:38 +01:00
Wei Yongjun	7203db97b7	UBIFS: fix return code Fix to return -ENOMEM in the kmalloc() and d_make_root() error handling case instead of 0, as done elsewhere in those functions. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2013-10-26 11:11:59 +01:00
Linus Torvalds	f55ac56d5e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes (try two) from Al Viro: "nfsd performance regression fix + seq_file lseek(2) fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: seq_file: always update file->f_pos in seq_lseek() nfsd regression since delayed fput()	2013-10-25 18:16:47 +01:00
Gu Zheng	05e16745c0	seq_file: always update file->f_pos in seq_lseek() This issue was first pointed out by Jiaxing Wang several months ago, but no further comments: https://lkml.org/lkml/2013/6/29/41 As we know pread() does not change f_pos, so after pread(), file->f_pos and m->read_pos become different. And seq_lseek() does not update file->f_pos if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to m->read_pos), then a subsequent read may read from a wrong position, the following program produces the problem: char str1[32] = { 0 }; char str2[32] = { 0 }; int poffset = 10; int count = 20; /open any seq file/ int fd = open("/proc/modules", O_RDONLY); pread(fd, str1, count, poffset); printf("pread:%s\n", str1); /seek to where m->read_pos is/ lseek(fd, poffset+count, SEEK_SET); /supposed to read from poffset+count, but this read from position 0/ read(fd, str2, count); printf("read:%s\n", str2); out put: pread: ck_netbios_ns 12665 read: nf_conntrack_netbios /proc/modules: nf_conntrack_netbios_ns 12665 0 - Live 0xffffffffa038b000 nf_conntrack_broadcast 12589 1 nf_conntrack_netbios_ns, Live 0xffffffffa0386000 So we always update file->f_pos to offset in seq_lseek() to fix this issue. Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-25 10:46:40 -04:00
Jaegeuk Kim	e943a10d94	f2fs: add tracepoint for vm_page_mkwrite This patch adds a tracepoint for f2fs_vm_page_mkwrite. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:40 +09:00
Jaegeuk Kim	26c6b88799	f2fs: add tracepoint for set_page_dirty This patch adds a tracepoint for set_page_dirty. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:40 +09:00
Chao Yu	e8d61a7488	f2fs: remove redundant set_page_dirty from write_compacted_summaries Previously, set_page_dirty is called every time after writting one summary info into compacted summary page, To avoid redundant set_page_dirty, we only call set_page_dirty before release page. Signed-off-by: Yu Chao <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:39 +09:00
Jaegeuk Kim	ea91e9b043	f2fs: add reclaiming control by sysfs This patch adds a control method in sysfs to reclaim prefree segments. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:39 +09:00
Jaegeuk Kim	4660f9c0fe	f2fs: introduce f2fs_balance_fs_bg for some background jobs This patch merges some background jobs into this new function. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:38 +09:00
Jaegeuk Kim	81eb8d6e28	f2fs: reclaim prefree segments periodically Previously, f2fs postpones reclaiming prefree segments into free segments as much as possible. However, if user writes and deletes a bunch of data without any sync or fsync calls, some flash storages can suffer from garbage collections. So, this patch adds the reclaiming codes to f2fs_write_node_pages and background GC thread. If there are a lot of prefree segments, let's do checkpoint so that f2fs submits discard commands for the prefree regions to the flash storage. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:37 +09:00
Haicheng Li	aabe51364f	f2fs: use bool for booleans Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:37 +09:00
Jaegeuk Kim	dcdfff6527	f2fs: clean up several status-related operations This patch cleans up improper definitions that update some status information. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:08 +09:00
Linus Torvalds	88829dfe4b	Two important fixes - Fix long standing memory leak in the (rarely used) public key support - Fix large file corruption on 32 bit architectures -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCgAGBQJSaX/HAAoJENaSAD2qAscKdpQQAI6Rvsv5y/Gj+8/9rCUnNYhw 8YWYkOko2+cyGl6ro+nIm2nmKOuaGrjijvubOjOAe4WkMzS0EyJjku/9NT3S6KzC SqHC0ZeZf0jaFC9zUkUN69RY9m96Ak94HAagXO3Qm39DCSj8xijxODOVnVzkEs2x ylOU8OgRbD/AIDzmLxgHaOtuAmQ0GNvbVoYK6ZErVmOMENU2/67iH3OsyGD4OFpr Oaq1i8m7rxPmwv3QNSGhXSK6EScqs2jgM4aPWx3aG+OhYv6sGWkL8jJgPS/uSUBc ttD1Ou/d9yyvZPDFd9wmiHhenbCVbEdl6JAIS8zKv4NkSQ3V7AVWwAoe6JMfbREo U+Om7FwGLgKlZ/19+IxBMGTITuOjUkKq97vJMiYbXuWzdrZSflv5GiGGKbxchmnA CnfYaN1HYVcpLsbXoDTBomML7VTtbifgmY0diUJ2aJ1eTg86Gs1DXjhnuLF70Jjd dfuYfOKkJguuRfZ50yrpWfEQ0iOudXI1v+PrramLof33lNKWI8XeKjgDxyUrAjOZ UjFT639EXIRzYDIOCPZicQKdNO3BRziKi1cSnXQQp9cNTMs6/FIxK2zrQmjgqvww Hwj+M6czLs45lbfjQIxi3FlEAYYdXBQwrEiAu4cmt9j1bxIZnwIa7Fu0bXSxphfD dUo0GN7CkF45BkNvotFX =74EV -----END PGP SIGNATURE----- Merge tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Two important fixes - Fix long standing memory leak in the (rarely used) public key support - Fix large file corruption on 32 bit architectures" * tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: fix 32 bit corruption issue ecryptfs: Fix memory leakage in keystore.c	2013-10-25 07:32:01 +01:00
Ming Lei	b9c0622516	sysfs: fix sysfs_write_file for bin file Before patch(sysfs: prepare path write for unified regular / bin file handling), when size of bin file is zero, writting still can continue, but this patch changes the behaviour. The worse thing is that firmware loader is broken by this patch, and user space application can't write to firmware bin file any more because both firmware loader and drivers can't know at advance how large the firmware file is and have to set its initialized size as zero. This patch fixes the problem and keeps behaviour of writting to bin as before. Reported-by: Lothar Waßmann <LW@karo-electronics.de> Tested-by: Lothar Waßmann <LW@karo-electronics.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-25 05:46:27 +01:00
Al Viro	dd3e2c55a4	fuse: rcu-delay freeing fuse_conn makes ->permission() and ->d_revalidate() safety in RCU mode independent from vfsmount_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:45:13 -04:00
Al Viro	1dcddd4abd	ncpfs: rcu-delay unload_nls() and freeing ncp_server makes ->d_hash() and ->d_compare() safety in RCU mode independent from vfsmount_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:43:28 -04:00
Al Viro	cac45b062c	fat: rcu-delay unloading nls and freeing sbi makes ->d_hash() and ->d_compare() safety in RCU mode independent from vfsmount_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:43:28 -04:00
Al Viro	2e32cf5ef2	cifs: rcu-delay unload_nls() and freeing sbi makes ->d_hash(), ->d_compare() and ->permission() safety in RCU mode independent from vfsmount_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:43:27 -04:00
Al Viro	baa40671d3	autofs4: make freeing sbi rcu-delayed makes ->d_managed() safety in RCU mode independent from vfsmount_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:43:27 -04:00
Al Viro	2d1d9b5b5c	adfs: delayed freeing of sbi makes ->d_hash() and ->d_compare() safety in RCU mode independent from vfsmount_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:43:27 -04:00
Al Viro	30687e0a47	hpfs: make freeing sbi and codetables rcu-delayed makes ->d_hash() and ->d_compare() safety in RCU mode independent from vfsmount_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:43:26 -04:00
Al Viro	e2fec7c355	make freeing super_block rcu-delayed Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:43:26 -04:00
Miklos Szeredi	b70a80e7a1	vfs: introduce d_instantiate_no_diralias() ...which just returns -EBUSY if a directory alias would be created. This is to be used by fuse mkdir to make sure that a buggy or malicious userspace filesystem doesn't do anything nasty. Previously fuse used a private mutex for this purpose, which can now go away. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-24 23:41:37 -04:00
Al Viro	94e92a6e77	move taking vfsmount_lock down into prepend_path() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:35:01 -04:00
Al Viro	474279dc0f	split __lookup_mnt() in two functions Instead of passing the direction as argument (and checking it on every step through the hash chain), just have separate __lookup_mnt() and __lookup_mnt_last(). And use the standard iterators... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:35:00 -04:00
Al Viro	7eb5e88269	uninline destroy_super(), consolidate alloc_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:35:00 -04:00
Al Viro	966c1f75f8	isofs: don't pass dentry to isofs_hash{i,}_common() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:59 -04:00
Al Viro	719ea2fbb5	new helpers: lock_mount_hash/unlock_mount_hash aka br_write_{lock,unlock} of vfsmount_lock. Inlines in fs/mount.h, vfsmount_lock extern moved over there as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:59 -04:00
Al Viro	aab407fc5c	don't bother with vfsmount_lock in mounts_poll() wake_up_interruptible/poll_wait provide sufficient barriers; just use ACCESS_ONCE() to fetch ns->event and that's it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:59 -04:00
Al Viro	aba809cf09	namespace.c: get rid of mnt_ghosts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:58 -04:00
Al Viro	9559f68915	fold dup_mnt_ns() into its only surviving caller should've been done 6 years ago... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:58 -04:00
Al Viro	f6b742d869	mnt_set_expiry() doesn't need vfsmount_lock ->mnt_expire is protected by namespace_sem Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:57 -04:00
Al Viro	22a7919299	finish_automount() doesn't need vfsmount_lock for removal from expiry list Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:57 -04:00
Al Viro	085e83ff0c	fs/namespace.c: bury long-dead define MNT_WRITER_UNDERFLOW_LIMIT has been missed 4 years ago when it became unused. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:57 -04:00
Al Viro	649a795aff	fold mntfree() into mntput_no_expire() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:56 -04:00
Al Viro	6339dab869	do_remount(): pull touch_mnt_namespace() up ... and don't bother with dropping and regaining vfsmount_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:56 -04:00
Al Viro	aa7a574d0c	dup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock mnt_list is protected by namespace_sem, not vfsmount_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:55 -04:00
Al Viro	44bb4385ce	fs_is_visible only needs namespace_sem held shared Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:55 -04:00
Al Viro	59aa0da8e2	initialize namespace_sem statically Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:54 -04:00
Al Viro	72c2d53192	file->f_op is never NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:54 -04:00
Al Viro	e84f9e57b9	consolidate the reassignments of ->f_op in ->open() instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:53 -04:00
Al Viro	7b00ed6fe6	put_mnt_ns(): use drop_collected_mounts() ... rather than open-coding it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:52 -04:00
Al Viro	84eb3532b5	ncpfs: switch to %p[dD] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:52 -04:00
Al Viro	4cb2a01d8c	ubifs: switch to %pd Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:51 -04:00
Al Viro	a6a9f18f0a	nfsd: switch to %p[dD] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:51 -04:00
Al Viro	6de1472f1a	nfs: use %p[dD] instead of open-coded (and often racy) equivalents Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:50 -04:00
Al Viro	48bc06e74b	befs: split symlink iops in two - for short and long symlinks resp. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:50 -04:00
Al Viro	87dc800be2	new helper: kfree_put_link() duplicated to hell and back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:49 -04:00
Al Viro	12f3887222	libfs: get exports to definitions of objects being exported... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:49 -04:00
Al Viro	cbe9c08524	ecryptfs: ->lower_path.dentry is never NULL ... on anything found via ->d_fsdata Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:48 -04:00
Al Viro	92dd123033	ecryptfs: get rid of ecryptfs_set_dentry_lower{,_mnt} Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:48 -04:00
Al Viro	2edbfbf1c1	ecryptfs: don't leave RCU pathwalk immediately If the underlying dentry doesn't have ->d_revalidate(), there's no need to force dropping out of RCU mode. All we need for that is to make freeing ecryptfs_dentry_info RCU-delayed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:48 -04:00
Al Viro	3a93e17cf6	ecryptfs: check DCACHE_OP_REVALIDATE instead of ->d_op Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:47 -04:00
Al Viro	ceaec15d49	9p: make v9fs_cache_inode_{get,put,set}_cookie empty inlines for !9P_CACHEFS Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-24 23:34:47 -04:00
Colin Ian King	43b7c6c6a4	eCryptfs: fix 32 bit corruption issue Shifting page->index on 32 bit systems was overflowing, causing data corruption of > 4GB files. Fix this by casting it first. https://launchpad.net/bugs/1243636 Signed-off-by: Colin Ian King <colin.king@canonical.com> Reported-by: Lars Duesing <lars.duesing@camelotsweb.de> Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Tyler Hicks <tyhicks@canonical.com>	2013-10-24 12:36:30 -07:00
Dave Chinner	c963c6193a	xfs: split xfs_rtalloc.c for userspace sanity xfs_rtalloc.c is partially shared with userspace. Split the file up into two parts - one that is kernel private and the other which is wholly shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-23 17:16:32 -05:00
Dave Chinner	a4fbe6ab1e	xfs: decouple inode and bmap btree header files Currently the xfs_inode.h header has a dependency on the definition of the BMAP btree records as the inode fork includes an array of xfs_bmbt_rec_host_t objects in it's definition. Move all the btree format definitions from xfs_btree.h, xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to xfs_format.h to continue the process of centralising the on-disk format definitions. With this done, the xfs inode definitions are no longer dependent on btree header files. The enables a massive culling of unnecessary includes, with close to 200 #include directives removed from the XFS kernel code base. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-23 16:28:49 -05:00
Dave Chinner	239880ef64	xfs: decouple log and transaction headers xfs_trans.h has a dependency on xfs_log.h for a couple of structures. Most code that does transactions doesn't need to know anything about the log, but this dependency means that they have to include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header files and clean up the includes to be in dependency order. In doing this, remove the direct include of xfs_trans_reserve.h from xfs_trans.h so that we remove the dependency between xfs_trans.h and xfs_mount.h. Hence the xfs_trans.h include can be moved to the indicate the actual dependencies other header files have on it. Note that these are kernel only header files, so this does not translate to any userspace changes at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-23 16:17:44 -05:00
Dave Chinner	d420e5c810	xfs: remove unused transaction callback variables We don't do callbacks at transaction commit time, no do we have any infrastructure to set up or run such callbacks, so remove the variables and typedefs for these operations. If we ever need to add callbacks, we can reintroduce the variables at that time. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-23 14:30:51 -05:00
Dave Chinner	9aede1d81b	xfs: split dquot buffer operations out Parts of userspace want to be able to read and modify dquot buffers (e.g. xfs_db) so we need to split out the reading and writing of these buffers so it is easy to shared code with libxfs in userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-23 14:28:35 -05:00
Dave Chinner	5706278758	xfs: unify directory/attribute format definitions The on-disk format definitions for the directory and attribute structures are spread across 3 header files right now, only one of which is dedicated to defining on-disk structures and their manipulation (xfs_dir2_format.h). Pull all the format definitions into a single header file - xfs_da_format.h - and switch all the code over to point at that. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-23 14:21:40 -05:00
Dave Chinner	70a9883c5f	xfs: create a shared header file for format-related information All of the buffer operations structures are needed to be exported for xfs_db, so move them all to a common location rather than spreading them all over the place. They are verifying the on-disk format, so while xfs_format.h might be a good place, it is not part of the on disk format. Hence we need to create a new header file that we centralise these related definitions. Start by moving the bffer operations structures, and then also move all the other definitions that have crept into xfs_log_format.h and xfs_format.h as there was no other shared header file to put them in. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-23 14:11:30 -05:00
wang.bo116@zte.com.cn	e71d1a59e7	UBIFS: remove unnecessary code in ubifs_garbage_collect In ubifs_garbage_collect,local variable "space_before" calculate twice. In fact, at the beginning of the loop, there is no need to calculate this variable. Calculate it before call "ubifs_garbage_collect_leb" is enough. This patch just remove the unnecessary calculate code. Signed-off-by: wang bo <wang.bo116@zte.com.cn> Acked-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2013-10-22 13:34:27 +01:00
Gu Zheng	7bd59381c8	f2fs: introduce f2fs_kmem_cache_alloc to hide the unfailed, kmem cache allocation Introduce the unfailed version of kmem_cache_alloc named f2fs_kmem_cache_alloc to hide the retry routine and make the code a bit cleaner. v2: Fix the wrong use of 'retry' tag pointed out by Gao feng. Use more neat code to remove redundant tag suggested by Haicheng Li. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-22 20:16:02 +09:00
Randy Dunlap	69c88dc7d9	vfs: fix new kernel-doc warnings Move kernel-doc notation to immediately before its function to eliminate kernel-doc warnings introduced by commit `db14fc3abc` ("vfs: add d_walk()") Warning(fs/dcache.c:1343): No description found for parameter 'data' Warning(fs/dcache.c:1343): No description found for parameter 'dentry' Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-22 12:02:40 +01:00
Randy Dunlap	606d6fe3ff	fs/namei.c: fix new kernel-doc warning Add @path parameter to fix kernel-doc warning. Also fix a spello/typo. Warning(fs/namei.c:2304): No description found for parameter 'path' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-22 12:02:40 +01:00
Haicheng Li	435f2a1b58	f2fs: no need to check other dirty_segmap when the seg has been found Because one dirty seg can only be mapped to one dirty_type. Otherwise, it's a bug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> [Jaegeuk Kim: modify a comment related to this patch] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-22 19:57:31 +09:00
Haicheng Li	cffbfa6648	f2fs: use true and false for boolean value Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-22 19:49:39 +09:00
Linus Torvalds	d24fec3991	Just a patch to fix an oops in an error path. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJSZVtaAAoJEDaohF61QIxkQwoP/2uqO2kg0b0ndR2pyCeUIu6a uMZ5/dC1DZ8CEVPLudu5Cb6mdS646rUEv4MjfZx6z7tJBWv0QpesiSnZN0vDlP3i Mj8iA/JckzbZv734Y7RQzpVfN+k/BOG/8YMrEQY3c9loD9yOzqGazOF6OK38O1E8 CLQ2HeX0sigCdlYQOe9Lx8D0QiRlx91Yx8GH41wzAy5HGIWlJ2TxFLPf0upS1OPl PzH0G5mnS6apUndIxobk/z8w5q40+x2MWXG8aXNZflro7h4gp9L5DyfzaO/1dZV0 WgS9zbjAOJKx8N0eAA1Z0PyNJ2i2/BLlpsw/6asm5CwEqMp134TCvv53oaihaIK/ 0P9Z4auXXuqKAc3Ok31HhGnWUwEhcY9TYRNqnH6dYGcg0YfQAWRpGdHPK7yFf85g MoTcgCqrcI9V4bxdECCdGTA798FOocuo2ShMeABJ73Zl97W3c0e91cAA2dPJ0N8+ LaqmdP0cb0T5pJjbdQ2uDgQOK2JkoKQgkeilHHndRYT6cM+R4BFKTlft3ga/0ZLn GVubFNrL/T6rHVmK7014GvvX5NgsRzWd2yK01NYZGQFe/aOs0Eb86ed2R08X/+lh q9lmrvHZ6ATU9XvQsFMynnOLBWEMcPCC5rBEilUS70GIz8GENoG58XcBf4d2adiB 5cDZlF5/v2BBDUt8vjK5 =bh3d -----END PGP SIGNATURE----- Merge tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy Pull jfs bugfix from David Kleikamp: "Just a patch to fix an oops in an error path" * tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy: jfs: fix error path in ialloc	2013-10-22 09:01:11 +01:00
Christoph Hellwig	865e9446b4	xfs: fold xfs_change_file_space into xfs_ioc_space Now that only one caller of xfs_change_file_space is left it can be merged into said caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-21 16:57:03 -05:00
Christoph Hellwig	83aee9e4c2	xfs: simplify the fallocate path Call xfs_alloc_file_space or xfs_free_file_space directly from xfs_file_fallocate instead of going through xfs_change_file_space. This simplified the code by removing the unessecary marshalling of the arguments into an xfs_flock64_t structure and allows removing checks that are already done in the VFS code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-21 16:56:21 -05:00
Christoph Hellwig	5f8aca8b43	xfs: always hold the iolock when calling xfs_change_file_space Currently fallocate always holds the iolock when calling into xfs_change_file_space, while the ioctl path lets some of the lower level functions take it, but leave it out in others. This patch makes sure the ioctl path also always holds the iolock and thus introduces consistent locking for the preallocation operations while simplifying the code and allowing to kill the now unused XFS_ATTR_NOLOCK flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-21 16:54:22 -05:00
Christoph Hellwig	001a3e7370	xfs: remove the unused XFS_ATTR_NONBLOCK flag Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-21 16:53:11 -05:00
Christoph Hellwig	76ca4c238c	xfs: always take the iolock around xfs_setattr_size There is no reason to conditionally take the iolock inside xfs_setattr_size when we can let the caller handle it unconditionally, which just incrases the lock hold time for the case where it was previously taken internally by a few instructions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-21 16:51:33 -05:00
Al Viro	c7314d74fc	nfsd regression since delayed fput() Background: nfsd v[23] had throughput regression since delayed fput went in; every read or write ends up doing fput() and we get a pair of extra context switches out of that (plus quite a bit of work in queue_work itselfi, apparently). Use of schedule_delayed_work() gives it a chance to accumulate a bit before we do __fput() on all of them. I'm not too happy about that solution, but... on at least one real-world setup it reverts about 10% throughput loss we got from switch to delayed fput. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-20 08:44:39 -04:00
Greg Kroah-Hartman	a7204d72db	Merge 3.12-rc6 into driver-core-next We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-19 13:05:38 -07:00
Linus Torvalds	bdeeab62a6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a regression in our initial rc1 pull. When doing nocow writes we were sometimes starting a transaction with locks held" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: release path before starting transaction in can_nocow_extent	2013-10-18 16:46:21 -07:00
Peter A. Felvegi	444996027e	udf: fix for pathetic mount times in case of invalid file system The UDF driver was not strict enough about checking the IDs in the VSDs when mounting, which resulted in reading through all the sectors of the block device in some unfortunate cases. Eg, trying to mount my uninitialized 200G SSD partition (all 0xFF bytes) took ~350 minutes to fail, because the code expected some of the valid IDs or a zero byte. During this, the mount couldn't be killed, sync from the cmdline blocked, and the machine froze into the shutdown. Valid filesystems (extX, btrfs, ntfs) were rejected by the mere accident of having a zero byte at just the right place in some of their sectors, close enough to the beginning not to generate excess I/O. The fix adds a hard limit on the VSD sector offset, adds the two missing VSD IDs, and stops scanning when encountering an invalid ID. Also replaced the magic number 32768 with a more meaningful #define, and supressed the bogus message about failing to read the first sector if no UDF fs was detected. Signed-off-by: Peter A. Felvegi <petschy@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2013-10-18 22:39:07 +02:00
Josef Bacik	1bda19eb73	Btrfs: release path before starting transaction in can_nocow_extent We can't be holding tree locks while we try to start a transaction, we will deadlock. Thanks, Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-10-18 12:43:40 -04:00
Linus Torvalds	04919afb85	Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French: "Five small cifs fixes (includes fixes for: unmount hang, 2 security related, symlink, large file writes)" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: cifs: ntstatus_to_dos_map[] is not terminated cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods cifs: Fix inability to write files >2GB to SMB2/3 shares cifs: Avoid umount hangs with smb2 when server is unresponsive do not treat non-symlink reparse points as valid symlinks	2013-10-17 18:49:21 -07:00
Theodore Ts'o	efbed4dc58	ext4: add ratelimiting to ext4 messages In the case of a storage device that suddenly disappears, or in the case of significant file system corruption, this can result in a huge flood of messages being sent to the console. This can overflow the file system containing /var/log/messages, or if a serial console is configured, this can slow down the system so much that a hardware watchdog can end up triggering forcing a system reboot. Google-Bug-Id: `7258357` Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2013-10-17 21:11:01 -04:00
Jaegeuk Kim	87a9bd2656	f2fs: avoid to write during the recovery This patch enhances the recovery routine not to write any data/node/meta until its completion. If any writes are sent to the disk, it could contaminate the written history that will be used for further recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-18 09:44:14 +09:00
Gu Zheng	e234088758	f2fs: avoid wait if IO end up when do_checkpoint for better performance Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back. Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting, and no additional wake up mechanism was introduced if IO ends up before regular period costed. Yuan Zhong found there is a situation that after the pages have been written back, but the checkpoint thread still wait for congestion_wait to exit. So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait for IO completes if there's IO going on, and in the end IO path, wake up checkpoint task when IO ends up. Thanks to Yuan Zhong's pre work about this problem. Reported-by: Yuan Zhong <yuan.mark.zhong@samsung.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-18 09:44:14 +09:00
Gu Zheng	9076a75f8e	f2fs: introduce function read_raw_super_block() Introduce function read_raw_super_block() to hide reading raw super block and the retry routine if the first sb is invalid. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-18 09:44:13 +09:00
Jaegeuk Kim	b1838f8952	f2fs: fix the starvation problem on cp_rwsem This patch removes the logic previously introduced to address the starvation on cp_rwsem. One potential there-in bug is that we should cover the wait.list with spin_lock, but the previous code broke this rule. And, actually current rwsem handles this starvation issue reasonably, so that we didn't need to do this before neither. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-18 09:44:13 +09:00
Jaegeuk Kim	3d1e38073b	f2fs: fix to store and retrieve i_rdev correctly When storing i_rdev, we should check its file type. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-18 09:43:38 +09:00
Ming Lei	aeac589a74	ext4: fix performance regression in ext4_writepages Commit 4e7ea81db5(ext4: restructure writeback path) introduces another performance regression on random write: - one more page may be added to ext4 extent in mpage_prepare_extent_to_map, and will be submitted for I/O so nr_to_write will become -1 before 'done' is set - the worse thing is that dirty pages may still be retrieved from page cache after nr_to_write becomes negative, so lots of small chunks can be submitted to block device when page writeback is catching up with write path, and performance is hurted. On one arm A15 board with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB, SATA controller: 3.0Gbps), this patch can improve below test's result from 157MB/sec to 174MB/sec(>10%): dd if=/dev/zero of=./z.img bs=8K count=512K The above test is actually prototype of block write in bonnie++ utility. This patch makes sure no more pages than nr_to_write can be added to extent for mapping, so that nr_to_write won't become negative. Cc: linux-ext4@vger.kernel.org Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2013-10-17 18:56:16 -04:00
Eric Sandeen	59e5a0e821	xfs: don't break from growfs ag update loop on error When xfs_growfs_data_private() is updating backup superblocks, it bails out on the first error encountered, whether reading or writing: * If we get an error writing out the alternate superblocks, * just issue a warning and continue. The real work is * already done and committed. This can cause a problem later during repair, because repair looks at all superblocks, and picks the most prevalent one as correct. If we bail out early in the backup superblock loop, we can end up with more "bad" matching superblocks than good, and a post-growfs repair may revert the filesystem to the old geometry. With the combination of superblock verifiers and old bugs, we're more likely to encounter read errors due to verification. And perhaps even worse, we don't even properly write any of the newly-added superblocks in the new AGs. Even with this change, growfs will still say: xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning data blocks changed from 319815680 to 335216640 which might be confusing to the user, but it at least communicates that something has gone wrong, and dmesg will probably highlight the need for an xfs_repair. And this is still best-effort; if verifiers fail on more than half the backup supers, they may still "win" - but that's probably best left to repair to more gracefully handle by doing its own strict verification as part of the backup super "voting." Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-17 13:31:42 -05:00
Eric Sandeen	31625f28ad	xfs: don't emit corruption noise on fs probes If we get EWRONGFS due to probing of non-xfs filesystems, there's no need to issue the scary corruption error and backtrace. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-17 13:31:25 -05:00
Eric Sandeen	08e96e1a3c	xfs: remove newlines from strings passed to __xfs_printk __xfs_printk adds its own "\n". Having it in the original string leads to unintentional blank lines from these messages. Most format strings have no newline, but a few do, leading to i.e.: [ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119911] [ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119919] Fix them all. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-17 13:30:29 -05:00
Dave Chinner	2c6e24ce1a	xfs: prevent deadlock trying to cover an active log Recent analysis of a deadlocked XFS filesystem from a kernel crash dump indicated that the filesystem was stuck waiting for log space. The short story of the hang on the RHEL6 kernel is this: - the tail of the log is pinned by an inode - the inode has been pushed by the xfsaild - the inode has been flushed to it's backing buffer and is currently flush locked and hence waiting for backing buffer IO to complete and remove it from the AIL - the backing buffer is marked for write - it is on the delayed write queue - the inode buffer has been modified directly and logged recently due to unlinked inode list modification - the backing buffer is pinned in memory as it is in the active CIL context. - the xfsbufd won't start buffer writeback because it is pinned - xfssyncd won't force the log because it sees the log as needing to be covered and hence wants to issue a dummy transaction to move the log covering state machine along. Hence there is no trigger to force the CIL to the log and hence unpin the inode buffer and therefore complete the inode IO, remove it from the AIL and hence move the tail of the log along, allowing transactions to start again. Mainline kernels also have the same deadlock, though the signature is slightly different - the inode buffer never reaches the delayed write lists because xfs_buf_item_push() sees that it is pinned and hence never adds it to the delayed write list that the xfsaild flushes. There are two possible solutions here. The first is to simply force the log before trying to cover the log and so ensure that the CIL is emptied before we try to reserve space for the dummy transaction in the xfs_log_worker(). While this might work most of the time, it is still racy and is no guarantee that we don't get stuck in xfs_trans_reserve waiting for log space to come free. Hence it's not the best way to solve the problem. The second solution is to modify xfs_log_need_covered() to be aware of the CIL. We only should be attempting to cover the log if there is no current activity in the log - covering the log is the process of ensuring that the head and tail in the log on disk are identical (i.e. the log is clean and at idle). Hence, by definition, if there are items in the CIL then the log is not at idle and so we don't need to attempt to cover it. When we don't need to cover the log because it is active or idle, we issue a log force from xfs_log_worker() - if the log is idle, then this does nothing. However, if the log is active due to there being items in the CIL, it will force the items in the CIL to the log and unpin them. In the case of the above deadlock scenario, instead of xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to cover the log, it will instead force the log, thereby unpinning the inode buffer, allowing IO to be issued and complete and hence removing the inode that was pinning the tail of the log from the AIL. At that point, everything will start moving along again. i.e. the xfs_log_worker turns back into a watchdog that can alleviate deadlocks based around pinned items that prevent the tail of the log from being moved... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-17 10:56:17 -05:00
Linus Torvalds	056cdce0d3	Merge branch 'akpm' (fixes from Andrew Morton) Merge misc fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) mm: revert mremap pud_free anti-fix mm: fix BUG in __split_huge_page_pmd swap: fix set_blocksize race during swapon/swapoff procfs: call default get_unmapped_area on MMU-present architectures procfs: fix unintended truncation of returned mapped address writeback: fix negative bdi max pause percpu_refcount: export symbols fs: buffer: move allocation failure loop into the allocator mm: memcg: handle non-error OOM situations more gracefully tools/testing/selftests: fix uninitialized variable block/partitions/efi.c: treat size mismatch as a warning, not an error mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages mm/zswap: bugfix: memory leak when re-swapon mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages mm: migration: do not lose soft dirty bit if page is in migration state gcov: MAINTAINERS: Add an entry for gcov mm/hugetlb.c: correct missing private flag clearing mm/vmscan.c: don't forget to free shrinker->nr_deferred ipc/sem.c: synchronize semop and semctl with IPC_RMID ipc: update locking scheme comments ...	2013-10-16 21:36:03 -07:00
HATAYAMA Daisuke	fad1a86e25	procfs: call default get_unmapped_area on MMU-present architectures Commit `c4fe244857` ("sparc: fix PCI device proc file mmap(2)") added proc_reg_get_unmapped_area in proc_reg_file_ops and proc_reg_file_ops_no_compat, by which now mmap always returns EIO if get_unmapped_area method is not defined for the target procfs file, which causes regression of mmap on /proc/vmcore. To address this issue, like get_unmapped_area(), call default current->mm->get_unmapped_area on MMU-present architectures if pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation in the procfs file, is not defined. Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-16 21:35:53 -07:00
HATAYAMA Daisuke	2cbe3b0af8	procfs: fix unintended truncation of returned mapped address Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the mapped virtual address returned from get_unmapped_area method in pde->proc_fops due to the variable rv of signed integer on x86_64. This is too small to have vitual address of unsigned long on x86_64 since on x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes. To fix this issue, use unsigned long instead. Fixes a regression added in commit `c4fe244857` ("sparc: fix PCI device proc file mmap(2)"). Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-16 21:35:53 -07:00
Johannes Weiner	84235de394	fs: buffer: move allocation failure loop into the allocator Buffer allocation has a very crude indefinite loop around waking the flusher threads and performing global NOFS direct reclaim because it can not handle allocation failures. The most immediate problem with this is that the allocation may fail due to a memory cgroup limit, where flushers + direct reclaim might not make any progress towards resolving the situation at all. Because unlike the global case, a memory cgroup may not have any cache at all, only anonymous pages but no swap. This situation will lead to a reclaim livelock with insane IO from waking the flushers and thrashing unrelated filesystem cache in a tight loop. Use __GFP_NOFAIL allocations for buffers for now. This makes sure that any looping happens in the page allocator, which knows how to orchestrate kswapd, direct reclaim, and the flushers sensibly. It also allows memory cgroups to detect allocations that can't handle failure and will allow them to ultimately bypass the limit if reclaim can not make progress. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-16 21:35:53 -07:00
Cyrill Gorcunov	e9cdd6e771	mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages If a page we are inspecting is in swap we may occasionally report it as having soft dirty bit (even if it is clean). The pte_soft_dirty helper should be called on present pte only. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-16 21:35:52 -07:00
Linus Torvalds	0056019da4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull tmpfile fix from Al Viro: "A fix for double iput() in ->tmpfile() on ext3 and ext4; I'd fucked it up, Miklos has caught it" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext[34]: fix double put in tmpfile	2013-10-16 17:18:18 -07:00
Geyslan G. Bem	3edc8376c0	ecryptfs: Fix memory leakage in keystore.c In 'decrypt_pki_encrypted_session_key' function: Initializes 'payload' pointer and releases it on exit. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: stable@vger.kernel.org # v2.6.28+	2013-10-16 15:18:01 -07:00
Bart Van Assche	a97f4a66d8	dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY When dlm_release_lockspace(ls, 1) is invoked on a busy system immediately after the last dlm_unlock() AST has finished it can occur that lkb_idr_is_local() is invoked for the unlocked LKB since removal from ls_lkbidr only occurs after the AST has returned. If that happens dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing the lockspace. Fix this race condition by changing lkb_idr_is_local() such that it only returns true for LKB's that have not yet been unlocked. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Teigland <teigland@redhat.com>	2013-10-16 10:32:42 -05:00
Eric Sandeen	2046fd1873	ext3: Count journal as bsddf overhead in ext3_statfs ext4 counts journal space as bsddf overhead, but ext3 does not. For some reason when I patched ext4 I thought I should leave ext3 alone, but frankly it makes more sense to fix it, I think. Otherwise we get inconsistent behavior from ext3 under ext3.ko, and ext3 under ext4.ko, which is not at all desirable... This is testable by xfstests shared/289, though it will need modification because it currently special-cases ext3. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2013-10-16 14:29:17 +02:00
Jan Kara	7534e854b9	ext4: fixup kerndoc annotation of mpage_map_and_submit_extent() Document give_up_on_write argument of mpage_map_and_submit_extent(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2013-10-16 08:26:08 -04:00
Jan Kara	78371a45df	ext4: fix assertion in ext4_add_complete_io() It doesn't make sense to require io_end->handle when we are in nojournal mode. So update the assertion accordingly to avoid false warnings from ext4_add_complete_io(). Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2013-10-16 08:25:11 -04:00
Miklos Szeredi	43ae9e3fc7	ext[34]: fix double put in tmpfile d_tmpfile() already swallowed the inode ref. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-15 12:14:06 -04:00
Steven Whitehouse	e66cf16109	GFS2: Use lockref for glocks Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-15 15:18:08 +01:00
Tim Gardner	0c26606cbe	cifs: ntstatus_to_dos_map[] is not terminated Functions that walk the ntstatus_to_dos_map[] array could run off the end. For example, ntstatus_to_dos() loops while ntstatus_to_dos_map[].ntstatus is not 0. Granted, this is mostly theoretical, but could be used as a DOS attack if the error code in the SMB header is bogus. [Might consider adding to stable, as this patch is low risk - Steve] Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-14 12:14:01 -05:00
Benjamin Herrenschmidt	d723a92dd4	sysfs/bin: Fix size handling overflow for bin_attribute While looking at the code, I noticed that bin_attribute read() and write() ops copy the inode size into an int for futher comparisons. Some bin_attributes can be fairly large. For example, pci creates some for BARs set to the BAR size and giant BARs are around the corner, so this is going to break something somewhere eventually. Let's use the right type. [adjust for seqfile conversions, only needed for bin_read() - gkh] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-14 10:07:19 -07:00
Tejun Heo	785a162d14	sysfs: make sysfs_file_ops() follow ignore_lockdep flag `375b611e60` ("sysfs: remove sysfs_buffer->ops") introduced sysfs_file_ops() which determines the associated file operation of a given sysfs_dirent. As file ops access should be protected by an active reference, the new function includes a lockdep assertion on the sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep flag into account and the lockdep assertion trips spuriously for files which opt out from active reference lockdep checking. # cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized ------------[ cut here ]------------ WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60() Modules linked in: CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000 ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60 ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50 Call Trace: [<ffffffff81ca0131>] dump_stack+0x4e/0x82 [<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20 [<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60 [<ffffffff8125a274>] sysfs_open_file+0x54/0x300 [<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280 [<ffffffff811df820>] finish_open+0x30/0x40 [<ffffffff811f0623>] do_last+0x503/0xd90 [<ffffffff811f0f6b>] path_openat+0xbb/0x6d0 [<ffffffff811f23ba>] do_filp_open+0x3a/0x90 [<ffffffff811e09a9>] do_sys_open+0x129/0x220 [<ffffffff811e0abe>] SyS_open+0x1e/0x20 [<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b ---[ end trace aa48096b111dafdb ]--- Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep assertion if sysfs_ignore_lockdep() is true. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-14 08:40:39 -07:00
Linus Torvalds	9d05746e7b	vfs: allow O_PATH file descriptors for fstatfs() Olga reported that file descriptors opened with O_PATH do not work with fstatfs(), found during further development of ksh93's thread support. There is no reason to not allow O_PATH file descriptors here (fstatfs is very much a path operation), so use "fdget_raw()". See commit `55815f7014` ("vfs: make O_PATH file descriptors usable for 'fstat()'") for a very similar issue reported for fstat() by the same team. Reported-and-tested-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-12 13:12:31 -07:00
Linus Torvalds	be5090da4a	A bug fix and performance regression fix for ext4. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCAAGBQJSWZeNAAoJENNvdpvBGATwx6kP/2mVlKlNBXVfGUmLVP3Xb68v 4JhBzlC3ra3TRqVkw6C6kx4fbdq0cW/mDkecmYg2s+aDnswG/94/+yRdU4kQkyne iqN22ZYA7CumZgJvR0Z2ptWksDRpv8H5twgdVbPtad6/2cKmjseUraPo7YZhjDCe O9eRCXyVII305soAddZzZUgWOWCSWpdTW5zBitKaGq5x/K//rY9UlPVSuAo+9KPZ vyBiKJ1R6fDbtyH7JhCdXydMPKzlAPmyqYBQGLyq2GsRsXDp/VljGci6QN0iuZ5k lZsxFg8q0P6/R4Pjr3DDtE0tUbPXEyMxuquh/m4b3pAXRoMMCynyLP2zy7Gc7ec0 ek2ty+sVG06JjseqigHSmS/a+PdZgDY5xEMKhaK4X38lxRPb7apNktolXxxEt6eU OPZsuvma1g+lbkkCdRO5FVwMllb7cuPhuZPGyxZvmP+ON59oT5QOVsDC+55WnHNs Ib11PCTN93Mwhrm1YPNWVV+gWG50eLZQYJam6H4mE4knaXnba6htEhYrdNczoFH4 lcHaJzCDJLnYVRRbKXKdLSSnyz1X9cYJBP9g5ks1iNy7/JreF7WoIAOWvZWCp432 7NC0IOmV4Q4itiCTcSh85rGlsXU8ZA7wK5HILhp9qZmNkw30OMvihNoWoTFiWTJR mVCkm+isBbqMP0nhV5km =ZwlW -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "A bug fix and performance regression fix for ext4" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix memory leak in xattr ext4: fix performance regression in writeback of random writes	2013-10-12 12:55:15 -07:00
Linus Torvalds	d64dab903f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We've got more bug fixes in my for-linus branch: One of these fixes another corner of the compression oops from last time. Miao nailed down some problems with concurrent snapshot deletion and drive balancing. I kept out one of his patches for more testing, but these are all stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix oops caused by the space balance and dead roots Btrfs: insert orphan roots into fs radix tree Btrfs: limit delalloc pages outside of find_delalloc_range Btrfs: use right root when checking for hash collision	2013-10-12 12:54:24 -07:00
Dave Jones	6e4ea8e33b	ext4: fix memory leak in xattr If we take the 2nd retry path in ext4_expand_extra_isize_ea, we potentionally return from the function without having freed these allocations. If we don't do the return, we over-write the previous allocation pointers, so we leak either way. Spotted with Coverity. [ Fixed by tytso to set is and bs to NULL after freeing these pointers, in case in the retry loop we later end up triggering an error causing a jump to cleanup, at which point we could have a double free bug. -- Ted ] Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org	2013-10-12 14:39:49 -04:00
Miao Xie	c00869f1ae	Btrfs: fix oops caused by the space balance and dead roots When doing space balance and subvolume destroy at the same time, we met the following oops: kernel BUG at fs/btrfs/relocation.c:2247! RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs] Call Trace: [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs] [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs] [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs] [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs] [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs] [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs] [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs] [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs] [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs] [<ffffffff81122f53>] ? path_openat+0x234/0x4db [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b It is because we returned the error number if the reference of the root was 0 when doing space relocation. It was not right here, because though the root was dead(refs == 0), but the space it held still need be relocated, or we could not remove the block group. So in this case, we should return the root no matter it is dead or not. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-10-10 21:31:02 -04:00
Miao Xie	14927d9546	Btrfs: insert orphan roots into fs radix tree Now we don't drop all the deleted snapshots/subvolumes before the space balance. It means we have to relocate the space which is held by the dead snapshots/subvolumes. So we must into them into fs radix tree, or we would forget to commit the change of them when doing transaction commit, and it would corrupt the metadata. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-10-10 21:30:53 -04:00
Josef Bacik	7bf811a595	Btrfs: limit delalloc pages outside of find_delalloc_range Liu fixed part of this problem and unfortunately I steered him in slightly the wrong direction and so didn't completely fix the problem. The problem is we limit the size of the delalloc range we are looking for to max bytes and then we try to lock that range. If we fail to lock the pages in that range we will shrink the max bytes to a single page and re loop. However if our first page is inside of the delalloc range then we will end up limiting the end of the range to a period before our first page. This is illustrated below [0 -------- delalloc range --------- 256mb] [page] So find_delalloc_range will return with delalloc_start as 0 and end as 128mb, and then we will notice that delalloc_start < *start and adjust it up, but not adjust delalloc_end up, so things go sideways. To fix this we need to not limit the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that way we don't end up with this confusion. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-10-10 21:27:56 -04:00
Josef Bacik	4871c1588f	Btrfs: use right root when checking for hash collision btrfs_rename was using the root of the old dir instead of the root of the new dir when checking for a hash collision, so if you tried to move a file into a subvol it would freak out because it would see the file you are trying to move in its current root. This fixes the bug where this would fail btrfs subvol create test1 btrfs subvol create test2 mv test1 test2. Thanks to Chris Murphy for catching this, Cc: stable@vger.kernel.org Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-10-10 21:27:45 -04:00
Rob Herring	32df8dca50	of: remove HAVE_ARCH_DEVTREE_FIXUPS HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc, but it is only used for /proc/device-teee and sparc does not enable /proc/device-tree. So this option is redundant. Remove the option and always enable it. This has the side effect of fixing /proc/device-tree on arches such as arm64 which failed to define this option. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com>	2013-10-09 20:04:08 -05:00
Rik van Riel	82727018b0	sched/numa: Call task_numa_free() from do_execve() It is possible for a task in a numa group to call exec, and have the new (unrelated) executable inherit the numa group association from its former self. This has the potential to break numa grouping, and is trivial to fix. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:00 +02:00
Mel Gorman	e29cf08b05	sched/numa: Report a NUMA task group ID It is desirable to model from userspace how the scheduler groups tasks over time. This patch adds an ID to the numa_group and reports it via /proc/PID/status. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:47:49 +02:00
Brian Foster	74564fb48c	xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHE The xfs_inactive() return value is meaningless. Turn xfs_inactive() into a void function and clean up the error handling appropriately. Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant to Linux. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-08 17:20:41 -05:00
Brian Foster	88877d2b97	xfs: push down inactive transaction mgmt for ifree Push the inode free work performed during xfs_inactive() down into a new xfs_inactive_ifree() helper. This clears xfs_inactive() from all inode locking and transaction management more directly associated with freeing the inode xattrs, extents and the inode itself. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-08 17:15:01 -05:00
Brian Foster	f7be2d7f59	xfs: push down inactive transaction mgmt for truncate Create the new xfs_inactive_truncate() function to handle the truncate portion of xfs_inactive(). Push the locking and transaction management into the new function. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-08 15:32:11 -05:00
Brian Foster	36b21dde6e	xfs: push down inactive transaction mgmt for remote symlinks Push down the transaction management for remote symlinks from xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is cleaned up to avoid transaction management intended for the calling context (i.e., trans duplication, reservation, item attachment). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-08 14:53:02 -05:00
Mark Tinguely	2900a579ab	xfs: add the inode directory type support to XFS_IOC_FSGEOM Add the inode type directory type support to XFS_IOC_FSGEOM so that xfs_repair/xfs_info knows if the superblock v4 filesystem enabled the feature. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-08 14:28:09 -05:00
Jaegeuk Kim	ccaaca2591	f2fs: fix writing incorrect orphan blocks Previously, there was a erroneous scenario like below. thread 1: thread 2: f2fs_unlink - acquire_orphan_inode : sbi->n_orphans++ write_checkpoint - block_operations : f2fs_lock_all - do_checkpoint : write orphan blocks with sbi->n_orphans - unblock_operations - f2fs_lock_op - release_orphan_inode - f2fs_unlock_op During the checkpoint by thread 2, f2fs stores a wrong orphan block according to the wrong sbi->n_orphans. To avoid this, simply we should make cover acquire_orphan_inode too with f2fs_lock_op. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-08 10:19:28 +09:00
Jaegeuk Kim	5887d291d7	f2fs: avoid unnecessary checkpoints During the f2fs_put_super procedure, we don't need to conduct checkpoint all the time, since we don't need to do that if superblock is clean. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-08 09:32:43 +09:00
Sachin Prabhu	dde2356c84	cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods This allows users to use LANMAN authentication on servers which support unencapsulated authentication. The patch fixes a regression where users using plaintext authentication were no longer able to do so because of changed bought in by patch `3f618223dc` https://bugzilla.redhat.com/show_bug.cgi?id=1011621 Reported-by: Panos Kavalagios <Panagiotis.Kavalagios@eurodyn.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-07 09:57:11 -05:00
Jan Klos	2f6c947963	cifs: Fix inability to write files >2GB to SMB2/3 shares When connecting to SMB2/3 shares, maximum file size is set to non-LFS maximum in superblock. This is due to cap_large_files bit being different for SMB1 and SMB2/3 (where it is just an internal flag that is not negotiated and the SMB1 one corresponds to multichannel capability, so maybe LFS works correctly if server sends 0x08 flag) while capabilities are checked always for the SMB1 bit in cifs_read_super(). The patch fixes this by checking for the correct bit according to the protocol version. CC: Stable <stable@kernel.org> Signed-off-by: Jan Klos <honza.klos@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-07 09:54:45 -05:00
Kelly Anderson	4058c5117d	f2fs: handle remount options correctly The current f2fs code errors if the xattr or acl options are passed when remounting. This is important in a typical scenario where f2fs is mounted as a "ro" root file-system by the boot loader and then the init process wants to remount it "rw" with the "remount,rw" option. Signed-off-by: Kelly Anderson <kelly@xilka.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-07 11:38:13 +09:00
Gu Zheng	e479556bfd	f2fs: use rw_sem instead of fs_lock(locks mutex) The fs_locks is used to block other ops(ex, recovery) when doing checkpoint. And each other operate routine(besides checkpoint) needs to acquire a fs_lock, there is a terrible problem here, if these are too many concurrency threads acquiring fs_lock, so that they will block each other and may lead to some performance problem, but this is not the phenomenon we want to see. Though there are some optimization patches introduced to enhance the usage of fs_lock, but the thorough solution is using a rw_sem to replace the fs_lock. Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other, this can avoid the problem described above completely. Because of the weakness of rw_sem, the above change may introduce a potential problem that the checkpoint thread might get starved if other threads are intensively locking the read semaphore for I/O.(Pointed out by Xu Jin) In order to avoid this, a wait_list is introduced, the appending read semaphore ops will be dropped into the wait_list if checkpoint thread is waiting for write semaphore, and will be waked up when checkpoint thread gives up write semaphore. Thanks to Kim's previous review and test, and will be very glad to see other guys' performance tests about this patch. V2: -fix the potential starvation problem. -use more suitable func name suggested by Xu Jin. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: adjust minor coding standard] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-07 11:33:05 +09:00
Shirish Pargaonkar	eb4c7df6c2	cifs: Avoid umount hangs with smb2 when server is unresponsive Do not send SMB2 Logoff command when reconnecting, the way smb1 code base works. Also, no need to wait for a credit for an echo command when one is already in flight. Without these changes, umount command hangs if the server is unresponsive e.g. hibernating. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@us.ibm.com>	2013-10-06 20:18:42 -05:00
Steve French	c31f330719	do not treat non-symlink reparse points as valid symlinks Windows 8 and later can create NFS symlinks (within reparse points) which we were assuming were normal NTFS symlinks and thus reporting corrupt paths for. Add check for reparse points to make sure that they really are normal symlinks before we try to parse the pathname. We also should not be parsing other types of reparse points (DFS junctions etc) as if they were a symlink so return EOPNOTSUPP on those. Also fix endian errors (we were not parsing symlink lengths as little endian). This fixes commit `d244bf2dfb` which implemented follow link for non-Unix CIFS mounts CC: Stable <stable@kernel.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2013-10-05 21:54:18 -05:00
Tejun Heo	3124eb1679	sysfs: merge regular and bin file handling With the previous changes, sysfs regular file code is ready to handle bin files too. This patch makes bin files share the regular file path. * sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c. * sysfs_init_inode() is updated to use the new sysfs_bin_operations instead of bin_fops for bin files. * fs/sysfs/bin.c and the related pieces are removed. This patch shouldn't introduce any behavior difference to bin file accesses. Overall, this unification reduces the amount of duplicate logic, makes behaviors more consistent and paves the road for building simpler and more versatile interface which will allow other subsystems to make use of sysfs for their pseudo filesystems. v2: Stale fs/sysfs/bin.c reference dropped from Documentation/DocBook/filesystems.tmpl. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay@vrfy.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:27:40 -07:00
Tejun Heo	49fe604781	sysfs: prepare open path for unified regular / bin file handling sysfs bin file handling will be merged into the regular file support. This patch prepares the open path. This patch updates sysfs_open_file() such that it can handle both regular and bin files. This is a preparation and the new bin file path isn't used yet. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:27:40 -07:00
Tejun Heo	73d9714627	sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c sysfs bin file handling will be merged into the regular file support. This patch copies mmap support from bin so that fs/sysfs/file.c can handle mmapping bin files. The code is copied mostly verbatim with the following updates. * ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer references are replaced with sysfs_open_file ones. * Symbols are prefixed with sysfs_. * sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses ->files. Invocation of this function is added to sysfs_addrm_finish(). * sysfs_bin_mmap() is added to sysfs_bin_operations. This is a preparation and the new mmap path isn't used yet. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:27:40 -07:00
Tejun Heo	2f0c6b7593	sysfs: add sysfs_bin_read() sysfs bin file handling will be merged into the regular file support. This patch prepares the read path. Copy fs/sysfs/bin.c::read() to fs/sysfs/file.c and make it use sysfs_open_file instead of bin_buffer. The function is identical copy except for the use of sysfs_open_file. The new function is added to sysfs_bin_operations. This isn't used yet but will eventually replace fs/sysfs/bin.c. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:27:40 -07:00
Tejun Heo	f9b9a6217c	sysfs: prepare path write for unified regular / bin file handling sysfs bin file handling will be merged into the regular file support. This patch prepares the write path. bin file write is almost identical to regular file write except that the write length is capped by the inode size and @off is passed to the write method. This patch adds bin file handling to sysfs_write_file() so that it can handle both regular and bin files. A new file_operations struct sysfs_bin_operations is added, which currently only hosts sysfs_write_file() and generic_file_llseek(). This isn't used yet but will eventually replace fs/sysfs/bin.c. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:27:40 -07:00
Tejun Heo	3ff65d3cb0	sysfs: collapse fs/sysfs/bin.c::fill_read() into read() read() is simple enough and fill_read() being in a separate function doesn't add anything. Let's collapse it into read(). This will make merging bin file handling with regular file. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:27:40 -07:00
Tejun Heo	91270162bf	sysfs: skip bin_buffer->buffer while reading After `b31ca3f5df` ("sysfs: fix deadlock"), bin read() first writes data to bb->buffer and bounces it to a transient kernel buffer which is then copied out to userland. The double bouncing doesn't add anything. Let's just use the transient buffer directly. While at it, rename @temp to @buf for clarity. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:27:40 -07:00
Tejun Heo	13c589d5b0	sysfs: use seq_file when reading regular files sysfs read path implements its own buffering scheme between userland and kernel callbacks, which essentially is a degenerate duplicate of seq_file. This patch replaces the custom read buffering implementation in sysfs with seq_file. While the amount of code reduction is small, this reduces low level hairiness and enables future development of a new versatile API based on seq_file so that sysfs features can be shared with other subsystems. As write path was already converted to not use sysfs_open_file->page, this patch makes ->page and ->count unused and removes them. Userland behavior remains the same except for some extreme corner cases - e.g. sysfs will now regenerate the content each time a file is read after a non-contiguous seek whereas the original code would keep using the same content. While this is a userland visible behavior change, it is extremely unlikely to be noticeable and brings sysfs behavior closer to that of procfs. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:21:03 -07:00
Tejun Heo	8ef445f080	sysfs: use transient write buffer There isn't much to be gained by keeping around kernel buffer while a file is open especially as the read path planned to be converted to use seq_file and won't use the buffer. This patch makes sysfs_write_file() use per-write transient buffer instead of sysfs_open_file->page. This simplifies the write path, enables removing sysfs_open_file->page once read path is updated and will help merging bin file write path which already requires the use of a transient buffer due to a locking order issue. As the function comments of flush_write_buffer() and sysfs_write_buffer() are being updated anyway, reformat them so that they're more conventional. v2: Use min_t() instead of min() in sysfs_write_file() to avoid build warning on arm. Reported by build test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:21:03 -07:00
Tejun Heo	bcafe4eea3	sysfs: add sysfs_open_file->sd and ->file sysfs will be converted to use seq_file for read path, which will make it difficult to pass around multiple pointers directly. This patch adds sysfs_open_file->sd and ->file so that we can reach all the necessary data structures from sysfs_open_file. flush_write_buffer() is updated to drop @dentry which was used to discover the sysfs_dirent as it's now available through sysfs_open_file->sd. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:21:03 -07:00
Tejun Heo	58282d8dc2	sysfs: rename sysfs_buffer to sysfs_open_file sysfs read path will be converted to use seq_file which will handle buffering making sysfs_buffer a misnomer. Rename sysfs_buffer to sysfs_open_file, and sysfs_open_dirent->buffers to ->files. This path is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:16:28 -07:00
Tejun Heo	c75ec764cf	sysfs: add sysfs_open_file_mutex Add a separate mutex to protect sysfs_open_dirent->buffers list. This will allow performing sleepable operations while traversing sysfs_buffers, which will be renamed to sysfs_open_file. Note that currently sysfs_open_dirent->buffers list isn't being used for anything and this patch doesn't make any functional difference. It will be used to merge regular and bin file supports. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:15:48 -07:00
Tejun Heo	375b611e60	sysfs: remove sysfs_buffer->ops Currently, sysfs_ops is fetched during sysfs_open_file() and cached in sysfs_buffer->ops to be used while the file is open. This patch removes the caching and makes each operation directly fetch sysfs_ops. This patch doesn't introduce any behavior difference and is to prepare for merging regular and bin file supports. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 17:04:34 -07:00
Linus Torvalds	e62063d699	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is a small collection of fixes, including a regression fix from Liu Bo that solves rare crashes with compression on. I've merged my for-linus up to 3.12-rc3 because the top commit is only meant for 3.12. The rest of the fixes are also available in my master branch on top of my last 3.11 based pull" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: Fix crash due to not allocating integrity data for a bioset Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing Btrfs: eliminate races in worker stopping code Btrfs: fix crash of compressed writes Btrfs: fix transid verify errors when recovering log tree	2013-10-05 12:17:24 -07:00
Tejun Heo	aea585ef8f	sysfs: remove sysfs_buffer->needs_read_fill ->needs_read_fill is used to implement the following behaviors. 1. Ensure buffer filling on the first read. 2. Force buffer filling after a write. 3. Force buffer filling after a successful poll. However, #2 and #3 don't really work as sysfs doesn't reset file position. While the read buffer would be refilled, the next read would continue from the position after the last read or write, requiring an explicit seek to the start for it to be useful, which makes ->needs_read_fill superflous as read buffer is always refilled if f_pos == 0. Update sysfs_read_file() to test buffer->page for #1 instead and remove ->needs_read_fill. While this changes behavior in extreme corner cases - e.g. re-reading a sysfs file after seeking to non-zero position after a write or poll, it's highly unlikely to lead to actual breakage. This change is to prepare for using seq_file in the read path. While at it, reformat a comment in fill_write_buffer(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 11:02:04 -07:00
Tejun Heo	89e51dab7c	sysfs: remove unused sysfs_buffer->pos Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-05 10:54:47 -07:00
Darrick J. Wong	b208c2f7ce	btrfs: Fix crash due to not allocating integrity data for a bioset When btrfs creates a bioset, we must also allocate the integrity data pool. Otherwise btrfs will crash when it tries to submit a bio to a checksumming disk: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 PGD 2305e4067 PUD 23063d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug] CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000 RIP: 0010:[<ffffffff8111e28a>] [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP: 0018:ffff880230699688 EFLAGS: 00010286 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445 RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000 RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008 R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210 R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720 FS: 00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0 Stack: ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250 ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000 0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900 Call Trace: [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0 [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360 [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150 [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs] [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460 [<ffffffff8123e58a>] generic_make_request+0xca/0x100 [<ffffffff8123e639>] submit_bio+0x79/0x160 [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs] [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs] [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs] [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs] [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0 [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260 [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120 [btrfs] [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs] [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs] [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs] [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0 [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0 [<ffffffff81176ba0>] mount_fs+0x20/0xd0 [<ffffffff81191096>] vfs_kern_mount+0x76/0x120 [<ffffffff81193320>] do_mount+0x200/0xa40 [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80 [<ffffffff81193bf0>] SyS_mount+0x90/0xe0 [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48 89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7 44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74 RIP [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP <ffff880230699688> CR2: 0000000000000018 ---[ end trace 7a96042017ed21e2 ]--- Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-10-05 10:52:10 -04:00
Chris Mason	1329dfc8bb	Merge branch 'for-linus' into for-linus-3.12	2013-10-05 10:51:32 -04:00
Linus Torvalds	a5c984cc29	Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French: "Small set of cifs fixes. Most important is Jeff's fix that works around disconnection problems which can be caused by simultaneous use of user space tools (starting a long running smbclient backup then doing a cifs kernel mount) or multiple cifs mounts through a NAT, and Jim's fix to deal with reexport of cifs share. I expect to send two more cifs fixes next week (being tested now) - fixes to address an SMB2 unmount hang when server dies and a fix for cifs symlink handling of Windows "NFS" symlinks" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] update cifs.ko version [CIFS] Remove ext2 flags that have been moved to fs.h [CIFS] Provide sane values for nlink cifs: stop trying to use virtual circuits CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them	2013-10-04 20:50:16 -07:00
Linus Torvalds	3dbecf0aa9	xfs: bugfixes for 3.12-rc4 - lockdep fix for project quotas - fix for dirent dtype support on v4 filesystems - fix for a memory leak in recovery - fix for build failure due to the recovery fix -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJSTw/vAAoJENaLyazVq6ZOo2EP/RjhwaDqDZHB5bm/axZrtxP6 g31TGvJ+nCUT6JjYX2wnoFuJDT2SDcs5+2gtjk1DRLb3JRQI2uJ+MtHLjDIZJSvE sMAADOgWvTuzx3TsnR4U0MM1/XVnv99k1vinedD6mGq16QtT0OWYsA9AKkMKWd1o OiTGyX4AMCNtfAZkiH9+OR8+BqH1xEEzv28H/Bf7yLSsQHM+v9uKPC5+f7I8bWvB YK8fAxeGmiAfDGR4tQ+tQVoIj3qrJmPyj45ElwAvGCKbOh0LG4/N+dwaCQme0teW xFfXMF+C/94qDom3z0gYAWzSOixgTFmy6gxt+3Mqw7uZ/dNzO+KeKE5Fm8cG11yD y3vxqwav/fLHv1fRUvl5abrAzl5VU8nRAbeQqZBM0xjzgfilMp5Jk2Jvix8OHcO5 edmb7+CkkGdiYD15cSUl2242qKaukB3K1vrHoOlFte42vxELmcHWBRBxuZe8rgV1 czf2xCHkWWjdwUrFeZoxVSEFydfoGIW0clAz8tHPQpVyvnSjRTuugJ8wuN92NyNF xGS5er0lyCqlBCBVCOZX/xTcwSQZ4UNG8qgdzDT26VN1VpTFeaaJlMRwD2GhYMYk 8eYX3Ie/XdECLn5ZaG4xWEJHLarXLcqUI6eMobjkVs+qt/FQl/PzH76qOcZWKKbf kEOhPA1Gh97SZ66+vqaw =eNZa -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs Pull xfs bugfixes from Ben Myers: "There are lockdep annotations for project quotas, a fix for dirent dtype support on v4 filesystems, a fix for a memory leak in recovery, and a fix for the build error that resulted from it. D'oh" * tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs: xfs: Use kmem_free() instead of free() xfs: fix memory leak in xlog_recover_add_to_trans xfs: dirent dtype presence is dependent on directory magic numbers xfs: lockdep needs to know about 3 dquot-deep nesting	2013-10-04 14:47:22 -07:00
Ilya Dryomov	1357272fc7	Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev, can be processed before btrfs_scratch_superblock is called, which would result in a use-after-free on btrfs_device contents. Fix this by zeroing the superblock before the rcu callback is registered. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-10-04 16:02:14 -04:00
Ilya Dryomov	964fb15acf	Btrfs: eliminate races in worker stopping code The current implementation of worker threads in Btrfs has races in worker stopping code, which cause all kinds of panics and lockups when running btrfs/011 xfstest in a loop. The problem is that btrfs_stop_workers is unsynchronized with respect to check_idle_worker, check_busy_worker and __btrfs_start_workers. E.g., check_idle_worker race flow: btrfs_stop_workers(): check_idle_worker(aworker): - grabs the lock - splices the idle list into the working list - removes the first worker from the working list - releases the lock to wait for its kthread's completion - grabs the lock - if aworker is on the working list, moves aworker from the working list to the idle list - releases the lock - grabs the lock - puts the worker - removes the second worker from the working list ...... btrfs_stop_workers returns, aworker is on the idle list FS is umounted, memory is freed ...... aworker is waken up, fireworks ensue With this applied, I wasn't able to trigger the problem in 48 hours, whereas previously I could reliably reproduce at least one of these races within an hour. Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-10-04 16:02:13 -04:00
Liu Bo	385fe0bede	Btrfs: fix crash of compressed writes The crash[1] is found by xfstests/generic/208 with "-o compress", it's not reproduced everytime, but it does panic. The bug is quite interesting, it's actually introduced by a recent commit (`573aecafca`, Btrfs: actually limit the size of delalloc range). Btrfs implements delay allocation, so during writeback, we (1) get a page A and lock it (2) search the state tree for delalloc bytes and lock all pages within the range (3) process the delalloc range, including find disk space and create ordered extent and so on. (4) submit the page A. It runs well in normal cases, but if we're in a racy case, eg. buffered compressed writes and aio-dio writes, sometimes we may fail to lock all pages in the 'delalloc' range, in which case, we need to fall back to search the state tree again with a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset). The mentioned commit has a side effect, that is, in the fallback case, we can find delalloc bytes before the index of the page we already have locked, so we're in the case of (delalloc_end <= *start) and return with (found > 0). This ends with not locking delalloc pages but making ->writepage still process them, and the crash happens. This fixes it by just thinking that we find nothing and returning to caller as the caller knows how to deal with it properly. [1]: ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2170! [...] CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G O 3.11.0+ #8 [...] RIP: 0010:[<ffffffff810f5093>] [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83 [...] [ 4934.248731] Stack: [ 4934.248731] ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a [ 4934.248731] 0000000000000000 0000000000000000 0000000000000fff 0000000000000620 [ 4934.248731] ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0 [ 4934.248731] Call Trace: [ 4934.248731] [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs] [ 4934.248731] [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs] [ 4934.248731] [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b [ 4934.248731] [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs] [ 4934.248731] [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs] [ 4934.248731] [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [ 4934.248731] [<ffffffff810608f5>] kthread+0x8d/0x95 [ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43 [ 4934.248731] [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0 [ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43 [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44 [ 4934.248731] RIP [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83 [ 4934.248731] RSP <ffff8801869f9c48> [ 4934.280307] ---[ end trace 36f06d3f8750236a ]--- Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-10-04 16:02:11 -04:00
Josef Bacik	60e7cd3a4b	Btrfs: fix transid verify errors when recovering log tree If we crash with a log, remount and recover that log, and then crash before we can commit another transaction we will get transid verify errors on the next mount. This is because we were not zero'ing out the log when we committed the transaction after recovery. This is ok as long as we commit another transaction at some point in the future, but if you abort or something else goes wrong you can end up in this weird state because the recovery stuff says that the tree log should have a generation+1 of the super generation, which won't be the case of the transaction that was started for recovery. Fix this by removing the check and _always_ zero out the log portion of the super when we commit a transaction. This fixes the transid verify issues I was seeing with my force errors tests. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-10-04 16:02:09 -04:00
Thierry Reding	b2a42f78ab	xfs: Use kmem_free() instead of free() This fixes a build failure caused by calling the free() function which does not exist in the Linux kernel. Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit `aaaae98022`)	2013-10-04 13:56:12 -05:00
tinguely@sgi.com	9b3b77fe66	xfs: fix memory leak in xlog_recover_add_to_trans Free the memory in error path of xlog_recover_add_to_trans(). Normally this memory is freed in recovery pass2, but is leaked in the error path. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit `519ccb81ac`)	2013-10-04 13:56:03 -05:00
Dave Chinner	6d313498f0	xfs: dirent dtype presence is dependent on directory magic numbers The determination of whether a directory entry contains a dtype field originally was dependent on the filesystem having CRCs enabled. This meant that the format for dtype beign enabled could be determined by checking the directory block magic number rather than doing a feature bit check. This was useful in that it meant that we didn't need to pass a struct xfs_mount around to functions that were already supplied with a directory block header. Unfortunately, the introduction of dtype fields into the v4 structure via a feature bit meant this "use the directory block magic number" method of discriminating the dirent entry sizes is broken. Hence we need to convert the places that use magic number checks to use feature bit checks so that they work correctly and not by chance. The current code works on v4 filesystems only because the dirent size roundup covers the extra byte needed by the dtype field in the places where this problem occurs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit `367993e7c6`)	2013-10-04 13:55:48 -05:00
Dave Chinner	89c6c89af2	xfs: lockdep needs to know about 3 dquot-deep nesting Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); * DEADLOCK * May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit `f112a04971`)	2013-10-04 13:55:33 -05:00
Linus Torvalds	15c83d26e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse bugfixes from Miklos Szeredi: "This contains two more fixes by Maxim for writeback/truncate races and fixes for RCU walk in fuse_dentry_revalidate()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: no RCU mode in fuse_access() fuse: readdirplus: fix RCU walk fuse: don't check_submounts_and_drop() in RCU walk fuse: fix fallocate vs. ftruncate race fuse: wait for writeback in fuse_file_fallocate()	2013-10-04 09:06:13 -07:00
Steven Whitehouse	e46c772dba	GFS2: Protect quota sync generation Now that gfs2_quota_sync can be potentially called from multiple threads, we should protect this bit of code, and the sync generation number in particular in order to ensure that there are no races when syncing quotas. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>	2013-10-04 12:29:34 +01:00
Steven Whitehouse	aabd7c72f5	GFS2: Inline qd_trylock into gfs2_quota_unlock The function qd_trylock was not a trylock despite its name and can be inlined into gfs2_quota_unlock in order to make the code a bit clearer. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>	2013-10-04 11:39:21 +01:00
Steven Whitehouse	1bf59bf6de	GFS2: Make two similar quota code fragments into a function There should be no functional change bar the removal of a test of the MS_READONLY flag which would never be reachable. This merges the common code from qd_fish and qd_trylock into a single function and calls it from both those places. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>	2013-10-04 11:14:46 +01:00
Steven Whitehouse	bef292a72d	GFS2: Remove obsolete quota tunable There is no need for a paramater which relates to the internals of quota to be exposed to users. The only possible use would be to turn it up so large that the memory allocation fails. So lets remove it and set it to a sensible value which ensures that we don't ask for multipage allocations. Currently the size of struct gfs2_holder means that the caluclated value is identical to the previous default value, so there should be no functional change. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>	2013-10-04 09:49:29 +01:00
Tejun Heo	250f7c3fee	sysfs: introduce [__]sysfs_remove() Given a sysfs_dirent, there is no reason to have multiple versions of removal functions. A function which removes the specified sysfs_dirent and its descendants is enough. This patch intorduces [__}sysfs_remove() which replaces all internal variations of removal functions. This will be the only removal function in the planned new sysfs_dirent based interface. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-03 16:38:52 -07:00
Tejun Heo	bcdde7e221	sysfs: make __sysfs_remove_dir() recursive Currently, sysfs directory removal is inconsistent in that it would remove any files directly under it but wouldn't recurse into directories. Thanks to group subdirectories, this doesn't even match with kobject boundaries. sysfs is in the process of being separated out so that it can be used by multiple subsystems and we want to have a consistent behavior - either removal of a sysfs_dirent should remove every descendant entries or none instead of something inbetween. This patch implements proper recursive removal in __sysfs_remove_dir(). The function now walks its subtree in a post-order walk to remove all descendants. This is a behavior change but kobject / driver layer, which currently is the only consumer, has already been updated to handle duplicate removal attempts, so nothing should be broken after this change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-03 16:38:52 -07:00
Tejun Heo	26ea12dec0	kobject: grab an extra reference on kobject->sd to allow duplicate deletes sysfs currently has a rather weird behavior regarding removals. A directory removal would delete all files directly under it but wouldn't recurse into subdirectories, which, while a bit inconsistent, seems to make sense at the first glance as each directory is supposedly associated with a kobject and each kobject can take care of the directory deletion; however, this doesn't really hold as we have groups which can be directories without a kobject associated with it and require explicit deletions. We're in the process of separating out sysfs from kboject / driver core and want a consistent behavior. A removal should delete either only the specified node or everything under it. I think it is helpful to support recursive atomic removal and later patches will implement it. Such change means that a sysfs_dirent associated with kobject may be deleted before the kobject itself is removed if one of its ancestor gets removed before it. As sysfs_remove_dir() puts the base ref, we may end up with dangling pointer on descendants. This can be solved by holding an extra reference on the sd from kobject. Acquire an extra reference on the associated sysfs_dirent on directory creation and put it after removal. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-03 16:38:52 -07:00
Tejun Heo	d69ac5a0bb	sysfs: remove sysfs_addrm_cxt->parent_sd sysfs_addrm_start/finish() enclose sysfs_dirent additions and deletions and sysfs_addrm_cxt is used to record information necessary to finish the operations. Currently, sysfs_addrm_start() takes @parent_sd, records it in sysfs_addrm_cxt, and assumes that all operations in the block are performed under that @parent_sd. This assumption has been fine until now but we want to make some operations behave recursively and, while having @parent_sd recorded in sysfs_addrm_cxt doesn't necessarily prevents that, it becomes confusing. This patch removes sysfs_addrm_cxt->parent_sd and makes sysfs_add_one() take an explicit @parent_sd parameter. Note that sysfs_remove_one() doesn't need the extra argument as its parent is always known from the target @sd. While at it, add __acquires/releases() notations to sysfs_addrm_start/finish() respectively. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-03 16:16:43 -07:00
Linus Torvalds	981d901095	Merge git://git.kvack.org/~bcrl/aio-next Pull aio use-after-free fix from Ben LaHaise. * git://git.kvack.org/~bcrl/aio-next: aio: fix use-after-free in aio_migratepage	2013-10-02 09:38:17 -07:00
Steven Whitehouse	26e43a15d4	GFS2: Move gfs2_icbit_munge into quota.c This function is only called twice, and both callers are quota related, so lets move this function into quota.c and make it static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-02 14:47:02 +01:00
Steven Whitehouse	9e07f2cb3d	GFS2: Speed up starting point selection for block allocation When setting the starting point for block allocation, there were calls to both gfs2_rbm_to_block() and gfs2_rbm_from_block() in the common case of there being an active reservation. The gfs2_rbm_from_block() function can be quite slow, and since the two conversions were effectively a no-op, it makes sense to avoid them entirely in this case. There is no functional change here, but the code should be a bit more efficient after this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-02 14:42:45 +01:00
Steven Whitehouse	7b9cff4671	GFS2: Add allocation parameters structure This patch adds a structure to contain allocation parameters with the intention of future expansion of this structure. The idea is that we should be able to add more information about the allocation in the future in order to allow the allocator to make a better job of placing the requests on-disk. There is no functional difference from applying this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-02 11:13:25 +01:00
Ben Myers	d948709b8e	xfs: remove usage of is_bad_inode XFS never calls mark_inode_bad or iget_failed, so it will never see a bad inode. Remove all checks for is_bad_inode because they are unnecessary. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2013-10-01 17:38:16 -05:00
Jie Liu	17ec81c15f	xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct() At xfs_iext_realloc_direct(), the new_size is changed by adding if_bytes if originally the extent records are stored at the inline extent buffer, and we have to switch from it to a direct extent list for those new allocated extents, this is wrong. e.g, Create a file with three extents which was showing as following, xfs_io -f -c "truncate 100m" /xfs/testme for i in $(seq 0 5 10); do offset=$(($i * $((1 << 20)))) xfs_io -c "pwrite $offset 1m" /xfs/testme done Inline ------ irec: if_bytes bytes_diff new_size 1st 0 16 16 2nd 16 16 32 Switching --------- rnew_size 3rd 32 16 48 + 32 = 80 roundup=128 In this case, the desired value of new_size should be 48, and then it will be roundup to 64 and be assigned to rnew_size. However, this issue has been covered by resetting the if_bytes to the new_size which is calculated at the begnning of xfs_iext_add() before leaving out this function, and in turn make the rnew_size correctly again. Hence, this can not be detected via xfstestes. This patch fix above problem and revise the new_size comments at xfs_iext_realloc_direct() to make it more readable. Also, fix the comments while switching from the inline extent buffer to a direct extent list to reflect this change. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-01 17:33:10 -05:00
Trond Myklebust	99875249bf	NFSv4: Ensure that we disable the resend timeout for NFSv4 The spec states that the client should not resend requests because the server will disconnect if it needs to drop an RPC request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-10-01 18:22:11 -04:00
Trond Myklebust	a6f951ddbd	NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk() In nfs4_proc_getlk(), when some error causes a retry of the call to _nfs4_proc_getlk(), we can end up with Oopses of the form BUG: unable to handle kernel NULL pointer dereference at 0000000000000134 IP: [<ffffffff8165270e>] _raw_spin_lock+0xe/0x30 <snip> Call Trace: [<ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70 [<ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4] [<ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4] [<ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4] [<ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4] The problem is that we don't clear the request->fl_ops after the first try and so when we retry, nfs4_set_lock_state() exits early without setting the lock stateid. Regression introduced by commit `70cc6487a4` (locks: make ->lock release private data before returning in GETLK case) Reported-by: Weston Andros Adamson <dros@netapp.com> Reported-by: Jorge Mora <mora@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: <stable@vger.kernel.org> #2.6.22+	2013-10-01 18:21:28 -04:00
Jie Liu	0799a3e808	xfs: get rid of count from xfs_iomap_write_allocate() Get rid of function variable count from xfs_iomap_write_allocate() as it is unused. Additionally, checkpatch warn me of the following for this change: WARNING: extern prototypes should be avoided in .h files +extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t, So this patch also remove all extern function prototypes at xfs_iomap.h to suppress it to make this code style in consistent manner in this file. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-01 15:42:34 -05:00
Linus Torvalds	517bf8fc21	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs lru leak fix from Al Viro: "The fix in "super: fix for destroy lrus" didn't - they need to be destroyed, all right, but that's the wrong place..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/super.c: fix lru_list leak for real	2013-10-01 10:28:11 -07:00
Al Viro	c2d22ecd3c	fs/super.c: fix lru_list leak for real Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong; the right place is destroy_super(). As it is, we leak them if sget() decides that new superblock it has allocated (and never shown to anybody) isn't needed and should be freed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-10-01 13:11:21 -04:00
Thierry Reding	aaaae98022	xfs: Use kmem_free() instead of free() This fixes a build failure caused by calling the free() function which does not exist in the Linux kernel. Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-10-01 10:26:24 -05:00
Tom Gundersen	cb2ffb26e6	cuse: add fix minor number to /dev/cuse This allows udev (or more recently systemd-tmpfiles) to create /dev/cuse on boot, in the same way as /dev/fuse is currently created, and the corresponding module to be loaded on first access. The corresponding functionalty was introduced for fuse in commit `578454f`. Signed-off-by: Tom Gundersen <teg@jklm.no> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:54 +02:00
Miklos Szeredi	ff17be0864	fuse: writepage: skip already in flight If ->writepage() tries to write back a page whose copy is still in flight, then just skip by calling redirty_page_for_writepage(). This is OK, since now ->writepage() should never be called for data integrity sync. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:53 +02:00
Miklos Szeredi	8b284dc472	fuse: writepages: handle same page rewrites As Maxim Patlasov pointed out, it's possible to get a dirty page while it's copy is still under writeback, despite fuse_page_mkwrite() doing its thing (direct IO). This could result in two concurrent write request for the same offset, with data corruption if they get mixed up. To prevent this, fuse needs to check and delay such writes. This implementation does this by: 1. check if page is still under writeout, if so create a new, single page secondary request for it 2. chain this secondary request onto the in-flight request 2/a. if a seconday request for the same offset was already chained to the in-flight request, then just copy the contents of the page and discard the new secondary request. This makes sure that for each page will have at most two requests associated with it 3. when the in-flight request finished, send off all secondary requests chained onto it Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:53 +02:00
Miklos Szeredi	1e112a484e	fuse: writepages: fix aggregation Checking against tmp-page indexes is not very useful, and results in one (or rarely two) page requests. Which is not much of an improvement... Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:53 +02:00
Maxim Patlasov	2d033eaa00	fuse: fix race in fuse_writepages() The patch fixes a race between ftruncate(2), mmap-ed write and write(2): 1) An user makes a page dirty via mmap-ed write. 2) The user performs shrinking truncate(2) intended to purge the page. 3) Before fuse_do_setattr calls truncate_pagecache, the page goes to writeback. fuse_writepages_fill attaches a new page to FUSE_WRITE request, then releases the original page by end_page_writeback and unlock it. 4) fuse_do_setattr completes and successfully returns. Since now, i_mutex is free. 5) Ordinary write(2) extends i_size back to cover the page. Note that fuse_send_write_pages do wait for fuse writeback, but for another page->index. 6) fuse_writepages_fill attaches more pages to the request (if any), then fuse_writepages_send is eventually called. It is supposed to crop inarg->size of the request, but it doesn't because i_size has already been extended back. Moving end_page_writeback behind fuse_writepages_send guarantees that __fuse_release_nowrite (called from fuse_do_setattr) will crop inarg->size of the request before write(2) gets the chance to extend i_size. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:53 +02:00
Pavel Emelyanov	26d614df1d	fuse: Implement writepages callback The .writepages one is required to make each writeback request carry more than one page on it. The patch enables optimized behaviour unconditionally, i.e. mmap-ed writes will benefit from the patch even if fc->writeback_cache=0. [SzM: simplify, add comments] Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:52 +02:00
Miklos Szeredi	72523425fb	fuse: don't BUG on no write file Don't bug if there's no writable files found for page writeback. If ever this is triggered, a WARN_ON helps debugging it much better then a BUG_ON. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:52 +02:00
Miklos Szeredi	cca2437045	fuse: lock page in mkwrite Lock the page in fuse_page_mkwrite() to protect against a race with fuse_writepage() where the page is redirtied before the actual writeback begins. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:51 +02:00
Pavel Emelyanov	385b126815	fuse: Prepare to handle multiple pages in writeback The .writepages callback will issue writeback requests with more than one page aboard. Make existing end/check code be aware of this. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:51 +02:00
Pavel Emelyanov	adcadfa8f3	fuse: Getting file for writeback helper There will be a .writepageS callback implementation which will need to get a fuse_file out of a fuse_inode, thus make a helper for this. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:44:50 +02:00
Miklos Szeredi	698fa1d163	fuse: no RCU mode in fuse_access() fuse_access() is never called in RCU walk, only on the final component of access(2) and chdir(2)... Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-10-01 16:41:23 +02:00
Miklos Szeredi	6314efee3c	fuse: readdirplus: fix RCU walk Doing dput(parent) is not valid in RCU walk mode. In RCU mode it would probably be okay to update the parent flags, but it's actually not necessary most of the time... So only set the FUSE_I_ADVISE_RDPLUS flag on the parent when the entry was recently initialized by READDIRPLUS. This is achieved by setting FUSE_I_INIT_RDPLUS on entries added by READDIRPLUS and only dropping out of RCU mode if this flag is set. FUSE_I_INIT_RDPLUS is cleared once the FUSE_I_ADVISE_RDPLUS flag is set in the parent. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-10-01 16:41:22 +02:00
Miklos Szeredi	3c70b8eeda	fuse: don't check_submounts_and_drop() in RCU walk If revalidate finds an invalid dentry in RCU walk mode, let the VFS deal with it instead of calling check_submounts_and_drop() which is not prepared for being called from RCU walk. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-10-01 16:41:22 +02:00
Linus Torvalds	f927318840	NFS client bugfixes for 3.12 - Stable fix for Oopses in the pNFS files layout driver - Fix a regression when doing a non-exclusive file create on NFSv4.x - NFSv4.1 security negotiation fixes when looking up the root filesystem - Fix a memory ordering issue in the pNFS files layout driver -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSSfNNAAoJEGcL54qWCgDybGYQAJGm4/vd7/rWZ49KIjGFGkFo sCt0UOK6Y6ALhUOIlIreXsQ+Iwn9aAoIIRgx8UwnB+hO6PGnSyFuJZZx1KE8V2kj 6JlE5FbsWV+3uFQzNJQsNcoj7NZMzIRZT7x+7QansBOdSQjgQc3ig2sAMWREZjn8 GxMOl8FNRrnP8gRom30ZScgMp1YDM8J1ql80S/nbxh2NOLBsvgg9VapzJhhqkMyl b7WKX4Qbg4AeSaxIAIrIwcZ7L2YS09JGC40VSybQARs0/7J8fjOZPs7CmrUCoB5F DmT5vfEC4+dqDf8PMyoFVfxK5ua5Sb/FGQmagYYa8bSgY7Uq03akYI++co+4PZU1 f3SN6CSvVffzGMdXAhUupOZQbkKvKFxR2MTGy8s7dxdkQudd4RioYPDmLfCHlbmb VY5kFh/Duqso1FCrcfvZoC88ElrWUz5yoVzZyECOEwCs1wjI6bjmGdSqCSbU75Lm Z0XOAn1cStwFvGwCbGZPUzlvueji3coDdCFPBXAOFHzisLYoo/Lxenw7l5D1qM5b 02iZllcIo340vw8wxHZxVebecFo33P90X1gjv0HQQkV/6EeNgq4D47SWTPxRq3Ai Dl9MFjTPl51oseDLrH6I/hBvcqjksB1M1+WjifT0bCIi3Y0HAea2U0wgweHS3vAd QHqIpIJxNHDjPBMDWEZW =ScfI -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: - Stable fix for Oopses in the pNFS files layout driver - Fix a regression when doing a non-exclusive file create on NFSv4.x - NFSv4.1 security negotiation fixes when looking up the root filesystem - Fix a memory ordering issue in the pNFS files layout driver * tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Give "flavor" an initial value to fix a compile warning NFSv4.1: try SECINFO_NO_NAME flavs until one works NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method	2013-09-30 17:10:26 -07:00
tinguely@sgi.com	519ccb81ac	xfs: fix memory leak in xlog_recover_add_to_trans Free the memory in error path of xlog_recover_add_to_trans(). Normally this memory is freed in recovery pass2, but is leaked in the error path. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-09-30 17:52:43 -05:00
Dave Chinner	367993e7c6	xfs: dirent dtype presence is dependent on directory magic numbers The determination of whether a directory entry contains a dtype field originally was dependent on the filesystem having CRCs enabled. This meant that the format for dtype beign enabled could be determined by checking the directory block magic number rather than doing a feature bit check. This was useful in that it meant that we didn't need to pass a struct xfs_mount around to functions that were already supplied with a directory block header. Unfortunately, the introduction of dtype fields into the v4 structure via a feature bit meant this "use the directory block magic number" method of discriminating the dirent entry sizes is broken. Hence we need to convert the places that use magic number checks to use feature bit checks so that they work correctly and not by chance. The current code works on v4 filesystems only because the dirent size roundup covers the extra byte needed by the dtype field in the places where this problem occurs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-09-30 17:49:28 -05:00
Dave Chinner	f112a04971	xfs: lockdep needs to know about 3 dquot-deep nesting Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); * DEADLOCK * May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2013-09-30 17:48:25 -05:00
Linus Torvalds	522d6d38f8	Merge branch 'akpm' (fixes from Andrew Morton) Merge misc fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) pidns: fix free_pid() to handle the first fork failure ipc,msg: prevent race with rmid in msgsnd,msgrcv ipc/sem.c: update sem_otime for all operations mm/hwpoison: fix the lack of one reference count against poisoned page mm/hwpoison: fix false report on 2nd attempt at page recovery mm/hwpoison: fix test for a transparent huge page mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood block: change config option name for cmdline partition parsing mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration mm: avoid reinserting isolated balloon pages into LRU lists arch/parisc/mm/fault.c: fix uninitialized variable usage include/asm-generic/vtime.h: avoid zero-length file nilfs2: fix issue with race condition of competition between segments for dirty blocks Documentation/kernel-parameters.txt: replace kernelcore with Movable mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored kernel/kmod.c: check for NULL in call_usermodehelper_exec() ipc/sem.c: synchronize the proc interface ipc/sem.c: optimize sem_lock() ipc/sem.c: fix race in sem_lock() mm/compaction.c: periodically schedule when freeing pages ...	2013-09-30 14:32:32 -07:00
Vyacheslav Dubeyko	7f42ec3941	nilfs2: fix issue with race condition of competition between segments for dirty blocks Many NILFS2 users were reported about strange file system corruption (for example): NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768 NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540) But such error messages are consequence of file system's issue that takes place more earlier. Fortunately, Jerome Poulin <jeromepoulin@gmail.com> and Anton Eliasson <devel@antoneliasson.se> were reported about another issue not so recently. These reports describe the issue with segctor thread's crash: BUG: unable to handle kernel paging request at 0000000000004c83 IP: nilfs_end_page_io+0x12/0xd0 [nilfs2] Call Trace: nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2] nilfs_segctor_construct+0x17b/0x290 [nilfs2] nilfs_segctor_thread+0x122/0x3b0 [nilfs2] kthread+0xc0/0xd0 ret_from_fork+0x7c/0xb0 These two issues have one reason. This reason can raise third issue too. Third issue results in hanging of segctor thread with eating of 100% CPU. REPRODUCING PATH: One of the possible way or the issue reproducing was described by Jermoe me Poulin <jeromepoulin@gmail.com>: 1. init S to get to single user mode. 2. sysrq+E to make sure only my shell is running 3. start network-manager to get my wifi connection up 4. login as root and launch "screen" 5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies. 6. lscp \| xz -9e > lscp.txt.xz 7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs 8. start a screen to dump /proc/kmsg to text file since rsyslog is killed 9. start a screen and launch strace -f -o find-cat.log -t find /mnt/nilfs -type f -exec cat {} > /dev/null \; 10. start a screen and launch strace -f -o apt-get.log -t apt-get update 11. launch the last command again as it did not crash the first time 12. apt-get crashes 13. ps aux > ps-aux-crashed.log 13. sysrq+W 14. sysrq+E wait for everything to terminate 15. sysrq+SUSB Simplified way of the issue reproducing is starting kernel compilation task and "apt-get update" in parallel. REPRODUCIBILITY: The issue is reproduced not stable [60% - 80%]. It is very important to have proper environment for the issue reproducing. The critical conditions for successful reproducing: (1) It should have big modified file by mmap() way. (2) This file should have the count of dirty blocks are greater that several segments in size (for example, two or three) from time to time during processing. (3) It should be intensive background activity of files modification in another thread. INVESTIGATION: First of all, it is possible to see that the reason of crash is not valid page address: NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783 Moreover, value of b_page (0x1a82) is 6786. This value looks like segment number. And b_blocknr with b_size values look like block numbers. So, buffer_head's pointer points on not proper address value. Detailed investigation of the issue is discovered such picture: [-----------------------------SEGMENT 6783-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783 [-----------------------------SEGMENT 6784-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8 NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784 NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0 [----------] ditto NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15 [-----------------------------SEGMENT 6785-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88 NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785 NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0 [----------] ditto NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12 NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783 NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784 NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785 NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 BUG: unable to handle kernel paging request at 0000000000001a82 IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2] Usually, for every segment we collect dirty files in list. Then, dirty blocks are gathered for every dirty file, prepared for write and submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes place complete write phase after calling nilfs_end_bio_write() on the block layer. Buffers/pages are marked as not dirty on final phase and processed files removed from the list of dirty files. It is possible to see that we had three prepare_write and submit_bio phases before segbuf_wait and complete_write phase. Moreover, segments compete between each other for dirty blocks because on every iteration of segments processing dirty buffer_heads are added in several lists of payload_buffers: [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 The next pointer is the same but prev pointer has changed. It means that buffer_head has next pointer from one list but prev pointer from another. Such modification can be made several times. And, finally, it can be resulted in various issues: (1) segctor hanging, (2) segctor crashing, (3) file system metadata corruption. FIX: This patch adds: (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write() for every proccessed dirty block; (2) checking of BH_Async_Write flag in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_dirty_node_buffers(); (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(), nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page(). Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Reported-by: Anton Eliasson <devel@antoneliasson.se> Cc: Paul Fertser <fercerpav@gmail.com> Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp> Cc: Piotr Szymaniak <szarpaj@grubelek.pl> Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net> Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com> Cc: Elmer Zhang <freeboy6716@gmail.com> Cc: Kenneth Langga <klangga@gmail.com> Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-30 14:31:02 -07:00
Dan Aloni	7202365696	fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing A high setting of max_map_count, and a process core-dumping with a large enough vm_map_count could result in an NT_FILE note not being written, and the kernel crashing immediately later because it has assumed otherwise. Reproduction of the oops-causing bug described here: https://lkml.org/lkml/2013/8/30/50 Rge ussue originated in commit `2aa362c49c` ("coredump: extend core dump note section to contain file names of mapped file") from Oct 4, 2012. This patch make that section optional in that case. fill_files_note() should signify the error, and also let the info struct in elf_core_dump() be zero-initialized so that we can check for the optionally written note. [akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables] [akpm@linux-foundation.org: fix sparse warning] Signed-off-by: Dan Aloni <alonid@stratoscale.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <vda.linux@googlemail.com> Reported-by: Martin MOKREJS <mmokrejs@gmail.com> Tested-by: Martin MOKREJS <mmokrejs@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-30 14:31:01 -07:00
Al Viro	13f3583892	afs: dget_parent() can't return a negative dentry Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-29 22:02:24 -04:00
Al Viro	7b9a2378b4	ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-29 22:02:20 -04:00
Lubomir Rintel	4947555584	sysv: Add forgotten superblock lock init for v7 fs Superblock lock was replaced with (un)lock_super() removal, but left uninitialized for Seventh Edition UNIX filesystem in the following commit (3.7): `c07cb01` sysv: drop lock/unlock super Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-29 22:02:02 -04:00
Greg Kroah-Hartman	88502b9c0a	Merge 3.12-rc3 into driver-core-next We want the driver core and sysfs fixes in here to make merges and development easier. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-09-29 18:29:23 -07:00
Anna Schumaker	367156d9a8	NFS: Give "flavor" an initial value to fix a compile warning The previous patch introduces a compile warning by not assigning an initial value to the "flavor" variable. This could only be a problem if the server returns a supported secflavor list of length zero, but it's better to fix this before it's ever hit. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Acked-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-29 16:03:34 -04:00
Weston Andros Adamson	58a8cf1212	NFSv4.1: try SECINFO_NO_NAME flavs until one works Call nfs4_lookup_root_sec for each flavor returned by SECINFO_NO_NAME until one works. One example of a situation this fixes: - server configured for krb5 - server principal somehow gets deleted from KDC - server still thinking krb is good, sends krb5 as first entry in SECINFO_NO_NAME response - client tries krb5, but this fails without even sending an RPC because gssd's requests to the KDC can't find the server's principal Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-09-29 16:03:34 -04:00

... 3 4 5 6 7 ...

33974 Commits (9491846fca57e9326b6673716c386b76fc13ebca)