1
0
Fork 0
Commit Graph

415 Commits (192a3697600382c5606fc1b2c946e737c5450f88)

Author SHA1 Message Date
Trond Myklebust 2e5b29f044 pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGET
Fix a bug in which flexfiles clients are falling back to I/O through the
MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set.

The flexfiles client will always report errors through the LAYOUTRETURN
and/or LAYOUTERROR mechanisms, so it should normally be safe for it
to retry the LAYOUTGET until it fails or succeeds.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:40 -05:00
Peng Tao 0bcbf039f6 nfs: handle request add failure properly
When we fail to queue a read page to IO descriptor,
we need to clean it up otherwise it is hanging around
preventing nfs module from being removed.

When we fail to queue a write page to IO descriptor,
we need to clean it up and also save the failure status
to open context. Then at file close, we can try to write
pages back again and drop the page if it fails to writeback
in .launder_page, which will be done in the next patch.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:37 -05:00
Peng Tao 2bff228857 nfs: centralize pgio error cleanup
In case we fail during setting things up for read/write IO, set
pg_error in IO descriptor and do the cleanup in nfs_pageio_add_request,
where we clean up all pages that are still hanging around on the IO
descriptor.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:37 -05:00
Peng Tao d600ad1f2b NFS41: pop some layoutget errors to application
For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we
should just bail out instead of hiding the error and
retrying inband IO.

Change all the call sites to pop the error all the way up.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:36 -05:00
Trond Myklebust f4848303ce pNFS: Modify pnfs_update_layout tracepoints to use layout stateid
Instead of displaying a layout segment pointer in these tracepoints,
let's use the layout stateid, now that Olga gave us a set of tools for
displaying them.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 09:57:14 -05:00
Jeff Layton 9a4bf31d05 nfs: add new tracepoint for pnfs_update_layout
pnfs_update_layout is really the "nexus" of layout handling. If it
returns NULL then we end up going through the MDS. This patch adds
some tracepoints to that function that allow us to determine the
cause when we end up going through the MDS unexpectedly.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 09:57:14 -05:00
Peter Zijlstra dfd01f0260 sched/wait: Fix the signal handling fix
Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/

His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().

We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed.  We must
instead pass the initial state along and use that.

Fixes: 68985633bc ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-13 14:30:59 -08:00
Jeff Layton 4f2e9dce0c nfs4: resend LAYOUTGET when there is a race that changes the seqid
pnfs_layout_process will check the returned layout stateid against what
the kernel has in-core. If it turns out that the stateid we received is
older, then we should resend the LAYOUTGET instead of falling back to
MDS I/O.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-25 15:32:13 -05:00
Kinglong Mee f8417b481c NFSv4.1/pnfs: Retry through MDS when getting bad length of data
If non rpc-based layout driver return bad length of data, nfs retries
by calling rpc_restart_call_prepare() that cause an NULL reference panic.

This patch lets nfs retry through MDS for non rpc-based layout driver
return bad length of data.

[13034.883329] BUG: unable to handle kernel NULL pointer dereference at           (null)
[13034.884902] IP: [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
[13034.886558] PGD 0
[13034.888126] Oops: 0000 [#1] KASAN
[13034.889710] Modules linked in: blocklayoutdriver(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c coretemp btrfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon auth_rpcgss shpchp nfs_acl lockd vmw_vmci parport_pc xor raid6_pq grace parport sunrpc i2c_piix4 vmwgfx drm_kms_helper ttm drm mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
[13034.898260] CPU: 0 PID: 10112 Comm: kworker/0:1 Tainted: G           OE   4.3.0-rc5+ #279
[13034.899932] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[13034.903342] Workqueue: events bl_read_cleanup [blocklayoutdriver]
[13034.905059] task: ffff88006a9148c0 ti: ffff880035e90000 task.ti: ffff880035e90000
[13034.906827] RIP: 0010:[<ffffffffa00db372>]  [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
[13034.910522] RSP: 0018:ffff880035e97b58  EFLAGS: 00010282
[13034.912378] RAX: fffffbfff04a5a94 RBX: ffff880068fe4858 RCX: 0000000000000003
[13034.914339] RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 0000000000000282
[13034.916236] RBP: ffff880035e97b68 R08: 0000000000000001 R09: 0000000000000001
[13034.918229] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[13034.920007] R13: ffff880068fe4858 R14: ffff880068fe4a60 R15: 0000000000001000
[13034.921845] FS:  0000000000000000(0000) GS:ffffffff82247000(0000) knlGS:0000000000000000
[13034.923645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13034.925525] CR2: 0000000000000000 CR3: 00000000063dd000 CR4: 00000000001406f0
[13034.932808] Stack:
[13034.934813]  ffff880068fe4780 0000000000001000 ffff880035e97ba8 ffffffffa08800d2
[13034.936675]  ffffffffa088029d ffff880068fe4780 ffff880068fe4858 ffffffffa089c0a0
[13034.938593]  ffff880068fe47e0 ffff88005d59faf0 ffff880035e97be0 ffffffffa087e08f
[13034.940454] Call Trace:
[13034.942388]  [<ffffffffa08800d2>] nfs_readpage_result+0x112/0x200 [nfs]
[13034.944317]  [<ffffffffa088029d>] ? nfs_readpage_done+0xdd/0x160 [nfs]
[13034.946267]  [<ffffffffa087e08f>] nfs_pgio_result+0x9f/0x120 [nfs]
[13034.948166]  [<ffffffffa09266cc>] pnfs_ld_read_done+0x7c/0x1e0 [nfsv4]
[13034.950247]  [<ffffffffa03b07ee>] bl_read_cleanup+0x2e/0x60 [blocklayoutdriver]
[13034.952156]  [<ffffffff810ebf62>] process_one_work+0x412/0x870
[13034.954102]  [<ffffffff810ebe84>] ? process_one_work+0x334/0x870
[13034.955949]  [<ffffffff810ebb50>] ? queue_delayed_work_on+0x40/0x40
[13034.957985]  [<ffffffff810ec441>] worker_thread+0x81/0x6a0
[13034.959817]  [<ffffffff810ec3c0>] ? process_one_work+0x870/0x870
[13034.961785]  [<ffffffff810f43bd>] kthread+0x17d/0x1a0
[13034.963544]  [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
[13034.965479]  [<ffffffff81100428>] ? finish_task_switch+0x88/0x220
[13034.967223]  [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
[13034.968929]  [<ffffffff81b6ae5f>] ret_from_fork+0x3f/0x70
[13034.970534]  [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
[13034.972176] Code: c7 43 50 40 84 0d a0 e8 3d fe 1c e1 48 8d 7b 58 c7 83 e4 00 00 00 00 00 00 00 e8 ca fe 1c e1 4c 8b 63 58 4c 89 e7 e8 be fe 1c e1 <49> 83 3c 24 00 74 12 48 c7 43 50 f0 a2 0e a0 b8 01 00 00 00 5b
[13034.977148] RIP  [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
[13034.978780]  RSP <ffff880035e97b58>
[13034.980399] CR2: 0000000000000000

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-21 15:55:47 -05:00
Peng Tao 500d701f33 NFS41: make close wait for layoutreturn
If we send a layoutreturn asynchronously before close, the close
might reach server first and layoutreturn would fail with BADSTATEID
because there is nothing keeping the layout stateid alive.

Also do not pretend sending layoutreturn if we are not.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-23 08:55:32 -04:00
Trond Myklebust 2d89a1d3c9 NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file
If we have a read layout, then sanity check the minimal layout length
so that it does not extend beyond the end of file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31 02:05:47 -07:00
Trond Myklebust 4ae93560b1 NFSv4.1/pnfs: Don't ask for a read layout for an empty file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31 01:33:12 -07:00
Trond Myklebust 0bdb8fa6ec NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return must notify of layout return
It's not sufficient to just mark the layout segment for layout return. We
also need to set the NFS_LAYOUT_RETURN_BEFORE_CLOSE flag in the layout header.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27 19:17:33 -04:00
Trond Myklebust 03772d2f00 NFSv4.1/pnfs: Allow pNFS device drivers to customise layout segment insertion
This is needed in order to allow merging of contiguous layout segments,
and also to correct the ordering of layouts for those device drivers that
don't necessarily want to place the read-write layouts first.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 19:42:43 -04:00
Trond Myklebust 540d9864e1 NFSv4.1/pnfs: Add sanity check for the layout range returned by the server
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 14:40:10 -04:00
Trond Myklebust bbf58bf348 NFSv4.2/pnfs: Make the layoutstats timer configurable
Allow advanced users to set the layoutstats timer in order to lengthen
or shorten the period between layoutstat transmissions to the server.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 14:40:08 -04:00
Peng Tao 3976143b06 NFS41: remove NFS_LAYOUT_ROC flag
If we return delegation before closing, we fail to do roc check
during close because NFS_LAYOUT_ROC is cleared by delegreturn
and it causes layouts to be still hanging around after delegreturn
+ close, which is a voilation against protocol.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 14:40:06 -04:00
Trond Myklebust 6a463beb9a NFSv4.1/pnfs: Add a tracepoint for return-on-close events
Allow tracing of return-on-close.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25 14:40:05 -04:00
Trond Myklebust c740624989 pNFS: Fix an unused variable warning in pnfs_roc_get_barrier
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-19 23:01:53 -05:00
Peng Tao e755d638e9 NFS41: make sure sending LAYOUTRETURN before close if marked so
If layout is marked by NFS_LAYOUT_RETURN_BEFORE_CLOSE, we should always
send LAYOUTRETURN before close, and we don't need to do ROC drain if we
do send LAYOUTRETURN.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-19 10:29:25 -05:00
Trond Myklebust 4ff376feaf NFSv4.1/pnfs: Fix a close/delegreturn hang when return-on-close is set
The helper pnfs_roc() has already verified that we have no delegations,
and no further open files, hence no outstanding I/O and it has marked
all the return-on-close lsegs as being invalid.
Furthermore, it sets the NFS_LAYOUT_RETURN bit, thus serialising the
close/delegreturn with all future layoutget calls on this inode.

The checks in pnfs_roc_drain() for valid layout segments are therefore
redundant: those cannot exist until another layoutget completes.
The other check for whether or not NFS_LAYOUT_RETURN is set, actually
causes a hang, since we already know that we hold that flag.

To fix, we therefore strip out all the functionality in pnfs_roc_drain()
except the retrieval of the barrier state, and then rename the function
accordingly.

Reported-by: Christoph Hellwig <hch@infradead.org>
Fixes: 5c4a79fb2b ("Don't prevent layoutgets when doing return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-18 23:23:21 -05:00
Trond Myklebust 58830550f0 NFSv4.1/pnfs: Remove redundant wakeup in pnfs_send_layoutreturn()
pnfs_clear_layoutreturn_waitbit() should already be calling
rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq) for us.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust e1c06f80dc NFSv4.1/pnfs: Remove redundant check in pnfs_layoutgets_blocked()
layoutget now should already be serialised w.r.t. layout returns

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust 2d8ae84fbc NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets in layoutreturn
The NFS_LAYOUT_RETURN bit already suffices to ensure that layoutget
is blocked.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust 5c4a79fb2b NFSv4.1/pnfs: Don't prevent layoutgets when doing return-on-close
If there is an outstanding return-on-close, then we just want new
layoutget requests to wait rather than fail.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust 8f70f53a87 NFSv4.1/pnfs: Fix serialisation of layout return and layoutget
We should always test for outstanding layout returns, whether or not
pnfs_should_retry_layoutget() is true.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust a4497a58e4 NFSv4.1/pnfs: Remove redundant checks in pnfs_layoutgets_blocked()
If there are no valid layout segments, then we should already have
checked in pnfs_update_layout() whether or not this is the first
layoutget.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:18 -04:00
Trond Myklebust c8ad8894e9 NFSv4.2/pnfs: Use GFP_NOIO for layoutstat reporting in the writeback path
Prevent a potential deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:27:23 -04:00
Jeff Layton 3471648a75 nfs: plug memory leak when ->prepare_layoutcommit fails
"data" is currently leaked when the prepare_layoutcommit operation
returns an error. Put the cred before taking the spinlock in that
case, take the lock and then goto out_unlock which will drop the
lock and then free "data".

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-28 09:07:02 -04:00
Trond Myklebust faa4a54f0b pNFS: Don't throw out valid layout segments
It is OK for layout segments to remain hashed even if no-one holds any
references to them, provided that the segments are still valid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-11 16:16:17 +02:00
Trond Myklebust bdc59cf233 pNFS: pnfs_roc_drain() fix a race with open
If a process reopens the file before we can send off the CLOSE/DELEGRETURN,
then pnfs_roc_drain() may end up waiting for a new set of layout segments
that are marked as return-on-close, but haven't yet been returned.

Fix this by only waiting for those layout segments that were invalidated in
pnfs_roc().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-11 16:16:17 +02:00
Trond Myklebust 7f27392cd4 pNFS: Fix races between return-on-close and layoutreturn.
If one or more of the layout segments reports an error during I/O, then
we may have to send a layoutreturn to report the error back to the NFS
metadata server.
This patch ensures that the return-on-close code can detect the
outstanding layoutreturn, and not preempt it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-11 16:16:16 +02:00
Trond Myklebust df9cecc1a3 pNFS: pnfs_roc_drain should return 'true' when sleeping
Also clean up the case where we don't find a return-on-close layout segment.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-11 16:16:16 +02:00
Trond Myklebust 6c5a0d8915 NFSv4.2: LAYOUTSTATS is optional to implement
Make it so, by checking the return value for NFS4ERR_MOTSUPP and
caching the information as a server capability.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-27 11:48:58 -04:00
Peng Tao 865a7ecb21 nfs: provide pnfs_report_layoutstat when NFS42 is disabled
kbuild test robot reported:
   fs/built-in.o: In function `pnfs_report_layoutstat':
>> (.text+0x151a1c): undefined reference to `nfs42_proc_layoutstats_generic'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-26 14:01:37 -04:00
Peng Tao 1bfe3b259f nfs42: serialize LAYOUTSTATS calls of the same file
There is no need to report concurrently.

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:53:11 -04:00
Peng Tao 8733408d6e pnfs: add pnfs_report_layoutstat helper function
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-24 10:17:37 -04:00
Trond Myklebust c70701131f NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes
If a write attempt fails, and the write is queued up for resending to
the server, as opposed to being dropped, then we need to set the
appropriate flag so that nfs_file_fsync() does the right thing.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-17 20:00:42 -04:00
Trond Myklebust 1ca018d28d pNFS: Fix a memory leak when attempted pnfs fails
pnfs_do_write() expects the call to pnfs_write_through_mds() to free the
pgio header and to release the layout segment before exiting. The problem
is that nfs_pgio_data_destroy() doesn't actually do this; it only frees
the memory allocated by nfs_generic_pgio().

Ditto for pnfs_do_read()...

Fix in both cases is to add a call to hdr->release(hdr).

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-17 20:00:26 -04:00
Trond Myklebust 21330b6670 Merge branch 'bugfixes'
* bugfixes:
  NFSv4: Return delegations synchronously in evict_inode
  SUNRPC: Fix a regression when reconnecting
  NFS: remount with security change should return EINVAL
  nfs: do not export discarded symbols
  NFSv4.1: don't export static symbol
2015-04-23 15:16:27 -04:00
Trond Myklebust 5bb89b4702 NFSv4.1/pnfs: Separate out metadata and data consistency for pNFS
The LAYOUTCOMMIT operation means different things to different layout types.
For blocks and objects, it is both a data and metadata consistency operation.
For files and flexfiles, it is only a metadata consistency operation.

This patch separates out the 2 cases, allowing the files/flexfiles layout
drivers to optimise away the data consistency calls to layoutcommit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-27 12:39:38 -04:00
Trond Myklebust 7140171ea9 NFSv4.1/pnfs: Ensure we send layoutcommit before return-on-close
We must not send a close or delegreturn that would result in a
return-on-close of the layout without ensuring that we've also
sent the necessary layoutcommit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-27 12:39:38 -04:00
Trond Myklebust 67af7611ec NFSv4.1/pnfs: Refactor pnfs_set_layoutcommit()
pnfs_set_layoutcommit() and pnfs_commit_set_layoutcommit() are 100% identical
except for the function arguments. Refactor to eliminate the difference.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-27 12:39:36 -04:00
Trond Myklebust 29559b11ae NFSv4.1/pnfs: Fix setting of layoutcommit last write byte
If the NFS_INO_LAYOUTCOMMIT flag was unset, then we _must_ ensure that
we also reset the last write byte (lwb) for that layout. The current
code depends on us clearing the lwb when we clear NFS_INO_LAYOUTCOMMIT,
which is not the case when we call pnfs_clear_layoutcommit().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-27 12:39:35 -04:00
Julia Lawall 5b833825fd NFSv4.1: don't export static symbol
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
type T;
identifier f;
@@

static T f (...) { ... }

@@
identifier r.f;
declarer name EXPORT_SYMBOL_GPL;
@@

-EXPORT_SYMBOL_GPL(f);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-12 11:53:11 -04:00
Dan Carpenter 4c21462acc pnfs: delete an unintended goto
There was an extra goto here where it shouldn't be, because of a merge
error.

Fixes: e2c63e091e ('Merge branch 'flexfiles'')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-10 08:41:23 -05:00
Trond Myklebust 4ef2e4f84c NFSv4.1: Fix pnfs_put_lseg races
pnfs_layoutreturn_free_lseg_async() can also race with inode put in
the general case. We can now fix this, and also simplify the code.

Cc: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-05 23:44:18 -05:00
Trond Myklebust e4af440aaf NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
In we want to be able to call pnfs_send_layoutreturn() from within the
writeback path, we really want it to use GFP_NOFS in order to prevent
recursion.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-05 22:16:50 -05:00
Trond Myklebust e2c63e091e Merge branch 'flexfiles'
* flexfiles: (53 commits)
  pnfs: lookup new lseg at lseg boundary
  nfs41: .init_read and .init_write can be called with valid pg_lseg
  pnfs: Update documentation on the Layout Drivers
  pnfs/flexfiles: Add the FlexFile Layout Driver
  nfs: count DIO good bytes correctly with mirroring
  nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
  nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
  nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
  nfs/flexfiles: send layoutreturn before freeing lseg
  nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
  nfs41: allow async version layoutreturn
  nfs41: add range to layoutreturn args
  pnfs: allow LD to ask to resend read through pnfs
  nfs: add nfs_pgio_current_mirror helper
  nfs: only reset desc->pg_mirror_idx when mirroring is supported
  nfs41: add a debug warning if we destroy an unempty layout
  pnfs: fail comparison when bucket verifier not set
  nfs: mirroring support for direct io
  nfs: add mirroring support to pgio layer
  pnfs: pass ds_commit_idx through the commit path
  ...

Conflicts:
	fs/nfs/pnfs.c
	fs/nfs/pnfs.h
2015-02-03 16:01:27 -05:00
Weston Andros Adamson 7c13789e3e pnfs: lookup new lseg at lseg boundary
Before mirroring support was added, the pageio descriptor's pg_lseg was
set to null when an RPC was sent. Because of this, pg_init was called
at lseg boundaries with pg_lseg = NULL, and it could be set to the new
lseg.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:54 -08:00
Peng Tao cb5d04bc39 nfs41: .init_read and .init_write can be called with valid pg_lseg
With pgio refactoring in v3.15, .init_read and .init_write can be
called with valid pgio->pg_lseg. file layout was fixed at that time
by commit c6194271f (pnfs: filelayout: support non page aligned
layouts). But the generic helper still needs to be fixed.

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:53 -08:00
Tom Haynes d67ae825a5 pnfs/flexfiles: Add the FlexFile Layout Driver
The flexfile layout is a new layout that extends the
file layout. It is currently being drafted as a specification at
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Tao Peng <bergwolf@primarydata.com>
2015-02-03 11:06:52 -08:00
Peng Tao aa8a45ee97 nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
Also take care to stop waiting if someone clears retry bit.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:51 -08:00
Peng Tao c829013dca nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
Use it to indicate that LD wants to retry layoutget. LD can set
it whenever it wants the common pnfs code to return and retry
pnfs path through a new layout.

The bit gets cleared when client does a new layoutget, when client
closes the file (ROC case), or when kernel needs to evict the inode
(non-ROC case).

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:50 -08:00
Peng Tao 27b6f53987 nfs/flexfiles: send layoutreturn before freeing lseg
Otherwise we'll lose error tracking information when
encoding layoutreturn.

pnfs_put_lseg may be called from rpc callbacks. So we should not
call pnfs_send_layoutreturn directly because it can deadlock in
the rpc layer.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:50 -08:00
Peng Tao 193e3aa2cc nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
When it is set, generic pnfs would try to send layoutreturn right
before last close/delegation_return regard less NFS_LAYOUT_ROC is
set or not. LD can then make sure layoutreturn is always sent
rather than being omitted.

The difference against NFS_LAYOUT_RETURN is that
NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so
LD can set it and expect generic layer to try pnfs path at the
same time.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:50 -08:00
Peng Tao 6c16605d6e nfs41: allow async version layoutreturn
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:49 -08:00
Peng Tao 15eb67c153 nfs41: add range to layoutreturn args
So that callers can specify which range to return.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:49 -08:00
Peng Tao ceb11e13df pnfs: allow LD to ask to resend read through pnfs
If current IO cannot be completed due to some transient errors,
LD may want to ask generic layer to resend the request through
pnfs again.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:48 -08:00
Peng Tao 48d635f14a nfs: add nfs_pgio_current_mirror helper
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:48 -08:00
Peng Tao 566f873763 nfs41: add a debug warning if we destroy an unempty layout
So that we can detect the case if some layout segments are still
pinned which is surely a bug that we need to fix.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:47 -08:00
Weston Andros Adamson a7d42ddb30 nfs: add mirroring support to pgio layer
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.

The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:45 -08:00
Weston Andros Adamson 180bb5ec06 pnfs: release lseg in pnfs_generic_pg_cleanup
This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:44 -08:00
Peng Tao e736a5b98c nfs41: clear NFS_LAYOUT_RETURN if layoutreturn is sent or failed to send
So that pnfs path is not disabled for ever.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:42 -08:00
Peng Tao aa1e0e3a8e nfs41: send layoutreturn in last put_lseg
If current lseg is the last lseg marked with NFS_LSEG_LAYOUTRETURN,
send layoutreturn.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:42 -08:00
Peng Tao ce6ab4f238 nfs41: don't use a layout if it is marked for returning
And if we are to return the same type of layouts, don't bother
sending more layoutgets.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:41 -08:00
Peng Tao 016256df3a nfs41: add a helper to mark layout for return
It marks all matching layout segments as NFS_LSEG_LAYOUTRETURN,
which is an indicator for pnfs_put_lseg() to send layoutreturn,
and also prevents pnfs_update_layout() from using the returning
segments. Once it is set, it never gets cleared.

It also sets proper io failure bit so that pnfs path can be retried
after PNFS_LAYOUTGET_RETRY_TIMEOUT second.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:41 -08:00
Peng Tao f40eb5d044 nfs41: make a helper function to send layoutreturn
It allows to specify different iomode to return.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:41 -08:00
Peng Tao 4579d6b897 nfs41: pass iomode through layoutreturn args
So that it is possible to return a specific iomode layouts.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:40 -08:00
Peng Tao 9bf87482dd nfs41: serialize first layoutget of a file
Per RFC 5661 Errata 3208:
| A client MAY always forget its layout state and associated
| layout stateid at any time (See also section 12.5.5.1).
| In such case, the client MUST use a non-layout stateid for the next
| LAYOUTGET operation. This will signal the server that the client has
| no more layouts on the file and its respective layout state can be
| released before issuing a new layout in response to LAYOUTGET.

In order to make such a signal unique to server, client needs to serialize
all layoutgets using non-layout stateid. We implement this by serializing
layoutgets when client has no layout segments at hand.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:39 -08:00
Peng Tao abb9a0079c nfs41: close a small race window when adding new layout to global list
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:39 -08:00
Peng Tao 72cff4494e nfs/flexclient: export pnfs_layoutcommit_inode
flexfiles needs to start layoutcommit when necessary

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:38 -08:00
Trond Myklebust 40dd4b7aee NFSv4.1: Optimise layout return-on-close
Optimise the layout return on close code by ensuring that

1) Add a check for whether we hold a layout before taking any spinlocks
2) Only take the spin lock once
3) Use nfs_state->state to speed up open file checks

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:48 -05:00
Trond Myklebust 6543f80367 NFSv4.1/pnfs: replace broken pnfs_put_lseg_async
You cannot call pnfs_put_lseg_async() more than once per lseg, so it
is really an inappropriate way to deal with a refcount issue.

Instead, replace it with a function that decrements the refcount, and
puts the final 'free' operation (which is incompatible with locks) on
the workqueue.

Cc: Weston Andros Adamson <dros@primarydata.com>
Fixes: e6cf82d1830f: pnfs: add pnfs_put_lseg_async
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-10-08 16:45:43 -04:00
Christoph Hellwig c88953d87f pnfs: add return_range method
If a layout driver keeps per-inode state outside of the layout segments it
needs to be notified of any layout returns or recalls on an inode, and not
just about the freeing of layout segments.  Add a method to acomplish this,
which will allow the block layout driver to handle the case of truncated
and re-expanded files properly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:03 -07:00
Christoph Hellwig 7c5d187581 pnfs: force a layout commit when encountering busy segments during recall
Expedite layout recall processing by forcing a layout commit when
we see busy segments.  Without it the layout recall might have to wait
until the VM decided to start writeback for the file, which can introduce
long delays.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:02 -07:00
Christoph Hellwig 5f919c9f10 pnfs: allow splicing pre-encoded pages into the layoutcommit args
Currently there is no XDR buffer space allocated for the per-layout driver
layoutcommit payload, which leads to server buffer overflows in the
blocklayout driver even under simple workloads.  As we can't do per-layout
sizes for XDR operations we'll have to splice a previously encoded list
of pages into the XDR stream, similar to how we handle ACL buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:01 -07:00
Christoph Hellwig 47abadefad pnfs: avoid using stale stateids after layoutreturn
After we issued a layoutreturn operations the may free the layout stateid
and will thus cause bad stateid error when the client uses it again.

We currently try to avoid this case by chosing the open stateid if not
lsegs are present for this inode.  But various places can hold refererence
on lsegs and thus cause the list not to be empty shortly after a layout
return.  Add an explicit flag to mark the current layout stateid invalid
and force usage of the openstateid after we did a full file layoutreturn.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:01 -07:00
Christoph Hellwig 362f74745c pnfs: don't check sequence on new stateids in layoutget
When layoutget returns an entirely new layout stateid it should not
check the generation counter as the new stateid will start with a new
counter entirely unrelated to old one.

The current behavior causes constant layoutget failures against a block
server which allocates a new stateid after an recall that removed all
outstanding layouts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:01 -07:00
Christoph Hellwig 1013df6115 pnfs: do not pass uninitialized lsegs to ->free_lseg
Ensure the lsegs are initialized early so that we don't pass an unitialized
one back to ->free_lseg during error processing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:01 -07:00
Peng Tao 378520b837 nfs41: add a helper function to set layoutcommit after commit
Track lwb in nfs_commit_data so that we can use it to setup
layoutcommit in commit_done callback.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:00 -07:00
Linus Torvalds 06b8ab5528 NFS client updates for Linux 3.17
Highlights include:
 
 - Stable fix for a bug in nfs3_list_one_acl()
 - Speed up NFS path walks by supporting LOOKUP_RCU
 - More read/write code cleanups
 - pNFS fixes for layout return on close
 - Fixes for the RCU handling in the rpcsec_gss code
 - More NFS/RDMA fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT65zoAAoJEGcL54qWCgDyvq8QAJ+OKuC5dpngrZ13i4ZJIcK1
 TJSkWCr44FhYPlrmkLCntsGX6C0376oFEtJ5uqloqK0+/QtvwRNVSQMKaJopKIVY
 mR4En0WwpigxVQdW2lgto6bfOhzMVO+llVdmicEVrU8eeSThATxGNv7rxRzWorvL
 RX3TwBkWSc0kLtPi66VRFQ1z+gg5I0kngyyhsKnLOaHHtpTYP2JDZlRPRkokXPUg
 nmNedmC3JrFFkarroFIfYr54Qit2GW/eI2zVhOwHGCb45j4b2wntZ6wr7LpUdv3A
 OGDBzw59cTpcx3Hij9CFvLYVV9IJJHBNd2MJqdQRtgWFfs+aTkZdk4uilUJCIzZh
 f4BujQAlm/4X1HbPxsSvkCRKga7mesGM7e0sBDPHC1vu0mSaY1cakcj2kQLTpbQ7
 gqa1cR3pZ+4shCq37cLwWU0w1yElYe1c4otjSCttPCrAjXbXJZSFzYnHm8DwKROR
 t+yEDRL5BIXPu1nEtSnD2+xTQ3vUIYXooZWEmqLKgRtBTtPmgSn9Vd8P1OQXmMNo
 VJyFXyjNx5WH06Wbc/jLzQ1/cyhuPmJWWyWMJlVROyv+FXk9DJUFBZuTkpMrIPcF
 NlBXLV1GnA7PzMD9Xt9bwqteERZl6fOUDJLWS9P74kTk5c2kD+m+GaqC/rBTKKXc
 ivr2s7aIDV48jhnwBSVL
 =KE07
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - stable fix for a bug in nfs3_list_one_acl()
   - speed up NFS path walks by supporting LOOKUP_RCU
   - more read/write code cleanups
   - pNFS fixes for layout return on close
   - fixes for the RCU handling in the rpcsec_gss code
   - more NFS/RDMA fixes"

* tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
  nfs: reject changes to resvport and sharecache during remount
  NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
  SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
  NFS: fix two problems in lookup_revalidate in RCU-walk
  NFS: allow lockless access to access_cache
  NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
  NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
  NFS: support RCU_WALK in nfs_permission()
  sunrpc/auth: allow lockless (rcu) lookup of credential cache.
  NFS: prepare for RCU-walk support but pushing tests later in code.
  NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
  NFS: add checks for returned value of try_module_get()
  nfs: clear_request_commit while holding i_lock
  pnfs: add pnfs_put_lseg_async
  pnfs: find swapped pages on pnfs commit lists too
  nfs: fix comment and add warn_on for PG_INODE_REF
  nfs: check wait_on_bit_lock err in page_group_lock
  sunrpc: remove "ec" argument from encrypt_v2 operation
  sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
  sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
  ...
2014-08-13 18:13:19 -06:00
Weston Andros Adamson e6cf82d183 pnfs: add pnfs_put_lseg_async
This is useful when lsegs need to be released while holding locks.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-03 17:05:25 -04:00
NeilBrown 743162013d sched: Remove proliferation of wait_on_bit() action functions
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 15:10:39 +02:00
Weston Andros Adamson 53113ad35e pnfs: clean up *_resend_to_mds
Clean up pnfs_read_done_resend_to_mds and pnfs_write_done_resend_to_mds:
 - instead of passing all arguments from a nfs_pgio_header, just pass the header
 - share the common code

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:47:01 -04:00
Weston Andros Adamson 4714fb51fd nfs: remove pgio_header refcount, related cleanup
The refcounting on nfs_pgio_header was related to there being (possibly)
more than one nfs_pgio_data. Now that nfs_pgio_data has been merged into
nfs_pgio_header, there is no reason to do this ref counting.  Just call
the completion callback on nfs_pgio_release/nfs_pgio_error.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:47:01 -04:00
Weston Andros Adamson d45f60c678 nfs: merge nfs_pgio_data into _header
struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is
passed around everywhere, because there used to be multiple _data structs
per _header. Many of these functions then use the _data to find a pointer
to the _header.  This patch cleans this up by merging the nfs_pgio_data
structure into nfs_pgio_header and passing nfs_pgio_header around instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:47:00 -04:00
Weston Andros Adamson 1e7f3a4859 nfs: move nfs_pgio_data and remove nfs_rw_header
nfs_rw_header was used to allocate an nfs_pgio_header along with an
nfs_pgio_data, because a _header would need at least one _data.

Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move
it to nfs_pgio_header and get rid of nfs_rw_header.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:46:59 -04:00
Linus Torvalds d1e1cda862 NFS client updates for Linux 3.16
Highlights include:
 
 - Massive cleanup of the NFS read/write code by Anna and Dros
 - Support multiple NFS read/write requests per page in order to deal with
   non-page aligned pNFS striping. Also cleans up the r/wsize < page size
   code nicely.
 - stable fix for ensuring inode is declared uptodate only after all the
   attributes have been checked.
 - stable fix for a kernel Oops when remounting
 - NFS over RDMA client fixes
 - move the pNFS files layout driver into its own subdirectory
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTl3pmAAoJEGcL54qWCgDyraIP/08ZbbDowVTP9572bxl+VR2i
 zNbrflBtl1R05D4Imi/IEySK0w6xj1CLsncNpXAT2bxTlyKPW70tpiiPlRKMPuO8
 JW+iPiepR2t0mol6MEd46yuV8btXVk8I+7IYjPXANiMJG8O5dJzNQ8NiCQOERBNt
 FQ7rzTCFO0ESGXnT6vYrT4I0bwqYVklBiJRTT4PQVzhhhDq9qUdq21BlQjQJFXP4
 9aBLurxKptlHBvE6A2Quja6ObEC0s31CxcijqHIJ+Ue4GbKcFbMG1tgjY7ESE/AD
 rqzDeF0jvWHT+frmvFEUUXWqzF1ReZ4x9pfDoOgeG6T9/K6DT91O0yMOgG8jvlbF
 8DSATNYGDX5sSjpvaG5JokGG+cGCk9srVDx+itn7HlwzalRwn0PjKtIYwOJ7TJIr
 o/j20nOsPrRGF0OqLf9phyocgRrlbMKOzj1IXldHHfAbNkRcISTK08lxvsz96Ddn
 zRyDmbsbY6QFXdB3AVSeQmg5R0OOLtzNIcsFPmNdvy5eiy67qU0lsGg8UGNnoz8k
 PHN1pcGejkctLhQ32ee3w/W6zkrgpJZcNC9JSoG8Dc3SeXus0c3IgumRknFCmiep
 ssN+1jEITAGeS5a2aBxwLQLVI2JAr2lxs5e+R4D5EsQlFkCl6Mrgtzh/aToWTuFl
 Qt7l2zI3r3VieKT9u7Bh
 =OyXR
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - massive cleanup of the NFS read/write code by Anna and Dros
   - support multiple NFS read/write requests per page in order to deal
     with non-page aligned pNFS striping.  Also cleans up the r/wsize <
     page size code nicely.
   - stable fix for ensuring inode is declared uptodate only after all
     the attributes have been checked.
   - stable fix for a kernel Oops when remounting
   - NFS over RDMA client fixes
   - move the pNFS files layout driver into its own subdirectory"

* tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
  NFS: populate ->net in mount data when remounting
  pnfs: fix lockup caused by pnfs_generic_pg_test
  NFSv4.1: Fix typo in dprintk
  NFSv4.1: Comment is now wrong and redundant to code
  NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
  xprtrdma: Disconnect on registration failure
  xprtrdma: Remove BUG_ON() call sites
  xprtrdma: Avoid deadlock when credit window is reset
  SUNRPC: Move congestion window constants to header file
  xprtrdma: Reset connection timeout after successful reconnect
  xprtrdma: Use macros for reconnection timeout constants
  xprtrdma: Allocate missing pagelist
  xprtrdma: Remove Tavor MTU setting
  xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
  xprtrdma: Reduce the number of hardway buffer allocations
  xprtrdma: Limit work done by completion handler
  xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
  xprtrmda: Reduce lock contention in completion handlers
  xprtrdma: Split the completion queue
  xprtrdma: Make rpcrdma_ep_destroy() return void
  ...
2014-06-10 15:02:42 -07:00
Weston Andros Adamson c5e20cb700 pnfs: fix lockup caused by pnfs_generic_pg_test
end_offset and req_offset both return u64 - avoid casting to u32
until it's needed, when it's less than the (u32) size returned by
nfs_generic_pg_test.

Also, fix the comments in pnfs_generic_pg_test.

Running the cthon04 special tests caused this lockup in the
"write/read at 2GB, 4GB edges" test when running against a file layout server:

BUG: soft lockup - CPU#0 stuck for 22s! [bigfile2:823]
Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw e1000 shpchp i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc btrfs xor zlib_deflate raid6_pq mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy autofs4
irq event stamp: 205958
hardirqs last  enabled at (205957): [<ffffffff814a62dc>] restore_args+0x0/0x30
hardirqs last disabled at (205958): [<ffffffff814ad96a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (205956): [<ffffffff8103ffb2>] __do_softirq+0x1ea/0x2ab
softirqs last disabled at (205951): [<ffffffff8104026d>] irq_exit+0x44/0x9a
CPU: 0 PID: 823 Comm: bigfile2 Not tainted 3.15.0-rc1-branch-pgio_plus+ #3
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff8800792ec480 ti: ffff880078c4e000 task.ti: ffff880078c4e000
RIP: 0010:[<ffffffffa02ce51f>]  [<ffffffffa02ce51f>] nfs_page_group_unlock+0x3e/0x4b [nfs]
RSP: 0018:ffff880078c4fab0  EFLAGS: 00000202
RAX: 0000000000000fff RBX: ffff88006bf83300 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006bf83300
RBP: ffff880078c4fab8 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffff8249840c R11: 0000000000000000 R12: 0000000000000035
R13: ffff88007ffc72d8 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f45f11b7740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3a8cb632d0 CR3: 000000007931c000 CR4: 00000000001407f0
Stack:
 ffff88006bf832c0 ffff880078c4fb00 ffffffffa02cec22 ffff880078c4fad8
 00000fff810f9d99 ffff880078c4fca0 ffff88006bf832c0 ffff88006bf832c0
 ffff880078c4fca0 ffff880078c4fd60 ffff880078c4fb28 ffffffffa02cee34
Call Trace:
 [<ffffffffa02cec22>] __nfs_pageio_add_request+0x298/0x34f [nfs]
 [<ffffffffa02cee34>] nfs_pageio_add_request+0x1f/0x42 [nfs]
 [<ffffffffa02d1722>] nfs_do_writepage+0x1b5/0x1e4 [nfs]
 [<ffffffffa02d1764>] nfs_writepages_callback+0x13/0x25 [nfs]
 [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs]
 [<ffffffff810eb32d>] write_cache_pages+0x254/0x37f
 [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs]
 [<ffffffff8149cf9e>] ? printk+0x54/0x56
 [<ffffffff810eacca>] ? __set_page_dirty_nobuffers+0x22/0xe9
 [<ffffffffa016d864>] ? put_rpccred+0x38/0x101 [sunrpc]
 [<ffffffffa02d1ae1>] nfs_writepages+0xb4/0xf8 [nfs]
 [<ffffffff810ec59c>] do_writepages+0x21/0x2f
 [<ffffffff810e36e8>] __filemap_fdatawrite_range+0x55/0x57
 [<ffffffff810e374a>] filemap_write_and_wait_range+0x2d/0x5b
 [<ffffffffa030ba0a>] nfs4_file_fsync+0x3a/0x98 [nfsv4]
 [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20
 [<ffffffff810e40c2>] generic_file_aio_write+0xa7/0xbd
 [<ffffffffa02c5c6b>] nfs_file_write+0xf0/0x170 [nfs]
 [<ffffffff81129215>] do_sync_write+0x59/0x78
 [<ffffffff8112956c>] vfs_write+0xab/0x107
 [<ffffffff81129c8b>] SyS_write+0x49/0x7f
 [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b

Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-10 11:07:56 -04:00
Weston Andros Adamson 19b54848fe pnfs: allow non page aligned pnfs layout segments
Remove alignment checks that would revert to MDS and change pg_test
to return the max ammount left in the segment (or other pg_test call)
up to size of passed request, or 0 if no space is left.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 11:11:49 -04:00
Weston Andros Adamson 7f714720fa nfs: remove data list from pgio header
Since the ability to split pages into subpage requests has been added,
nfs_pgio_header->rpc_list only ever has one pgio data.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 11:11:48 -04:00
Weston Andros Adamson 0f9c429eca nfs: chain calls to pg_test
Now that pg_test can change the size of the request (by returning a non-zero
size smaller than the request), pg_test functions that call other
pg_test functions must return the minimum of the result - or 0 if any fail.

Also clean up the logic of some pg_test functions so that all checks are
for contitions where coalescing is not possible.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 11:11:47 -04:00
Weston Andros Adamson b4fdac1a51 nfs: modify pg_test interface to return size_t
This is a step toward allowing pg_test to inform the the
coalescing code to reduce the size of requests so they may fit in
whatever scheme the pg_test callback wants to define.

For now, just return the size of the request if there is space, or 0
if there is not.  This shouldn't change any behavior as it acts
the same as when the pg_test functions returned bool.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 11:11:43 -04:00
Anna Schumaker ef2c488c07 NFS: Create a generic_pgio function
These functions are almost identical on both the read and write side.
FLUSH_COND_STABLE will never be set for the read path, so leaving it in
the generic code won't hurt anything.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 18:41:12 -04:00
Anna Schumaker 4a0de55c56 NFS: Create a common rw_header_alloc and rw_header_free function
I create a new struct nfs_rw_ops to decide the differences between reads
and writes.  This struct will be set when initializing a new
nfs_pgio_descriptor, and then passed on to the nfs_rw_header when a new
header is allocated.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 18:40:04 -04:00
Anna Schumaker 00bfa30abe NFS: Create a common pgio_alloc and pgio_release function
These functions are identical for the read and write paths so they can
be combined.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 18:39:55 -04:00
Anna Schumaker c0752cdfbb NFS: Create a common read and write header struct
The only difference is the write verifier field, but we can keep that
for a little bit longer.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 18:12:55 -04:00
Anna Schumaker 9c7e1b3d50 NFS: Create a common read and write data struct
At this point, the only difference between nfs_read_data and
nfs_write_data is the write verifier.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 18:12:47 -04:00
Christoph Hellwig fab5fc25d2 nfs: remove ->read_pageio_init from rpc ops
The read_pageio_init method is just a very convoluted way to grab the
right nfs_pageio_ops vector.  The vector to chose is not a choice of
protocol version, but just a pNFS vs MDS I/O choice that can simply be
done inside nfs_pageio_init_read based on the presence of a layout
driver, and a new force_mds flag to the special case of falling back
to MDS I/O on a pNFS-capable volume.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 17:50:08 -04:00
Christoph Hellwig a20c93e316 nfs: remove ->write_pageio_init from rpc ops
The write_pageio_init method is just a very convoluted way to grab the
right nfs_pageio_ops vector.  The vector to chose is not a choice of
protocol version, but just a pNFS vs MDS I/O choice that can simply be
done inside nfs_pageio_init_write based on the presence of a layout
driver, and a new force_mds flag to the special case of falling back
to MDS I/O on a pNFS-capable volume.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-28 17:48:38 -04:00
Peter Zijlstra 4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
Trond Myklebust 78096ccac5 NFSv4.1: Ensure that we free existing layout segments if we get a new layout
If the server returns a completely new layout stateid in response to our
LAYOUTGET, then make sure to free any existing layout segments.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:06 -05:00
Trond Myklebust 2c64c57dfc NFSv4.1: Fix wraparound issues in pnfs_seqid_is_newer()
Subtraction of signed integers does not have well defined wraparound
semantics in the C99 standard. In order to be wraparound-safe, we
have to use unsigned subtraction, and then cast the result.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 21:21:01 -05:00
Trond Myklebust 71244d9bdf NFSv4.1: Fix a race in nfs4_write_inode
nfs4_write_inode() must not be allowed to exit until the layoutcommit
is done. That means that both NFS_INO_LAYOUTCOMMIT and
NFS_INO_LAYOUTCOMMITTING have to be cleared.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13 13:34:36 -05:00
Trond Myklebust cc668ab30b NFSv4: Add tracepoints for debugging reads and writes
Set up tracepoints to track read, write and commit, as well as
pNFS reads and writes and commits to the data server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-22 08:58:26 -04:00
Trond Myklebust 7dc0ac70f8 NFSv4.1: Clean up layout segment comparison helper names
Give them names that are a bit more consistent with the general
pNFS naming scheme.

 - lo_seg_contained -> pnfs_lseg_range_contained
 - lo_seg_intersecting -> pnfs_lseg_range_intersecting
 - cmp_layout -> pnfs_lseg_range_cmp
 - is_matching_lseg -> pnfs_lseg_range_match

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-18 13:47:18 -04:00
Trond Myklebust 3cb2df17ae NFSv4.1: layout segment comparison helpers should take 'const' parameters
Also strip off the unnecessary 'inline' declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-18 13:47:18 -04:00
Trond Myklebust 5cc2216db8 NFSv4.1: Simplify setting the layout header credential
ctx->cred == ctx->state->owner->so_cred, so let's just use the former.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:38 -04:00
Trond Myklebust 9556000d8c NFSv4.1: Ensure that layoutreturn uses the correct credential
We need to use the same credential as was used for the layoutget
and/or layoutcommit operations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:35 -04:00
Trond Myklebust 6ab59344d9 NFSv4.1: Ensure that layoutget is called using the layout credential
Ensure that we use the same credential for layoutget, layoutcommit and
layoutreturn.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:34 -04:00
Trond Myklebust 5d422301f9 NFSv4: Fail I/O if the state recovery fails irrevocably
If state recovery fails with an ESTALE or a ENOENT, then we shouldn't
keep retrying. Instead, mark the stateid as being invalid and
fail the I/O with an EIO error.
For other operations such as POSIX and BSD file locking, truncate
etc, fail with an EBADF to indicate that this file descriptor is no
longer valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:10 -04:00
Trond Myklebust 240286725d NFSv4.1: Add a helper pnfs_commit_and_return_layout
In order to be able to safely return the layout in nfs4_proc_setattr,
we need to block new uses of the layout, wait for all outstanding
users of the layout to complete, commit the layout and then return it.

This patch adds a helper in order to do all this safely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
2013-03-21 10:31:21 -04:00
Trond Myklebust 2495680434 NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires
you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout
segments.
The only two sites that need to do this are the ones that call
pnfs_return_layout() without first doing a layout commit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
2013-03-21 10:31:21 -04:00
Trond Myklebust a073dbff35 NFSv4.1: Fix a race in pNFS layoutcommit
We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the
NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations
where the two are out of sync.
The first half of the problem is to ensure that pnfs_layoutcommit_inode
clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg.
We still need to keep the reference to those segments until the RPC call
is finished, so in order to make it clear _where_ those references come
from, we add a helper pnfs_list_write_lseg_done() that cleans up after
pnfs_list_write_lseg.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
2013-03-21 10:31:19 -04:00
Weston Andros Adamson 3000512137 NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS
The client will currently try LAYOUTGETs forever if a server is returning
NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no
longer needs the layout (ie process killed, unmounted).

This patch uses the DS timeout value (module parameter 'dataserver_timeo'
via rpc layer) to set an upper limit of how long the client tries LATOUTGETs
in this situation.  Once the timeout is reached, IO is redirected to the MDS.

This also changes how the client checks if a layout is on the clp list
to avoid a double list_add.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-28 17:41:35 -08:00
Benny Halevy 78f33277f9 pnfs: fix resend_to_mds for directio
Pass the directio request on pageio_init to clean up the API.

Percolate pg_dreq from original nfs_pageio_descriptor to the
pnfs_{read,write}_done_resend_to_mds and use it on respective
call to nfs_pageio_init_{read,write} on the newly created
nfs_pageio_descriptor.

Reproduced by command:
 mount -o vers=4.1 server:/ /mnt
 dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
PGD 34786067 PUD 34794067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4
CPU 1
Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs
RIP: 0010:[<ffffffffa021a3a8>]  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
RSP: 0018:ffff880038f8fa68  EFLAGS: 00010206
RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000
RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028
RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510
R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40
R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878
FS:  0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480)
Stack:
 ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82
 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d
 ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48
Call Trace:
 [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs]
 [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs]
 [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs]
 [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs]
 [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs]
 [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4]
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files]
 [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files]
 [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4]
 [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs]
 [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs]
 [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files]
 [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc]
 [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc]
 [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173
 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff8105f8c1>] process_one_work+0x226/0x422
 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422
 [<ffffffff81094757>] ? lock_acquired+0x210/0x249
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff810600d8>] worker_thread+0x126/0x1c4
 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240
 [<ffffffff81064ef8>] kthread+0xb1/0xb9
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f
RIP  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
 RSP <ffff880038f8fa68>
CR2: 0000000000000028

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [>= 3.6]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-24 10:07:36 -05:00
Trond Myklebust fd9a8d7160 NFSv4.1: Fix bulk recall and destroy of layouts
The current code in pnfs_destroy_all_layouts() assumes that removing
the layout from the server->layouts list is sufficient to make it
invisible to other processes. This ignores the fact that most
users access the layout through the nfs_inode->layout...
There is further breakage due to lack of reference counting of the
layouts, meaning that the whole thing Oopses at the drop of a hat.

The code in initiate_bulk_draining() is almost correct, and can be
used as a model for pnfs_destroy_all_layouts(), so move that
code to pnfs.c, and refactor the code to allow us to choose between
a single filesystem bulk recall, and a recall of all layouts.
Also note that initiate_bulk_draining() currently calls iput() while
holding locks. Fix that too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-14 13:22:50 -05:00
Yanchuan Nian 39e88fcfb1 pnfs: Increase the refcount when LAYOUTGET fails the first time
The layout will be set unusable if LAYOUTGET fails. Is it reasonable to
increase the refcount iff LAYOUTGET fails the first time?

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]
2013-01-04 10:50:42 -05:00
Trond Myklebust bc5a89b337 NFSv4.1: Remove assertion BUG_ON()s from the files and generic layout code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:39 -05:00
Trond Myklebust eba24e1fe5 NFSv4.1: Remove unused function last_byte_offset
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:38 -05:00
Yanchuan Nian 7175fe9015 nfs: Check whether a layout pointer is NULL before free it
The new layout pointer in pnfs_find_alloc_layout() may be NULL because of
out of memory. we must do some check work, otherwise pnfs_free_layout_hdr()
will go wrong because it can not deal with a NULL pointer.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-31 16:26:25 -04:00
Peng Tao 1fd937bd75 NFS41: send real read size in layoutget
For buffer read, use offst-to-isize.

For direct read, use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:34 -04:00
Peng Tao 6296556f0b NFS41: send real write size in layoutget
For buffer write, block layout client scan inode mapping to find
next hole and use offset-to-hole as layoutget length. Object
layout client uses offset-to-isize as layoutget length.

For direct write, both block layout and object layout use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:22 -04:00
Trond Myklebust 19c54abab7 NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
Split it into two functions, one which checks if layoutgets are blocked,
and one which checks if the layout stateid has expired.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-05 16:56:58 -07:00
Trond Myklebust 22aaf71495 NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
Clamp the layout barrier sequence id to the current sequence id
minus the maximum number of outstanding layoutget requests.

Also ensure that we correctly initialise lo->plh_barrier if there are
no layout segments associated to this layout header.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:57:48 -07:00
Trond Myklebust 0f35ad6f68 NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:28:17 -07:00
Trond Myklebust 25a1a6211d NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
...and fix a bug in pnfs_set_layout_stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 17:04:33 -07:00
Trond Myklebust 5a65503f3d NFSv4.1: Deal with wraparound issues when updating the layout stateid
...and add a helper function.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 16:47:14 -07:00
Trond Myklebust 038d649376 NFSv4.1: Always set the layout stateid if this is the first layoutget
If the list of layout segments is empty, we must unconditionally set
the layout stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 16:38:41 -07:00
Trond Myklebust 251ec410c4 NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 15:41:05 -07:00
Trond Myklebust 65857d5768 NFSv4.1: _pnfs_return_layout() shouldn't invalidate the layout on failure
Failure of the layoutreturn allocation fails is not a good reason to
mark the pnfs_layout_hdr as having failed a layoutget or i/o. Just
exit cleanly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:18 -04:00
Trond Myklebust e5929f3cff NFSv4.1: Remove the NFS_LAYOUT_RETURNED state
It serves no purpose that the test for whether or not we have valid
layout segments doesn't already serve.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:17 -04:00
Trond Myklebust 173f77e9c5 NFSv4.1: Clear NFS_LAYOUT_BULK_RECALL when the layout segments are freed
Once all the affected layout segments have been freed up, clear the
NFS_LAYOUT_BULK_RECALL flag so that we can reuse the pnfs_layout_hdr

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:17 -04:00
Trond Myklebust 8006bfba36 NFSv4.1: Get rid of the NFS_LAYOUT_DESTROYED state
We already have a mechanism for blocking LAYOUTGET by means of the
plh_block_lgets counter. The only "service" that NFS_LAYOUT_DESTROYED
provides at this point is to block layoutget once the layout segment
list is empty, which basically means that you have to wait until
the pnfs_layout_hdr is destroyed before you can do pNFS on that file
again.

This patch enables the reuse of the pnfs_layout_hdr if the layout
segment list is empty.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:16 -04:00
Trond Myklebust 579342785f NFSv4.1: Remove unused 'default allocation' for pnfs_alloc_layout_hdr()
...and ditto for pnfs_free_layout_hdr()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:16 -04:00
Trond Myklebust a9136d4914 NFSv4.1: Get rid of pNFS spin lock debugging asserts...
These are all in static declared functions that are called only once.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:16 -04:00
Trond Myklebust 8f0d27dc5d NFSv4.1: Balance pnfs_layout_hdr refcount in pnfs_layout_(insert|remove)_lseg
Ensure that the reference count for pnfs_layout_hdr reverts to the
original value after a call to pnfs_layout_remove_lseg().

Note that the caller is expected to hold a reference to the struct
pnfs_layout_hdr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:15 -04:00
Trond Myklebust 905ca191cf NFSv4.1: Clean up pnfs_put_lseg()
There is no longer a need to use pnfs_free_lseg_list(). Just call
pnfs_free_lseg() directly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:15 -04:00
Trond Myklebust 9c6263819f NFSv4.1: Clean up the removal of pnfs_layout_hdr from the server list
Move the code into pnfs_free_layout_hdr(), and add checks to
get_layout_by_fh_locked to ensure that they don't reference a layout
that is being freed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:14 -04:00
Trond Myklebust 6622c3ea05 NFSv4.1: Free the pnfs_layout_hdr outside the inode->i_lock
None of the existing pNFS layout drivers seem to require the inode
to be locked while they free the layout header.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:14 -04:00
Trond Myklebust 01d39ce82b NFSv4.1: Remove redundant reference to the pnfs_layout_hdr
Each layout segment already holds a reference to the pnfs_layout_hdr,
so there is no need to hold an extra reference that is released once
the last layout segment is freed.

Ensure that pnfs_find_alloc_layout() always returns a reference
to the pnfs_layout_hdr, which will be matched by the final call to
pnfs_put_layout_hdr() in pnfs_update_layout().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:13 -04:00
Trond Myklebust 57036a3776 NFSv4.1: Rename the pnfs_put_lseg_common to pnfs_layout_remove_lseg
The latter name is more descriptive of the actual function.
Also rename pnfs_insert_layout to pnfs_layout_insert_lseg.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:13 -04:00
Trond Myklebust bb346f6397 NFSv4.1: reset the inode MDS threshold counters on layout destruction
Instead of resetting the inode MDS threshold counters when we mark
the layout for destruction, do it as part of freeing the layout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:12 -04:00
Trond Myklebust 7fdab069b7 NFSv4.1: Fix a race in the pNFS return-on-close code
If we sleep after dropping the inode->i_lock, then we are no longer
atomic with respect to the rpc_wake_up() call in pnfs_layout_remove_lseg().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:11 -04:00
Trond Myklebust 115ce575cb NFSv4.1: pnfs_layout_io_set_failed must clear invalid lsegs
If pnfs_layout_io_test_failed() authorises a retry of the failed layoutgets,
we should clear the existing layout segments so that we start afresh. Do
this in pnfs_layout_io_set_failed().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:11 -04:00
Trond Myklebust 3e62121493 NFSv4.1: Don't drop the pnfs_layout_hdr after a layoutget failure
We want to cache the pnfs_layout_hdr after a layoutget or i/o
failure so that pnfs_update_layout() can find it and know when
it is time to retry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:10 -04:00
Trond Myklebust 830ffb5657 NFSv4.1: Fix a reference leak in pnfs_update_layout
If we exit after the call to pnfs_find_alloc_layout(), we have to ensure
that we put the struct pnfs_layout_hdr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:10 -04:00
Trond Myklebust 25c7533357 NFSv4.1: Retry pNFS after a 2 minute timeout
If we had to fall back to read/write through MDS, then assume that we should
retry pNFS after a suitable timeout period.
The following patch sets a timeout of 2 minutes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:09 -04:00
Trond Myklebust b9e028fd89 NFSv4.1: Add helpers for setting/reading the I/O fail bit
...and make them local to the pnfs.c file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:09 -04:00
Trond Myklebust f86bbcf85d NFSv4.1: Replace dprintk() in pnfs_update_layout with something less buggy
Dereferencing nfsi->layout in order to read plh_flags without holding
a spin lock is bug prone. Furthermore, the dprintk() tells you nothing
about whether or not the call succeeded.
Replace it with something that tells you about whether or not a valid
layout segment was returned for the inode in question.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:08 -04:00
Trond Myklebust 9369a431bc NFSv4.1: Cleanup; add "pnfs_" prefix to put_lseg() and get_lseg()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:07 -04:00
Trond Myklebust 70c3bd2bdf NFSv4.1: Cleanup; add "pnfs_" prefix to get_layout_hdr() and put_layout_hdr()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:07 -04:00
Trond Myklebust 49a85061b0 NFSv4.1: Cleanup add a "pnfs_" prefix to mark_matching_lsegs_invalid
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:06 -04:00
Trond Myklebust a0b0a6e39b NFS: Clean up the pNFS layoutget interface
Ensure that we do return errors from nfs4_proc_layoutget() and that we
don't mark the layout as having failed if the error was due to a
signal or resource problem on the client side.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:06 -04:00
Idan Kedar 8554116e17 pnfs: defer release of pages in layoutget
we have encountered a bug whereby reading a lot of files (copying
fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused
a general protection fault in xdr_shrink_bufhead. this function is
called when decoding the response from LAYOUTGET. the decoding is done
by a worker thread, and the caller of LAYOUTGET waits for the worker
thread to complete.

hitting Ctrl+C caused the synchronous wait to end and the next thing the
caller does is to free the pages, so when the worker thread calls
xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these
pages has been moved to nfs4_layoutget_release.

Signed-off-by: Idan Kedar <idank@tonian.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02 17:38:54 -04:00
Bryan Schumaker 89d77c8fa8 NFS: Convert v4 into a module
This patch exports symbols needed by the v4 module.  In addition, I also
switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or
CONFIG_NFS_V4_MODULE are set.

The module (nfs4.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v4.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:06:52 -04:00
Andy Adamson 293b3b065c NFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs list
mark_matching_lsegs_invalid() resets the mds_threshold counters and can
dereference the layout hdr on an initial empty plh_segs list. It returns 0 both
in the case of an initial empty list and in a non-emtpy list that was cleared
by calls to mark_lseg_invalid.

Don't send a LAYOUTRETURN if the list was initially empty.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:39:00 -04:00
Andy Adamson 366d50521c NFSv4.1 mark layout when already returned
When the file layout driver is fencing a DS, _pnfs_return_layout can be
called mulitple times per inode due to in-flight i/o referencing lsegs on it's
plh_segs list.

Remember that LAYOUTRETURN has been called, and do not call it again.
Allow LAYOUTRETURNs after a subsequent LAYOUTGET.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:37:25 -04:00
Bryan Schumaker 57208fa7e5 NFS: Create an write_pageio_init() function
pNFS needs to select a write function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing writes.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Bryan Schumaker 1abb50886a NFS: Create an read_pageio_init() function
pNFS needs to select a read function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing reads.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Trond Myklebust 0a9c63fae7 NFSv4.1: Fix a race in set_pnfs_layoutdriver
The call to try_module_get() dereferences ld_type outside the
spin locks, which means that it may be pointing to garbage if
a module unload was in progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-19 13:32:45 -04:00
Trond Myklebust 2a4c8994ee NFSv4.1: Fix umount when filelayout DS is also the MDS
Currently there is a 'chicken and egg' issue when the DS is also the mounted
MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
is not called to dereference the MDS usage of a deviceid which holds a
reference to the DS nfs_client.  The result is the umount program returns,
but the nfs_client is not freed, and the cl_session hearbeat continues.

The MDS (and all other nfs mounts) lose their last nfs_client reference in
nfs_free_server when the last nfs_server (fsid) is umounted.
The file layout DS lose their last nfs_client reference in destroy_ds
when the last deviceid referencing the data server is put and destroy_ds is
called. This is triggered by a call to nfs4_deviceid_purge_client which
removes references to a pNFS deviceid used by an MDS mount.

The fix is to track how many pnfs enabled filesystems are mounted from
this server, and then to purge the device id cache once that count reaches
zero.

Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-18 08:45:16 -04:00
Andy Adamson d23d61c8d3 NFSv4.1 test the mdsthreshold hint parameters
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:49 -04:00
Andy Adamson 2701d086db NFSv4.1 add nfs_inode book keeping for mdsthreshold
Keep track of the number of bytes read or written via buffered, direct, and
mem-mapped i/o for use by mdsthreshold size_io hints.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson 82be417aa3 NFSv4.1 cache mdsthreshold values on OPEN
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson 041245c88a NFSv4.1 resend LAYOUTGET on data server invalid layout errors
The "invalid layout" class of errors is handled by destroying the layout and
getting a new layout from the server.  Currently, the layout must be
destroyed before a new layout can be obtained.

This means that all references (e.g.lsegs) to the "to be destroyed" layout
header must be dropped before it can be destroyed. This in turn means waiting
for all in flight RPC's using the old layout as well as draining the data
server session slot table wait queue.

Set the NFS_LAYOUT_INVALID flag to redirect I/O to the MDS while waiting for
the old layout to be destroyed.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:33 -04:00
Andy Adamson 0a57cdac3f NFSv4.1 send layoutreturn to fence disconnected data server
Let the MDS know that you are redirecting I/O from pNFS to MDS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:31 -04:00
Andy Adamson e7dd79af01 NFSv4.1: mark deviceid invalid on filelayout DS connection errors
This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.

Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:20 -04:00
Trond Myklebust 25b11dcdbf NFS: Clean up nfs read and write error paths
Move the error handling for nfs_generic_pagein() into a single function.
Ditto for nfs_generic_flush().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-01 13:48:13 -04:00
Trond Myklebust 9b5415b536 NFS: Fix a use-before-initialised warning in fs/nfs/write.c and fs/nfs/pnfs.c
If the allocation of nfs_write_header fails, the list of nfs_pages that
needs to be cleaned up is still on desc->pg_list...

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Fred Isaman <iisaman@netapp.com>
2012-04-27 15:03:51 -04:00
Fred Isaman 1825a0d08f NFS: prepare coalesce testing for directio
The coalesce code made assumptions that will no longer be true once
non-page aligned io occurs.  This introduces no change in
current behavior, but allows for more general situations to come.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman 061ae2edb7 NFS: create completion structure to pass into page_init functions
Factors out the code that will need to change when directio
starts using these code paths.  This will allow directio to use
the generic pagein and flush routines

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman 6c75dc0d49 NFS: merge _full and _partial write rpc_ops
Decouple nfs_pgio_header and nfs_write_data, and have (possibly
multiple) nfs_write_datas each take a refcount on nfs_pgio_header.

For the moment keeps nfs_write_header as a way to preallocate a single
nfs_write_data with the nfs_pgio_header.  The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.

This also fixes bug in pnfs_ld_handle_write_error.  In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman 4db6e0b74c NFS: merge _full and _partial read rpc_ops
Decouple nfs_pgio_header and nfs_read_data, and have (possibly
multiple) nfs_read_datas each take a refcount on nfs_pgio_header.

For the moment keeps nfs_read_header as a way to preallocate a single
nfs_read_data with the nfs_pgio_header.  The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.

This also fixes bug in pnfs_ld_handle_read_error.  In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman cd841605f7 NFS: create common nfs_pgio_header for both read and write
In order to avoid duplicating all the data in nfs_read_data whenever we
split it up into multiple RPC calls (either due to a short read result
or due to rsize < PAGE_SIZE), we split out the bits that are the same
per RPC call into a separate "header" structure.

The goal this patch moves towards is to have a single header
refcounted by several rpc_data structures.  Thus, want to always refer
from rpc_data to the header, and not the other way.  This patch comes
close to that ideal, but the directio code currently needs some
special casing, isolated in the nfs_direct_[read_write]hdr_release()
functions.  This will be dealt with in a future patch.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman 1acbbb4e16 NFS4.1: make pnfs_ld_[read|write]_done consistent
The two functions had diverged quite a bit, with the write function
being a bit more robust than the read.

However, these still break badly in the desc->pg_bsize < PAGE_CACHE_SIZE case,
as then there is nothing hanging on the data->pages list, and the resend
ends up doing nothing.  This will be fixed in a patch later in the series.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:36 -04:00
Andy Adamson e5265a0c58 NFSv4.1 fix page number calculation bug for filelayout decode buffers
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:23:23 -04:00
Trond Myklebust 8dd3775889 NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
Move more pnfs-isms out of the generic commit code.

Bugfixes:

- filelayout_scan_commit_lists doesn't need to get/put the lseg.
  In fact since it is run under the inode->i_lock, the lseg_put()
  can deadlock.

- Ensure that we distinguish between what needs to be done for
  commit-to-data server and what needs to be done for commit-to-MDS
  using the new flag PG_COMMIT_TO_DS. Otherwise we may end up calling
  put_lseg() on a bucket for a struct nfs_page that got written
  through the MDS.

- Fix a case where we were using list_del() on an nfs_page->wb_list
  instead of list_del_init().

- filelayout_initiate_commit needs to call filelayout_commit_release
  on error instead of the mds_ops->rpc_release(). Otherwise it won't
  clear the commit lock.

Cleanups:

- Let the files layout manage the commit lists for the pNFS case.
  Don't expose stuff like pnfs_choose_commit_list, and the fact
  that the commit buckets hold references to the layout segment
  in common code.

- Cast out the put_lseg() calls for the struct nfs_read/write_data->lseg
  into the pNFS layer from whence they came.

- Let the pNFS layer manage the NFS_INO_PNFS_COMMIT bit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-03-17 11:09:33 -04:00
Trond Myklebust 2d2f24add1 NFSv4: Simplify the struct nfs4_stateid
Replace the union with the common struct stateid4 as defined in both
RFC3530 and RFC5661. This makes it easier to access the sequence id,
which will again make implementing support for parallel OPEN calls
easier.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:47 -05:00
Trond Myklebust f597c53790 NFSv4: Add helpers for basic copying of stateids
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:46 -05:00
Trond Myklebust a59c30acfb NFSv4.1: Get rid of redundant NFS4CLNT_LAYOUTRECALL tests
The NFS4CLNT_LAYOUTRECALL tests in pnfs_layout_process and
pnfs_update_layout are redundant.

In the case of a bulk layout recall, we're always testing for
the NFS_LAYOUT_BULK_RECALL flay anyway.
In the case of a file or segment recall, the call to
pnfs_set_layout_stateid() updates the layout_header 'barrier'
sequence id, which triggers the test in pnfs_layoutgets_blocked()
and is less race-prone than NFS4CLNT_LAYOUTRECALL anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 11:17:47 -05:00
Weston Andros Adamson a030889a01 NFS: start printks w/ NFS: even if __func__ shown
This patch addresses printks that have some context to show that they are
from fs/nfs/, but for the sake of consistency now start with NFS:

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:00 -05:00
Trond Myklebust 7d9dea915f NFS: Use kcalloc() when allocating arrays
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:22 -05:00
Trond Myklebust e2fecb215b NFS: Remove pNFS bloat from the generic write path
We have no business doing any this in the standard write release path.
Get rid of it, and put it in the pNFS layer.

Also, while we're at it, get rid of the completely bogus unlock/relock
semantics that were present in nfs_writeback_release_full(). It is
not only unnecessary, but actually dangerous to release the write lock
just in order to take it again in nfs_page_async_flush(). Better just
to open code the pgio operations in a pnfs helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:57:46 -05:00
Boaz Harrosh fe0fe83585 pnfs-obj: Must return layout on IO error
As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.

This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
 because this code is never called outside of pnfs.c and pnfs IO
 paths)

Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.

[This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:55:33 -05:00
Linus Torvalds e25ba0ce03 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Revert pnfs ugliness from the generic NFS read code path
  SUNRPC: destroy freshly allocated transport in case of sockaddr init error
  NFS: Fix a regression in the referral code
  nfs: move nfs_file_operations declaration to bottom of file.c (try #2)
  nfs: when attempting to open a directory, fall back on normal lookup (try #5)
2011-11-22 08:54:15 -08:00
Trond Myklebust 62e4a76987 NFS: Revert pnfs ugliness from the generic NFS read code path
pNFS-specific code belongs in the pnfs layer. It should not be
hijacking generic NFS read or write code paths.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-10 14:50:26 -05:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Paul Gortmaker 143cb494cb fs: add module.h to files that were implicitly using it
Some files were using the complete module.h infrastructure without
actually including the header at all.  Fix them up in advance so
once the implicit presence is removed, we won't get failures like this:

  CC [M]  fs/nfsd/nfssvc.o
fs/nfsd/nfssvc.c: In function 'nfsd_create_serv':
fs/nfsd/nfssvc.c:335: error: 'THIS_MODULE' undeclared (first use in this function)
fs/nfsd/nfssvc.c:335: error: (Each undeclared identifier is reported only once
fs/nfsd/nfssvc.c:335: error: for each function it appears in.)
fs/nfsd/nfssvc.c: In function 'nfsd':
fs/nfsd/nfssvc.c:555: error: implicit declaration of function 'module_put_and_exit'
make[3]: *** [fs/nfsd/nfssvc.o] Error 1

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:31 -04:00
Peng Tao 92407e75ce nfs4: serialize layoutcommit
Current pnfs_layoutcommit_inode can not handle parallel layoutcommit.
And as Trond suggested , there is no need for client to optimize for
parallel layoutcommit. So add NFS_INO_LAYOUTCOMMITTING flag to
mark inflight layoutcommit and serialize lalyoutcommit with it.
Also mark_inode_dirty_sync if pnfs_layoutcommit_inode fails to issue
layoutcommit.

Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-31 11:51:28 -04:00
Peng Tao 9b7eecdcfe pnfs: recoalesce when ld read pagelist fails
For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to
mds.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:14 -07:00
Peng Tao 8ce160c5ef pnfs: recoalesce when ld write pagelist fails
For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to
mds.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:13 -07:00
Peng Tao 1b0ae06877 pnfs: make _set_lo_fail generic
file layout and block layout both use it to set mark layout io failure
bit. So make it generic.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:13 -07:00
Andy Adamson db29c08909 pnfs: cleanup_layoutcommit
This gives layout driver a chance to cleanup structures they put in at
encode_layoutcommit.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixup layout header pointer for layoutcommit]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Benny Halevy 738fd0f360 pnfs: add set-clear layoutdriver interface
To allow layout driver to issue getdevicelist at mount time, and clean up
at umount time.

[fixup non NFS_V4_1 set_pnfs_layoutdriver definition]
[pnfs: pass mntfh down the init_pnfs path]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao a9bae5666d pnfs: let layoutcommit handle a list of lseg
There can be multiple lseg per file, so layoutcommit should be
able to handle it.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao 9fa4075878 pnfs: save layoutcommit cred at layout header init
No need to save it for every lseg.
No need to save it at every pnfs_set_layoutcommit.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:14 -04:00
Peng Tao acff588053 pnfs: save layoutcommit lwb at layout header
No need to save it for every lseg.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:14 -04:00
Trond Myklebust 1f9453578f NFS: Clean up - simplify the switch to read/write-through-MDS
Use nfs_pageio_reset_read_mds and nfs_pageio_reset_write_mds instead of
completely reinitialising the struct nfs_pageio_descriptor.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:22 -04:00