1
0
Fork 0
Commit Graph

2324 Commits (e7258c1a269e0967856c81d182c286a78f5ecf15)

Author SHA1 Message Date
Martin K. Petersen e7258c1a26 block: Get rid of bdev_integrity_enabled()
bdev_integrity_enabled() is only used by bio_integrity_enabled().
Combine these two functions.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-27 09:14:44 -06:00
Ming Lei f70ced0917 blk-mq: support per-distpatch_queue flush machinery
This patch supports to run one single flush machinery for
each blk-mq dispatch queue, so that:

- current init_request and exit_request callbacks can
cover flush request too, then the buggy copying way of
initializing flush request's pdu can be fixed

- flushing performance gets improved in case of multi hw-queue

In fio sync write test over virtio-blk(4 hw queues, ioengine=sync,
iodepth=64, numjobs=4, bs=4K), it is observed that througput gets
increased a lot over my test environment:
	- throughput: +70% in case of virtio-blk over null_blk
	- throughput: +30% in case of virtio-blk over SSD image

The multi virtqueue feature isn't merged to QEMU yet, and patches for
the feature can be found in below tree:

	git://kernel.ubuntu.com/ming/qemu.git  	v2.1.0-mq.4

And simply passing 'num_queues=4 vectors=5' should be enough to
enable multi queue(quad queue) feature for QEMU virtio-blk.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:45 -06:00
Ming Lei e97c293cdf block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue
This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
so that this function can find the corresponding blk_flush_queue
bound with current mq context since the flush queue will become
per hw-queue.

For legacy queue, the parameter can be simply 'NULL'.

For multiqueue case, the parameter should be set as the context
from which the related request is originated. With this context
info, the hw queue and related flush queue can be found easily.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:44 -06:00
Ming Lei 0bae352da5 block: flush: avoid to figure out flush queue unnecessarily
Just figuring out flush queue at the entry of kicking off flush
machinery and request's completion handler, then pass it through.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:42 -06:00
Ming Lei ba483388e3 block: remove blk_init_flush() and its pair
Now mission of the two helpers is over, and just call
blk_alloc_flush_queue() and blk_free_flush_queue() directly.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:41 -06:00
Ming Lei 7c94e1c157 block: introduce blk_flush_queue to drive flush machinery
This patch introduces 'struct blk_flush_queue' and puts all
flush machinery related fields into this structure, so that

	- flush implementation details aren't exposed to driver
	- it is easy to convert to per dispatch-queue flush machinery

This patch is basically a mechanical replacement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:40 -06:00
Ming Lei 7ddab5de5b block: avoid to use q->flush_rq directly
This patch trys to use local variable to access flush request,
so that we can convert to per-queue flush machinery a bit easier.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:38 -06:00
Ming Lei 3c09676c12 block: move flush initialization to blk_flush_init
These fields are always used with the flush request, so
initialize them together.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:37 -06:00
Ming Lei f355265571 block: introduce blk_init_flush and its pair
These two temporary functions are introduced for holding flush
initialization and de-initialization, so that we can
introduce 'flush queue' easier in the following patch. And
once 'flush queue' and its allocation/free functions are ready,
they will be removed for sake of code readability.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:35 -06:00
Ming Lei 1bcb1eada4 blk-mq: allocate flush_rq in blk_mq_init_flush()
It is reasonable to allocate flush req in blk_mq_init_flush().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:34 -06:00
Ming Lei 08e98fc601 blk-mq: handle failure path for initializing hctx
Failure of initializing one hctx isn't handled, so this patch
introduces blk_mq_init_hctx() and its pair to handle it explicitly.
Also this patch makes code cleaner.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-25 15:22:32 -06:00
Christoph Hellwig 9041583765 block: fix blk_abort_request on blk-mq
Signed-off-by: Christoph Hellwig <hch@lst.de>

Moved blk_mq_rq_timed_out() definition to the private blk-mq.h header.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:08 -06:00
Ming Lei 5e940aaa59 blk-timeout: fix blk_add_timer
Commit 8cb34819cdd5d(blk-mq: unshared timeout handler) introduces
blk-mq's own timeout handler, and removes following line:

	blk_queue_rq_timed_out(q, blk_mq_rq_timed_out);

which then causes blk_add_timer() to bypass adding the timer,
since blk-mq no longer has q->rq_timed_out_fn defined.

This patch fixes the problem by bypassing the check for blk-mq,
so that both request deadlines are still set and the rolling
timer updated.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:08 -06:00
Jens Axboe aedcd72f6c blk-mq: limit memory consumption if a crash dump is active
It's not uncommon for crash dump kernels to be limited to 128MB or
something low in that area. This is normally not a problem for
devices as we don't use that much memory, but for some shared SCSI
setups with huge queue depths, it can potentially fill most of
memory with tons of request allocations. blk-mq does scale back
when it fails to allocate memory, but it scales back just enough
so that blk-mq succeeds. This could still leave the system with
not enough memory to make any real progress.

Check if we are in a kdump environment and limit the hardware
queues and tag depth.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:08 -06:00
Ming Lei 2edd2c740b blk-mq: remove unnecessary blk_clear_rq_complete()
This patch removes two unnecessary blk_clear_rq_complete(),
the REQ_ATOM_COMPLETE flag is cleared inside blk_mq_start_request(),
so:

	- The blk_clear_rq_complete() in blk_flush_restore_request()
	needn't because the request will be freed later, and clearing
	it here may open a small race window with timeout.

	- The blk_clear_rq_complete() in blk_mq_requeue_request() isn't
	necessary too, even though REQ_ATOM_STARTED is cleared in
	__blk_mq_requeue_request(), in theory it still may cause a small
	race window with timeout since the two clear_bit() may be
	reordered.

Signed-off-by: Ming Lei <ming.lei@canoical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:07 -06:00
Christoph Hellwig 0152fb6b57 blk-mq: pass a reserved argument to the timeout handler
Allow blk-mq to pass an argument to the timeout handler to indicate
if we're timing out a reserved or regular command.  For many drivers
those need to be handled different.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:07 -06:00
Christoph Hellwig 46f92d42ee blk-mq: unshared timeout handler
Duplicate the (small) timeout handler in blk-mq so that we can pass
arguments more easily to the driver timeout handler.  This enables
the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:07 -06:00
Christoph Hellwig 81481eb423 blk-mq: fix and simplify tag iteration for the timeout handler
Don't do a kmalloc from timer to handle timeouts, chances are we could be
under heavy load or similar and thus just miss out on the timeouts.
Fortunately it is very easy to just iterate over all in use tags, and doing
this properly actually cleans up the blk_mq_busy_iter API as well, and
prepares us for the next patch by passing a reserved argument to the
iterator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:07 -06:00
Christoph Hellwig c8a446ad69 blk-mq: rename blk_mq_end_io to blk_mq_end_request
Now that we've changed the driver API on the submission side use the
opportunity to fix up the name on the completion side to fit into the
general scheme.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:07 -06:00
Christoph Hellwig e2490073cd blk-mq: call blk_mq_start_request from ->queue_rq
When we call blk_mq_start_request from the core blk-mq code before calling into
->queue_rq there is a racy window where the timeout handler can hit before we've
fully set up the driver specific part of the command.

Move the call to blk_mq_start_request into the driver so the driver can start
the request only once it is fully set up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:07 -06:00
Christoph Hellwig bf57229745 blk-mq: remove REQ_END
Pass an explicit parameter for the last request in a batch to ->queue_rq
instead of using a request flag.  Besides being a cleaner and non-stateful
interface this is also required for the next patch, which fixes the blk-mq
I/O submission code to not start a time too early.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 12:00:07 -06:00
Jens Axboe 6d11fb454b Merge branch 'for-linus' into for-3.18/core
Moving patches from for-linus to 3.18 instead, pull in this changes
that will go to Linus today.
2014-09-22 11:57:32 -06:00
Jens Axboe 8b95741569 blk-mq: use blk_mq_start_hw_queues() when running requeue work
When requests are retried due to hw or sw resource shortages,
we often stop the associated hardware queue. So ensure that we
restart the queues when running the requeue work, otherwise the
queue run will be a no-op.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 11:55:56 -06:00
Jens Axboe 6b55e1f2d0 blk-mq: fix potential oops on out-of-memory in __blk_mq_alloc_rq_maps()
__blk_mq_alloc_rq_maps() can be invoked multiple times, if we scale
back the queue depth if we are low on memory. So don't clear
set->tags when we fail, this is handled directly in
the parent function, blk_mq_alloc_tag_set().

Reported-by: Robert Elliott  <Elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 11:55:23 -06:00
Christoph Hellwig a57a178a49 blk-mq: avoid infinite recursion with the FUA flag
We should not insert requests into the flush state machine from
blk_mq_insert_request.  All incoming flush requests come through
blk_{m,s}q_make_request and are handled there, while blk_execute_rq_nowait
should only be called for BLOCK_PC requests.  All other callers
deal with requests that already went through the flush statemchine
and shouldn't be reinserted into it.

Reported-by: Robert Elliott  <Elliott@hp.com>
Debugged-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 11:55:19 -06:00
David Hildenbrand 683d0e1262 blk-mq: Avoid race condition with uninitialized requests
This patch should fix the bug reported in
https://lkml.org/lkml/2014/9/11/249.

We have to initialize at least the atomic_flags and the cmd_flags when
allocating storage for the requests.

Otherwise blk_mq_timeout_check() might dereference uninitialized
pointers when racing with the creation of a request.

Also move the reset of cmd_flags for the initializing code to the point
where a request is freed. So we will never end up with pending flush
request indicators that might trigger dereferences of invalid pointers
in blk_mq_timeout_check().

Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reported-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com>
Tested-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 11:55:14 -06:00
Jens Axboe 538b753418 blk-mq: request deadline must be visible before marking rq as started
When we start the request, we set the deadline and flip the bits
marking the request as started and non-complete. However, it's
important that the deadline store is ordered before flipping the
bits, otherwise we could have a small window where the request is
marked started but with an invalid deadline. This can confuse the
timeout handling.

Suggested-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-22 11:54:04 -06:00
Jens Axboe b207892b06 Merge branch 'for-linus' into for-3.18/core
A bit of churn on the for-linus side that would be nice to have
in the core bits for 3.18, so pull it in to catch us up and make
forward progress easier.

Signed-off-by: Jens Axboe <axboe@fb.com>

Conflicts:
	block/scsi_ioctl.c
2014-09-11 09:31:18 -06:00
Jens Axboe a516440542 blk-mq: scale depth and rq map appropriate if low on memory
If we are running in a kdump environment, resources are scarce.
For some SCSI setups with a huge set of shared tags, we run out
of memory allocating what the drivers is asking for. So implement
a scale back logic to reduce the tag depth for those cases, allowing
the driver to successfully load.

We should extend this to detect low memory situations, and implement
a sane fallback for those (1 queue, 64 tags, or something like that).

Tested-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-10 09:02:03 -06:00
Alan Stern df35c7c912 Block: fix unbalanced bypass-disable in blk_register_queue
When a queue is registered, the block layer turns off the bypass
setting (because bypass is enabled when the queue is created).  This
doesn't work well for queues that are unregistered and then registered
again; we get a WARNING because of the unbalanced calls to
blk_queue_bypass_end().

This patch fixes the problem by making blk_register_queue() call
blk_queue_bypass_end() only the first time the queue is registered.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Tejun Heo <tj@kernel.org>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-09 10:44:24 -06:00
Tejun Heo ff9ea32381 block, bdi: an active gendisk always has a request_queue associated with it
bdev_get_queue() returns the request_queue associated with the
specified block_device.  blk_get_backing_dev_info() makes use of
bdev_get_queue() to determine the associated bdi given a block_device.

All the callers of bdev_get_queue() including
blk_get_backing_dev_info() assume that bdev_get_queue() may return
NULL and implement NULL handling; however, bdev_get_queue() requires
the passed in block_device is opened and attached to its gendisk.
Because an active gendisk always has a valid request_queue associated
with it, bdev_get_queue() can never return NULL and neither can
blk_get_backing_dev_info().

Make it clear that neither of the two functions can return NULL and
remove NULL handling from all the callers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-08 10:00:35 -06:00
Tejun Heo f4da80727c blkcg: remove blkcg->id
blkcg->id is a unique id given to each blkcg; however, the
cgroup_subsys_state which each blkcg embeds already has ->serial_nr
which can be used for the same purpose.  Drop blkcg->id and replace
its uses with blkcg->css.serial_nr.  Rename cfq_cgroup->blkcg_id to
->blkcg_serial_nr and @id in check_blkcg_changed() to @serial_nr for
consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-08 09:55:37 -06:00
Keith Busch 2da78092dd block: Fix dev_t minor allocation lifetime
Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.

Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-03 15:01:02 -06:00
Robert Elliott 5676e7b6db blk-mq: cleanup after blk_mq_init_rq_map failures
In blk-mq.c blk_mq_alloc_tag_set, if:
	set->tags = kmalloc_node()
succeeds, but one of the blk_mq_init_rq_map() calls fails,
	goto out_unwind;
needs to free set->tags so the caller is not obligated
to do so.  None of the current callers (null_blk,
virtio_blk, virtio_blk, or the forthcoming scsi-mq)
do so.

set->tags needs to be set to NULL after doing so,
so other tag cleanup logic doesn't try to free
a stale pointer later.  Also set it to NULL
in blk_mq_free_tag_set.

Tested with error injection on the forthcoming
scsi-mq + hpsa combination.

Signed-off-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-03 10:44:15 -06:00
Ming Lei 0738854939 blk-merge: fix blk_recount_segments
QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
so bio->bi_phys_segment computed may be bigger than
queue_max_segments(q) for blk-mq devices, then drivers will
fail to handle the case, for example, BUG_ON() in
virtio_queue_rq() can be triggerd for virtio-blk:

	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146

This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
flag if the computed bio->bi_phys_segment is bigger than
queue_max_segments(q), and the regression is caused by commit
05f1dd53152173(block: add queue flag for disabling SG merging).

Reported-by: Kick In <pierre-andre.morey@canonical.com>
Tested-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-02 10:25:12 -06:00
Jens Axboe 55872c5a3c bsg: fix potential error pointer dereference
Dan writes:

block/bsg.c:327 bsg_map_hdr() error: 'next_rq' dereferencing possible
ERR_PTR().

Fix this by setting next_rq to NULL, for the case where it can be
!= NULL but an error pointer.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-29 08:34:14 -06:00
Joe Lawrence a492f07545 block,scsi: fixup blk_get_request dead queue scenarios
The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.

For consistency, make a similar change to the blk_mq_alloc_request leg
of blk_get_request.  It may fail if the queue is dead, or the caller was
unwilling to wait.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-28 10:03:46 -06:00
Toshiaki Makita 7b5af5cffc cfq-iosched: Add comments on update timing of weight
Explain that weight has to be updated on activation.
This complements previous fix e15693ef18 ("cfq-iosched: Fix wrong
children_weight calculation").

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-28 08:16:29 -06:00
Joe Lawrence eb571eeade block,scsi: verify return pointer from blk_get_request
The blk-core dead queue checks introduce an error scenario to
blk_get_request that returns NULL if the request queue has been
shutdown. This affects the behavior for __GFP_WAIT callers, who should
verify the return value before dereferencing.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-26 15:20:23 -06:00
Toshiaki Makita e15693ef18 cfq-iosched: Fix wrong children_weight calculation
cfq_group_service_tree_add() is applying new_weight at the beginning of
the function via cfq_update_group_weight().
This actually allows weight to change between adding it to and subtracting
it from children_weight, and triggers WARN_ON_ONCE() in
cfq_group_service_tree_del(), or even causes oops by divide error during
vfr calculation in cfq_group_service_tree_add().

The detailed scenario is as follows:
1. Create blkio cgroups X and Y as a child of X.
   Set X's weight to 500 and perform some I/O to apply new_weight.
   This X's I/O completes before starting Y's I/O.
2. Y starts I/O and cfq_group_service_tree_add() is called with Y.
3. cfq_group_service_tree_add() walks up the tree during children_weight
   calculation and adds parent X's weight (500) to children_weight of root.
   children_weight becomes 500.
4. Set X's weight to 1000.
5. X starts I/O and cfq_group_service_tree_add() is called with X.
6. cfq_group_service_tree_add() applies its new_weight (1000).
7. I/O of Y completes and cfq_group_service_tree_del() is called with Y.
8. I/O of X completes and cfq_group_service_tree_del() is called with X.
9. cfq_group_service_tree_del() subtracts X's weight (1000) from
   children_weight of root. children_weight becomes -500.
   This triggers WARN_ON_ONCE().
10. Set X's weight to 500.
11. X starts I/O and cfq_group_service_tree_add() is called with X.
12. cfq_group_service_tree_add() applies its new_weight (500) and adds it
    to children_weight of root. children_weight becomes 0. Calcularion of
    vfr triggers oops by divide error.

weight should be updated right before adding it to children_weight.

Reported-by: Ruki Sekiya <sekiya.ruki@lab.ntt.co.jp>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-26 10:17:30 -06:00
Sabrina Dubroca d19d744685 block: fix error handling in sg_io
Before commit 2cada584b2 ("block: cleanup error handling in sg_io"),
we had ret = 0 before entering the last big if block of sg_io.

Since 2cada584b2, ret = -EFAULT, which breaks hdparm:

/dev/sda:
 setting Advanced Power Management level to 0xc8 (200)
 HDIO_DRIVE_CMD failed: Bad address
 APM_level      = 128

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Fixes: 2cada584b2 ("block: cleanup error handling in sg_io")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-26 08:20:01 -06:00
Tony Battersby 2ba136daa3 fix regression in SCSI_IOCTL_SEND_COMMAND
blk_rq_set_block_pc() memsets rq->cmd to 0, so it should come
immediately after blk_get_request() to avoid overwriting the
user-supplied CDB.  Also check for failure to allocate rq.

Fixes: f27b087b81 ("block: add blk_rq_set_block_pc()")
Cc: <stable@vger.kernel.org> # 3.16.x
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-22 15:04:33 -05:00
Tony Battersby 6f4a16266f scsi-mq: fix requests that use a separate CDB buffer
This patch fixes code such as the following with scsi-mq enabled:

    rq = blk_get_request(...);
    blk_rq_set_block_pc(rq);

    rq->cmd = my_cmd_buffer; /* separate CDB buffer */

    blk_execute_rq_nowait(...);

Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for
large CDBs only).  Without this patch, scsi_mq_prep_fn() will set
rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-22 15:04:31 -05:00
Christoph Hellwig a57821cac6 block: support > 16 byte CDBs for SG_IO
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-21 20:42:01 -05:00
Christoph Hellwig 2cada584b2 block: cleanup error handling in sg_io
Make sure we always clean up through the out label and just have
a single place to put the request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-21 20:42:01 -05:00
Tejun Heo cddd5d1764 blk-mq: blk_mq_freeze_queue() should allow nesting
While converting to percpu_ref for freezing, add703fda9 ("blk-mq:
use percpu_ref for mq usage count") incorrectly made
blk_mq_freeze_queue() misbehave when freezing is nested due to
percpu_ref_kill() being invoked on an already killed ref.

Fix it by making blk_mq_freeze_queue() kill and kick the queue only
for the outermost freeze attempt.  All the nested ones can simply wait
for the ref to reach zero.

While at it, remove unnecessary @wake initialization from
blk_mq_unfreeze_queue().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-21 20:37:51 -05:00
Jens Axboe a68aafa5b2 blk-mq: correct a few wrong/bad comments
Just grammar or spelling errors, nothing major.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-21 20:37:49 -05:00
Sagi Grimberg 16f408dc6b block: Fix BUG_ON when pi errors occur
When getting a pi error we get to bio_integrity_end_io with
bi_remaining already decremented to 0 where we will eventually
need to call bio_endio with restored original bio completion handler.
Calling bio_endio invokes a BUG_ON(). We should call bio_endio_nodec
instead, like what is done in bio_integrity_verify_fn.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-21 20:37:47 -05:00
Jens Axboe 274a5843ff blk-mq: don't allow merges if turned off for the queue
blk-mq uses BLK_MQ_F_SHOULD_MERGE, as set by the driver at init time,
to determine whether it should merge IO or not. However, this could
also be disabled by the admin, if merging is switched off through
sysfs. So check the general queue state as well before attempting
to merge IO.

Reported-by: Rob Elliott <Elliott@hp.com>
Tested-by: Rob Elliott <Elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-21 20:37:45 -05:00
Ming Lei dd84008708 blk-mq: fix WARNING "percpu_ref_kill() called more than once!"
Before doing queue release, the queue has been freezed already
by blk_cleanup_queue(), so needn't to freeze queue for deleting
tag set.

This patch fixes the WARNING of "percpu_ref_kill() called more than once!"
which is triggered during unloading block driver.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-15 12:38:20 -06:00