1
0
Fork 0
alistair23-linux/include
Ming Lei bf0beec060 blk-mq: drain I/O when all CPUs in a hctx are offline
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:

"That was the constraint of managed interrupts from the very beginning:

 The driver/subsystem has to quiesce the interrupt line and the associated
 queue _before_ it gets shutdown in CPU unplug and not fiddle with it
 until it's restarted by the core when the CPU is plugged in again."

However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown.  Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.

Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests.  This guarantees that there is no inflight I/O before shutting
down the managed IRQ.

Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.

[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

[hch: different retry mechanism, merged two patches, minor cleanups]

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-29 10:23:25 -06:00
..
acpi Additional ACPI updates for 5.7-rc1 2020-04-06 10:35:06 -07:00
asm-generic hyperv-fixes for 5.7-rc1 2020-04-14 11:58:04 -07:00
clocksource pwm: omap-dmtimer: Drop unused header file 2020-03-30 18:03:06 +02:00
crypto crypto: curve25519 - do not pollute dispatcher based on assembler 2020-04-09 00:01:59 +09:00
drm drm/bridge: analogix_dp: Split bind() into probe() and real bind() 2020-04-09 10:29:35 +02:00
dt-bindings RISC-V Patches for the 5.7 Merge Window, Part 1 2020-04-09 10:51:30 -07:00
keys KEYS: Don't write out to userspace while holding key semaphore 2020-03-29 12:40:41 +01:00
kunit kunit: subtests should be indented 4 spaces according to TAP 2020-03-26 14:08:41 -06:00
kvm KVM: arm64: GICv4.1: Allow SGIs to switch between HW and SW interrupts 2020-03-24 12:15:51 +00:00
linux blk-mq: drain I/O when all CPUs in a hctx are offline 2020-05-29 10:23:25 -06:00
math-emu
media media: cec-notifier: make cec_notifier_get_conn() static 2020-03-20 09:02:45 +01:00
misc
net cfg80211: fix kernel-doc notation 2020-04-14 12:40:02 +02:00
pcmcia
ras
rdma IB/mlx5: Expose UAR object and its alloc/destroy commands 2020-03-27 12:59:04 -03:00
scsi block: move dma_pad handling from blk_rq_map_sg into the callers 2020-04-22 10:47:39 -06:00
soc net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge 2020-04-15 12:27:35 -07:00
sound ALSA: hda: Skip controller resume if not needed 2020-04-13 18:03:16 +02:00
target scsi: target: fix hang when multiple threads try to destroy the same iscsi session 2020-03-26 21:47:47 -04:00
trace bdi: use bdi_dev_name() to get device name 2020-05-09 16:07:39 -06:00
uapi flexible-array member convertion patches for 5.7-rc2 2020-04-19 10:34:30 -07:00
vdso vdso: Fix clocksource.h macro detection 2020-03-23 18:51:08 +01:00
video
xen xen: Use evtchn_type_t as a type for event channels 2020-04-07 12:12:54 +02:00