1
0
Fork 0
remarkable-linux/drivers
Michael J. Ruhl 6edd85a787 IB/hfi1: Fix destroy_qp hang after a link down
commit b4a4957d3d upstream.

rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware.  If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc/<app PID>/stack
 quiesce_qp+0x178/0x250 [hfi1]
 rvt_reset_qp+0x23d/0x400 [rdmavt]
 rvt_destroy_qp+0x69/0x210 [rdmavt]
 ib_destroy_qp+0xba/0x1c0 [ib_core]
 nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
 nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
 nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
 nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
 process_one_work+0x17a/0x440
 worker_thread+0x126/0x3c0
 kthread+0xcf/0xe0
 ret_from_fork+0x58/0x90
 0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary.  During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same.  This is not correct.  The freeze path
waits until the HFI is unfrozen and then restarts PIO.  A link down
is not a freeze event.  The link down path cannot restart the PIO
until link is restored.  If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20 09:48:54 +02:00
..
accessibility License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
acpi ACPI / scan: Initialize status to ACPI_STA_DEFAULT 2018-09-15 09:45:30 +02:00
amba ARM: amba: Don't read past the end of sysfs "driver_override" buffer 2018-05-01 12:58:21 -07:00
android android: binder: fix the race mmap and alloc_new_buf_locked 2018-09-19 22:43:35 +02:00
ata ata: ftide010: Add a quirk for SQ201 2018-10-03 17:00:59 -07:00
atm atm: zatm: Fix potential Spectre v1 2018-07-22 14:28:43 +02:00
auxdisplay auxdisplay: fix broken menu 2018-07-03 11:24:56 +02:00
base PM / core: Clear the direct_complete flag on errors 2018-10-13 09:27:25 +02:00
bcma License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
block floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl 2018-10-03 17:00:54 -07:00
bluetooth Bluetooth: hci_ldisc: Free rw_semaphore on close 2018-10-18 09:16:21 +02:00
bus drivers/perf: arm-ccn: don't log to dmesg in event_init 2018-08-03 07:50:31 +02:00
cdrom cdrom: Fix info leak/OOB read in cdrom_ioctl_drive_status 2018-09-05 09:26:42 +02:00
char ipmi: Fix I2C client removal in the SSIF driver 2018-09-26 08:38:06 +02:00
clk clk: x86: Stop marking clocks as CLK_IS_CRITICAL 2018-10-18 09:16:22 +02:00
clocksource clocksource/drivers/fttmr010: Fix set_next_event handler 2018-10-20 09:48:52 +02:00
connector connector: make cn_proc explicitly non-modular 2016-07-05 11:40:47 -07:00
cpufreq cpufreq: governor: Avoid accessing invalid governor_data 2018-09-09 19:55:58 +02:00
cpuidle cpuidle: powernv: Fix promotion from snooze if next state disabled 2018-07-03 11:24:51 +02:00
crypto crypto: chelsio - Fix memory corruption in DMA Mapped buffers. 2018-10-13 09:27:28 +02:00
dax dev-dax: check_vma: ratelimit dev_info-s 2018-08-24 13:09:08 +02:00
dca dmaengine: ioatdma: constify dca_ops structures 2015-11-16 09:27:32 +05:30
devfreq PM / devfreq: Fix potential NULL pointer dereference in governor_store 2018-04-12 12:32:13 +02:00
dio License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dma dmaengine: mv_xor_v2: kill the tasklets upon exit 2018-09-26 08:38:05 +02:00
dma-buf dma-buf: remove redundant initialization of sg_table 2018-06-05 11:41:57 +02:00
edac EDAC: Fix memleak in module init error path 2018-10-03 17:00:53 -07:00
eisa License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extcon extcon: Release locking when sending the notification of connector state 2018-09-09 19:55:56 +02:00
firewire firewire-ohci: work around oversized DMA reads on JMicron controllers 2018-04-26 11:02:03 +02:00
firmware efi/esrt: Only call efi_mem_reserve() for boot services memory 2018-09-26 08:38:10 +02:00
fmc License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fpga fpga-manager: altera-ps-spi: preserve nCONFIG state 2018-05-01 12:58:24 -07:00
fsi drivers/fsi/scom: Remove reset before every putscom 2017-08-28 17:15:16 +02:00
gpio gpiolib: Free the last requested descriptor 2018-10-10 08:54:28 +02:00
gpu drm/i915/glk: Add Quirk for GLK NUC HDMI port issues. 2018-10-20 09:48:53 +02:00
hid HID: quirks: fix support for Apple Magic Keyboards 2018-10-20 09:48:53 +02:00
hsi License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hv Drivers: hv: vmbus: Use get/put_cpu() in vmbus_connect() 2018-10-10 08:54:28 +02:00
hwmon hwmon: (adt7475) Make adt7475_read_word() return errors 2018-10-03 17:00:58 -07:00
hwspinlock License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hwtracing intel_th: pci: Add Ice Lake PCH support 2018-10-20 09:48:50 +02:00
i2c i2c: rcar: handle RXDMA HW behaviour on Gen3 2018-10-20 09:48:54 +02:00
ide cdrom: do not call check_disk_change() inside cdrom_open() 2018-05-30 07:52:34 +02:00
idle intel_idle: Graceful probe failure when MWAIT is disabled 2018-08-09 12:16:39 +02:00
iio Revert "iio: temperature: maxim_thermocouple: add MAX31856 part" 2018-10-10 08:54:24 +02:00
infiniband IB/hfi1: Fix destroy_qp hang after a link down 2018-10-20 09:48:54 +02:00
input Input: atakbd - fix Atari CapsLock behaviour 2018-10-20 09:48:50 +02:00
iommu iommu/amd: Return devid as alias for ACPI HID devices 2018-10-20 09:48:52 +02:00
ipack ipack: Improve a size determination in ipack_bus_register() 2017-05-18 16:59:06 +02:00
irqchip irqchip/bcm7038-l1: Hide cpu offline callback when building for !SMP 2018-09-15 09:45:29 +02:00
isdn isdn: Disable IIOCDBGVAR 2018-08-22 07:46:11 +02:00
leds leds: pm8058: Silence pointer to integer size warning 2018-03-19 08:42:50 +01:00
lightnvm lightnvm: pblk: free padded entries in write buffer 2018-09-15 09:45:35 +02:00
macintosh macintosh/via-pmu: Add missing mmio accessors 2018-09-19 22:43:41 +02:00
mailbox mailbox: xgene-slimpro: Fix potential NULL pointer dereference 2018-09-09 19:55:54 +02:00
mcb License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
md dm linear: fix linear_end_io conditional definition 2018-10-18 09:16:24 +02:00
media media: af9035: prevent buffer overflow on write 2018-10-20 09:48:47 +02:00
memory memory: tegra: Apply interrupts mask per SoC 2018-08-03 07:50:38 +02:00
memstick License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
message scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo() 2018-05-25 16:17:47 +02:00
mfd mfd: omap-usb-host: Fix dts probe of children 2018-10-18 09:16:21 +02:00
misc misc: sram: enable clock before registering regions 2018-10-03 17:00:46 -07:00
mmc mmc: block: avoid multiblock reads for the last sector in SPI mode 2018-10-18 09:16:24 +02:00
mtd mtd: rawnand: atmel: add module param to avoid using dma 2018-10-03 17:00:50 -07:00
mux mux: core: fix double get_device() 2018-01-17 09:45:27 +01:00
net net/mlx4: Use cpumask_available for eq->affinity_mask 2018-10-20 09:48:52 +02:00
nfc NFC: pn533: Fix wrong GFP flag usage 2018-08-24 13:09:06 +02:00
ntb ntb_transport: Fix bug with max_mw_size parameter 2018-04-26 11:02:13 +02:00
nubus License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nvdimm libnvdimm: fix ars_status output length calculation 2018-09-09 19:56:01 +02:00
nvme nvme_fc: fix ctrl create failures racing with workq items 2018-10-13 09:27:28 +02:00
nvmem nvmem: Don't let a NULL cell_id for nvmem_cell_get() crash us 2018-08-24 13:09:14 +02:00
of of: unittest: Disable interrupt node tests for old world MAC systems 2018-10-13 09:27:27 +02:00
oprofile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
parisc parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode 2018-05-30 07:52:28 +02:00
parport parport: sunbpp: fix error return code 2018-09-26 08:38:12 +02:00
pci PCI: dwc: Fix scheduling while atomic issues 2018-10-20 09:48:51 +02:00
pcmcia PCMCIA / PM: Avoid noirq suspend aborts during suspend-to-idle 2018-05-30 07:52:39 +02:00
perf arm64: perf: Reject stand-alone CHAIN events for PMUv3 2018-10-18 09:16:24 +02:00
phy phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals 2018-08-15 18:12:48 +02:00
pinctrl pinctrl: mcp23s08: fix irq and irqchip setup order 2018-10-18 09:16:24 +02:00
platform platform/x86: alienware-wmi: Correct a memory leak 2018-09-29 03:06:03 -07:00
pnp License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
power power: remove possible deadlock when unregistering power_supply 2018-10-03 17:00:47 -07:00
powercap powercap/RAPL: prevent overridding bits outside of the mask 2017-06-28 00:38:34 +02:00
pps drivers/pps: use surrounding "if PPS" to remove numerous dependency checks 2017-09-08 18:26:51 -07:00
ps3 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
ptp ptp: fix missing break in switch 2018-07-25 11:25:10 +02:00
pwm pwm: meson: Fix mux clock names 2018-09-15 09:45:27 +02:00
rapidio drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()' 2017-12-14 09:53:08 +01:00
ras License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
regulator regulator: fix crash caused by null driver data 2018-10-03 17:00:55 -07:00
remoteproc remoteproc: qcom: Fix potential device node leaks 2018-06-21 04:02:48 +09:00
reset reset: imx7: Fix always writing bits as 0 2018-09-26 08:38:03 +02:00
rpmsg Revert "rpmsg: core: add support to power domains for devices" 2018-09-29 03:06:04 -07:00
rtc rtc: bq4802: add error handling for devm_ioremap 2018-09-26 08:38:13 +02:00
s390 s390/cio: Fix how vfio-ccw checks pinned pages 2018-10-18 09:16:23 +02:00
sbus License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
scsi scsi: sd: don't crash the host on invalid commands 2018-10-20 09:48:51 +02:00
sfi x86/boot: Fix memremap() related build failure 2017-07-20 11:37:58 +02:00
sh License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sn Drivers: misc: remove __dev* attributes. 2013-01-03 15:57:16 -08:00
soc soc: imx: gpc: restrict register range for regmap access 2018-08-24 13:09:19 +02:00
spi spi: rspi: Fix interrupted DMA transfers 2018-10-03 17:00:55 -07:00
spmi spmi: pmic-arb: Move the ownership check to irq_chip callback 2017-08-28 13:52:22 +02:00
ssb License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
staging staging: ccree: check DMA pool buf !NULL before free 2018-10-20 09:48:53 +02:00
target scsi: iscsi: target: Don't use stack buffer for scatterlist 2018-10-18 09:16:21 +02:00
tc TC: Error handling clean-ups 2014-11-24 07:45:25 +01:00
tee tee: check shm references are consistent in offset/size 2018-06-21 04:02:54 +09:00
thermal thermal: of-thermal: disable passive polling when thermal zone is disabled 2018-10-03 17:00:57 -07:00
thunderbolt thunderbolt: Prevent crash when ICM firmware is not running 2018-04-24 09:36:29 +02:00
tty tty: Drop tty->count on tty_reopen() failure 2018-10-13 09:27:26 +02:00
uio uio: potential double frees if __uio_register_device() fails 2018-09-19 22:43:40 +02:00
usb xhci: Don't print a warning when setting link state for disabled ports 2018-10-18 09:16:25 +02:00
uwb uwb: hwa-rc: fix memory leak at probe 2018-10-03 17:00:46 -07:00
vfio vfio/type1: Fix task tracking for QEMU vCPU hotplug 2018-08-03 07:50:23 +02:00
vhost vhost: correctly check the iova range when waking virtqueue 2018-09-15 09:45:25 +02:00
video mach64: detect the dot clock divider correctly on sparc 2018-10-18 09:16:23 +02:00
virt virt: Convert to using %pOF instead of full_name 2017-08-29 08:52:51 -05:00
virtio virtio_balloon: fix increment of vb->num_pfns in fill_balloon() 2018-10-13 09:27:30 +02:00
vlynq drivers/vlynq/vlynq.c: fix another resource size off by 1 error 2014-01-23 16:36:55 -08:00
vme License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
w1 1wire: family module autoload fails because of upper/lower case mismatch. 2018-07-03 11:24:47 +02:00
watchdog watchdog: da9063: Fix updating timeout value 2018-08-03 07:50:24 +02:00
xen xen: fix GCC warning and remove duplicate EVTCHN_ROW/EVTCHN_COL usage 2018-10-10 08:54:26 +02:00
zorro zorro: Set up z->dev.dma_mask for the DMA API 2018-05-30 07:52:30 +02:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Makefile usb: build drivers/usb/common/ when USB_SUPPORT is set 2018-02-25 11:07:53 +01:00