1
0
Fork 0
Fork of reMarkable kernel https://github.com/reMarkable/linux
 
 
 
 
Go to file
Michael J. Ruhl 6edd85a787 IB/hfi1: Fix destroy_qp hang after a link down
commit b4a4957d3d upstream.

rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware.  If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc/<app PID>/stack
 quiesce_qp+0x178/0x250 [hfi1]
 rvt_reset_qp+0x23d/0x400 [rdmavt]
 rvt_destroy_qp+0x69/0x210 [rdmavt]
 ib_destroy_qp+0xba/0x1c0 [ib_core]
 nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
 nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
 nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
 nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
 process_one_work+0x17a/0x440
 worker_thread+0x126/0x3c0
 kthread+0xcf/0xe0
 ret_from_fork+0x58/0x90
 0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary.  During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same.  This is not correct.  The freeze path
waits until the HFI is unfrozen and then restarts PIO.  A link down
is not a freeze event.  The link down path cannot restart the PIO
until link is restored.  If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20 09:48:54 +02:00
Documentation ARM: dts: at91: add new compatibility string for macb on sama5d3 2018-10-18 09:16:22 +02:00
arch ARC: build: Don't set CROSS_COMPILE in arch's Makefile 2018-10-20 09:48:53 +02:00
block blk-mq: I/O and timer unplugs are inverted in blktrace 2018-10-13 09:27:22 +02:00
certs Replace magic for trusting the secondary keyring with #define 2018-09-09 19:55:54 +02:00
crypto crypto: skcipher - Fix -Wstringop-truncation warnings 2018-10-03 17:00:45 -07:00
drivers IB/hfi1: Fix destroy_qp hang after a link down 2018-10-20 09:48:54 +02:00
firmware License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fs Revert "vfs: fix freeze protection in mnt_want_write_file() for overlayfs" 2018-10-20 09:48:52 +02:00
include mremap: properly flush TLB before releasing the page 2018-10-20 09:48:53 +02:00
init init: rename and re-order boot_cpu_state_init() 2018-08-15 18:12:48 +02:00
ipc ipc/sem.c: prevent queue.status tearing in semop 2018-09-05 09:26:30 +02:00
kernel mm: disallow mappings that conflict for devm_memremap_pages() 2018-10-20 09:48:53 +02:00
lib scsi: klist: Make it safe to use klists in atomic context 2018-10-03 17:00:48 -07:00
mm mremap: properly flush TLB before releasing the page 2018-10-20 09:48:53 +02:00
net batman-adv: fix hardif_neigh refcount on queue_work() failure 2018-10-20 09:48:49 +02:00
samples samples/bpf: Check the error of write() and read() 2018-08-24 13:09:12 +02:00
scripts kbuild: add .DELETE_ON_ERROR special target 2018-09-26 08:37:59 +02:00
security Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name" 2018-09-29 03:06:04 -07:00
sound sound: don't call skl_init_chip() to reset intel skl soc 2018-10-18 09:16:22 +02:00
tools perf tools: Fix snprint warnings for gcc 8 2018-10-18 09:16:28 +02:00
usr initramfs: fix initramfs rebuilds w/ compression after disabling 2017-11-03 07:39:19 -07:00
virt KVM: arm/arm64: Fix vgic init race 2018-09-26 08:38:04 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: rpm-pkg: keep spec file until make mrproper 2018-02-13 10:19:46 +01:00
.mailmap .mailmap: Add Maciej W. Rozycki's Imagination e-mail address 2017-11-10 12:16:15 -08:00
COPYING [PATCH] update FSF address in COPYING 2005-09-10 10:06:29 -07:00
CREDITS MAINTAINERS: update TPM driver infrastructure changes 2017-11-09 17:58:40 -08:00
Kbuild License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
MAINTAINERS dt-bindings: Document mti,mips-cpc binding 2018-03-15 10:54:35 +01:00
Makefile Linux 4.14.77 2018-10-18 09:16:28 +02:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

README

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.