1
0
Fork 0
alistair23-linux/net
Chuck Lever 2652314c83 xprtrdma: Fix oops in Receive handler after device removal
commit 671c450b6f upstream.

Since v5.4, a device removal occasionally triggered this oops:

Dec  2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
Dec  2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
Dec  2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
Dec  2 17:13:53 manet kernel: PGD 0 P4D 0
Dec  2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
Dec  2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G        W         5.4.0-00050-g53717e43af61 #883
Dec  2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Dec  2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Dec  2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
Dec  2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
Dec  2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
Dec  2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
Dec  2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
Dec  2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
Dec  2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
Dec  2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
Dec  2 17:13:53 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
Dec  2 17:13:53 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
Dec  2 17:13:53 manet kernel: Call Trace:
Dec  2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
Dec  2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
Dec  2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec  2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec  2 17:13:53 manet kernel: kthread+0xf4/0xf9
Dec  2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
Dec  2 17:13:53 manet kernel: ret_from_fork+0x24/0x30

The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
is still pointing to the old ib_device, which has been freed. The
only way that is possible is if this rpcrdma_rep was not destroyed
by rpcrdma_ia_remove.

Debugging showed that was indeed the case: this rpcrdma_rep was
still in use by a completing RPC at the time of the device removal,
and thus wasn't on the rep free list. So, it was not found by
rpcrdma_reps_destroy().

The fix is to introduce a list of all rpcrdma_reps so that they all
can be found when a device is removed. That list is used to perform
only regbuf DMA unmapping, replacing that call to
rpcrdma_reps_destroy().

Meanwhile, to prevent corruption of this list, I've moved the
destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
protecting the rb_all_reps list.

Fixes: b0b227f071 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17 19:48:44 +01:00
..
6lowpan 6lowpan: no need to check return value of debugfs_create functions 2019-07-06 12:50:01 +02:00
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
802 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
8021q vlan: vlan_changelink() should propagate errors 2020-01-12 12:21:50 +01:00
appletalk appletalk: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
atm net: atm: Reduce the severity of logging in unlink_clip_vcc 2019-11-18 17:08:20 -08:00
ax25 ax25: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
batman-adv Here are two batman-adv bugfixes: 2019-10-28 16:39:07 -07:00
bluetooth Bluetooth: Fix memory leak in hci_connect_le_scan 2020-01-09 10:20:04 +01:00
bpf bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN 2019-07-25 18:00:41 -07:00
bpfilter Kbuild updates for v5.3 2019-07-12 16:03:16 -07:00
bridge net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2020-01-04 19:18:58 +01:00
caif net: use skb_queue_empty_lockless() in poll() handlers 2019-10-28 13:33:41 -07:00
can can: j1939: j1939_sk_bind(): take priv after lock is held 2019-12-31 16:45:56 +01:00
ceph libceph: use ceph_kvmalloc() for osdmap arrays 2019-09-16 12:06:25 +02:00
core bpf: skmsg, fix potential psock NULL pointer dereference 2020-01-17 19:48:40 +01:00
dcb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 2019-05-30 11:29:52 -07:00
dccp net: ipv6: add net argument to ip6_dst_lookup_flow 2019-12-18 16:08:40 +01:00
decnet net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2020-01-04 19:18:58 +01:00
dns_resolver Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
dsa net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid 2019-11-16 12:23:53 -08:00
ethernet net: add annotations on hh->hh_len lockless accesses 2020-01-09 10:20:06 +01:00
hsr hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() 2020-01-17 19:48:32 +01:00
ieee802154 net: core: add generic lockdep keys 2019-10-24 14:53:48 -07:00
ife net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
ipv4 netfilter: arp_tables: init netns pointer in xt_tgchk_param struct 2020-01-14 20:08:39 +01:00
ipv6 ipv6/addrconf: only check invalid header values when NETLINK_F_STRICT_CHK is set 2020-01-04 19:19:17 +01:00
iucv net/af_iucv: mark expected switch fall-throughs 2019-07-29 10:26:14 -07:00
kcm kcm: disable preemption in kcm_parse_func_strparser() 2019-09-27 10:27:14 +02:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-07-08 19:48:57 -07:00
l2tp net: ipv6: add net argument to ip6_dst_lookup_flow 2019-12-18 16:08:40 +01:00
l3mdev ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF 2019-06-23 13:24:17 -07:00
lapb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-17 20:20:36 -07:00
llc llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) 2020-01-12 12:21:45 +01:00
mac80211 mac80211: fix TID field in monitor mode transmit 2020-01-12 12:21:28 +01:00
mac802154 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
mpls net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-18 16:08:42 +01:00
ncsi net/ncsi: Disable global multicast filter 2019-09-19 18:04:40 -07:00
netfilter netfilter: nft_meta: use 64-bit time arithmetic 2020-01-17 19:48:33 +01:00
netlabel netlabel: remove redundant assignment to pointer iter 2019-09-01 11:45:02 -07:00
netlink net: remove empty netlink_tap_exit_net 2019-06-14 19:50:33 -07:00
netrom net: core: add generic lockdep keys 2019-10-24 14:53:48 -07:00
nfc net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive() 2019-12-31 16:41:23 +01:00
nsh treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
openvswitch net: Fixed updating of ethertype in skb_mpls_push() 2019-12-18 16:08:56 +01:00
packet af_packet: set defaule value for tmo 2019-12-31 16:41:12 +01:00
phonet net: use skb_queue_empty_lockless() in poll() handlers 2019-10-28 13:33:41 -07:00
psample net: psample: fix skb_over_panic 2019-12-04 22:30:54 +01:00
qrtr net: qrtr: Stop rx_worker before freeing node 2019-09-21 18:45:46 -07:00
rds rds: ib: update WR sizes when bringing up connection 2019-11-16 12:59:08 -08:00
rfkill rfkill: Fix incorrect check to avoid NULL pointer dereference 2020-01-12 12:21:33 +01:00
rose net: core: add generic lockdep keys 2019-10-24 14:53:48 -07:00
rxrpc rxrpc: Fix handling of last subpacket of jumbo packet 2019-10-31 12:23:09 -07:00
sched net: sch_prio: When ungrafting, replace with FIFO 2020-01-12 12:21:49 +01:00
sctp sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY 2020-01-12 12:21:48 +01:00
smc net/smc: add fallback check to connect() 2020-01-04 19:18:37 +01:00
strparser Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sunrpc xprtrdma: Fix oops in Receive handler after device removal 2020-01-17 19:48:44 +01:00
switchdev treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tipc net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-18 16:08:42 +01:00
tls net/tls: Fix return values to avoid ENOTSUPP 2019-12-18 16:08:31 +01:00
unix net: use skb_queue_empty_lockless() in poll() handlers 2019-10-28 13:33:41 -07:00
vmw_vsock vsock/virtio: fix sock refcnt holding during the shutdown 2019-11-08 12:17:50 -08:00
wimax wimax: no need to check return value of debugfs_create functions 2019-08-10 15:25:47 -07:00
wireless cfg80211: fix double-free after changing network namespace 2020-01-12 12:21:28 +01:00
x25 net: silence KCSAN warnings around sk_add_backlog() calls 2019-10-09 21:42:59 -07:00
xdp xsk: Add rcu_read_lock around the XSK wakeup 2020-01-12 12:21:41 +01:00
xfrm xfrm: release device reference for invalid state 2019-11-12 08:24:38 +01:00
Kconfig devlink: Add packet trap infrastructure 2019-08-17 12:40:08 -07:00
Makefile
compat.c uio: make import_iovec()/compat_import_iovec() return bytes on success 2019-05-31 15:30:03 -06:00
socket.c net: make socket read/write_iter() honor IOCB_NOWAIT 2020-01-09 10:19:47 +01:00
sysctl_net.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00