1
0
Fork 0
alistair23-linux/fs
Jens Axboe 08bdcc35f0 io-wq: clear node->next on list deletion
If someone removes a node from a list, and then later adds it back to
a list, we can have invalid data in ->next. This can cause all sorts
of issues. One such use case is the IORING_OP_POLL_ADD command, which
will do just that if we race and get woken twice without any pending
events. This is a pretty rare case, but can happen under extreme loads.
Dan reports that he saw the following crash:

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD d283ce067 P4D d283ce067 PUD e5ca04067 PMD 0
Oops: 0002 [#1] SMP
CPU: 17 PID: 10726 Comm: tao:fast-fiber Kdump: loaded Not tainted 5.2.9-02851-gac7bc042d2d1 #116
Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
RIP: 0010:io_wqe_enqueue+0x3e/0xd0
Code: 34 24 74 55 8b 47 58 48 8d 6f 50 85 c0 74 50 48 89 df e8 35 7c 75 00 48 83 7b 08 00 48 8b 14 24 0f 84 84 00 00 00 48 8b 4b 10 <48> 89 11 48 89 53 10 83 63 20 fe 48 89 c6 48 89 df e8 0c 7a 75 00
RSP: 0000:ffffc90006858a08 EFLAGS: 00010082
RAX: 0000000000000002 RBX: ffff889037492fc0 RCX: 0000000000000000
RDX: ffff888e40cc11a8 RSI: ffff888e40cc11a8 RDI: ffff889037492fc0
RBP: ffff889037493010 R08: 00000000000000c3 R09: ffffc90006858ab8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888e40cc11a8
R13: 0000000000000000 R14: 00000000000000c3 R15: ffff888e40cc1100
FS:  00007fcddc9db700(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000e479f5003 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 io_poll_wake+0x12f/0x2a0
 __wake_up_common+0x86/0x120
 __wake_up_common_lock+0x7a/0xc0
 sock_def_readable+0x3c/0x70
 tcp_rcv_established+0x557/0x630
 tcp_v6_do_rcv+0x118/0x3c0
 tcp_v6_rcv+0x97e/0x9d0
 ip6_protocol_deliver_rcu+0xe3/0x440
 ip6_input+0x3d/0xc0
 ? ip6_protocol_deliver_rcu+0x440/0x440
 ipv6_rcv+0x56/0xd0
 ? ip6_rcv_finish_core.isra.18+0x80/0x80
 __netif_receive_skb_one_core+0x50/0x70
 netif_receive_skb_internal+0x2f/0xa0
 napi_gro_receive+0x125/0x150
 mlx5e_handle_rx_cqe+0x1d9/0x5a0
 ? mlx5e_poll_tx_cq+0x305/0x560
 mlx5e_poll_rx_cq+0x49f/0x9c5
 mlx5e_napi_poll+0xee/0x640
 ? smp_reschedule_interrupt+0x16/0xd0
 ? reschedule_interrupt+0xf/0x20
 net_rx_action+0x286/0x3d0
 __do_softirq+0xca/0x297
 irq_exit+0x96/0xa0
 do_IRQ+0x54/0xe0
 common_interrupt+0xf/0xf
 </IRQ>
RIP: 0033:0x7fdc627a2e3a
Code: 31 c0 85 d2 0f 88 f6 00 00 00 55 48 89 e5 41 57 41 56 4c 63 f2 41 55 41 54 53 48 83 ec 18 48 85 ff 0f 84 c7 00 00 00 48 8b 07 <41> 89 d4 49 89 f5 48 89 fb 48 85 c0 0f 84 64 01 00 00 48 83 78 10

when running a networked workload with about 5000 sockets being polled
for. Fix this by clearing node->next when the node is being removed from
the list.

Fixes: 6206f0e180 ("io-wq: shrink io_wq_work a bit")
Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04 17:26:57 -07:00
..
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
adfs Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 11:33:22 -07:00
affs affs: fix a memory leak in affs_remount 2019-11-18 14:26:43 +01:00
afs AFS development 2019-11-30 10:57:22 -08:00
autofs autofs: fix a leak in autofs_expire_indirect() 2019-10-25 00:03:11 -04:00
befs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
bfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
btrfs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
cachefiles treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
ceph compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
cifs CIFS: fix a white space issue in cifs_get_inode_info() 2019-11-27 11:31:49 -06:00
coda y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
configfs configfs: calculate the depth of parent item 2019-11-06 18:36:01 +01:00
cramfs cramfs: fix usage on non-MTD device 2019-11-23 21:44:49 -05:00
crypto fscrypt: add support for IV_INO_LBLK_64 policies 2019-11-06 12:34:36 -08:00
debugfs debugfs: remove return value of debugfs_create_atomic_t() 2019-11-03 14:03:01 +01:00
devpts devpts_pty_kill(): don't bother with d_delete() 2019-09-03 09:30:56 -04:00
dlm dlm for 5.3 2019-07-12 17:37:53 -07:00
ecryptfs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
efivarfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
efs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
erofs erofs: remove unnecessary output in erofs_show_options() 2019-11-24 11:02:41 +08:00
exportfs exportfs_decode_fh(): negative pinned may become positive without the parent locked 2019-11-10 11:56:05 -05:00
ext2 \n 2019-11-30 11:16:07 -08:00
ext4 compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
f2fs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
fat compat_ioctl: move drivers to compat_ptr_ioctl 2019-10-23 17:23:43 +02:00
freevxfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
fscache Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
fuse compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
gfs2 compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
hfs treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
hfsplus fs/hfsplus/xattr.c: replace strncpy with memcpy 2019-07-16 19:23:23 -07:00
hostfs This pull request contains the following changes for UML: 2019-05-12 17:52:13 -04:00
hpfs fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
hugetlbfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
iomap iomap: Fix pipe page leakage during splicing 2019-11-22 08:36:02 -08:00
isofs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
jbd2 This merge window saw the the following new featuers added to ext4: 2019-11-30 10:53:02 -08:00
jffs2 Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-26 11:33:30 -07:00
jfs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
kernfs Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 16:02:40 -08:00
lockd lockd: Make two symbols static 2019-07-03 17:52:09 -04:00
minix fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
nfs NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid() 2019-11-01 11:03:56 -04:00
nfs_common treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
nfsd Highlights: 2019-09-27 17:00:27 -07:00
nilfs2 fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
nls treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
notify compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
ntfs ntfs: remove (un)?likely() from IS_ERR() conditions 2019-09-26 10:10:44 -07:00
ocfs2 compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
omfs fs: omfs: Initialize filesystem timestamp ranges 2019-08-30 08:11:25 -07:00
openpromfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
orangefs Orangefs: a fix and a cleanup 2019-09-19 10:21:35 -07:00
overlayfs ovl: filter of trusted xattr results in audit 2019-09-11 16:11:45 +02:00
proc procfs: Use all-in-one vtime aware kcpustat accessor 2019-11-21 07:33:24 +01:00
pstore pstore: fs superblock limits 2019-08-30 08:11:25 -07:00
qnx4 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
qnx6 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
quota fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long 2019-11-11 11:06:27 +01:00
ramfs vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API 2019-09-12 21:05:34 -04:00
reiserfs reiserfs: replace open-coded atomic_dec_and_mutex_lock() 2019-11-05 12:25:22 +01:00
romfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
squashfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
sysfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
sysv fs: sysv: Initialize filesystem timestamp ranges 2019-08-30 07:27:18 -07:00
tracefs tracing: Do not create tracefs files if tracefs lockdown is in effect 2019-10-12 20:49:07 -04:00
ubifs This pull request contains the following changes for UBI, UBIFS and JFFS2: 2019-09-21 11:10:16 -07:00
udf fs-udf: Delete an unnecessary check before brelse() 2019-09-04 18:19:43 +02:00
ufs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
unicode unicode: make array 'token' static const, makes object smaller 2019-09-17 11:48:24 -04:00
verity fs-verity: support builtin file signatures 2019-08-12 19:33:50 -07:00
xfs New code for 5.5: 2019-11-30 10:44:49 -08:00
Kconfig io-wq: small threadpool implementation for io_uring 2019-10-29 12:43:00 -06:00
Kconfig.binfmt binfmt_flat: make support for old format binaries optional 2019-06-24 09:16:47 +10:00
Makefile io-wq: small threadpool implementation for io_uring 2019-10-29 12:43:00 -06:00
aio.c y2038: syscall implementation cleanups 2019-12-01 14:00:59 -08:00
anon_inodes.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
attr.c timestamp_truncate: Replace users of timespec64_trunc 2019-08-30 07:27:17 -07:00
bad_inode.c
…
binfmt_aout.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
binfmt_elf.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
binfmt_elf_fdpic.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
binfmt_em86.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
binfmt_flat.c fs/binfmt_flat.c: remove set but not used variable 'inode' 2019-07-16 19:23:22 -07:00
binfmt_misc.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
binfmt_script.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
block_dev.c block: don't send uevent for empty disk when not invalidating 2019-12-02 18:49:30 -07:00
buffer.c fs/buffer.c: support fscrypt in block_read_full_page() 2019-11-14 16:40:45 -05:00
char_dev.c chardev: set variable ret to -EBUSY before checking minor range overlap 2019-05-24 20:50:36 +02:00
compat.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
compat_binfmt_elf.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
compat_ioctl.c compat_ioctl: move SG_GET_REQUEST_TABLE handling 2019-10-23 17:23:47 +02:00
coredump.c coredump: split pipe command whitespace before expanding template 2019-08-03 07:02:01 -07:00
d_path.c [PATCH] fix d_absolute_path() interplay with fsmount() 2019-08-30 19:31:09 -04:00
dax.c New code for 5.5: 2019-11-30 10:44:49 -08:00
dcache.c locking/lockdep: Remove unused @nested argument from lock_release() 2019-10-09 12:46:10 +02:00
dcookies.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
direct-io.c fs/direct-io.c: fix kernel-doc warning 2019-10-14 15:04:01 -07:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-02-01 15:46:24 -08:00
eventfd.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
eventpoll.c PM / wakeup: Show wakeup sources stats in sysfs 2019-08-21 00:20:40 +02:00
exec.c Pipework for general notification queue 2019-11-30 14:12:13 -08:00
fcntl.c fcntl: fix typo in RWH_WRITE_LIFE_NOT_SET r/w hint name 2019-10-25 14:28:10 -06:00
fhandle.c fs/handle.c - fix up kerneldoc 2019-08-07 21:51:47 -04:00
file.c Revert "vfs: properly and reliably lock f_pos in fdget_pos()" 2019-11-26 11:34:06 -08:00
file_table.c vfs: Export flush_delayed_fput for use by knfsd. 2019-08-19 11:00:39 -04:00
filesystems.c vfs: Implement a filesystem superblock creation/configuration context 2019-02-28 03:29:26 -05:00
fs-writeback.c cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead 2019-11-08 13:37:24 -07:00
fs_context.c vfs: subtype handling moved to fuse 2019-09-06 21:28:49 +02:00
fs_parser.c vfs: Make fs_parse() handle fs_param_is_fd-type params better 2019-09-12 21:06:14 -04:00
fs_pin.c switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
fs_struct.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
fs_types.c fs: common implementation of file type 2019-01-21 17:48:13 +01:00
fsopen.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
inode.c mm,thp: avoid writes to file with THP in pagecache 2019-09-24 15:54:11 -07:00
internal.h Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-20 09:15:51 -07:00
io-wq.c io_uring: use current task creds instead of allocating a new one 2019-12-02 08:50:00 -07:00
io-wq.h io-wq: clear node->next on list deletion 2019-12-04 17:26:57 -07:00
io_uring.c io_uring: ensure deferred timeouts copy necessary data 2019-12-04 11:12:08 -07:00
ioctl.c compat: move FS_IOC_RESVSP_32 handling to fs/ioctl.c 2019-10-23 17:15:57 +02:00
libfs.c fs/libfs.c: fix kernel-doc warning 2019-10-14 15:04:01 -07:00
locks.c Highlights: 2019-09-27 17:00:27 -07:00
mbcache.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
mount.h switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
mpage.c blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() 2019-07-10 09:00:57 -06:00
namei.c audit: Report suspicious O_CREAT usage 2019-10-03 13:59:29 -04:00
namespace.c fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry() 2019-10-16 23:15:09 -04:00
no-block.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
nsfs.c vfs: Convert nsfs to use the new mount API 2019-05-25 18:00:06 -04:00
open.c Revert "vfs: properly and reliably lock f_pos in fdget_pos()" 2019-11-26 11:34:06 -08:00
pipe.c Pipework for general notification queue 2019-11-30 14:12:13 -08:00
pnode.c fs/namespace: fix unprivileged mount propagation 2019-06-17 17:36:09 -04:00
pnode.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209 2019-05-30 11:29:53 -07:00
posix_acl.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
proc_namespace.c vfs: subtype handling moved to fuse 2019-09-06 21:28:49 +02:00
read_write.c vfs: fix page locking deadlocks when deduping files 2019-08-16 18:43:24 -07:00
readdir.c filldir[64]: remove WARN_ON_ONCE() for bad directory entries 2019-10-18 18:41:16 -04:00
select.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
seq_file.c seq_file: fix problem when seeking mid-record 2019-08-13 16:06:52 -07:00
signalfd.c fs: mark expected switch fall-throughs 2019-04-08 18:21:02 -05:00
splice.c Pipework for general notification queue 2019-11-30 14:12:13 -08:00
stack.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
stat.c fs: move generic stat response attr handling to vfs_getattr_nosec 2019-02-01 01:55:45 -05:00
statfs.c vfs: Fix EOVERFLOW testing in put_compat_statfs64 2019-10-03 14:21:35 -07:00
super.c Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-10-10 08:16:44 -07:00
sync.c fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback 2019-05-14 09:47:50 -07:00
timerfd.c y2038: timerfd: Use timespec64 internally 2019-11-15 14:38:30 +01:00
userfaultfd.c compat_ioctl: move more drivers to compat_ptr_ioctl 2019-10-23 17:23:44 +02:00
utimes.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
xattr.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00