remarkable-linux/block
Gabriel Krisman Bertazi d7045cbf4a blk-mq: Avoid memory reclaim when remapping queues
commit 36e1f3d107 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-18 07:11:49 +02:00
..
partitions block: atari: Return early for unsupported sector size 2016-07-13 09:31:44 -07:00
badblocks.c badblocks: badblocks_set/clear update unacked_exist 2016-10-21 15:45:47 -06:00
bio-integrity.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
bio.c blk: Ensure users for current->bio_list can see the full list. 2017-04-08 09:30:36 +02:00
blk-cgroup.c blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL 2016-09-30 10:31:20 +02:00
blk-core.c blk: Ensure users for current->bio_list can see the full list. 2017-04-08 09:30:36 +02:00
blk-exec.c block: Fix spelling in a source code comment 2016-07-20 21:28:22 -06:00
blk-flush.c block: flush: fix IO hang in case of flood fua req 2016-10-26 07:49:27 -06:00
blk-integrity.c block, libnvdimm, nvme: provide a built-in blk_integrity nop profile 2015-10-21 14:43:45 -06:00
blk-ioc.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
blk-lib.c block: require write_same and discard requests align to logical block size 2016-10-11 15:06:30 -07:00
blk-map.c Don't feed anything but regular iovec's to blk_rq_map_user_iov 2016-12-07 08:23:35 -08:00
blk-merge.c block: make sure a big bio is split into at most 256 bvecs 2016-08-24 08:17:24 -06:00
blk-mq-cpumap.c blk-mq: allow the driver to pass in a queue mapping 2016-09-15 08:42:03 -06:00
blk-mq-pci.c blk_mq: linux/blk-mq.h does not include all the headers it depends on 2016-09-19 08:21:51 -06:00
blk-mq-sysfs.c blk-mq: register device instead of disk 2016-09-21 07:56:16 -06:00
blk-mq-tag.c Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
blk-mq-tag.h Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
blk-mq.c blk-mq: Avoid memory reclaim when remapping queues 2017-04-18 07:11:49 +02:00
blk-mq.h Merge branch 'for-4.9/block-smp' of git://git.kernel.dk/linux-block 2016-10-09 17:32:20 -07:00
blk-settings.c block: kill off q->flush_flags 2016-04-13 13:33:19 -06:00
blk-softirq.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
blk-sysfs.c blk-mq: register device instead of disk 2016-09-21 07:56:16 -06:00
blk-tag.c block: support different tag allocation policy 2015-01-23 14:15:46 -07:00
blk-throttle.c blk-throttle: Extend slice if throttle group is not empty 2016-09-19 15:12:41 -06:00
blk-timeout.c block: remove REQ_NO_TIMEOUT flag 2015-12-22 09:38:34 -07:00
blk.h blk-mq: remove ->map_queue 2016-09-15 08:42:03 -06:00
bounce.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2015-09-19 18:57:09 -07:00
bsg-lib.c bsg: Remove unused function bsg_goose_queue() 2012-12-06 14:33:02 +01:00
bsg.c sg_write()/bsg_write() is not fit to be called under KERNEL_DS 2017-01-09 08:32:25 +01:00
cfq-iosched.c block: cfq_cpd_alloc() should use @gfp 2017-01-19 20:18:07 +01:00
cmdline-parser.c block: remove unrelated header files and export symbol 2014-01-21 20:18:26 -08:00
compat_ioctl.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
deadline-iosched.c block: do not merge requests without consulting with io scheduler 2016-07-20 21:35:12 -06:00
elevator.c block: Fix secure erase 2016-08-16 09:16:51 -06:00
genhd.c block: fix bdi vs gendisk lifetime mismatch 2016-08-04 14:19:16 -06:00
ioctl.c block: invalidate the page cache when issuing BLKZEROOUT 2016-10-11 15:06:30 -07:00
ioprio.c block: fix use-after-free in sys_ioprio_get() 2016-07-01 08:39:24 -06:00
Kconfig Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
Kconfig.iosched blkcg: make CONFIG_BLK_CGROUP bool 2012-03-06 21:27:21 +01:00
Makefile Merge branch 'for-4.9/block-smp' of git://git.kernel.dk/linux-block 2016-10-09 17:32:20 -07:00
noop-iosched.c elevator: use list_{first,prev,next}_entry 2015-11-16 15:21:48 -07:00
partition-generic.c block/partition-generic.c: Remove a set-but-not-used variable 2016-06-14 09:09:15 -06:00
scsi_ioctl.c block: allow WRITE_SAME commands with the SG_IO ioctl 2017-03-22 12:43:38 +01:00
t10-pi.c block: Consolidate static integrity profile properties 2015-10-21 14:42:38 -06:00