1
0
Fork 0
alistair23-linux/arch
Gabriel Krisman Bertazi 1dab82dd20 um: ubd: Submit all data segments atomically
[ Upstream commit fc6b6a872d ]

Internally, UBD treats each physical IO segment as a separate command to
be submitted in the execution pipe.  If the pipe returns a transient
error after a few segments have already been written, UBD will tell the
block layer to requeue the request, but there is no way to reclaim the
segments already submitted.  When a new attempt to dispatch the request
is done, those segments already submitted will get duplicated, causing
the WARN_ON below in the best case, and potentially data corruption.

In my system, running a UML instance with 2GB of RAM and a 50M UBD disk,
I can reproduce the WARN_ON by simply running mkfs.fvat against the
disk on a freshly booted system.

There are a few ways to around this, like reducing the pressure on
the pipe by reducing the queue depth, which almost eliminates the
occurrence of the problem, increasing the pipe buffer size on the host
system, or by limiting the request to one physical segment, which causes
the block layer to submit way more requests to resolve a single
operation.

Instead, this patch modifies the format of a UBD command, such that all
segments are sent through a single element in the communication pipe,
turning the command submission atomic from the point of view of the
block layer.  The new format has a variable size, depending on the
number of elements, and looks like this:

+------------+-----------+-----------+------------
| cmd_header | segment 0 | segment 1 | segment ...
+------------+-----------+-----------+------------

With this format, we push a pointer to cmd_header in the submission
pipe.

This has the advantage of reducing the memory footprint of executing a
single request, since it allow us to merge some fields in the header.
It is possible to reduce even further each segment memory footprint, by
merging bitmap_words and cow_offset, for instance, but this is not the
focus of this patch and is left as future work.  One issue with the
patch is that for a big number of segments, we now perform one big
memory allocation instead of multiple small ones, but I wasn't able to
trigger any real issues or -ENOMEM because of this change, that wouldn't
be reproduced otherwise.

This was tested using fio with the verify-crc32 option, and by running
an ext4 filesystem over this UBD device.

The original WARN_ON was:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346
Stack:
 6084eed0 6063dc77 00000009 6084ef60
 00000000 604b8d9f 6084eee0 6063dcbc
 6084ef40 6006ab8d e013d780 1c00000000
Call Trace:
 [<600a0c1c>] ? printk+0x0/0x94
 [<6004a888>] show_stack+0x13b/0x155
 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8
 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141
 [<6063dcbc>] dump_stack+0x2a/0x2c
 [<6006ab8d>] __warn+0x107/0x134
 [<6008da6c>] ? wake_up_process+0x17/0x19
 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd
 [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf
 [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf
 [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15
 [<600619ae>] ? os_nsecs+0x1d/0x2b
 [<604b8d9f>] refcount_warn_saturate+0x13f/0x141
 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37
 [<6048c8de>] blk_mq_free_request+0xf1/0x10d
 [<6048ca06>] __blk_mq_end_request+0x10c/0x114
 [<6005ac0f>] ubd_intr+0xb5/0x169
 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e
 [<600a1b70>] handle_irq_event_percpu+0x26/0x69
 [<600a1bd9>] handle_irq_event+0x26/0x34
 [<600a1bb3>] ? handle_irq_event+0x0/0x34
 [<600a5186>] ? unmask_irq+0x0/0x37
 [<600a57e6>] handle_edge_irq+0xbc/0xd6
 [<600a131a>] generic_handle_irq+0x21/0x29
 [<60048f6e>] do_IRQ+0x39/0x54
 [...]
---[ end trace c6e7444e55386c0f ]---

Cc: Christopher Obbard <chris.obbard@collabora.com>
Reported-by: Martyn Welch <martyn@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:48:40 +01:00
..
alpha alpha: fix annotation of io{read,write}{16,32}be() 2020-08-26 10:40:58 +02:00
arc ARC: stack unwinding: don't assume non-current task is sleeping 2020-12-16 10:56:55 +01:00
arm ARM: dts: at91: sama5d2: fix CAN message ram offset and size 2020-12-30 11:51:38 +01:00
arm64 KVM: arm64: Introduce handling of AArch32 TTBCR2 traps 2020-12-30 11:51:38 +01:00
c6x mm: consolidate pgtable_cache_init() and pgd_cache_init() 2019-09-24 15:54:09 -07:00
csky csky: Fixup abiv2 syscall_trace break a4 & a5 2020-06-17 16:40:21 +02:00
h8300 mm: consolidate pgtable_cache_init() and pgd_cache_init() 2019-09-24 15:54:09 -07:00
hexagon hexagon: define ioremap_uc 2020-05-10 10:31:31 +02:00
ia64 ia64: fix build error with !COREDUMP 2020-11-05 11:43:33 +01:00
m68k m68k: q40: Fix info-leak in rtc_ioctl 2020-10-01 13:17:12 +02:00
microblaze microblaze: Prevent the overflow of the start 2020-02-24 08:37:02 +01:00
mips MIPS: Don't round up kernel sections size for memblock_add() 2020-12-30 11:51:18 +01:00
nds32 asm-generic/nds32: don't redefine cacheflush primitives 2020-01-17 19:48:43 +01:00
nios2 nios2 update for v5.4-rc1 2019-09-27 13:02:19 -07:00
openrisc openrisc: Fix issue with get_user for 64-bit values 2020-11-01 12:01:06 +01:00
parisc kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables 2020-09-03 11:27:10 +02:00
powerpc powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() 2021-01-06 14:48:39 +01:00
riscv arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed 2020-12-02 08:49:50 +01:00
s390 s390/kexec_file: fix diag308 subcode when loading crash kernel 2020-12-30 11:51:34 +01:00
sh sh: landisk: Add missing initialization of sh_io_port_base 2020-08-21 13:05:38 +02:00
sparc sparc: fix handling of page table constructor failure 2020-12-30 11:51:26 +01:00
um um: ubd: Submit all data segments atomically 2021-01-06 14:48:40 +01:00
unicore32 mm: treewide: clarify pgtable_page_{ctor,dtor}() naming 2019-09-26 10:10:44 -07:00
x86 KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits 2021-01-06 14:48:36 +01:00
xtensa xtensa: uaccess: Add missing __user to strncpy_from_user() prototype 2020-12-02 08:49:49 +01:00
.gitignore
Kconfig Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" 2020-12-30 11:51:47 +01:00