1
0
Fork 0
Fork of alistair23 Linux kernel for reMarkable from https://github.com/alistair23/linux
 
 
 
 
 
 
Go to file
Douglas Anderson a48b609c8d bdev: Reduce time holding bd_mutex in sync in blkdev_close()
[ Upstream commit b849dd84b6 ]

While trying to "dd" to the block device for a USB stick, I
encountered a hung task warning (blocked for > 120 seconds).  I
managed to come up with an easy way to reproduce this on my system
(where /dev/sdb is the block device for my USB stick) with:

  while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done

With my reproduction here are the relevant bits from the hung task
detector:

 INFO: task udevd:294 blocked for more than 122 seconds.
 ...
 udevd           D    0   294      1 0x00400008
 Call trace:
  ...
  mutex_lock_nested+0x40/0x50
  __blkdev_get+0x7c/0x3d4
  blkdev_get+0x118/0x138
  blkdev_open+0x94/0xa8
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3c8
  ...

 ...
 Showing all locks held in the system:
 ...
 1 lock held by dd/2798:
  #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204
 ...
 dd              D    0  2798   2764 0x00400208
 Call trace:
  ...
  schedule+0x8c/0xbc
  io_schedule+0x1c/0x40
  wait_on_page_bit_common+0x238/0x338
  __lock_page+0x5c/0x68
  write_cache_pages+0x194/0x500
  generic_writepages+0x64/0xa4
  blkdev_writepages+0x24/0x30
  do_writepages+0x48/0xa8
  __filemap_fdatawrite_range+0xac/0xd8
  filemap_write_and_wait+0x30/0x84
  __blkdev_put+0x88/0x204
  blkdev_put+0xc4/0xe4
  blkdev_close+0x28/0x38
  __fput+0xe0/0x238
  ____fput+0x1c/0x28
  task_work_run+0xb0/0xe4
  do_notify_resume+0xfc0/0x14bc
  work_pending+0x8/0x14

The problem appears related to the fact that my USB disk is terribly
slow and that I have a lot of RAM in my system to cache things.
Specifically my writes seem to be happening at ~15 MB/s and I've got
~4 GB of RAM in my system that can be used for buffering.  To write 4
GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds.

The 267 second number is a problem because in __blkdev_put() we call
sync_blockdev() while holding the bd_mutex.  Any other callers who
want the bd_mutex will be blocked for the whole time.

The problem is made worse because I believe blkdev_put() specifically
tells other tasks (namely udev) to go try to access the device at right
around the same time we're going to hold the mutex for a long time.

Putting some traces around this (after disabling the hung task detector),
I could confirm:
 dd:    437.608600: __blkdev_put() right before sync_blockdev() for sdb
 udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb
 dd:    661.468451: __blkdev_put() right after sync_blockdev() for sdb
 udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb

A simple fix for this is to realize that sync_blockdev() works fine if
you're not holding the mutex.  Also, it's not the end of the world if
you sync a little early (though it can have performance impacts).
Thus we can make a guess that we're going to need to do the sync and
then do it without holding the mutex.  We still do one last sync with
the mutex but it should be much, much faster.

With this, my hung task warnings for my test case are gone.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:17:55 +02:00
Documentation affs: fix basic permission bits to actually work 2020-09-09 19:12:34 +02:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch KVM: Remove CREATE_IRQCHIP/SET_PIT2 race 2020-10-01 13:17:55 +02:00
block block: only call sched requeue_request() for scheduled requests 2020-09-23 12:40:37 +02:00
certs PKCS#7: Refactor verify_pkcs7_signature() 2019-08-05 18:40:18 -04:00
crypto crypto: af_alg - Work around empty control messages without MSG_MORE 2020-09-03 11:27:05 +02:00
drivers serial: uartps: Wait for tx_empty in console setup 2020-10-01 13:17:55 +02:00
fs bdev: Reduce time holding bd_mutex in sync in blkdev_close() 2020-10-01 13:17:55 +02:00
include ALSA: hda: Skip controller resume if not needed 2020-10-01 13:17:54 +02:00
init exec: Add exec_update_mutex to replace cred_guard_mutex 2020-10-01 13:17:47 +02:00
ipc ipc/util.c: sysvipc_find_ipc() incorrectly updates position index 2020-05-20 08:20:16 +02:00
kernel workqueue: Remove the warning in wq_worker_sleeping() 2020-10-01 13:17:54 +02:00
lib kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec() 2020-10-01 13:17:10 +02:00
mm mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area 2020-10-01 13:17:54 +02:00
net SUNRPC: Don't start a timer on an already queued rpc task 2020-10-01 13:17:53 +02:00
samples bpf: Fix fds_example SIGSEGV error 2020-08-19 08:16:03 +02:00
scripts checkpatch: fix the usage of capture group ( ... ) 2020-09-09 19:12:37 +02:00
security selinux: sel_avc_get_stat_idx should increase position index 2020-10-01 13:17:32 +02:00
sound ALSA: hda: Skip controller resume if not needed 2020-10-01 13:17:54 +02:00
tools perf stat: Force error in fallback on :k events 2020-10-01 13:17:55 +02:00
usr initramfs: restore default compression behavior 2020-04-08 09:08:38 +02:00
virt KVM: fix overflow of zero page refcount with ksm running 2020-10-01 13:17:31 +02:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Modules updates for v5.4 2019-09-22 10:34:46 -07:00
.mailmap ARM: SoC fixes 2019-11-10 13:41:59 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS Documentation/llvm: add documentation on building w/ Clang/LLVM 2020-08-26 10:40:46 +02:00
Makefile Linux 5.4.68 2020-09-26 18:03:16 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.