From ad5f498f610fa3fd8bd265139098bc1405cd2783 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Tue, 27 Oct 2015 19:06:55 -0400 Subject: [PATCH 01/13] dm: initialize non-blk-mq queue data before queue is used Commit bfebd1cdb497a57757c83f5fbf1a29931591e2a4 ("dm: add full blk-mq support to request-based DM") moves the initialization of the fields backing_dev_info.congested_fn, backing_dev_info.congested_data and queuedata from the function dm_init_md_queue (that is called when the device is created) to dm_init_old_md_queue (that is called after the device type is determined). There is no locking when accessing these variables, thus it is possible for other parts of the kernel to briefly see this data in a transient state (e.g. queue->backing_dev_info.congested_fn initialized and md->queue->backing_dev_info.congested_data uninitialized, resulting in passing an incorrect parameter to the function dm_any_congested). This queue data is left initialized for blk-mq devices even though they that don't use it. Fixes: bfebd1cdb497 ("dm: add full blk-mq support to request-based DM") Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org # v4.1+ --- drivers/md/dm.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 6264781dc69a..77ebb985154c 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2197,6 +2197,13 @@ static void dm_init_md_queue(struct mapped_device *md) * This queue is new, so no concurrency on the queue_flags. */ queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue); + + /* + * Initialize data that will only be used by a non-blk-mq DM queue + * - must do so here (in alloc_dev callchain) before queue is used + */ + md->queue->queuedata = md; + md->queue->backing_dev_info.congested_data = md; } static void dm_init_old_md_queue(struct mapped_device *md) @@ -2207,10 +2214,7 @@ static void dm_init_old_md_queue(struct mapped_device *md) /* * Initialize aspects of queue that aren't relevant for blk-mq */ - md->queue->queuedata = md; md->queue->backing_dev_info.congested_fn = dm_any_congested; - md->queue->backing_dev_info.congested_data = md; - blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY); } From 47796938c46b943d157ac8a6f9ed4e3b98b83cf4 Mon Sep 17 00:00:00 2001 From: Mauricio Faria de Oliveira Date: Thu, 29 Oct 2015 10:24:23 -0200 Subject: [PATCH 02/13] Revert "dm mpath: fix stalls when handling invalid ioctls" This reverts commit a1989b330093578ea5470bea0a00f940c444c466. That commit introduced a regression at least for the case of the SG_IO ioctl() running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there are no active paths: the ioctl() fails with the ENOTTY errno immediately rather than blocking due to queue_if_no_path until a path becomes active, for example. That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices (qemu "-device scsi-block" [1], libvirt "" [2]) from multipath devices; which leads to SCSI/filesystem errors in such a guest. More general scenarios can hit that regression too. The following demonstration employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective (some output & user changes omitted for brevity and comments added for clarity). Reverting that commit restores normal operation (queueing) in failing scenarios; tested on linux-next (next-20151022). 1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM) $ cat sg_simple0.c ... see [3] ... $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c $ gcc sgio_inquiry.c -o sgio_inquiry 2) The ioctl() works fine with active paths present. # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | |- 8:0:11:0 sdz 65:144 active undef running | `- 9:0:9:0 sdbf 67:144 active undef running `-+- policy='service-time 0' prio=0 status=enabled |- 8:0:12:0 sdae 65:224 active undef running `- 9:0:12:0 sdbo 68:32 active undef running $ ./sgio_inquiry /dev/mapper/85ag56 Some of the INQUIRY command's response: IBM 2145 0000 INQUIRY duration=0 millisecs, resid=0 3) The ioctl() fails with ENOTTY errno with _no_ active paths present, for unprivileged users (rather than blocking due to queue_if_no_path). # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \ do multipathd -k"fail path $path"; done # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | |- 8:0:11:0 sdz 65:144 failed undef running | `- 9:0:9:0 sdbf 67:144 failed undef running `-+- policy='service-time 0' prio=0 status=enabled |- 8:0:12:0 sdae 65:224 failed undef running `- 9:0:12:0 sdbo 68:32 failed undef running $ ./sgio_inquiry /dev/mapper/85ag56 sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device 4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285); it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl(). $ dmesg <...> [] device-mapper: multipath: Failing path 65:144. [] device-mapper: multipath: Failing path 67:144. [] device-mapper: multipath: Failing path 65:224. [] device-mapper: multipath: Failing path 68:32. [] sgio_inquiry: sending ioctl 2285 to a partition! 5) The ioctl() only works if the SYS_CAP_RAWIO capability is present (then queueing happens -- in this example, queue_if_no_path is set); this is due to a conditional check in scsi_verify_blk_ioctl(). # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56' sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device # ./sgio_inquiry /dev/mapper/85ag56 & [1] 72830 # cat /proc/72830/stack [] 0xc00000171c0df700 [] __switch_to+0x204/0x350 [] msleep+0x5c/0x80 [] dm_blk_ioctl+0x70/0x170 [] blkdev_ioctl+0x2b0/0x9b0 [] block_ioctl+0x64/0xd0 [] do_vfs_ioctl+0x490/0x780 [] SyS_ioctl+0xd4/0xf0 [] system_call+0x38/0xd0 6) This is the function call chain exercised in this analysis: SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c -> do_vfs_ioctl() -> vfs_ioctl() ... error = filp->f_op->unlocked_ioctl(filp, cmd, arg); ... -> dm_blk_ioctl() @ drivers/md/dm.c -> multipath_ioctl() @ drivers/md/dm-mpath.c ... (bdev = NULL, due to no active paths) ... if (!bdev || <...>) { int err = scsi_verify_blk_ioctl(NULL, cmd); if (err) r = err; } ... -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c ... if (bd && bd == bd->bd_contains) // not taken (bd = NULL) return 0; ... if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user) return 0; ... printk_ratelimited(KERN_WARNING "%s: sending ioctl %x to a partition!\n" <...>); return -ENOIOCTLCMD; <- ... return r ? : <...> <- ... if (error == -ENOIOCTLCMD) error = -ENOTTY; out: return error; ... Links: [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52 [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device') [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03) Signed-off-by: Mauricio Faria de Oliveira Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org --- drivers/md/dm-mpath.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 5a67671a3973..bdc96cd838b8 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1569,11 +1569,8 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd, /* * Only pass ioctls through if the device sizes match exactly. */ - if (!bdev || ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT) { - int err = scsi_verify_blk_ioctl(NULL, cmd); - if (err) - r = err; - } + if (!r && ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT) + r = scsi_verify_blk_ioctl(NULL, cmd); if (r == -ENOTCONN && !fatal_signal_pending(current)) { spin_lock_irqsave(&m->lock, flags); From e56f81e0b01ef4e45292d8c1e19edd4d09724e14 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 15 Oct 2015 14:10:50 +0200 Subject: [PATCH 03/13] dm: refactor ioctl handling This moves the call to blkdev_ioctl and the argument checking to DM core code, and only leaves a callout to find the block device to operate on in the targets. This simplifies the code and allows us to pass through ioctl-like command using other methods in the next patch. Also split out a helper around calling the prepare_ioctl method that will be reused for persistent reservation handling. Signed-off-by: Christoph Hellwig Signed-off-by: Mike Snitzer --- drivers/md/dm-flakey.c | 16 +++++----- drivers/md/dm-linear.c | 14 ++++----- drivers/md/dm-log-writes.c | 13 ++++----- drivers/md/dm-mpath.c | 29 ++++++++---------- drivers/md/dm-switch.c | 21 ++++++------- drivers/md/dm-verity.c | 15 +++++----- drivers/md/dm.c | 55 ++++++++++++++++++++++++++--------- include/linux/device-mapper.h | 6 ++-- 8 files changed, 94 insertions(+), 75 deletions(-) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 645e8b4f808e..09e2afcafd2d 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -373,20 +373,20 @@ static void flakey_status(struct dm_target *ti, status_type_t type, } } -static int flakey_ioctl(struct dm_target *ti, unsigned int cmd, unsigned long arg) +static int flakey_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode) { struct flakey_c *fc = ti->private; - struct dm_dev *dev = fc->dev; - int r = 0; + + *bdev = fc->dev->bdev; /* * Only pass ioctls through if the device sizes match exactly. */ if (fc->start || - ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT) - r = scsi_verify_blk_ioctl(NULL, cmd); - - return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg); + ti->len != i_size_read((*bdev)->bd_inode) >> SECTOR_SHIFT) + return 1; + return 0; } static int flakey_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data) @@ -405,7 +405,7 @@ static struct target_type flakey_target = { .map = flakey_map, .end_io = flakey_end_io, .status = flakey_status, - .ioctl = flakey_ioctl, + .prepare_ioctl = flakey_prepare_ioctl, .iterate_devices = flakey_iterate_devices, }; diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 436f5c9b6aea..de52864d60fa 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -116,21 +116,21 @@ static void linear_status(struct dm_target *ti, status_type_t type, } } -static int linear_ioctl(struct dm_target *ti, unsigned int cmd, - unsigned long arg) +static int linear_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode) { struct linear_c *lc = (struct linear_c *) ti->private; struct dm_dev *dev = lc->dev; - int r = 0; + + *bdev = dev->bdev; /* * Only pass ioctls through if the device sizes match exactly. */ if (lc->start || ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT) - r = scsi_verify_blk_ioctl(NULL, cmd); - - return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg); + return 1; + return 0; } static int linear_iterate_devices(struct dm_target *ti, @@ -149,7 +149,7 @@ static struct target_type linear_target = { .dtr = linear_dtr, .map = linear_map, .status = linear_status, - .ioctl = linear_ioctl, + .prepare_ioctl = linear_prepare_ioctl, .iterate_devices = linear_iterate_devices, }; diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c index b2912dbac8bc..624589d51c2c 100644 --- a/drivers/md/dm-log-writes.c +++ b/drivers/md/dm-log-writes.c @@ -714,20 +714,19 @@ static void log_writes_status(struct dm_target *ti, status_type_t type, } } -static int log_writes_ioctl(struct dm_target *ti, unsigned int cmd, - unsigned long arg) +static int log_writes_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode) { struct log_writes_c *lc = ti->private; struct dm_dev *dev = lc->dev; - int r = 0; + *bdev = dev->bdev; /* * Only pass ioctls through if the device sizes match exactly. */ if (ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT) - r = scsi_verify_blk_ioctl(NULL, cmd); - - return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg); + return 1; + return 0; } static int log_writes_iterate_devices(struct dm_target *ti, @@ -782,7 +781,7 @@ static struct target_type log_writes_target = { .map = log_writes_map, .end_io = normal_end_io, .status = log_writes_status, - .ioctl = log_writes_ioctl, + .prepare_ioctl = log_writes_prepare_ioctl, .message = log_writes_message, .iterate_devices = log_writes_iterate_devices, .io_hints = log_writes_io_hints, diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index bdc96cd838b8..77066a199984 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1533,18 +1533,14 @@ out: return r; } -static int multipath_ioctl(struct dm_target *ti, unsigned int cmd, - unsigned long arg) +static int multipath_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode) { struct multipath *m = ti->private; struct pgpath *pgpath; - struct block_device *bdev; - fmode_t mode; unsigned long flags; int r; - bdev = NULL; - mode = 0; r = 0; spin_lock_irqsave(&m->lock, flags); @@ -1555,23 +1551,17 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd, pgpath = m->current_pgpath; if (pgpath) { - bdev = pgpath->path.dev->bdev; - mode = pgpath->path.dev->mode; + *bdev = pgpath->path.dev->bdev; + *mode = pgpath->path.dev->mode; } if ((pgpath && m->queue_io) || (!pgpath && m->queue_if_no_path)) r = -ENOTCONN; - else if (!bdev) + else if (!*bdev) r = -EIO; spin_unlock_irqrestore(&m->lock, flags); - /* - * Only pass ioctls through if the device sizes match exactly. - */ - if (!r && ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT) - r = scsi_verify_blk_ioctl(NULL, cmd); - if (r == -ENOTCONN && !fatal_signal_pending(current)) { spin_lock_irqsave(&m->lock, flags); if (!m->current_pg) { @@ -1584,7 +1574,12 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd, dm_table_run_md_queue_async(m->ti->table); } - return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg); + /* + * Only pass ioctls through if the device sizes match exactly. + */ + if (!r && ti->len != i_size_read((*bdev)->bd_inode) >> SECTOR_SHIFT) + return 1; + return r; } static int multipath_iterate_devices(struct dm_target *ti, @@ -1700,7 +1695,7 @@ static struct target_type multipath_target = { .resume = multipath_resume, .status = multipath_status, .message = multipath_message, - .ioctl = multipath_ioctl, + .prepare_ioctl = multipath_prepare_ioctl, .iterate_devices = multipath_iterate_devices, .busy = multipath_busy, }; diff --git a/drivers/md/dm-switch.c b/drivers/md/dm-switch.c index 50fca469cafd..b1285753a5d4 100644 --- a/drivers/md/dm-switch.c +++ b/drivers/md/dm-switch.c @@ -511,27 +511,24 @@ static void switch_status(struct dm_target *ti, status_type_t type, * * Passthrough all ioctls to the path for sector 0 */ -static int switch_ioctl(struct dm_target *ti, unsigned cmd, - unsigned long arg) +static int switch_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode) { struct switch_ctx *sctx = ti->private; - struct block_device *bdev; - fmode_t mode; unsigned path_nr; - int r = 0; path_nr = switch_get_path_nr(sctx, 0); - bdev = sctx->path_list[path_nr].dmdev->bdev; - mode = sctx->path_list[path_nr].dmdev->mode; + *bdev = sctx->path_list[path_nr].dmdev->bdev; + *mode = sctx->path_list[path_nr].dmdev->mode; /* * Only pass ioctls through if the device sizes match exactly. */ - if (ti->len + sctx->path_list[path_nr].start != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT) - r = scsi_verify_blk_ioctl(NULL, cmd); - - return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg); + if (ti->len + sctx->path_list[path_nr].start != + i_size_read((*bdev)->bd_inode) >> SECTOR_SHIFT) + return 1; + return 0; } static int switch_iterate_devices(struct dm_target *ti, @@ -560,7 +557,7 @@ static struct target_type switch_target = { .map = switch_map, .message = switch_message, .status = switch_status, - .ioctl = switch_ioctl, + .prepare_ioctl = switch_prepare_ioctl, .iterate_devices = switch_iterate_devices, }; diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c index edc624bccf9a..ccf41886ebcf 100644 --- a/drivers/md/dm-verity.c +++ b/drivers/md/dm-verity.c @@ -631,18 +631,17 @@ static void verity_status(struct dm_target *ti, status_type_t type, } } -static int verity_ioctl(struct dm_target *ti, unsigned cmd, - unsigned long arg) +static int verity_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode) { struct dm_verity *v = ti->private; - int r = 0; + + *bdev = v->data_dev->bdev; if (v->data_start || ti->len != i_size_read(v->data_dev->bdev->bd_inode) >> SECTOR_SHIFT) - r = scsi_verify_blk_ioctl(NULL, cmd); - - return r ? : __blkdev_driver_ioctl(v->data_dev->bdev, v->data_dev->mode, - cmd, arg); + return 1; + return 0; } static int verity_iterate_devices(struct dm_target *ti, @@ -965,7 +964,7 @@ static struct target_type verity_target = { .dtr = verity_dtr, .map = verity_map, .status = verity_status, - .ioctl = verity_ioctl, + .prepare_ioctl = verity_prepare_ioctl, .iterate_devices = verity_iterate_devices, .io_hints = verity_io_hints, }; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 77ebb985154c..9b3fe5be0cee 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -555,18 +555,16 @@ static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo) return dm_get_geometry(md, geo); } -static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode, - unsigned int cmd, unsigned long arg) +static int dm_get_live_table_for_ioctl(struct mapped_device *md, + struct dm_target **tgt, struct block_device **bdev, + fmode_t *mode, int *srcu_idx) { - struct mapped_device *md = bdev->bd_disk->private_data; - int srcu_idx; struct dm_table *map; - struct dm_target *tgt; - int r = -ENOTTY; + int r; retry: - map = dm_get_live_table(md, &srcu_idx); - + r = -ENOTTY; + map = dm_get_live_table(md, srcu_idx); if (!map || !dm_table_get_size(map)) goto out; @@ -574,8 +572,9 @@ retry: if (dm_table_get_num_targets(map) != 1) goto out; - tgt = dm_table_get_target(map, 0); - if (!tgt->type->ioctl) + *tgt = dm_table_get_target(map, 0); + + if (!(*tgt)->type->prepare_ioctl) goto out; if (dm_suspended_md(md)) { @@ -583,16 +582,46 @@ retry: goto out; } - r = tgt->type->ioctl(tgt, cmd, arg); + r = (*tgt)->type->prepare_ioctl(*tgt, bdev, mode); + if (r < 0) + goto out; + + return r; out: - dm_put_live_table(md, srcu_idx); - + dm_put_live_table(md, *srcu_idx); if (r == -ENOTCONN) { msleep(10); goto retry; } + return r; +} +static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode, + unsigned int cmd, unsigned long arg) +{ + struct mapped_device *md = bdev->bd_disk->private_data; + struct dm_target *tgt; + int srcu_idx, r; + + r = dm_get_live_table_for_ioctl(md, &tgt, &bdev, &mode, &srcu_idx); + if (r < 0) + return r; + + if (r > 0) { + /* + * Target determined this ioctl is being issued against + * a logical partition of the parent bdev; so extra + * validation is needed. + */ + r = scsi_verify_blk_ioctl(NULL, cmd); + if (r) + goto out; + } + + r = __blkdev_driver_ioctl(bdev, mode, cmd, arg); +out: + dm_put_live_table(md, srcu_idx); return r; } diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 76d23fa8c7d3..ec1c61c87d89 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -79,8 +79,8 @@ typedef void (*dm_status_fn) (struct dm_target *ti, status_type_t status_type, typedef int (*dm_message_fn) (struct dm_target *ti, unsigned argc, char **argv); -typedef int (*dm_ioctl_fn) (struct dm_target *ti, unsigned int cmd, - unsigned long arg); +typedef int (*dm_prepare_ioctl_fn) (struct dm_target *ti, + struct block_device **bdev, fmode_t *mode); /* * These iteration functions are typically used to check (and combine) @@ -156,7 +156,7 @@ struct target_type { dm_resume_fn resume; dm_status_fn status; dm_message_fn message; - dm_ioctl_fn ioctl; + dm_prepare_ioctl_fn prepare_ioctl; dm_busy_fn busy; dm_iterate_devices_fn iterate_devices; dm_io_hints_fn io_hints; From 71cdb6978a80f9f6c51bef0622388c1414c2fe32 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 15 Oct 2015 14:10:51 +0200 Subject: [PATCH 04/13] dm: add support for passing through persistent reservations This adds support to pass through persistent reservation requests similar to the existing ioctl handling, and with the same limitations, e.g. devices may only have a single target attached. This is mostly intended for multipathing. Signed-off-by: Christoph Hellwig Signed-off-by: Mike Snitzer --- drivers/md/dm-mpath.c | 2 +- drivers/md/dm.c | 123 ++++++++++++++++++++++++++++++++++ include/uapi/linux/dm-ioctl.h | 4 +- 3 files changed, 126 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 77066a199984..aaa6caa46a9f 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1682,7 +1682,7 @@ out: *---------------------------------------------------------------*/ static struct target_type multipath_target = { .name = "multipath", - .version = {1, 9, 0}, + .version = {1, 10, 0}, .module = THIS_MODULE, .ctr = multipath_ctr, .dtr = multipath_dtr, diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 9b3fe5be0cee..c31ca8bc6af5 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -24,6 +24,7 @@ #include #include /* for rq_end_sector() */ #include +#include #include @@ -3553,11 +3554,133 @@ void dm_free_md_mempools(struct dm_md_mempools *pools) kfree(pools); } +static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, + u32 flags) +{ + struct mapped_device *md = bdev->bd_disk->private_data; + const struct pr_ops *ops; + struct dm_target *tgt; + fmode_t mode; + int srcu_idx, r; + + r = dm_get_live_table_for_ioctl(md, &tgt, &bdev, &mode, &srcu_idx); + if (r < 0) + return r; + + ops = bdev->bd_disk->fops->pr_ops; + if (ops && ops->pr_register) + r = ops->pr_register(bdev, old_key, new_key, flags); + else + r = -EOPNOTSUPP; + + dm_put_live_table(md, srcu_idx); + return r; +} + +static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, + u32 flags) +{ + struct mapped_device *md = bdev->bd_disk->private_data; + const struct pr_ops *ops; + struct dm_target *tgt; + fmode_t mode; + int srcu_idx, r; + + r = dm_get_live_table_for_ioctl(md, &tgt, &bdev, &mode, &srcu_idx); + if (r < 0) + return r; + + ops = bdev->bd_disk->fops->pr_ops; + if (ops && ops->pr_reserve) + r = ops->pr_reserve(bdev, key, type, flags); + else + r = -EOPNOTSUPP; + + dm_put_live_table(md, srcu_idx); + return r; +} + +static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +{ + struct mapped_device *md = bdev->bd_disk->private_data; + const struct pr_ops *ops; + struct dm_target *tgt; + fmode_t mode; + int srcu_idx, r; + + r = dm_get_live_table_for_ioctl(md, &tgt, &bdev, &mode, &srcu_idx); + if (r < 0) + return r; + + ops = bdev->bd_disk->fops->pr_ops; + if (ops && ops->pr_release) + r = ops->pr_release(bdev, key, type); + else + r = -EOPNOTSUPP; + + dm_put_live_table(md, srcu_idx); + return r; +} + +static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, + enum pr_type type, bool abort) +{ + struct mapped_device *md = bdev->bd_disk->private_data; + const struct pr_ops *ops; + struct dm_target *tgt; + fmode_t mode; + int srcu_idx, r; + + r = dm_get_live_table_for_ioctl(md, &tgt, &bdev, &mode, &srcu_idx); + if (r < 0) + return r; + + ops = bdev->bd_disk->fops->pr_ops; + if (ops && ops->pr_preempt) + r = ops->pr_preempt(bdev, old_key, new_key, type, abort); + else + r = -EOPNOTSUPP; + + dm_put_live_table(md, srcu_idx); + return r; +} + +static int dm_pr_clear(struct block_device *bdev, u64 key) +{ + struct mapped_device *md = bdev->bd_disk->private_data; + const struct pr_ops *ops; + struct dm_target *tgt; + fmode_t mode; + int srcu_idx, r; + + r = dm_get_live_table_for_ioctl(md, &tgt, &bdev, &mode, &srcu_idx); + if (r < 0) + return r; + + ops = bdev->bd_disk->fops->pr_ops; + if (ops && ops->pr_clear) + r = ops->pr_clear(bdev, key); + else + r = -EOPNOTSUPP; + + dm_put_live_table(md, srcu_idx); + return r; +} + +static const struct pr_ops dm_pr_ops = { + .pr_register = dm_pr_register, + .pr_reserve = dm_pr_reserve, + .pr_release = dm_pr_release, + .pr_preempt = dm_pr_preempt, + .pr_clear = dm_pr_clear, +}; + static const struct block_device_operations dm_blk_dops = { .open = dm_blk_open, .release = dm_blk_close, .ioctl = dm_blk_ioctl, .getgeo = dm_blk_getgeo, + .pr_ops = &dm_pr_ops, .owner = THIS_MODULE }; diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h index d34611e35a30..30afd0a23c4b 100644 --- a/include/uapi/linux/dm-ioctl.h +++ b/include/uapi/linux/dm-ioctl.h @@ -267,9 +267,9 @@ enum { #define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl) #define DM_VERSION_MAJOR 4 -#define DM_VERSION_MINOR 33 +#define DM_VERSION_MINOR 34 #define DM_VERSION_PATCHLEVEL 0 -#define DM_VERSION_EXTRA "-ioctl (2015-8-18)" +#define DM_VERSION_EXTRA "-ioctl (2015-10-28)" /* Status bits */ #define DM_READONLY_FLAG (1 << 0) /* In/Out */ From 6f65985e2636c0b170eade6a72d216632f065e26 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sun, 13 Sep 2015 14:15:05 +0200 Subject: [PATCH 05/13] dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() Remove DM's unneeded NULL tests before calling these destroy functions, now that they check for NULL, thanks to these v4.3 commits: 3942d2991 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()") 4e3ca3e03 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()") The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // Signed-off-by: Julia Lawall Signed-off-by: Mike Snitzer --- drivers/md/dm-bufio.c | 8 ++------ drivers/md/dm-cache-target.c | 3 +-- drivers/md/dm-crypt.c | 6 ++---- drivers/md/dm-io.c | 3 +-- drivers/md/dm-log-userspace-base.c | 3 +-- drivers/md/dm-region-hash.c | 4 +--- drivers/md/dm.c | 13 ++++--------- 7 files changed, 12 insertions(+), 28 deletions(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index 83cc52eaf56d..e1547d0dadfd 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -1861,12 +1861,8 @@ static void __exit dm_bufio_exit(void) cancel_delayed_work_sync(&dm_bufio_work); destroy_workqueue(dm_bufio_wq); - for (i = 0; i < ARRAY_SIZE(dm_bufio_caches); i++) { - struct kmem_cache *kc = dm_bufio_caches[i]; - - if (kc) - kmem_cache_destroy(kc); - } + for (i = 0; i < ARRAY_SIZE(dm_bufio_caches); i++) + kmem_cache_destroy(dm_bufio_caches[i]); for (i = 0; i < ARRAY_SIZE(dm_bufio_cache_names); i++) kfree(dm_bufio_cache_names[i]); diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index dd90d1236f4a..2fd4c8296144 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -2309,8 +2309,7 @@ static void destroy(struct cache *cache) { unsigned i; - if (cache->migration_pool) - mempool_destroy(cache->migration_pool); + mempool_destroy(cache->migration_pool); if (cache->all_io_ds) dm_deferred_set_destroy(cache->all_io_ds); diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 4b3b6f8aff0c..3729b394432c 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -1544,10 +1544,8 @@ static void crypt_dtr(struct dm_target *ti) if (cc->bs) bioset_free(cc->bs); - if (cc->page_pool) - mempool_destroy(cc->page_pool); - if (cc->req_pool) - mempool_destroy(cc->req_pool); + mempool_destroy(cc->page_pool); + mempool_destroy(cc->req_pool); if (cc->iv_gen_ops && cc->iv_gen_ops->dtr) cc->iv_gen_ops->dtr(cc); diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c index 6f8e83b2a6f8..81c5e1a1f363 100644 --- a/drivers/md/dm-io.c +++ b/drivers/md/dm-io.c @@ -65,8 +65,7 @@ struct dm_io_client *dm_io_client_create(void) return client; bad: - if (client->pool) - mempool_destroy(client->pool); + mempool_destroy(client->pool); kfree(client); return ERR_PTR(-ENOMEM); } diff --git a/drivers/md/dm-log-userspace-base.c b/drivers/md/dm-log-userspace-base.c index 058256d2eeea..53b7b06d0aa8 100644 --- a/drivers/md/dm-log-userspace-base.c +++ b/drivers/md/dm-log-userspace-base.c @@ -313,8 +313,7 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti, out: kfree(devices_rdata); if (r) { - if (lc->flush_entry_pool) - mempool_destroy(lc->flush_entry_pool); + mempool_destroy(lc->flush_entry_pool); kfree(lc); kfree(ctr_str); } else { diff --git a/drivers/md/dm-region-hash.c b/drivers/md/dm-region-hash.c index b929fd5f4984..f3d608bedffe 100644 --- a/drivers/md/dm-region-hash.c +++ b/drivers/md/dm-region-hash.c @@ -249,9 +249,7 @@ void dm_region_hash_destroy(struct dm_region_hash *rh) if (rh->log) dm_dirty_log_destroy(rh->log); - if (rh->region_pool) - mempool_destroy(rh->region_pool); - + mempool_destroy(rh->region_pool); vfree(rh->buckets); kfree(rh); } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index c31ca8bc6af5..95558432c080 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2254,10 +2254,8 @@ static void cleanup_mapped_device(struct mapped_device *md) destroy_workqueue(md->wq); if (md->kworker_task) kthread_stop(md->kworker_task); - if (md->io_pool) - mempool_destroy(md->io_pool); - if (md->rq_pool) - mempool_destroy(md->rq_pool); + mempool_destroy(md->io_pool); + mempool_destroy(md->rq_pool); if (md->bs) bioset_free(md->bs); @@ -3542,11 +3540,8 @@ void dm_free_md_mempools(struct dm_md_mempools *pools) if (!pools) return; - if (pools->io_pool) - mempool_destroy(pools->io_pool); - - if (pools->rq_pool) - mempool_destroy(pools->rq_pool); + mempool_destroy(pools->io_pool); + mempool_destroy(pools->rq_pool); if (pools->bs) bioset_free(pools->bs); From a3d939ae7b5f82688a6d3450f95286eaea338328 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Fri, 2 Oct 2015 11:21:24 -0400 Subject: [PATCH 06/13] dm: convert ffs to __ffs ffs counts bit starting with 1 (for the least significant bit), __ffs counts bits starting with 0. This patch changes various occurrences of ffs to __ffs and removes subtraction of 1 from the result. Note that __ffs (unlike ffs) is not defined when called with zero argument, but it is not called with zero argument in any of these cases. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer --- drivers/md/dm-bufio.c | 10 +++++----- drivers/md/dm-cache-policy-cleaner.c | 2 +- drivers/md/dm-cache-policy-mq.c | 2 +- drivers/md/dm-cache-policy-smq.c | 2 +- drivers/md/dm-exception-store.c | 2 +- drivers/md/dm-region-hash.c | 2 +- drivers/md/dm-snap-persistent.c | 2 +- 7 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index e1547d0dadfd..2dd33085b331 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -1598,11 +1598,11 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign c->bdev = bdev; c->block_size = block_size; - c->sectors_per_block_bits = ffs(block_size) - 1 - SECTOR_SHIFT; - c->pages_per_block_bits = (ffs(block_size) - 1 >= PAGE_SHIFT) ? - ffs(block_size) - 1 - PAGE_SHIFT : 0; - c->blocks_per_page_bits = (ffs(block_size) - 1 < PAGE_SHIFT ? - PAGE_SHIFT - (ffs(block_size) - 1) : 0); + c->sectors_per_block_bits = __ffs(block_size) - SECTOR_SHIFT; + c->pages_per_block_bits = (__ffs(block_size) >= PAGE_SHIFT) ? + __ffs(block_size) - PAGE_SHIFT : 0; + c->blocks_per_page_bits = (__ffs(block_size) < PAGE_SHIFT ? + PAGE_SHIFT - __ffs(block_size) : 0); c->aux_size = aux_size; c->alloc_callback = alloc_callback; diff --git a/drivers/md/dm-cache-policy-cleaner.c b/drivers/md/dm-cache-policy-cleaner.c index 240c9f0e85e7..c1612d43e601 100644 --- a/drivers/md/dm-cache-policy-cleaner.c +++ b/drivers/md/dm-cache-policy-cleaner.c @@ -83,7 +83,7 @@ static struct list_head *list_pop(struct list_head *q) static int alloc_hash(struct hash *hash, unsigned elts) { hash->nr_buckets = next_power(elts >> 4, 16); - hash->hash_bits = ffs(hash->nr_buckets) - 1; + hash->hash_bits = __ffs(hash->nr_buckets); hash->table = vzalloc(sizeof(*hash->table) * hash->nr_buckets); return hash->table ? 0 : -ENOMEM; diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c index aa1b41ca40f7..ddb26980cd66 100644 --- a/drivers/md/dm-cache-policy-mq.c +++ b/drivers/md/dm-cache-policy-mq.c @@ -1410,7 +1410,7 @@ static struct dm_cache_policy *mq_create(dm_cblock_t cache_size, mq->generation_period = max((unsigned) from_cblock(cache_size), 1024U); mq->nr_buckets = next_power(from_cblock(cache_size) / 2, 16); - mq->hash_bits = ffs(mq->nr_buckets) - 1; + mq->hash_bits = __ffs(mq->nr_buckets); mq->table = vzalloc(sizeof(*mq->table) * mq->nr_buckets); if (!mq->table) goto bad_alloc_table; diff --git a/drivers/md/dm-cache-policy-smq.c b/drivers/md/dm-cache-policy-smq.c index 1ffbeb1b3ea6..28d4586748d0 100644 --- a/drivers/md/dm-cache-policy-smq.c +++ b/drivers/md/dm-cache-policy-smq.c @@ -566,7 +566,7 @@ static int h_init(struct hash_table *ht, struct entry_space *es, unsigned nr_ent ht->es = es; nr_buckets = roundup_pow_of_two(max(nr_entries / 4u, 16u)); - ht->hash_bits = ffs(nr_buckets) - 1; + ht->hash_bits = __ffs(nr_buckets); ht->buckets = vmalloc(sizeof(*ht->buckets) * nr_buckets); if (!ht->buckets) diff --git a/drivers/md/dm-exception-store.c b/drivers/md/dm-exception-store.c index ebaa4f803eec..3eef6d6f15cd 100644 --- a/drivers/md/dm-exception-store.c +++ b/drivers/md/dm-exception-store.c @@ -183,7 +183,7 @@ int dm_exception_store_set_chunk_size(struct dm_exception_store *store, store->chunk_size = chunk_size; store->chunk_mask = chunk_size - 1; - store->chunk_shift = ffs(chunk_size) - 1; + store->chunk_shift = __ffs(chunk_size); return 0; } diff --git a/drivers/md/dm-region-hash.c b/drivers/md/dm-region-hash.c index f3d608bedffe..74cb7b991d41 100644 --- a/drivers/md/dm-region-hash.c +++ b/drivers/md/dm-region-hash.c @@ -193,7 +193,7 @@ struct dm_region_hash *dm_region_hash_create( rh->max_recovery = max_recovery; rh->log = log; rh->region_size = region_size; - rh->region_shift = ffs(region_size) - 1; + rh->region_shift = __ffs(region_size); rwlock_init(&rh->hash_lock); rh->mask = nr_buckets - 1; rh->nr_buckets = nr_buckets; diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c index bf71583296f7..046f6b81e6e6 100644 --- a/drivers/md/dm-snap-persistent.c +++ b/drivers/md/dm-snap-persistent.c @@ -321,7 +321,7 @@ static int read_header(struct pstore *ps, int *new_snapshot) bdev_logical_block_size(dm_snap_cow(ps->store->snap)-> bdev) >> 9); ps->store->chunk_mask = ps->store->chunk_size - 1; - ps->store->chunk_shift = ffs(ps->store->chunk_size) - 1; + ps->store->chunk_shift = __ffs(ps->store->chunk_size); chunk_size_supplied = 0; } From dbba42d8a9ebddcc1c1412e8457f79f3cb6ef6e7 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 21 Oct 2015 16:34:20 -0400 Subject: [PATCH 07/13] dm: eliminate unused "bioset" process for each bio-based DM device Commit 54efd50bfd873e2dbf784e0b21a8027ba4299a3e ("block: make generic_make_request handle arbitrarily sized bios") makes it possible for block devices to process large bios. In doing so that commit allocates a new queue->bio_split bioset for each block device, this bioset is used for allocating bios when the driver needs to split large bios. Each bioset allocates a workqueue process, thus the above commit increases the number of processes allocated per block device. DM doesn't need the queue->bio_split bioset, thus we can deallocate it. This reduces the number of allocated processes per bio-based DM device from 3 to 2. Also remove the call to blk_queue_split(), it is not needed because DM does its own splitting. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer --- drivers/md/dm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 95558432c080..64b50b71400f 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1763,8 +1763,6 @@ static void dm_make_request(struct request_queue *q, struct bio *bio) map = dm_get_live_table(md, &srcu_idx); - blk_queue_split(q, &bio, q->bio_split); - generic_start_io_acct(rw, bio_sectors(bio), &dm_disk(md)->part0); /* if we're suspended, we have to queue this io for later */ @@ -2792,6 +2790,12 @@ int dm_setup_md_queue(struct mapped_device *md) case DM_TYPE_BIO_BASED: dm_init_old_md_queue(md); blk_queue_make_request(md->queue, dm_make_request); + /* + * DM handles splitting bios as needed. Free the bio_split bioset + * since it won't be used (saves 1 process per bio-based DM device). + */ + bioset_free(md->queue->bio_split); + md->queue->bio_split = NULL; break; } From 4c7da06f5a780bbf44ebd7547789e48536d0a823 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Thu, 22 Oct 2015 16:46:59 -0400 Subject: [PATCH 08/13] dm persistent data: eliminate unnecessary return values dm_bm_unlock and dm_tm_unlock return an integer value but the returned value is always 0. The calling code sometimes checks the return value and sometimes doesn't. Eliminate these unnecessary return values and also the checks for them. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer --- drivers/md/dm-cache-metadata.c | 8 +++-- drivers/md/dm-era-target.c | 15 ++++---- drivers/md/dm-thin-metadata.c | 16 ++++++--- drivers/md/persistent-data/dm-array.c | 4 +-- drivers/md/persistent-data/dm-block-manager.c | 4 +-- drivers/md/persistent-data/dm-block-manager.h | 2 +- .../md/persistent-data/dm-btree-internal.h | 2 +- drivers/md/persistent-data/dm-btree-remove.c | 36 +++++-------------- drivers/md/persistent-data/dm-btree-spine.c | 20 ++++------- drivers/md/persistent-data/dm-btree.c | 4 ++- .../md/persistent-data/dm-space-map-common.c | 32 ++++++++--------- .../persistent-data/dm-transaction-manager.c | 4 +-- .../persistent-data/dm-transaction-manager.h | 2 +- 13 files changed, 67 insertions(+), 82 deletions(-) diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 20cc36b01b77..4a6c83e5bc41 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -260,7 +260,9 @@ static int __superblock_all_zeroes(struct dm_block_manager *bm, bool *result) } } - return dm_bm_unlock(b); + dm_bm_unlock(b); + + return 0; } static void __setup_mapping_info(struct dm_cache_metadata *cmd) @@ -465,7 +467,9 @@ static int __open_metadata(struct dm_cache_metadata *cmd) dm_disk_bitset_init(cmd->tm, &cmd->discard_info); sb_flags = le32_to_cpu(disk_super->flags); cmd->clean_when_opened = test_bit(CLEAN_SHUTDOWN, &sb_flags); - return dm_bm_unlock(sblock); + dm_bm_unlock(sblock); + + return 0; bad: dm_bm_unlock(sblock); diff --git a/drivers/md/dm-era-target.c b/drivers/md/dm-era-target.c index 0119ebfb3d49..665bf3285618 100644 --- a/drivers/md/dm-era-target.c +++ b/drivers/md/dm-era-target.c @@ -343,7 +343,9 @@ static int superblock_all_zeroes(struct dm_block_manager *bm, bool *result) } } - return dm_bm_unlock(b); + dm_bm_unlock(b); + + return 0; } /*----------------------------------------------------------------*/ @@ -582,7 +584,9 @@ static int open_metadata(struct era_metadata *md) md->metadata_snap = le64_to_cpu(disk->metadata_snap); md->archived_writesets = true; - return dm_bm_unlock(sblock); + dm_bm_unlock(sblock); + + return 0; bad: dm_bm_unlock(sblock); @@ -1046,12 +1050,7 @@ static int metadata_take_snap(struct era_metadata *md) md->metadata_snap = dm_block_location(clone); - r = dm_tm_unlock(md->tm, clone); - if (r) { - DMERR("%s: couldn't unlock clone", __func__); - md->metadata_snap = SUPERBLOCK_LOCATION; - return r; - } + dm_tm_unlock(md->tm, clone); return 0; } diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c index 6ba47cfb1443..1fa45695b68a 100644 --- a/drivers/md/dm-thin-metadata.c +++ b/drivers/md/dm-thin-metadata.c @@ -396,7 +396,9 @@ static int __superblock_all_zeroes(struct dm_block_manager *bm, int *result) } } - return dm_bm_unlock(b); + dm_bm_unlock(b); + + return 0; } static void __setup_btree_details(struct dm_pool_metadata *pmd) @@ -650,7 +652,9 @@ static int __open_metadata(struct dm_pool_metadata *pmd) } __setup_btree_details(pmd); - return dm_bm_unlock(sblock); + dm_bm_unlock(sblock); + + return 0; bad_cleanup_data_sm: dm_sm_destroy(pmd->data_sm); @@ -1297,7 +1301,9 @@ static int __release_metadata_snap(struct dm_pool_metadata *pmd) dm_btree_del(&pmd->details_info, le64_to_cpu(disk_super->device_details_root)); dm_sm_dec_block(pmd->metadata_sm, held_root); - return dm_tm_unlock(pmd->tm, copy); + dm_tm_unlock(pmd->tm, copy); + + return 0; } int dm_pool_release_metadata_snap(struct dm_pool_metadata *pmd) @@ -1327,7 +1333,9 @@ static int __get_metadata_snap(struct dm_pool_metadata *pmd, disk_super = dm_block_data(sblock); *result = le64_to_cpu(disk_super->held_root); - return dm_bm_unlock(sblock); + dm_bm_unlock(sblock); + + return 0; } int dm_pool_get_metadata_snap(struct dm_pool_metadata *pmd, diff --git a/drivers/md/persistent-data/dm-array.c b/drivers/md/persistent-data/dm-array.c index e64b61ad0ef3..431a03067d64 100644 --- a/drivers/md/persistent-data/dm-array.c +++ b/drivers/md/persistent-data/dm-array.c @@ -233,9 +233,9 @@ static int get_ablock(struct dm_array_info *info, dm_block_t b, /* * Unlocks an array block. */ -static int unlock_ablock(struct dm_array_info *info, struct dm_block *block) +static void unlock_ablock(struct dm_array_info *info, struct dm_block *block) { - return dm_tm_unlock(info->btree_info.tm, block); + dm_tm_unlock(info->btree_info.tm, block); } /*----------------------------------------------------------------*/ diff --git a/drivers/md/persistent-data/dm-block-manager.c b/drivers/md/persistent-data/dm-block-manager.c index 88dbe7b97c2c..f2393ba838eb 100644 --- a/drivers/md/persistent-data/dm-block-manager.c +++ b/drivers/md/persistent-data/dm-block-manager.c @@ -578,7 +578,7 @@ int dm_bm_write_lock_zero(struct dm_block_manager *bm, } EXPORT_SYMBOL_GPL(dm_bm_write_lock_zero); -int dm_bm_unlock(struct dm_block *b) +void dm_bm_unlock(struct dm_block *b) { struct buffer_aux *aux; aux = dm_bufio_get_aux_data(to_buffer(b)); @@ -590,8 +590,6 @@ int dm_bm_unlock(struct dm_block *b) bl_up_read(&aux->lock); dm_bufio_release(to_buffer(b)); - - return 0; } EXPORT_SYMBOL_GPL(dm_bm_unlock); diff --git a/drivers/md/persistent-data/dm-block-manager.h b/drivers/md/persistent-data/dm-block-manager.h index 84330f59886d..3627d1b7667a 100644 --- a/drivers/md/persistent-data/dm-block-manager.h +++ b/drivers/md/persistent-data/dm-block-manager.h @@ -94,7 +94,7 @@ int dm_bm_write_lock_zero(struct dm_block_manager *bm, dm_block_t b, struct dm_block_validator *v, struct dm_block **result); -int dm_bm_unlock(struct dm_block *b); +void dm_bm_unlock(struct dm_block *b); /* * It's a common idiom to have a superblock that should be committed last. diff --git a/drivers/md/persistent-data/dm-btree-internal.h b/drivers/md/persistent-data/dm-btree-internal.h index 8731b6ea026b..a240990a7f33 100644 --- a/drivers/md/persistent-data/dm-btree-internal.h +++ b/drivers/md/persistent-data/dm-btree-internal.h @@ -52,7 +52,7 @@ void inc_children(struct dm_transaction_manager *tm, struct btree_node *n, struct dm_btree_value_type *vt); int new_block(struct dm_btree_info *info, struct dm_block **result); -int unlock_block(struct dm_btree_info *info, struct dm_block *b); +void unlock_block(struct dm_btree_info *info, struct dm_block *b); /* * Spines keep track of the rolling locks. There are 2 variants, read-only diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c index 421a36c593e3..4d7617fc6dbb 100644 --- a/drivers/md/persistent-data/dm-btree-remove.c +++ b/drivers/md/persistent-data/dm-btree-remove.c @@ -165,9 +165,9 @@ static int init_child(struct dm_btree_info *info, struct dm_btree_value_type *vt return 0; } -static int exit_child(struct dm_btree_info *info, struct child *c) +static void exit_child(struct dm_btree_info *info, struct child *c) { - return dm_tm_unlock(info->tm, c->block); + dm_tm_unlock(info->tm, c->block); } static void shift(struct btree_node *left, struct btree_node *right, int count) @@ -249,13 +249,10 @@ static int rebalance2(struct shadow_spine *s, struct dm_btree_info *info, __rebalance2(info, parent, &left, &right); - r = exit_child(info, &left); - if (r) { - exit_child(info, &right); - return r; - } + exit_child(info, &left); + exit_child(info, &right); - return exit_child(info, &right); + return 0; } /* @@ -389,22 +386,9 @@ static int rebalance3(struct shadow_spine *s, struct dm_btree_info *info, __rebalance3(info, parent, &left, ¢er, &right); - r = exit_child(info, &left); - if (r) { - exit_child(info, ¢er); - exit_child(info, &right); - return r; - } - - r = exit_child(info, ¢er); - if (r) { - exit_child(info, &right); - return r; - } - - r = exit_child(info, &right); - if (r) - return r; + exit_child(info, &left); + exit_child(info, ¢er); + exit_child(info, &right); return 0; } @@ -428,9 +412,7 @@ static int rebalance_children(struct shadow_spine *s, memcpy(n, dm_block_data(child), dm_bm_block_size(dm_tm_get_bm(info->tm))); - r = dm_tm_unlock(info->tm, child); - if (r) - return r; + dm_tm_unlock(info->tm, child); dm_tm_dec(info->tm, dm_block_location(child)); return 0; diff --git a/drivers/md/persistent-data/dm-btree-spine.c b/drivers/md/persistent-data/dm-btree-spine.c index 0dee514ba4c5..b27b8091a1ca 100644 --- a/drivers/md/persistent-data/dm-btree-spine.c +++ b/drivers/md/persistent-data/dm-btree-spine.c @@ -117,9 +117,9 @@ int new_block(struct dm_btree_info *info, struct dm_block **result) return dm_tm_new_block(info->tm, &btree_node_validator, result); } -int unlock_block(struct dm_btree_info *info, struct dm_block *b) +void unlock_block(struct dm_btree_info *info, struct dm_block *b) { - return dm_tm_unlock(info->tm, b); + dm_tm_unlock(info->tm, b); } /*----------------------------------------------------------------*/ @@ -137,9 +137,7 @@ int exit_ro_spine(struct ro_spine *s) int r = 0, i; for (i = 0; i < s->count; i++) { - int r2 = unlock_block(s->info, s->nodes[i]); - if (r2 < 0) - r = r2; + unlock_block(s->info, s->nodes[i]); } return r; @@ -150,9 +148,7 @@ int ro_step(struct ro_spine *s, dm_block_t new_child) int r; if (s->count == 2) { - r = unlock_block(s->info, s->nodes[0]); - if (r < 0) - return r; + unlock_block(s->info, s->nodes[0]); s->nodes[0] = s->nodes[1]; s->count--; } @@ -194,9 +190,7 @@ int exit_shadow_spine(struct shadow_spine *s) int r = 0, i; for (i = 0; i < s->count; i++) { - int r2 = unlock_block(s->info, s->nodes[i]); - if (r2 < 0) - r = r2; + unlock_block(s->info, s->nodes[i]); } return r; @@ -208,9 +202,7 @@ int shadow_step(struct shadow_spine *s, dm_block_t b, int r; if (s->count == 2) { - r = unlock_block(s->info, s->nodes[0]); - if (r < 0) - return r; + unlock_block(s->info, s->nodes[0]); s->nodes[0] = s->nodes[1]; s->count--; } diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c index b6cec258cc21..25a62b73cfc0 100644 --- a/drivers/md/persistent-data/dm-btree.c +++ b/drivers/md/persistent-data/dm-btree.c @@ -141,7 +141,9 @@ int dm_btree_empty(struct dm_btree_info *info, dm_block_t *root) n->header.value_size = cpu_to_le32(info->value_type.size); *root = dm_block_location(b); - return unlock_block(info, b); + unlock_block(info, b); + + return 0; } EXPORT_SYMBOL_GPL(dm_btree_empty); diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c index aacbe70c2c2e..306d2e4502c4 100644 --- a/drivers/md/persistent-data/dm-space-map-common.c +++ b/drivers/md/persistent-data/dm-space-map-common.c @@ -259,9 +259,7 @@ int sm_ll_extend(struct ll_disk *ll, dm_block_t extra_blocks) idx.blocknr = cpu_to_le64(dm_block_location(b)); - r = dm_tm_unlock(ll->tm, b); - if (r < 0) - return r; + dm_tm_unlock(ll->tm, b); idx.nr_free = cpu_to_le32(ll->entries_per_block); idx.none_free_before = 0; @@ -293,7 +291,9 @@ int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result) *result = sm_lookup_bitmap(dm_bitmap_data(blk), b); - return dm_tm_unlock(ll->tm, blk); + dm_tm_unlock(ll->tm, blk); + + return 0; } static int sm_ll_lookup_big_ref_count(struct ll_disk *ll, dm_block_t b, @@ -373,9 +373,7 @@ int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin, return r; } - r = dm_tm_unlock(ll->tm, blk); - if (r < 0) - return r; + dm_tm_unlock(ll->tm, blk); *result = i * ll->entries_per_block + (dm_block_t) position; return 0; @@ -429,9 +427,7 @@ static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b, if (ref_count <= 2) { sm_set_bitmap(bm_le, bit, ref_count); - r = dm_tm_unlock(ll->tm, nb); - if (r < 0) - return r; + dm_tm_unlock(ll->tm, nb); if (old > 2) { r = dm_btree_remove(&ll->ref_count_info, @@ -445,9 +441,7 @@ static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b, __le32 le_rc = cpu_to_le32(ref_count); sm_set_bitmap(bm_le, bit, 3); - r = dm_tm_unlock(ll->tm, nb); - if (r < 0) - return r; + dm_tm_unlock(ll->tm, nb); __dm_bless_for_disk(&le_rc); r = dm_btree_insert(&ll->ref_count_info, ll->ref_count_root, @@ -556,7 +550,9 @@ static int metadata_ll_init_index(struct ll_disk *ll) memcpy(dm_block_data(b), &ll->mi_le, sizeof(ll->mi_le)); ll->bitmap_root = dm_block_location(b); - return dm_tm_unlock(ll->tm, b); + dm_tm_unlock(ll->tm, b); + + return 0; } static int metadata_ll_open(struct ll_disk *ll) @@ -570,7 +566,9 @@ static int metadata_ll_open(struct ll_disk *ll) return r; memcpy(&ll->mi_le, dm_block_data(block), sizeof(ll->mi_le)); - return dm_tm_unlock(ll->tm, block); + dm_tm_unlock(ll->tm, block); + + return 0; } static dm_block_t metadata_ll_max_entries(struct ll_disk *ll) @@ -590,7 +588,9 @@ static int metadata_ll_commit(struct ll_disk *ll) memcpy(dm_block_data(b), &ll->mi_le, sizeof(ll->mi_le)); ll->bitmap_root = dm_block_location(b); - return dm_tm_unlock(ll->tm, b); + dm_tm_unlock(ll->tm, b); + + return 0; } int sm_ll_new_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm) diff --git a/drivers/md/persistent-data/dm-transaction-manager.c b/drivers/md/persistent-data/dm-transaction-manager.c index 9cb797d800cf..abe2c5dd0993 100644 --- a/drivers/md/persistent-data/dm-transaction-manager.c +++ b/drivers/md/persistent-data/dm-transaction-manager.c @@ -342,9 +342,9 @@ int dm_tm_read_lock(struct dm_transaction_manager *tm, dm_block_t b, } EXPORT_SYMBOL_GPL(dm_tm_read_lock); -int dm_tm_unlock(struct dm_transaction_manager *tm, struct dm_block *b) +void dm_tm_unlock(struct dm_transaction_manager *tm, struct dm_block *b) { - return dm_bm_unlock(b); + dm_bm_unlock(b); } EXPORT_SYMBOL_GPL(dm_tm_unlock); diff --git a/drivers/md/persistent-data/dm-transaction-manager.h b/drivers/md/persistent-data/dm-transaction-manager.h index 2e0d4d66fb1b..f3a18be68f30 100644 --- a/drivers/md/persistent-data/dm-transaction-manager.h +++ b/drivers/md/persistent-data/dm-transaction-manager.h @@ -94,7 +94,7 @@ int dm_tm_read_lock(struct dm_transaction_manager *tm, dm_block_t b, struct dm_block_validator *v, struct dm_block **result); -int dm_tm_unlock(struct dm_transaction_manager *tm, struct dm_block *b); +void dm_tm_unlock(struct dm_transaction_manager *tm, struct dm_block *b); /* * Functions for altering the reference count of a block directly. From 00272c854ee17b804ce81ef706f611dac17f4f89 Mon Sep 17 00:00:00 2001 From: Tomohiro Kusumi Date: Wed, 28 Oct 2015 04:38:58 +0900 Subject: [PATCH 09/13] dm linear: remove redundant target name from error messages Commit 72d94861 back in 2006 should have consistently removed "dm-linear: " from all error messages. Signed-off-by: Tomohiro Kusumi Signed-off-by: Mike Snitzer --- drivers/md/dm-linear.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index de52864d60fa..05c35aacb3aa 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -39,20 +39,20 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) lc = kmalloc(sizeof(*lc), GFP_KERNEL); if (lc == NULL) { - ti->error = "dm-linear: Cannot allocate linear context"; + ti->error = "Cannot allocate linear context"; return -ENOMEM; } ret = -EINVAL; if (sscanf(argv[1], "%llu%c", &tmp, &dummy) != 1) { - ti->error = "dm-linear: Invalid device sector"; + ti->error = "Invalid device sector"; goto bad; } lc->start = tmp; ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &lc->dev); if (ret) { - ti->error = "dm-linear: Device lookup failed"; + ti->error = "Device lookup failed"; goto bad; } From 340c9ec09b21c29e1e53284acc8993ddf6fc5b2a Mon Sep 17 00:00:00 2001 From: Tomohiro Kusumi Date: Wed, 28 Oct 2015 04:38:55 +0900 Subject: [PATCH 10/13] dm delay: Use DM_MAPIO macros instead of open-coded equivalents .map function of dm-delay returns return value of delay_bio(), hence it's supposed to return using a defined DM_MAPIO macro. Signed-off-by: Tomohiro Kusumi Acked-By: Mikulas Patocka Signed-off-by: Mike Snitzer --- drivers/md/dm-delay.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c index b34f6e27293d..e10f69a22c46 100644 --- a/drivers/md/dm-delay.c +++ b/drivers/md/dm-delay.c @@ -237,7 +237,7 @@ static int delay_bio(struct delay_c *dc, int delay, struct bio *bio) unsigned long expires = 0; if (!delay || !atomic_read(&dc->may_delay)) - return 1; + return DM_MAPIO_REMAPPED; delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info)); @@ -257,7 +257,7 @@ static int delay_bio(struct delay_c *dc, int delay, struct bio *bio) queue_timeout(dc, expires); - return 0; + return DM_MAPIO_SUBMITTED; } static void delay_presuspend(struct dm_target *ti) From e213f33e4d3a00e9916a58e0fff367a7c60e3c9c Mon Sep 17 00:00:00 2001 From: Tomohiro Kusumi Date: Wed, 28 Oct 2015 04:38:57 +0900 Subject: [PATCH 11/13] dm delay: capitalize the start of an delay_ctr() error message All other error messages start capitalized. Signed-off-by: Tomohiro Kusumi Signed-off-by: Mike Snitzer --- drivers/md/dm-delay.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c index e10f69a22c46..9a4362d53508 100644 --- a/drivers/md/dm-delay.c +++ b/drivers/md/dm-delay.c @@ -132,7 +132,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv) int ret; if (argc != 3 && argc != 6) { - ti->error = "requires exactly 3 or 6 arguments"; + ti->error = "Requires exactly 3 or 6 arguments"; return -EINVAL; } From f49e869a61829b8ac6eb069b3824f738cd0146e6 Mon Sep 17 00:00:00 2001 From: Tomohiro Kusumi Date: Wed, 28 Oct 2015 04:38:56 +0900 Subject: [PATCH 12/13] dm delay: document that offsets are specified in sectors Only delay params are mentioned in delay.txt. Mention offsets just like documents for linear and flakey do. Signed-off-by: Tomohiro Kusumi Signed-off-by: Mike Snitzer --- Documentation/device-mapper/delay.txt | 1 + drivers/md/dm-delay.c | 1 + 2 files changed, 2 insertions(+) diff --git a/Documentation/device-mapper/delay.txt b/Documentation/device-mapper/delay.txt index 15adc55359e5..a07b5927f4a8 100644 --- a/Documentation/device-mapper/delay.txt +++ b/Documentation/device-mapper/delay.txt @@ -8,6 +8,7 @@ Parameters: [ ] With separate write parameters, the first set is only used for reads. +Offsets are specified in sectors. Delays are specified in milliseconds. Example scripts diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c index 9a4362d53508..b4c356a21123 100644 --- a/drivers/md/dm-delay.c +++ b/drivers/md/dm-delay.c @@ -122,6 +122,7 @@ static void flush_expired_bios(struct work_struct *work) * [ ] * * With separate write parameters, the first set is only used for reads. + * Offsets are specified in sectors. * Delays are specified in milliseconds. */ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv) From aad9ae4550755edc020b5c511a8b54f0104b2f47 Mon Sep 17 00:00:00 2001 From: Tomohiro Kusumi Date: Thu, 29 Oct 2015 03:54:21 +0900 Subject: [PATCH 13/13] dm switch: simplify conditional in alloc_region_table() The variable sctx->nr_regions has type unsigned long and the variable nr_regions has type sector_t. Thus the variables may be different when overflow happens. Changed the conditional to "if (nr_regions >= ULONG_MAX)". Also move the assignment of nr_regions after sector_div() and the sanity check which looks more sane. Signed-off-by: Tomohiro Kusumi Reviewed-by: Mikulas Patocka Signed-off-by: Mike Snitzer --- drivers/md/dm-switch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-switch.c b/drivers/md/dm-switch.c index b1285753a5d4..871c18fe000d 100644 --- a/drivers/md/dm-switch.c +++ b/drivers/md/dm-switch.c @@ -99,11 +99,11 @@ static int alloc_region_table(struct dm_target *ti, unsigned nr_paths) if (sector_div(nr_regions, sctx->region_size)) nr_regions++; - sctx->nr_regions = nr_regions; - if (sctx->nr_regions != nr_regions || sctx->nr_regions >= ULONG_MAX) { + if (nr_regions >= ULONG_MAX) { ti->error = "Region table too large"; return -EINVAL; } + sctx->nr_regions = nr_regions; nr_slots = nr_regions; if (sector_div(nr_slots, sctx->region_entries_per_slot))