Commit graph

2100 commits

Author SHA1 Message Date
NeilBrown 29d3247ea2 md/bitmap remove fault injection options.
These are too hard to use to be much more than noise.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:49:56 +11:00
NeilBrown d1688a6d55 md/raid5: typedef removal: raid5_conf_t -> struct r5conf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:49:52 +11:00
NeilBrown e809636047 md/raid1: typedef removal: conf_t -> struct r1conf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:49:05 +11:00
NeilBrown e879a8793f md/raid10: typedef removal: conf_t -> struct r10conf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:49:02 +11:00
NeilBrown e373ab1091 md/raid0: typedef removal: raid0_conf_t -> struct r0conf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:59 +11:00
NeilBrown 69724e28ca md/multipath: typedef removal: multipath_conf_t -> struct mpconf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:57 +11:00
NeilBrown e849b9381f md/linear: typedef removal: linear_conf_t -> struct linear_conf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:54 +11:00
NeilBrown 8f1ae43dd2 md/faulty: remove typedef: conf_t -> struct faulty_conf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:52 +11:00
NeilBrown a71207713a md/linear: remove typedefs: dev_info_t -> struct dev_info
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:49 +11:00
NeilBrown 0f6d02d580 md: remove typedefs: mirror_info_t -> struct mirror_info
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:46 +11:00
NeilBrown 9f2c9d12bc md: remove typedefs: r10bio_t -> struct r10bio and r1bio_t -> struct r1bio
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:43 +11:00
NeilBrown 2b8bf3451d md: remove typedefs: mdk_thread_t -> struct md_thread
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:23 +11:00
NeilBrown fd01b88c75 md: remove typedefs: mddev_t -> struct mddev
Having mddev_t and 'struct mddev_s' is ugly and not preferred

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:47:53 +11:00
NeilBrown 3cb0300200 md: removing typedefs: mdk_rdev_t -> struct md_rdev
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h'
which used to be an include file that defined this thing.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:45:26 +11:00
NeilBrown 50de8df4ab md/raid0: convert some printks to pr_debug.
When md assembles a RAID0 array it prints out lots of info which
is really just for debugging, so convert that to pr_debug.
It also prints out the resulting configuration which could be
interesting, so keep that as 'printk' but tidy it up a bit.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:23:22 +11:00
NeilBrown 36a4e1fe0f md: remove PRINTK and dprintk debugging and use pr_debug
Being able to dynamically enable these make them much more useful.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:23:17 +11:00
NeilBrown bdc04e6b15 md: remove some old DEBUGging code.
This code is not really helpful and is hard to maintain, so just
discard it.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:23:04 +11:00
NeilBrown db298e1946 md/raid5: convert to macros into inline functions.
More type-safety.  Easier to read.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:23:00 +11:00
NeilBrown 0fc280f606 md/raid1/ avoid bio search in end_sync_read()
We know which device we just read from so we don't need to
search the bios to find out.  Just use ->read_disk.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:22:55 +11:00
Namhyung Kim ba3ae3bee3 md/raid1: factor out common bio handling code
When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:22:53 +11:00
NeilBrown e4f869d9de md/raid5: remove pointless NULL test.
In the 'abort' branch of run(), 'conf' cannot possibly be NULL,
so remove the test.

Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:22:49 +11:00
NeilBrown ce550c2059 md/raid1: add documentation to r1_private_data_s data structure.
There wasn't much and it is inconsistent.
Also rearrange fields to keep related fields together.

Reported-by: Aapo Laine <aapo.laine@shiftmail.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07 14:22:33 +11:00
Daniel P. Berrange 2dba6a911c md: don't delay reboot by 1 second if no MD devices exist
The md_notify_reboot() method includes a call to mdelay(1000),
to deal with "exotic SCSI devices" which are too volatile on
reboot. The delay is unconditional. Even if the machine does
not have any block devices, let alone MD devices, the kernel
shutdown sequence is slowed down.

1 second does not matter much with physical hardware, but with
certain virtualization use cases any wasted time in the bootup
& shutdown sequence counts for alot.

* drivers/md/md.c: md_notify_reboot() - only impose a delay if
  there was at least one MD device to be stopped during reboot

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-23 19:54:04 +10:00
Wang Sheng-Hui 7e84152626 trival: md_k.h should be md.h in the beginning comment of file md.h
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21 15:37:46 +10:00
NeilBrown 2585f3ef8c md/bitmap: improve handling of 'allclean'.
The 'allclean' flag is used to cache the fact that there is nothing to
do, so we can avoid waking up and scanning the bitmap regularly.

The two sorts of pages that might need the attention of the bitmap
daemon are BITMAP_PAGE_PENDING and BITMAP_PAGE_NEEDWRITE pages.

So make sure allclean reflects exactly when there are none of those.
So:
  set it before scanning all pages with either bit set.
  clear it whenever these bits are set
  clear it when we desire not to clear one of these bits.
  don't clear it any other time.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21 15:37:46 +10:00
NeilBrown 5a537df44d md/bitmap: rename and tidy up BITMAP_PAGE_CLEAN
The flag 'BITMAP_PAGE_CLEAN' has a confusing name as it doesn't mean
that the page is clean, but rather that there are counters in the page
which allow bits in the bitmap to be cleared - i.e. maybe cleaning can
happen.

So change it to BITMAP_PAGE_PENDING and fix some irregularities:
 - Don't set it in bitmap_init_from_disk as bitmap_set_memory_bits
   sets it when needed
 - in bitmap_daemon_work, if we find a counter that is '1', but
   need_sync is set, then set BITMAP_PAGE_PENDING again (it was
   recently cleared) to ensure we don't forget about this bit.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21 15:37:46 +10:00
NeilBrown 01f96c0a99 md: Avoid waking up a thread after it has been freed.
Two related problems:

1/ some error paths call "md_unregister_thread(mddev->thread)"
   without subsequently clearing ->thread.  A subsequent call
   to mddev_unlock will try to wake the thread, and crash.

2/ Most calls to md_wakeup_thread are protected against the thread
   disappeared either by:
      - holding the ->mutex
      - having an active request, so something else must be keeping
        the array active.
   However mddev_unlock calls md_wakeup_thread after dropping the
   mutex and without any certainty of an active request, so the
   ->thread could theoretically disappear.
   So we need a spinlock to provide some protections.

So change md_unregister_thread to take a pointer to the thread
pointer, and ensure that it always does the required locking, and
clears the pointer properly.

Reported-by: "Moshe Melnikov" <moshe@zadarastorage.com>
Signed-off-by: NeilBrown <neilb@suse.de>
cc: stable@kernel.org
2011-09-21 15:30:20 +10:00
NeilBrown 27a7b260f7 md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.

Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.

Also the sanity check at the end of super_90_load should include level
1 as it used ->size too. (RAID0 and Linear don't use ->size at all).

Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-10 17:21:28 +10:00
NeilBrown 079fa166a2 md/raid1,10: Remove use-after-free bug in make_request.
A single request to RAID1 or RAID10 might result in multiple
requests if there are known bad blocks that need to be avoided.

To detect if we need to submit another write request we test:
 	if (sectors_handled < (bio->bi_size >> 9)) {

However this is after we call **_write_done() so the 'bio' no longer
belongs to us - the writes could have completed and the bio freed.

So move the **_write_done call until after the test against
bio->bi_size.

This addresses https://bugzilla.kernel.org/show_bug.cgi?id=41862

Reported-by: Bruno Wolff III <bruno@wolff.to>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-10 17:21:23 +10:00
NeilBrown 19d5f834d6 md/raid10: unify handling of write completion.
A write can complete at two different places:
1/ when the last member-device write completes, through
   raid10_end_write_request
2/ in make_request() when we remove the initial bias from ->remaining.

These two should do exactly the same thing and the comment says they
do, but they don't.

So factor the correct code out into a function and call it in both
places.  This makes the code much more similar to RAID1.

The difference is only significant if there is an error, and they
usually take a while, so it is unlikely that there will be an error
already when make_request is completing, so this is unlikely to cause
real problems.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-10 17:21:17 +10:00
NeilBrown 43220aa0f2 md/raid5: fix a hang on device failure.
Waiting for a 'blocked' rdev to become unblocked in the raid5d thread
cannot work with internal metadata as it is the raid5d thread which
will clear the blocked flag.
This wasn't a problem in 3.0 and earlier as we only set the blocked
flag when external metadata was used then.
However we now set it always, so we need to be more careful.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-31 12:49:14 +10:00
NeilBrown 7da64a0abc md: fix clearing of 'blocked' flag in the presence of bad blocks.
When the 'blocked' flag on a device is cleared while there are
unacknowledged bad blocks we must fail the device.  This is needed for
backwards compatability of the interface.

The code currently uses the wrong test for "unacknowledged bad blocks
exist".  Change it to the right test.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-30 16:20:17 +10:00
NeilBrown 1b6afa1758 md/linear: avoid corrupting structure while waiting for rcu_free to complete.
I don't know what I was thinking putting 'rcu' after a dynamically
sized array!  The array could still be in use when we call rcu_free()
(That is the point) so we mustn't corrupt it.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-25 14:43:53 +10:00
Namhyung Kim a5bf4df0c8 md: use REQ_NOIDLE flag in md_super_write()
Queue idling is used for the anticipation of immediate
sequencial I/O's but md_super_write() is a kind of one-
shot operation, coupled with md_super_wait(), so the
idling in this case will be just a waste of time.

Specifying REQ_NOIDLE prevents it. Instead of adding
the flag to submit_bio() directly, use pre-defined
macro WRITE_FLUSH_FUA.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-25 14:43:34 +10:00
NeilBrown aeb9b21184 md: ensure changes to 'write-mostly' are reflected in metadata.
The 'write-mostly' flag can be changed through sysfs.
With 0.90 metadata, those changes are reflected in the metadata.
For 1.x metadata, they aren't.

So fix super_1_sync to record 'write-mostly' status.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-25 14:43:08 +10:00
NeilBrown 5ef56c8fec md: report failure if a 'set faulty' request doesn't.
Sometimes a device will refuse to be set faulty.  e.g. RAID1 will
never let the last working device become faulty.

So check if "md_error()" did manage to set the faulty flag and fail
with EBUSY if it didn't.

Resolves-Debian-Bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=601198
Reported-by: Mike Hommey <mh+reportbug@glandium.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-25 14:42:51 +10:00
Mike Snitzer ed8b752bcc dm table: set flush capability based on underlying devices
DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
regardless of whether or not a given DM device's underlying devices
also advertised a need for them.

Block's flush-merge changes from 2.6.39 have proven to be more costly
for DM devices.  Performance regressions have been reported even when
DM's underlying devices do not advertise that they have a write cache.

Fix the performance regressions by configuring a DM device's flushing
capabilities based on those of the underlying devices' capabilities.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:08 +01:00
Milan Broz 772ae5f54d dm crypt: optionally support discard requests
Add optional parameter field to dmcrypt table and support
"allow_discards" option.

Discard requests bypass crypt queue processing. Bio is simple remapped
to underlying device.

Note that discard will be never enabled by default because of security
consequences.  It is up to the administrator to enable it for encrypted
devices.

(Note that userspace cryptsetup does not understand new optional
parameters yet.  Support for this will come later.  Until then, you
should use 'dmsetup' to enable and disable this.)

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:08 +01:00
Jonathan Brassow 327372797c dm raid: add md raid1 support
Support the MD RAID1 personality through dm-raid.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:07 +01:00
Jonathan Brassow b12d437b73 dm raid: support metadata devices
Add the ability to parse and use metadata devices to dm-raid.  Although
not strictly required, without the metadata devices, many features of
RAID are unavailable.  They are used to store a superblock and bitmap.

The role, or position in the array, of each device must be recorded in
its superblock.  This is to help with fault handling, array reshaping,
and sanity checks.  RAID 4/5/6 devices must be loaded in a specific order:
in this way, the 'array_position' field helps validate the correctness
of the mapping when it is loaded.  It can be used during reshaping to
identify which devices are added/removed.  Fault handling is impossible
without this field.  For example, when a device fails it is recorded in
the superblock.  If this is a RAID1 device and the offending device is
removed from the array, there must be a way during subsequent array
assembly to determine that the failed device was the one removed.  This
is done by correlating the 'array_position' field and the bit-field
variable 'failed_devices'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:07 +01:00
Jonathan Brassow 46bed2b5c1 dm raid: add write_mostly parameter
Add the write_mostly parameter to RAID1 dm-raid tables.

This allows the user to set the WriteMostly flag on a RAID1 device that
should normally be avoided for read I/O.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:07 +01:00
Jonathan Brassow c1084561bb dm raid: add region_size parameter
Allow the user to specify the region_size.

Ensures that the supplied value meets md's constraints, viz. the number of
regions does not exceed 2^21.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:07 +01:00
Mikulas Patocka 759dea204c dm ioctl: forbid multiple device specifiers
Exactly one of name, uuid or device must be specified when referencing
an existing device.  This removes the ambiguity (risking the wrong
device being updated) if two conflicting parameters were specified.
Previously one parameter got used and any others were ignored silently.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:06 +01:00
Mikulas Patocka ba2e19b0f4 dm ioctl: introduce __get_dev_cell
Move logic to find device based on major/minor number to a separate
function __get_dev_cell (similar to __get_uuid_cell and __get_name_cell).
This makes the function __find_device_hash_cell more straightforward.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:06 +01:00
Mikulas Patocka 0ddf9644cc dm ioctl: fill in device parameters in more ioctls
Move parameter filling from find_device to __find_device_hash_cell.

This patch causes ioctls using __find_device_hash_cell
(DM_DEV_REMOVE_CMD, DM_DEV_SUSPEND_CMD - resume, DM_TABLE_CLEAR_CMD)
to return device parameters, bringing them into line with the other
ioctls.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:06 +01:00
Mike Snitzer a3998799fb dm flakey: add corrupt_bio_byte feature
Add corrupt_bio_byte feature to simulate corruption by overwriting a byte at a
specified position with a specified value during intervals when the device is
"down".

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:06 +01:00
Mike Snitzer b26f5e3d71 dm flakey: add drop_writes
Add 'drop_writes' option to drop writes silently while the
device is 'down'.  Reads are not touched.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:05 +01:00
Mike Snitzer dfd068b01f dm flakey: support feature args
Add the ability to specify arbitrary feature flags when creating a
flakey target.  This code uses the same target argument helpers that
the multipath target does.

Also remove the superfluous 'dm-flakey' prefixes from the error messages,
as they already contain the prefix 'flakey'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:05 +01:00
Mike Snitzer 30e4171bfe dm flakey: use dm_target_offset and support discards
Use dm_target_offset() and support discards.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:05 +01:00
Mike Snitzer 498f0103ea dm table: share target argument parsing functions
Move multipath target argument parsing code into dm-table so other
targets can share it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:04 +01:00