1
0
Fork 0
Commit Graph

175 Commits (1294b9c973251a5e68b62c9b40dd914517bda675)

Author SHA1 Message Date
NeilBrown 1294b9c973 md/raid10: simplify/reindent some loops.
When a loop ends with a large if, it can be neater to change the
if to invert the condition and just 'continue'.
Then the body of the if can be indented to a lower level.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28 11:39:23 +10:00
NeilBrown de393cdea6 md: make it easier to wait for bad blocks to be acknowledged.
It is only safe to choose not to write to a bad block if that bad
block is safely recorded in metadata - i.e. if it has been
'acknowledged'.

If it hasn't we need to wait for the acknowledgement.

We support that using rdev->blocked wait and
md_wait_for_blocked_rdev by introducing a new device flag
'BlockedBadBlock'.

This flag is only advisory.
It is cleared whenever we acknowledge a bad block, so that a waiter
can re-check the particular bad blocks that it is interested it.

It should be set by a caller when they find they need to wait.
This (set after test) is inherently racy, but as
md_wait_for_blocked_rdev already has a timeout, losing the race will
have minimal impact.

When we clear "Blocked" was also clear "BlockedBadBlocks" incase it
was set incorrectly (see above race).

We also modify the way we manage 'Blocked' to fit better with the new
handling of 'BlockedBadBlocks' and to make it consistent between
externally managed and internally managed metadata.   This requires
that each raidXd loop checks if the metadata needs to be written and
triggers a write (md_check_recovery) if needed.  Otherwise a queued
write request might cause raidXd to wait for the metadata to write,
and only that thread can write it.

Before writing metadata, we set FaultRecorded for all devices that
are Faulty, then after writing the metadata we clear Blocked for any
device for which the Fault was certainly Recorded.

The 'faulty' device flag now appears in sysfs if the device is faulty
*or* it has unacknowledged bad blocks.  So user-space which does not
understand bad blocks can continue to function correctly.
User space which does, should not assume a device is faulty until it
sees the 'faulty' flag, and then sees the list of unacknowledged bad
blocks is empty.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28 11:31:48 +10:00
NeilBrown 34b343cff4 md: don't allow arrays to contain devices with bad blocks.
As no personality understand bad block lists yet, we must
reject any device that is known to contain bad blocks.
As the personalities get taught, these tests can be removed.

This only applies to raid1/raid5/raid10.
For linear/raid0/multipath/faulty the whole concept of bad blocks
doesn't mean anything so there is no point adding the checks.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-28 11:31:47 +10:00
Namhyung Kim cbea21703b md/raid10: move rdev->corrected_errors counting
Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27 11:00:36 +10:00
NeilBrown 700c721389 md/raid10: Improve decision on whether to fail a device with a read error.
Normally we would fail a device with a READ error.  However if doing
so causes the array to fail, it is better to leave the device
in place and just return the read error to the caller.

The current test for decide if the array will fail is overly
simplistic.
We have a function 'enough' which can tell if the array is failed or
not, so use it to guide the decision.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27 11:00:36 +10:00
NeilBrown 2bb77736ae md/raid10: Make use of new recovery_disabled handling
When we get a read error during recovery, RAID10 previously
arranged for the recovering device to appear to fail so that
the recovery stops and doesn't restart.  This is misleading and wrong.

Instead, make use of the new recovery_disabled handling and mark
the target device and having recovery disabled.

Add appropriate checks in add_disk and remove_disk so that devices
are removed and not re-added when recovery is disabled.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27 11:00:36 +10:00
Christian Dietrich 8bda470e8e md/raid: use printk_ratelimited instead of printk_ratelimit
As per printk_ratelimit comment, it should not be used.

Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27 11:00:36 +10:00
Namhyung Kim c65060ad42 md/raid10: share pages between read and write bio's during recovery
When performing a recovery, only first 2 slots in r10_bio are in use,
for read and write respectively. However all of pages in the write bio
are never used and just replaced to read bio's when the read completes.

Get rid of those unused pages and share read pages properly.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:49 +10:00
Namhyung Kim 778ca01852 md/raid10: factor out common bio handling code
When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:47 +10:00
Namhyung Kim 2c4193df37 md/raid10: get rid of duplicated conditional expression
Variable 'first' is initialized to zero and updated to @rdev->raid_disk
only if it is greater than 0. Thus condition '>= first' always implies
'>= 0' so the latter is not needed.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:43 +10:00
NeilBrown ab9d47e990 md/raid10: reformat some loops with less indenting.
When a loop ends with an 'if' with a large body, it is neater
to make the if 'continue' on the inverse condition, and then
the body is indented less.

Apply this pattern 3 times, and wrap some other long lines.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:54:41 +10:00
NeilBrown f17ed07c85 md/raid10: remove unused variable.
This variable 'disk' is never used - how odd.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:54:32 +10:00
NeilBrown a8830bcaf3 md/raid10: make more use of 'slot' in raid10d.
Now that we have a 'slot' variable, make better use of it to simplify
some code a little.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:54:19 +10:00
NeilBrown 7c4e06ff2b md/raid10: some tidying up in fix_read_error
Currently the rdev on which a read error happened could be removed
before we perform the fix_error handling.  This requires extra tests
for NULL.

So delay the rdev_dec_pending call until after the call to
fix_read_error so that we can be sure that the rdev still exists.

This allows an 'if' clause to be removed so the body gets re-indented
back one level.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:53:17 +10:00
NeilBrown 56d9912106 md: simplify raid10 read_balance
raid10 read balance has two different loop for looking through
possible devices to chose the best.
Collapse those into one loop and generally make the code more
readable.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:27:03 +10:00
NeilBrown c3b328ac84 md: fix up raid1/raid10 unplugging.
We just need to make sure that an unplug event wakes up the md
thread, which is exactly what mddev_check_plugged does.

Also remove some plug-related code that is no longer needed.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-18 18:25:43 +10:00
NeilBrown e1dfa0a297 md: use new plugging interface for RAID IO.
md/raid submits a lot of IO from the various raid threads.
So adding start/finish plug calls to those so that some
plugging happens.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-18 18:25:41 +10:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Martin K. Petersen a91a2785b2 block: Require subsystems to explicitly allocate bio_set integrity mempool
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.

Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.

Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snizer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-17 11:11:05 +01:00
Jens Axboe 4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe 7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
NeilBrown da9cf5050a md: avoid spinlock problem in blk_throtl_exit
blk_throtl_exit assumes that ->queue_lock still exists,
so make sure that it does.
To do this, we stop redirecting ->queue_lock to conf->device_lock
and leave it pointing where it is initialised - __queue_lock.

As the blk_plug functions check the ->queue_lock is held, we now
take that spin_lock explicitly around the plug functions.  We don't
need the locking, just the warning removal.

This is needed for any kernel with the blk_throtl code, which is
which is 2.6.37 and later.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-21 18:25:57 +11:00
Krzysztof Wojcik 02214dc546 FIX: md: process hangs at wait_barrier after 0->10 takeover
Following symptoms were observed:
1. After raid0->raid10 takeover operation we have array with 2
missing disks.
When we add disk for rebuild, recovery process starts as expected
but it does not finish- it stops at about 90%, md126_resync process
hangs in "D" state.
2. Similar behavior is when we have mounted raid0 array and we
execute takeover to raid10. After this when we try to unmount array-
it causes process umount hangs in "D"

In scenarios above processes hang at the same function- wait_barrier
in raid10.c.
Process waits in macro "wait_event_lock_irq" until the
"!conf->barrier" condition will be true.
In scenarios above it never happens.

Reason was that at the end of level_store, after calling pers->run,
we call mddev_resume. This calls pers->quiesce(mddev, 0) with
RAID10, that calls lower_barrier.
However raise_barrier hadn't been called on that 'conf' yet,
so conf->barrier becomes negative, which is bad.

This patch introduces setting conf->barrier=1 after takeover
operation. It prevents to become barrier negative after call
lower_barrier().

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-08 11:49:02 +11:00
Jonathan Brassow ccebd4c415 md-new-param-to_sync_page_io
Add new parameter to 'sync_page_io'.

The new parameter allows us to distinguish between metadata and data
operations.  This becomes important later when we add the ability to
use separate devices for data and metadata.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
2011-01-14 09:14:33 +11:00
Joe Perches 067032bc62 md: Fix single printks with multiple KERN_<level>s
Noticed-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-14 09:14:33 +11:00
NeilBrown 589a594be1 md: protect against NULL reference when waiting to start a raid10.
When we fail to start a raid10 for some reason, we call
md_unregister_thread to kill the thread that was created.

Unfortunately md_thread() will then make one call into the handler
(raid10d) even though md_wakeup_thread has not been called.  This is
not safe and as md_unregister_thread is called after mddev->private
has been set to NULL, it will definitely cause a NULL dereference.

So fix this at both ends:
 - md_thread should only call the handler if THREAD_WAKEUP has been
   set.
 - raid10 should call md_unregister_thread before setting things
   to NULL just like all the other raid modules do.

This is applicable to 2.6.35 and later.

Cc: stable@kernel.org
Reported-by: "Citizen" <citizen_lee@thecus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-09 17:02:14 +11:00
NeilBrown a167f66324 md: use separate bio pool for each md device.
bio_clone and bio_alloc allocate from a common bio pool.
If an md device is stacked with other devices that use this pool, or under
something like swap which uses the pool, then the multiple calls on
the pool can cause deadlocks.

So allocate a local bio pool for each md array and use that rather
than the common pool.

This pool is used both for regular IO and metadata updates.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-28 17:36:15 +11:00
NeilBrown 2b193363ef md: change type of first arg to sync_page_io.
Currently sync_page_io takes a 'bdev'.
Every caller passes 'rdev->bdev'.
We will soon want another field out of the rdev in sync_page_io,
So just pass the rdev instead of the bdev out of it.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-28 17:36:11 +11:00
NeilBrown 6746557f03 md: use bio_kmalloc rather than bio_alloc when failure is acceptable.
bio_alloc can never fail (as it uses a mempool) but an block
indefinitely, especially if the caller is holding a reference to a
previously allocated bio.

So these to places which both handle failure and hold multiple bios
should not use bio_alloc, they should use bio_kmalloc.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-28 17:36:06 +11:00
NeilBrown 4e78064f42 md: Fix possible deadlock with multiple mempool allocations.
It is not safe to allocate from a mempool while holding an item
previously allocated from that mempool as that can deadlock when the
mempool is close to exhaustion.

So don't use a bio list to collect the bios to write to multiple
devices in raid1 and raid10.
Instead queue each bio as it becomes available so an unplug will
activate all previously allocated bios and so a new bio has a chance
of being allocated.

This means we must set the 'remaining' count to '1' before submitting
any requests, then when all are submitted, decrement 'remaining' and
possible handle the write completion at that point.

Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-28 17:34:07 +11:00
NeilBrown 57dab0bdf6 md: use sector_t in bitmap_get_counter
bitmap_get_counter returns the number of sectors covered
by the counter in a pass-by-reference variable.
In some cases this can be very large, so make it a sector_t
for safety.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-28 17:32:26 +11:00
Tejun Heo e9c7469bb4 md: implment REQ_FLUSH/FUA support
This patch converts md to support REQ_FLUSH/FUA instead of now
deprecated REQ_HARDBARRIER.  In the core part (md.c), the following
changes are notable.

* Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
  processing of other requests and thus there is no reason to mark the
  queue congested while FLUSH/FUA is in progress.

* REQ_FLUSH/FUA failures are final and its users don't need retry
  logic.  Retry logic is removed.

* Preflush needs to be issued to all member devices but FUA writes can
  be handled the same way as other writes - their processing can be
  deferred to request_queue of member devices.  md_barrier_request()
  is renamed to md_flush_request() and simplified accordingly.

For linear, raid0 and multipath, the core changes are enough.  raid1,
5 and 10 need the following conversions.

* raid1: Handling of FLUSH/FUA bio's can simply be deferred to
  request_queues of member devices.  Barrier related logic removed.

* raid5: Queue draining logic dropped.  FUA bit is propagated through
  biodrain and stripe resconstruction such that all the updated parts
  of the stripe are written out with FUA writes if any of the dirtying
  writes was FUA.  preread_active_stripes handling in make_request()
  is updated as suggested by Neil Brown.

* raid10: FUA bit needs to be propagated to write clones.

linear, raid0, 1, 5 and 10 tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:38 +02:00
NeilBrown 2c7d46ec19 md raid-1/10 Fix bio_rw bit manipulations again
commit 7b6d91daee changed the behaviour
of a few variables in raid1 and raid10 from flags to bit-sets, but
left them as type 'bool' so they did not work.

Change them (back) to unsigned long.
(historical note: see 1ef04fefe2)

Signed-off-by: NeilBrown <neilb@suse.de>
Reported-by: Jiri Slaby <jslaby@suse.cz> and many others
2010-08-18 16:16:05 +10:00
NeilBrown 6b96562054 md: provide appropriate return value for spare_active functions.
md_check_recovery expects ->spare_active to return 'true' if any
spares were activated, but none of them do, so the consequent change
in 'degraded' is not notified through sysfs.

So count the number of spares activated, subtract it from 'degraded'
just once, and return it.

Reported-by: Adrian Drzewiecki <adriand@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-18 12:04:32 +10:00
Adrian Drzewiecki e6ffbcb6cd md: Notify sysfs when RAID1/5/10 disk is In_sync.
When RAID1 is done syncing disks, it'll update the state
of synced rdevs to In_sync. But it neglected to notify
sysfs that the attribute changed. So any programs that
are waiting for an rdev's state to change will not be
woken.

(raid5/raid10 added by neilb)

Signed-off-by: Adrian Drzewiecki <adriand@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-18 11:49:02 +10:00
Linus Torvalds 3d30701b58 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (24 commits)
  md: clean up do_md_stop
  md: fix another deadlock with removing sysfs attributes.
  md: move revalidate_disk() back outside open_mutex
  md/raid10: fix deadlock with unaligned read during resync
  md/bitmap:  separate out loading a bitmap from initialising the structures.
  md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log.
  md/bitmap: optimise scanning of empty bitmaps.
  md/bitmap: clean up plugging calls.
  md/bitmap: reduce dependence on sysfs.
  md/bitmap: white space clean up and similar.
  md/raid5: export raid5 unplugging interface.
  md/plug: optionally use plugger to unplug an array during resync/recovery.
  md/raid5: add simple plugging infrastructure.
  md/raid5: export is_congested test
  raid5: Don't set read-ahead when there is no queue
  md: add support for raising dm events.
  md: export various start/stop interfaces
  md: split out md_rdev_init
  md: be more careful setting MD_CHANGE_CLEAN
  md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk
  ...
2010-08-10 15:38:19 -07:00
Christoph Hellwig 7b6d91daee block: unify flags for struct bio and struct request
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver.  There were two flags in the bio that were
missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:20:39 +02:00
NeilBrown 51e9ac7703 md/raid10: fix deadlock with unaligned read during resync
If the 'bio_split' path in raid10-read is used while
resync/recovery is happening it is possible to deadlock.
Fix this be elevating ->nr_waiting for the duration of both
parts of the split request.

This fixes a bug that has been present since 2.6.22
but has only started manifesting recently for unknown reasons.
It is suitable for and -stable since then.

Reported-by:  Justin Bronder <jsbronder@gentoo.org>
Tested-by:  Justin Bronder <jsbronder@gentoo.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
2010-08-07 21:17:00 +10:00
Maciej Trela f73ea87375 md: fix raid10 takeover: use new_layout for setup_conf
Use mddev->new_layout in setup_conf.
Also use new_chunk, and don't set ->degraded in takeover().  That
gets set in run()

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-06-24 13:33:51 +10:00
NeilBrown e93f68a1fc md: fix handling of array level takeover that re-arranges devices.
Most array level changes leave the list of devices largely unchanged,
possibly causing one at the end to become redundant.
However conversions between RAID0 and RAID10 need to renumber
all devices (except 0).

This renumbering is currently being done in the ->run method when the
new personality takes over.  However this is too late as the common
code in md.c might already have invalidated some of the devices if
they had a ->raid_disk number that appeared to high.

Moving it into the ->takeover method is too early as the array is
still active at that time and wrong ->raid_disk numbers could cause
confusion.

So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
the new raid_disk number.
Now the common code knows exactly which devices need to be renumbered,
and which can be invalidated, and can do it all at a convenient time
when the array is suspend.
It can also update some symlinks in sysfs which previously were not be
updated correctly.

Reported-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-06-24 13:33:24 +10:00
Prasanna S. Panchamukhi 0544a21db0 md: raid10: Fix null pointer dereference in fix_read_error()
Such NULL pointer dereference can occur when the driver was fixing the
read errors/bad blocks and the disk was physically removed
causing a system crash. This patch check if the
rcu_dereference() returns valid rdev before accessing it in fix_read_error().

Cc: stable@kernel.org
Signed-off-by: Prasanna S. Panchamukhi <prasanna.panchamukhi@riverbed.com>
Signed-off-by: Rob Becker <rbecker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-06-24 13:31:03 +10:00
NeilBrown 19fdb9eefb Merge commit '3ff195b011d7decf501a4d55aeed312731094796' into for-linus
Conflicts:
	drivers/md/md.c

- Resolved conflict in md_update_sb
- Added extra 'NULL' arg to new instance of sysfs_get_dirent.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-22 08:31:36 +10:00
NeilBrown af3a2cd6b8 md: Fix read balancing in RAID1 and RAID10 on drives > 2TB
read_balance uses a "unsigned long" for a sector number which
will get truncated beyond 2TB.
This will cause read-balancing to be non-optimal, and can cause
data to be read from the 'wrong' branch during a resync.  This has a
very small chance of returning wrong data.

Reported-by: Jordan Russell <jr-list-2010@quo.to>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:28:00 +10:00
NeilBrown 128595ed6f md/raid10: tidy up printk messages.
All raid10 printk messages now start
   md/raid10:md-device-name:

Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:59 +10:00
NeilBrown 21a52c6d05 md: pass mddev to make_request functions rather than request_queue
We used to pass the personality make_request function direct
to the block layer so the first argument had to be a queue.
But now we have the intermediary md_make_request so it makes
at lot more sense to pass a struct mddev_s.
It makes it possible to have an mddev without its own queue too.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:55 +10:00
NeilBrown 490773268c md: move io accounting out of personalities into md_make_request
While I generally prefer letting personalities do as much as possible,
given that we have a central md_make_request anyway we may as well use
it to simplify code.
Also this centralises knowledge of ->gendisk which will help later.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:52 +10:00
Trela, Maciej dab8b29248 md: Add support for Raid0->Raid10 takeover
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:48 +10:00
NeilBrown 84707f38e7 md: don't use mddev->raid_disks in raid0 or raid10 while array is active.
In a subsequent patch we will make it possible to change
mddev->raid_disks while a RAID0 or RAID10 array is active.  This is
part of the process of reshaping such an array.

This means that we cannot use this value while processes requests
(it is OK to use it during initialisation as we are locked against
changes then).
Both RAID0 and RAID10 have the same value stored in the private data
structure, so use that value instead.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:47 +10:00
H Hartley Sweeten 7b92813c3c drivers/md: Remove unnecessary casts of void *
void pointers do not need to be cast to other pointer types.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:46 +10:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00