Commit graph

1740 commits

Author SHA1 Message Date
Philipp Reisner b18b37befb drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
This might happen if on the VERIFY_S node the disk gets dropped.
Although this is an cluster wide state transition, the VERIFY_T node,
updates it connection state first. Then the ack packet for the
cluster wide state transition travels back, and the VERIFY_S node
stops to produce the P_OV_REQUEST packets.

There is absolutely nothing wrong with that.

Further, do not log "Can not satisfy peer's..." on the VERIFY_S
node in this case, but pretend that they had equal checksum.

[Bugz 327]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:51 +02:00
Lars Ellenberg e9e6f3ec53 drbd: fix for possible deadlock on IO error during resync
Scenario:

Something (say, flush-147:0) is in drbd_al_begin_io,
holding a local_cnt, waiting for the resync to make progress.

Disk fails, worker in after_state_ch does drbd_rs_cancel_all,
then waits for local_cnt to drop to zero.

flush-147:0 is woken by drbd_rs_cancel_all, needs to write an AL
transaction, and queues that on the worker.

Deadlock.

Fix: do not wait in the worker, have put_ldev() trigger the
state change D_FAILED -> D_DISKLESS when necessary.
put_ldev() cannot do the state change directly, as it may or may not
already hold various spinlocks. We queue a short work instead.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:50 +02:00
Lars Ellenberg 22cc37a943 drbd: fix unlikely access after free and list corruption
Various cleanup paths have been incomplete, for the very unlikely case
that we cannot allocate enough bios from process context when submitting
on behalf of the peer or resync process.

Never observed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:49 +02:00
Lars Ellenberg af85e8e83d drbd: fix for spurious fullsync (uuids rotated too fast)
If it was an "empty" resync, the SyncSource may have already "finished"
the resync and rotated the UUIDs, before noticing the connection loss
(and generating a new uuid, if Primary, rotating again), while the
SyncTarget did not change its uuids at all, or only got to the previous
sync-uuid.
This would then again lead to a full sync on next handshake
(see also Bug #251).

Fix:
Use explicit resync finished notification even for empty resyncs,
do not finish an empty resync implicitly on the SyncSource.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:48 +02:00
Lars Ellenberg e9ef7bb6f9 drbd: allow for explicit resync-finished notifications
Preparation patch so more drbd_send_state() usage on the peer
will not confuse drbd in receive_state().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:47 +02:00
Lars Ellenberg 4ac4aadacb drbd: preparation commit, using full state in receive_state()
no functional change, just using full state instead of just the .conn
part of it for comparisons.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:46 +02:00
Lars Ellenberg 2b2bf2148f drbd: drbd_send_ack_dp must not rely on header information
drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
drbd: receiving of big packets, for payloads between 64kByte and 4GByte
introduced a new on-the-wire packet header format.  We must no longer
assume either format, but use the result of whatever drbd_recv_header
has decoded.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:45 +02:00
Lars Ellenberg 004352fa60 drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
We used to be16_to_cpu the length field in our received packet header.
drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
    drbd: receiving of big packets, for payloads between 64kByte and 4GByte
changed this, but forgot to adjust a few places where we relied on
h->length being in native byte order.

This broke the receiving side of the RLE compressed bitmap exchange.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:44 +02:00
Philipp Reisner f10f262349 drbd: Fixed a stupid copy and paste error
This caused rs_planed to be not in sync with the content of the fifo.
That in turn could cause that the resync comes to a complete halt.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:43 +02:00
Philipp Reisner 00b425377d drbd: Allow larger values for c-fill-target.
Connections through a compressing proxy might have more bits
on the fly. 500MByte instead of 50MByte

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:42 +02:00
Lars Ellenberg f65363cfa0 drbd: fix possible access after free
If we release the page pointed to by md_io_tmpp, we need to zero out the
pointer, too, as that may be used later to decide whether we need to
allocate a new page again.

Impact: a previously freed page may be used and clobbered.  Depending on
what that particular page is being used for meanwhile, this may result
in silent data corruption of completely unrelated things.

Only of concern on devices with logical_block_size != 512 byte,
if you re-attach after becoming diskless once.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:41 +02:00
Lars Ellenberg 8979d9c9e0 drbd: protocol compatibility for maximum packet sizes
Two missing corner cases to the "maximum packet size" handshake.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:41 +02:00
Philipp Reisner fb22c402ff drbd: Track the reasons to suspend IO in dedicated state bits
There are three ways to get IO suspended:

 * Loss of any access to data
 * Fence-peer-handler running
 * User requested to suspend IO

Track those in different bits, so that one condition clearing its
state bit does not interfere with the other two conditions.

Only when the user resumes IO he overrules all three bits.

The fact is hidden from the user, he sees only a single suspend
bit.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:40 +02:00
Lars Ellenberg 78db89287c drbd: DIV_ROUND_UP not needed here
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:39 +02:00
Philipp Reisner 5a75cc7cfb drbd: Fixed compatibility with protocol versions smaller than 95
Forgot to consider the max size for the resync requests.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:38 +02:00
Lars Ellenberg f2906e183f drbd: fix for spurious full sync (becoming sync target looked like invalidate)
If a synctarget lost connection while being WFSyncUUID,
due to "state sanitizing", the attempted state change to SyncTarget
looked like an "invalidate" to after_state_ch() later,
thus caused a full sync on next handshake (Bug #318).

drbd0: PingAck did not arrive in time.
drbd0: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

        from  : { cs:NetworkFailure ro:Secondary/Unknown ds:UpToDate/DUnknown r--- }
        to    : { cs:SyncTarget ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
        after sanizising, resulted in
        state: { cs:NetworkFailure ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
        drbd0: disk( UpToDate -> Inconsistent )

Fix:
don't mask state transition errors in "sanitizing",
so the requested state change to SyncTarget fails,
instead of being implicitly "remaped" to invalidate.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:37 +02:00
Lars Ellenberg 02bc7174ae drbd: cosmetic, don't report resync for online-verify
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:36 +02:00
Lars Ellenberg a821cc4a9a drbd: fix spurious protocol error
If we cannot satisfy a request (because our disk just broke),
we still need to drain the payload.  Or we'll get a protocol error
when interpreting the payload as DRBD packet header.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:35 +02:00
Lars Ellenberg 1d53f09e17 drbd: fix potential kernel BUG (NULL deref)
BUG trace would look like:
 lc_find
 drbd_rs_complete_io
 got_OVResult
 drbd_asender

Could be triggered by explicit, or IO-error policy based,
detach during online-verify.

We may only dereference mdev->resync, if we first get_ldev(), as the
disk may break any time, causing mdev->resync to disappear once all
ldev references have been returned.
Already in flight online-verify requests or replies may still come in,
which we then need to ignore.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:34 +02:00
Lars Ellenberg 435f07402b drbd: don't count sendpage()d pages only referenced by tcp as in use
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:33 +02:00
Philipp Reisner 76d2e7eca8 drbd: Adding support for BIO/Request flags: REQ_FUA, REQ_FLUSH and REQ_DISCARD
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:32 +02:00
Lars Ellenberg 1090c056c5 drbd: drbd_md_sync before calling user space helpers
Just in case we have some pending meta data changes to sync, do it
before we call our userland helper, as that may take some time,
or even cause a hard reboot.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:31 +02:00
Lars Ellenberg ee15b03816 drbd: fix race on meta-data update, addendum
addendum to baa33ae4eaa4477b60af7c434c0ddd1d182c1ae7

The race:
    drbd_md_sync()
	if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
		return;
    ==> RACE with drbd_md_mark_dirty() rearming the timer.
	del_timer(&mdev->md_sync_timer);

    Fixed by moving the del_timer before the test_and_clear_bit.

Additionally only rearm the timer in drbd_md_mark_dirty, if MD_DIRTY was
not already set, reduce the grace period from five to one second, and
add an ifdef'ed debuging aid to find code paths missing an explicit
drbd_md_sync, if any, as those are the only relevant ones for this race.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:30 +02:00
Philipp Reisner 63106d3c6c drbd: Removed a race that could cause unexpected execution of w_make_resync_request()
The actual race happened int the drbd_start_resync() function. Where
drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and
armed the timer.

If the timer fired before execution reaches the mod_timer statement
at the end of drbd_start_resync() the latter would cause an
unexpected call to w_make_resync_request().

Removed the STOP_SYNC_TIMER bit, and base it on the connection state.

The STOP_SYNC_TIMER bit probably originates probably the time before
the state engine.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:29 +02:00
Lars Ellenberg ef50a3e34f drbd: implicitly create unconfigured devices on sync-after dependencies
If pacemaker (for example) decided to initialize minor devices not in
the exact sync-after dependency order, the configuration partially
failed with an error "The sync-after minor number is invalid". (Bugz. #322)

We can avoid that by implicitly creating unconfigured minor devices,
if others depend on them.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:28 +02:00
Lars Ellenberg 3f3a9b849d drbd: fix race on meta-data update
The race:
	drbd_md_mark_dirty()
	drbd_md_sync()
		if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
			return;
		drbd_md_sync_page_io(mdev, mdev->ldev, sector, WRITE)
  ==> RACE
		clear_bit(MD_DIRTY, &mdev->flags); <== spurious

Fixed by removing the spurious clear_bit.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:28 +02:00
Lars Ellenberg c518d04fde drbd: fix race between deconfiguring and reconfiguring network
If a drbd_nl_net_conf hits the small window between the state change
to C_STANDALONE and the corresponding cleanup in after_state_ch,
that cleanup would throw away stuff we now need again,
and later trigger BUG_ON()s.

Fixed by properly serializing the new config request with
any pending cleanup.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:27 +02:00
Philipp Reisner 0778286a13 drbd: Disable activity log updates when the whole device is out of sync
When the complete device is marked as out of sync, we can disable
updates of the on disk AL. Currently AL updates are only disabled
if one uses the "invalidate-remote" command on an unconnected,
primary device, or when at attach time all bits in the bitmap are
set.

As of now, AL updated do not get disabled when a all bits becomes
set due to application writes to an unconnected DRBD device.
While this is a missing feature, it is not considered important,
and might get added later.

BTW, after initializing a "one legged" DRBD device
drbdadm create-md resX
drbdadm -- --force primary resX
AL updates also get disabled, until the first connect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:26 +02:00
Philipp Reisner d53733893d drbd: Actually allow BIOs up to 128k (was 32k).
Now we have multiple BIOs per ee, packets with a 32 bit length field,
it gets time to use these goodies.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:25 +02:00
Philipp Reisner 02918be227 drbd: receiving of big packets, for payloads between 64kByte and 4GByte
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:24 +02:00
Philipp Reisner 0b70a13dac drbd: Sending of big packets, for payloads from 64KByte to 4GByte
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:23 +02:00
Philipp Reisner 204bba9965 drbd: Bugfix for regression introduced with f9bc8913c06022e
If we intent to use the block_id member of an epoch entry,
we may not use the digest member.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:22 +02:00
Philipp Reisner 48acf86898 drbd: Microfix: Assigning sector once is sufficient
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:21 +02:00
Lars Ellenberg 0f0601f4ea drbd: new configuration parameter c-min-rate
We now track the data rate of locally submitted resync related requests,
and can thus detect non-resync activity on the lower level device.

If the current sync rate is above c-min-rate, and the lower level device
appears to be busy, we throttle the resyncer.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:20 +02:00
Lars Ellenberg 80a40e439e drbd: reduce code duplication when receiving data requests
also canonicalize the return values of read_for_csum
and drbd_rs_begin_io to return -ESOMETHING, or 0 for success.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:19 +02:00
Lars Ellenberg 1d7734a0df drbd: use rolling marks for resync speed calculation
The current resync speed as displayed in /proc/drbd fluctuates a lot.
Using an array of rolling marks makes this calculation much more stable.
We used to have this (a long time ago with 0.7), but it got lost somehow.

If "stalled", do not discard the rest of the information, just add a
" (stalled)" tag to the progress line.

This patch also shortens a spinlock critical section somewhat, and
reduces the number of atomic operations in put_ldev.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:18 +02:00
Lars Ellenberg 0bb70bf601 drbd: remove outdated comment and dead code
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:17 +02:00
Lars Ellenberg c36c3ced69 drbd: let drbd_free_ee implicitly free any digest
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:16 +02:00
Philipp Reisner 85719573dd drbd: Replaced some casts by an union. Improved comments
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:15 +02:00
Philipp Reisner d207450cf2 drbd: Bugfix: rs_in_flight could become wrong if read_for_csum() requested reschedule later
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:14 +02:00
Philipp Reisner 778f271dfe drbd: The new, smarter resync speed controller
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:14 +02:00
Philipp Reisner 8e26f9ccb9 drbd: New sync_param packet, that includes the parameters of the new controller
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:13 +02:00
Philipp Reisner 9a31d7164d drbd: New sync parameters for the smart resync rate controller
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:12 +02:00
Lars Ellenberg d28fd092a5 drbd: fix list corruption (recent regression)
The commit 288f422ec1
 drbd: Track all IO requests on the TL, not writes only
moved a list_add_tail(req, ) into a region where req
may have just been freed due to conflict detection.

Fix this by adding a proper cleanup section for that code path.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:31:43 +02:00
Philipp Reisner e756414f7d drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:12:07 +02:00
Philipp Reisner 6709893059 drbd: Make sure tl_restart(, resend) can not get called multiple times for a new connection
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:09:09 +02:00
Philipp Reisner f70b351159 drbd: Do not try to free tl_hash in drbd_disconnect() when IO is suspended
We may not free tl_hash when IO is suspended, since we can not wait
until ap_bio_cnt reaches zero.

We can do this after susp reched 0, since then tl_clear was called

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:08:27 +02:00
Philipp Reisner 8f488156c0 drbd: Allow attach while IO is suspended
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:05:32 +02:00
Philipp Reisner cfa03415a1 drbd: Allow tl_restart() to do IO completion while IO is suspended
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:05:08 +02:00
Philipp Reisner 84dfb9f564 drbd: Fixed a deadlock, probably only affected UP machines
After disconnect (most likely mdev->net_cnt == 0) and we are
still in an unstable state (!drbd_state_is_stable()). When we
get an IO request in drbd_get_max_buffers() (called from
__inc_ap_bio_cond(), called from inc_ap_bio()) we wake up
misc_wait. Misc_wait is also used in inc_ap_bio() to sleep
until the outcome of __inc_ap_bio_cond() changes. => Busy loop!

Solution: Have a dedicated wait queue for get_net_conf() and
put_net_conf().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:04:46 +02:00
Philipp Reisner 65d922c33e drbd: Do not do a hard state change when establishing a connection [bugz 304]
Make sure the state engine can deny two primaries to connect

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:04:10 +02:00
Philipp Reisner 481c6f5032 drbd: Ensure that the peer was not rebootet in the meantime before resending TL
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 15:01:37 +02:00
Philipp Reisner 43a5182ccc drbd: Delayed creation of current-UUID
When a fencing policy of "resource-and-stonith" is configured,
and DRBD looses connection to it's peer, we can delay the
creation of a new current-UUID until IO gets thawed.

That allows one to deploy fence-peer handlers that actually
commit suicide on the machine they get started.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:59:21 +02:00
Philipp Reisner 87f7be4cf8 drbd: Run the fence-peer helper asynchronously
Since we can not thaw the transfer log, the next logical step is
to allow reconnects while the fence-peer handler runs.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:58:36 +02:00
Philipp Reisner 1616a25493 drbd: Reduce the verbosity of some state transitions
State transitions in the space of non-allowed states used
to be very noisy. Reduce that, since that has little value
for the majority of the user base.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:57:22 +02:00
Philipp Reisner 999122bc18 drbd: Removing a by now obsolete clause in the state sanitizing
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:56:50 +02:00
Philipp Reisner 18a50fa213 drbd: Now we need to handle the ed_uuid of an diskless, unconnected primary correctly
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:56:00 +02:00
Philipp Reisner 894c6a9461 drbd: Disabled the crashed_primary detection for re-attach of last data while IO is frozen
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:55:11 +02:00
Philipp Reisner 47ff2d0a8e drbd: Do not allow a fencing-policy of resource-and-stonith with protocol A
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:53:42 +02:00
Philipp Reisner 265be2d098 drbd: Finished the "on-no-data-accessible suspend-io;" functionality
When no data is accessible (no connection to the peer, nor a local disk)
allow the user to select to freeze all IO operations instead of getting
IO errors.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:52:53 +02:00
Philipp Reisner 905cd7d8ac drbd: Removed redundant error checks in the request code path
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:39:38 +02:00
Philipp Reisner 5ba82308ea drbd: factored drbd_req_make_private_bio() out of drbd_req_new()
Preparing tl_thaw_dio()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:37:33 +02:00
Philipp Reisner b9b98716f8 drbd: Do not send two barriers without any writes between them
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:36:51 +02:00
Philipp Reisner 11b58e73a3 drbd: factored tl_restart() out of tl_clear().
If IO was frozen for a temporal network outage, resend the
content of the transfer-log into the newly established connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:35:58 +02:00
Philipp Reisner 2a80699f80 drbd: mod_req has now a return value
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:26:45 +02:00
Philipp Reisner 288f422ec1 drbd: Track all IO requests on the TL, not writes only
With that the drbd_fail_pending_reads() function becomes obsolete.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:25:20 +02:00
Philipp Reisner 7e602c0aaf drbd: renamed drbd_tl_epoch.n_req to drbd_tl_epoch.n_writes
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 14:23:45 +02:00
Dan Carpenter 93055c3104 ps3disk: passing wrong variable to bvec_kunmap_irq()
This should pass "buf" to bvec_kunmap_irq() instead of "bv".  The api is
like kmap_atomic() instead of kmap().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-12 18:56:33 +02:00
Mike Snitzer e4c4776dea virtio-blk: fix request leak.
Must drop reference taken by blk_make_request().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .35.x
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-09 11:42:37 -07:00
Arnd Bergmann 2a48fc0ab2 block: autoconvert trivial BKL users to private mutex
The block device drivers have all gained new lock_kernel
calls from a recent pushdown, and some of the drivers
were already using the BKL before.

This turns the BKL into a set of per-driver mutexes.
Still need to check whether this is safe to do.

file=$1
name=$2
if grep -q lock_kernel ${file} ; then
    if grep -q 'include.*linux.mutex.h' ${file} ; then
            sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
    else
            sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
    fi
    sed -i ${file} \
        -e "/^#include.*linux.mutex.h/,$ {
                1,/^\(static\|int\|long\)/ {
                     /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);

} }"  \
    -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
    -e '/[      ]*cycle_kernel_lock();/d'
else
    sed -i -e '/include.*\<smp_lock.h\>/d' ${file}  \
                -e '/cycle_kernel_lock()/d'
fi

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-05 15:01:10 +02:00
Arnd Bergmann 613655fa39 drivers: autoconvert trivial BKL users to private mutex
All these files use the big kernel lock in a trivial
way to serialize their private file operations,
typically resulting from an earlier semi-automatic
pushdown from VFS.

None of these drivers appears to want to lock against
other code, and they all use the BKL as the top-level
lock in their file operations, meaning that there
is no lock-order inversion problem.

Consequently, we can remove the BKL completely,
replacing it with a per-file mutex in every case.
Using a scripted approach means we can avoid
typos.

These drivers do not seem to be under active
maintainance from my brief investigation. Apologies
to those maintainers that I have missed.

file=$1
name=$2
if grep -q lock_kernel ${file} ; then
    if grep -q 'include.*linux.mutex.h' ${file} ; then
            sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
    else
            sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
    fi
    sed -i ${file} \
        -e "/^#include.*linux.mutex.h/,$ {
                1,/^\(static\|int\|long\)/ {
                     /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);

} }"  \
    -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
    -e '/[      ]*cycle_kernel_lock();/d'
else
    sed -i -e '/include.*\<smp_lock.h\>/d' ${file}  \
                -e '/cycle_kernel_lock()/d'
fi

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-05 15:01:04 +02:00
Dan Rosenberg 252a52aa4f Fix pktcdvd ioctl dev_minor range check
The PKT_CTRL_CMD_STATUS device ioctl retrieves a pointer to a
pktcdvd_device from the global pkt_devs array.  The index into this
array is provided directly by the user and is a signed integer, so the
comparison to ensure that it falls within the bounds of this array will
fail when provided with a negative index.

This can be used to read arbitrary kernel memory or cause a crash due to
an invalid pointer dereference.  This can be exploited by users with
permission to open /dev/pktcdvd/control (on many distributions, this is
readable by group "cdrom").

Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
[ Rather than add a cast, just make the function take the right type -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-27 16:29:06 -07:00
Vivek Goyal 504c6d1b44 amiga floppy: Compile failure fixes
o Compile fixes for amiga floppy driver.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-26 12:23:25 +09:00
Vivek Goyal 639e2f2aa7 atari floppy: Stop sharing request queue across multiple gendisks
o Use one request queue per gendisk instead of sharing the queue.

o Don't have hardware. No compile testing or run time testing done. Completely
  untested.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-24 20:35:45 +02:00
Vivek Goyal 786029ff81 amiga floppy: Stop sharing request queue across multiple gendisks
o Use one request queue per gendisk instead of sharing request queue

o Don't have hardware. No compile testing or run time testing done. Completely
  untested.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-24 20:35:44 +02:00
Jens Axboe 488211844e floppy: switch to one queue per drive instead of sharing a queue
Pretty straight forward conversion. Note that we do round-robin
between the drives that have available requests, before we simply
used the drive that the IO scheduler told us to. Since the IO
scheduler doesn't care about multiple devices per queue, the resulting
sort would not have made sense.

Fixed by Vivek to get rid of a double lock problem in set_next_request()

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2010-09-22 09:32:36 +02:00
Dan Carpenter b0722cb1ac cciss: freeing uninitialized data on error path
The "h->scatter_list" is allocated inside a for loop.  If any of those
allocations fail, then the rest of the list is uninitialized data.  When
we free it we should start from the top and free backwards so that we
don't call kfree() on uninitialized pointers.

Also if the allocation for "h->scatter_list" fails then we would get an
Oops here.  I should have noticed this when I send: 4ee69851c "cciss:
handle allocation failure."  but I didn't.  Sorry about that.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-21 11:49:17 +02:00
Christoph Hellwig dd3932eddf block: remove BLKDEV_IFL_WAIT
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller.  To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-16 20:52:58 +02:00
Martin K. Petersen c8bf133682 Consolidate min_not_zero
We have several users of min_not_zero, each of them using their own
definition.  Move the define to kernel.h.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2010-09-10 20:07:38 +02:00
Linus Torvalds ff3cb3fec3 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: Range check cpu in blk_cpu_to_group
  scatterlist: prevent invalid free when alloc fails
  writeback: Fix lost wake-up shutting down writeback thread
  writeback: do not lose wakeup events when forking bdi threads
  cciss: fix reporting of max queue depth since init
  block: switch s390 tape_block and mg_disk to elevator_change()
  block: add function call to switch the IO scheduler from a driver
  fs/bio-integrity.c: return -ENOMEM on kmalloc failure
  bio-integrity.c: remove dependency on __GFP_NOFAIL
  BLOCK: fix bio.bi_rw handling
  block: put dev->kobj in blk_register_queue fail path
  cciss: handle allocation failure
  cfq-iosched: Documentation help for new tunables
  cfq-iosched: blktrace print per slice sector stats
  cfq-iosched: Implement tunable group_idle
  cfq-iosched: Do group share accounting in IOPS when slice_idle=0
  cfq-iosched: Do not idle if slice_idle=0
  cciss: disable doorbell reset on reset_devices
  blkio: Fix return code for mkdir calls
2010-09-10 07:26:27 -07:00
Tejun Heo 02c42b7a68 virtio_blk: drop REQ_HARDBARRIER support
Remove now unused REQ_HARDBARRIER support.  virtio_blk already
supports REQ_FLUSH and the usefulness of REQ_FUA for virtio_blk is
questionable at this point, so there's nothing else to do to support
new REQ_FLUSH/FUA interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:37 +02:00
Tejun Heo 6259f28459 block/loop: implement REQ_FLUSH/FUA support
Deprecate REQ_HARDBARRIER and implement REQ_FLUSH/FUA instead.  Also,
instead of checking file->f_op->fsync() directly, look at the value of
vfs_fsync() and ignore -EINVAL return.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:37 +02:00
Tejun Heo 9cbbdca44a block: remove spurious uses of REQ_HARDBARRIER
REQ_HARDBARRIER is deprecated.  Remove spurious uses in the following
users.  Please note that other than osdblk, all other uses were
already spurious before deprecation.

* osdblk: osdblk_rq_fn() won't receive any request with
  REQ_HARDBARRIER set.  Remove the test for it.

* pktcdvd: use of REQ_HARDBARRIER in pkt_generic_packet() doesn't mean
  anything.  Removed.

* aic7xxx_old: Setting MSG_ORDERED_Q_TAG on REQ_HARDBARRIER is
  spurious.  Removed.

* sas_scsi_host: Setting TASK_ATTR_ORDERED on REQ_HARDBARRIER is
  spurious.  Removed.

* scsi_tcq: The ordered tag path wasn't being used anyway.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: Peter Osterlund <petero2@telia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:36 +02:00
Tejun Heo 4913efe456 block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush()
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests.  Deprecate barrier.  All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().

blk_queue_flush() takes combinations of REQ_FLUSH and FUA.  If a
device has write cache and can flush it, it should set REQ_FLUSH.  If
the device can handle FUA writes, it should also set REQ_FUA.

All blk_queue_ordered() users are converted.

* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:36 +02:00
Tejun Heo 6958f14545 block: kill QUEUE_ORDERED_BY_TAG
Nobody is making meaningful use of ORDERED_BY_TAG now and queue
draining for barrier requests will be removed soon which will render
the advantage of tag ordering moot.  Kill ORDERED_BY_TAG.  The
following users are affected.

* brd: converted to ORDERED_DRAIN.
* virtio_blk: ORDERED_TAG path was already marked deprecated.  Removed.
* xen-blkfront: ORDERED_TAG case dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:36 +02:00
Tejun Heo 589d7ed02a block/loop: queue ordered mode should be DRAIN_FLUSH
loop implements FLUSH using fsync but was incorrectly setting its
ordered mode to DRAIN.  Change it to DRAIN_FLUSH.  In practice, this
doesn't change anything as loop doesn't make use of the block layer
ordered implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:36 +02:00
Stephen M. Cameron fcfb5c0ce1 cciss: remove some superfluous tests from cciss_bigpassthru()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:40 +02:00
Stephen M. Cameron 0c9f5ba7cb cciss: factor out cciss_big_passthru
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:39 +02:00
Stephen M. Cameron f32f125b1c cciss: factor out cciss_passthru
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:37 +02:00
Stephen M. Cameron 0894b32c5c cciss: factor out cciss_getluninfo
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:36 +02:00
Stephen M. Cameron c525919ddf cciss: factor out cciss_getdrivver
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:35 +02:00
Stephen M. Cameron 8a4f7fbfdd cciss: factor out cciss_getfirmver
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:34 +02:00
Stephen M. Cameron d18dfad4e2 cciss: factor out cciss_getbustypes
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:33 +02:00
Stephen M. Cameron 93c7493113 cciss: factor out cciss_getheartbeat
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:32 +02:00
Stephen M. Cameron 4f43f32cd3 cciss: factor out cciss_setnodename
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:32 +02:00
Stephen M. Cameron 2521610942 cciss: factor out cciss_getnodename
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:31 +02:00
Stephen M. Cameron 4c800eed9a cciss: factor out cciss_setintinfo
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:30 +02:00
Stephen M. Cameron 576e661c65 cciss: factor out cciss_getintinfo
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:29 +02:00
Stephen M. Cameron 0a25a5aee7 cciss: factor out cciss_getpciinfo
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:12:28 +02:00
Stephen M. Cameron 2a643ec67f cciss: fix reporting of max queue depth since init
The ioctl path and the scsi tape path were not accounting
for their additions to the queue depth.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-25 19:58:53 +02:00
Linus Torvalds c05e1e23b8 Merge branch 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6
* 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6:
  xen: pvhvm: make it clearer that XEN_UNPLUG_* define bits in a bitfield
  xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary
  xen: pvhvm: allow user to request no emulated device unplug
2010-08-23 18:29:18 -07:00
Milan Broz ee86273062 loop: add some basic read-only sysfs attributes
Create /sys/block/loopX/loop directory and provide these attributes:
 - backing_file
 - autoclear
 - offset
 - sizelimit

This loop directory is present only if loop device is configured.

To be used in util-linux-ng (and possibly elsewhere like udev rules)
where code need to get loop attributes from kernel (and not store
duplicate info in userspace).

Moreover loop ioctls are not even able to provide full backing
file info because of buffer limits.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23 15:18:10 +02:00
Jens Axboe 52cc2eef31 block: switch s390 tape_block and mg_disk to elevator_change()
Now that we have this API, switch the two in-kernel users to it.
Resolves an oops introduced by commit
1abec4fdbb.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23 14:02:44 +02:00
Ian Campbell 1dc7ce99b0 xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary
It is not immediately clear what this option causes to become
ignored. The actual meaning is that it is not necessary to unplug the
emulated devices to safely use the PV ones, even if the platform does
not support the unplug protocol. (pressumably the user will only add
this option if they have ensured that their domain configuration is
safe).

I think xen_emul_unplug=unnecessary better captures this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
2010-08-23 11:59:29 +01:00
Jiri Slaby 5e00d1b5b4 BLOCK: fix bio.bi_rw handling
Return of the bi_rw tests is no longer bool after commit 74450be1. But
results of such tests are stored in bools. This doesn't fit in there
for some compilers (gcc 4.5 here), so either use !! magic to get real
bools or use ulong where the result is assigned somewhere.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23 12:33:10 +02:00
Dan Carpenter 4ee69851cd cciss: handle allocation failure
If kmalloc() fails then cleanup and return failure (-1).

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23 12:28:15 +02:00
Stephen M. Cameron 75230ff275 cciss: disable doorbell reset on reset_devices
The doorbell reset initially appears to work correctly,
the controller resets, comes up, some i/o can even be
done, but on at least some Smart Arrays in some servers,
it eventually causes a subsequent controller lockup due
to some kind of PCIe error, and kdump can end up leaving
the root filesystem in an unbootable state.  For this
reason, until the problem is fixed, or at least isolated
to certain hardware enough to be avoided, the doorbell
reset should not be used at all.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23 11:02:17 +02:00
Graeme Smecher 7a50d06e24 of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
The drivers for Xilinx' SystemACE and physically mapped MTDs were missing
prototypes for of_address_to_resource(). This patch adds the necessary
headers.

Signed-off-by: Graeme Smecher <graeme.smecher@mail.mcgill.ca>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-08-17 13:16:47 -06:00
Linus Torvalds 58d4ea65b9 Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
  mmc_spi: Fix unterminated of_match_table
  of/sparc: fix build regression from of_device changes
  of/device: Replace struct of_device with struct platform_device
2010-08-12 09:11:31 -07:00
Linus Torvalds 2f9e825d3e Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
  block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
  xen-blkfront: fix missing out label
  blkdev: fix blkdev_issue_zeroout return value
  block: update request stacking methods to support discards
  block: fix missing export of blk_types.h
  writeback: fix bad _bh spinlock nesting
  drbd: revert "delay probes", feature is being re-implemented differently
  drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
  drbd: Disable delay probes for the upcomming release
  writeback: cleanup bdi_register
  writeback: add new tracepoints
  writeback: remove unnecessary init_timer call
  writeback: optimize periodic bdi thread wakeups
  writeback: prevent unnecessary bdi threads wakeups
  writeback: move bdi threads exiting logic to the forker thread
  writeback: restructure bdi forker loop a little
  writeback: move last_active to bdi
  writeback: do not remove bdi from bdi_list
  writeback: simplify bdi code a little
  writeback: do not lose wake-ups in bdi threads
  ...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
2010-08-10 15:22:42 -07:00
Jens Axboe a4cc14ec9f xen-blkfront: fix missing out label
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-08 21:50:05 -04:00
Lars Ellenberg e7f52dfb4f drbd: revert "delay probes", feature is being re-implemented differently
It was a now abandoned attempt to throttle resync bandwidth
based on the delay it causes on the bulk data socket.
It has no userbase yet, and has been disabled by
9173465ccb51c09cc3102a10af93e9f469a0af6f already.
This removes the now unused code.

The basic feature, namely using up "idle" bandwith
of network and disk IO subsystem, with minimal impact
to application IO, is being reimplemented differently.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:57 +02:00
Philipp Reisner 85f4cc17a6 drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:57 +02:00
Philipp Reisner 6710a57603 drbd: Disable delay probes for the upcomming release
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:57 +02:00
Kulikov Vasiliy f6c4c8e19a cpqarray: check put_user() result
put_user() may fail, if so return -EFAULT.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:03 +02:00
Jeremy Fitzhardinge 7901d14144 xen/blkfront: Use QUEUE_ORDERED_DRAIN for old backends
If there's no feature-barrier key in xenstore, then it means its a fairly
old backend which does uncached in-order writes, which means ORDERED_DRAIN
is appropriate.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-07 18:52:53 +02:00
Jeremy Fitzhardinge 4dab46ff26 xen/blkfront: use tagged queuing for barriers
When barriers are supported, then use QUEUE_ORDERED_TAG to tell the block
subsystem that it doesn't need to do anything else with the barriers.
Previously we used ORDERED_DRAIN which caused the block subsystem to
drain all pending IO before submitting the barrier, which would be
very expensive.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-07 18:52:53 +02:00
Stephen Hemminger 3b06c21e84 floppy: make controller const
The struct cont_t is just a set of virtual function pointers.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:31 +02:00
Julia Lawall ad96a7a7ea drivers/block: use memdup_user
Use memdup_user when user data is immediately copied into the
allocated region.  Some checkpatch cleanups in nearby code.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Chirag Kantharia <chirag.kantharia@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:31 +02:00
Stephen M. Cameron 8112586063 cciss: cleanup interrupt_not_for_us
cciss: cleanup interrupt_not_for_us
In the case of MSI/MSIX interrutps, we don't need to check
if the interrupt is for us, and in the case of the intx interrupt
handler, when checking if the interrupt is for us, we don't need
to check if we're using MSI/MSIX, we know we're not.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:30 +02:00
Stephen M. Cameron b2a4a43dba cciss: change printks to dev_warn, etc.
cciss: change printks to dev_warn, etc.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:30 +02:00
Stephen M. Cameron 6b4d96b878 cciss: separate cmd_alloc() and cmd_special_alloc()
cciss: separate cmd_alloc() and cmd_special_alloc()
cmd_alloc() took a parameter which caused it to either allocate
from a pre-allocated pool, or allocate using pci_alloc_consistent.
This parameter is always known at compile time, so this would
be better handled by breaking the function into two functions
and differentiating the cases by function names.  Same goes
for cmd_free().

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:30 +02:00
Stephen M. Cameron f70dba8366 cciss: use consistent variable names
cciss: use consistent variable names
"h", for the hba structure and "c" for the command structures.
and get rid of trivial CCISS_LOCK macro.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:30 +02:00
Stephen M. Cameron 058a0f9f31 cciss: forbid hard reset of 640x boards
cciss: forbid hard reset of 640x boards
The 6402/6404 are two PCI devices -- two Smart Array controllers
-- that fit into one slot.  It is possible to reset them independently,
however, they share a battery backed cache module.  One of the pair
controls the cache and the 2nd one access the cache through the first
one.  If you reset the one controlling the cache, the other one will
not be a happy camper.  So we just forbid resetting this conjoined
mess.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:30 +02:00
Stephen M. Cameron adfbc1ff34 cciss: sanitize max commands
cciss: sanitize max commands
Some controllers might try to tell us they support 0 commands
in performant mode.  This is a lie told by buggy firmware.
We have to be wary of this lest we try to allocate a negative
number of command blocks, which will be treated as unsigned,
and get an out of memory condition.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:30 +02:00
Stephen M. Cameron a6528d0172 cciss: fix hard reset code.
cciss: Fix hard reset code.
Smart Array controllers newer than the P600 do not honor the
PCI power state method of resetting the controllers.  Instead,
in these cases we can get them to reset via the "doorbell" register.

This escaped notice until we began using "performant" mode because
the fact that the controllers did not reset did not normally
impede subsequent operation, and so things generally appeared to
"work".  Once the performant mode code was added, if the controller
does not reset, it remains in performant mode.  The code immediately
after the reset presumes the controller is in "simple" mode
(which previously, it had remained in simple mode the whole time).
If the controller remains in performant mode any code which presumes
it is in simple mode will not work.  So the reset needs to be fixed.

Unfortunately there are some controllers which cannot be reset by
either method. (eg. p800).  We detect these cases by noticing that
the controller seems to remain in performant mode even after a
reset has been attempted.  In those cases we ignore the controller,
as any commands outstanding on it will result in stale completions.
To sum up, we try to do a better job of resetting the controller if
"reset_devices" is set, and if it doesn't work, we ignore that
controller.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:30 +02:00
Stephen M. Cameron 83123cb11b cciss: factor out cciss_reset_devices()
cciss: factor out cciss_reset_devices()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:12 +02:00
Stephen M. Cameron 8e93bf6d6c cciss: factor out cciss_find_cfg_addrs.
Rationale for this is that I will also need to use this code
in fixing kdump host reset code prior to having the hba structure.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:12 +02:00
Stephen M. Cameron b993313540 cciss: factor out cciss_enter_performant_mode
cciss: factor out cciss_enter_performant_mode

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:12 +02:00
Stephen M. Cameron 0f8a6a1e7b cciss: factor out cciss_wait_for_mode_change_ack()
cciss: factor out cciss_wait_for_mode_change_ack()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron fe3b7527db cciss: make cciss_put_controller_into_performant_mode as __devinit
cciss: make cciss_put_controller_into_performant_mode as __devinit

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron ff5f58f06d cciss: cleanup some debug ifdefs
cciss: cleanup some debug ifdefs

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron bfd63ee571 cciss: factor out cciss_p600_dma_prefetch_quirk()
cciss: factor out cciss_p600_dma_prefetch_quirk()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron 322e304c4d cciss: factor out cciss_enable_scsi_prefetch()
cciss: factor out cciss_enable_scsi_prefetch()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron 501b92cd6b cciss: factor out CISS_signature_present()
cciss: factor out CISS_signature_present()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron afadbf4b95 cciss: factor out cciss_find_board_params
cciss: factor out cciss_find_board_params

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron da5503217d cciss: fix leak of ioremapped memory
cciss: fix leak of ioremapped memory
in cciss_pci_init error path.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron 4809d0988f cciss: factor out cciss_find_cfgtables
cciss: factor out cciss_find_cfgtables

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron e99ba13627 cciss: factor out cciss_wait_for_board_ready()
cciss: factor out cciss_wait_for_board_ready()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:11 +02:00
Stephen M. Cameron d474830da6 cciss: factor out cciss_find_memory_BAR()
cciss: factor out cciss_find_memory_BAR()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:10 +02:00
Stephen M. Cameron dac5488a9e cciss: remove board_id parameter from cciss_interrupt_mode()
cciss: remove board_id parameter from cciss_interrupt_mode()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:10 +02:00
Stephen M. Cameron dd9c426e92 cciss: factor out cciss_board_disabled
cciss: factor out cciss_board_disabled

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:10 +02:00
Stephen M. Cameron 6539fa9b2e cciss: factor out cciss_lookup_board_id
cciss: factor out cciss_lookup_board_id

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:10 +02:00
Stephen M. Cameron 292e50dd39 cciss: save pdev pointer in per hba structure early to avoid passing it around so much.
cciss: save pdev pointer in per hba structure early to avoid passing it around so much.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:10 +02:00
Stephen M. Cameron 373b45f7b6 cciss: Set the performant mode bit in the scsi half of the driver
cciss: Set the performant mode bit in the scsi half of the driver
In a couple of places, the performant mode bit wasn't being set in
the scsi half of the driver, causing commands to seem to hang.  Use
enqueue_cmd_and_start_io() where appropriate.  This fixes a bug that

	echo engage scsi > /proc/driver/cciss/cciss0

would hang.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:10 +02:00
Daniel Stodden d54142c71f blkfront: Klog the unclean release path
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:51:21 +02:00
Daniel Stodden 7b32d1044a blkfront: Remove obsolete info->users
This is just bd_openers, protected by the bd_mutex.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:49:20 +02:00
Daniel Stodden acfca3c622 blkfront: Remove obsolete info->users
This is just bd_openers, protected by the bd_mutex.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:47:26 +02:00
Daniel Stodden fa1bd3591a blkfront: Lock blockfront_info during xbdev removal
Same approach as blkfront_closing:
 * Grab the bdev safely, holding the info mutex.
 * Zap xbdev safely, holding the info mutex.
 * Try bdev removal safely, holding bd_mutex.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-07 18:45:27 +02:00
Daniel Stodden 7fd152f4b6 blkfront: Fix blkfront backend switch race (bdev release)
We cannot read backend state within bdev operations, because it risks
grabbing the state change before xenbus gets to do it.

Fixed by tracking deferral with a frontend switch to Closing. State
exposure isn't strictly necessary, but the backends won't mind.

For a 'clean' deferral this seems actually a more decent protocol than
raising errors.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:45:12 +02:00