Commit graph

27257 commits

Author SHA1 Message Date
Ramachandra K 0c0450db31 IB/srp: Support SRP rev. 10 targets
There has been a change in the format of port identifiers between
revision 10 of the SRP specification and the current revision 16A.

Revision 10 specifies port identifier format as

  lower 8 bytes :  GUID   upper 8 bytes :  Extension

Whereas revision 16A specifies it as 

 lower 8 bytes :  Extension  upper 8 bytes :  GUID

There are older targets (e.g. SilverStorm Virtual Fibre Channel
Bridge) which conform to revision 10 of the SRP specification.

The I/O class of revision 10 is 0xFF00 and the I/O class of revision
16A is 0x0100.

For supporting older targets, this patch:

1) Adds a new optional target creation parameter "io_class". Default
   value of io_class is 0x0100 (i.e. revision 16A)
2) Uses the correct port identifier format for targets with IO class
   of 0xFF00 (i.e. conforming to revision 10)

Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Ramachandra K 73c0996b1c [SCSI] srp.h: Add I/O Class values
Add enum values for I/O Class values from rev. 10 and rev. 16a SRP
drafts.  The values are used to detect targets that implement obsolete
revisions of SRP, so that the initiator can use the old format for
port identifier when connecting to them.

Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Or Gerlitz 6c8c1aa25d IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
When creating a FMR pool, query the IB device and use the returned
max_map_map_per_fmr attribute as for the max number of FMR remaps. If
the device does not suport querying this attribute, use the original
IB_FMR_MAX_REMAPS (32) default.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Or Gerlitz d4cb0784fd IB/mthca: Fill in max_map_per_fmr device attribute
Report the true max_map_per_fmr value from mthca_query_device(),
taking into account the change in FMR remapping introduced by the
Sinai performance optimization.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Roland Dreier 6eddb5cb90 IB/ipath: Add client reregister event generation
Generate a client reregister event instead of a LID change event when
client reregister bit is set.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Leonid Arsh 12bbb2b7be IB/mthca: Add client reregister event generation
Change the mthca snoop of MADs that set PortInfo to check if the SM
has set the client reregister bit, and if it has, generate a client
reregister event.  If the bit is not set, just generate a LID change
event as usual.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh da2ab62ab5 IB: Move struct port_info from ipath to <rdma/ib_smi.h>
Move ipath's struct port_info into <rdma/ib_smi.h>, so that it can be
used by mthca to implement client reregister support.

Remove the __attribute__((packed)) because all the members of the struct
are naturally aligned anyway.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh 508e434123 IPoIB: Handle client reregister events
Handle client reregister events by treating them just like LID or
SM changes -- flush all cached paths and rejoin multicast groups.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh 63942c9a98 IB: Add client reregister event type
Add IB_EVENT_CLIENT_REREGISTER to enum so low-level drivers can
generate "client reregister" events.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:35 -07:00
Jack Morgenstein 37c22a7721 IPoIB: Fix kernel unaligned access on ia64
Fix misaligned access faults on ia64: never cast a misaligned
neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer
instead.  The memcpy was being optimized to use full word accesses
because the compiler thought that union ib_gid is always aligned.

The cast in IPOIB_GID_ARG is safe, since it is fixed to access each
byte separately.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:35 -07:00
Roland Dreier 31c02e2157 IPoIB: Avoid using stale last_send counter when reaping AHs
The comparisons of priv->tx_tail to ah->last_send in ipoib_free_ah()
and ipoib_post_receive() are slightly unsafe, because priv->tx_lock is
not held and hence a stale value of ah->last_send might be used, which
would lead to freeing an AH before the driver was really done with it.
The simple way to fix this is to the optimization of early free from
ipoib_free_ah() and unconditionally queue AHs for reaping, and then
take priv->tx_lock in __ipoib_reap_ah().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Jack Morgenstein 9874e74655 IB/mad: Check GID/LID when matching requests
Check GID/LID for requester side when searching for request which
matches received response.  This is in order to guarantee uniqueness
if the same TID is used when requesting via multiple source LIDs (when
LMC is not zero).  Use ports' cached LMC to perform the check.

Further, do not perform LID check for direct-routed packets, since
the permissive LID makes a proper check impossible.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Jack Morgenstein 6fb9cdbf2c IB: Add caching of ports' LMC
Add an LMC cache to struct ib_device, and add a function
ib_get_cached_lmc() to query the cache.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Michael S. Tsirkin 856c256f88 IB/cm: remove unneeded flush_workqueue
destroy_workqueue() already does flush_workqueue().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2006-06-17 20:37:33 -07:00
Sean Hefty 4be10c1e6d IB/ucm: convert semaphore to mutex
Convert semaphore in ib_ucm_file to a real mutex.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Roland Dreier 6bfa24fa3e IB/srp: Get rid of "Target has req_lim 0" messages
It's perfectly valid for a connection to an SRP target to have a
request limit of 0, so get rid of the message about it, which can spam
kernel logs even with printk_ratelimit().  Keep a count of such events
in a "zero_req_lim" SCSI host attribute instead, so someone who cares
can look at the statistics.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Ishai Rabinovitz b7ac4ab497 IB/srp: Handle DREQ events from CM
Handle IB_CM_DREQ_ERROR and IB_CM_DREQ_RECEIVED events from the CM,
instead of just printing "Unhandled CM event".  In the case of
DREQ_ERROR, just ignore the event -- a TIMEWAIT_EXIT will be generated
also.  For DREQ_RECEIVED, send a DREP in response to shut the
connection down cleanly.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Roland Dreier ac83cbaa9a IPoIB: Mention RFC numbers in documentation
Now that the IETF has released RFCs covering IPoIB, give the numbers in
the documentation for IPoIB.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham 74b0a15b5e IB/srp: Allow sg_tablesize to be adjusted
Make the sg_tablesize used by SRP adjustable at module load time via a
module parameter.  Calculate the corresponding IU length required to
support this.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham 52fb2b50c4 IB/srp: Allow cmd_per_lun to be set per target port
Allow userspace to throttle traffic on a given connection to a target
port by adding "max_cmd_per_lun=xyz" to lower the cmd_per_lun value
set for that scsi_host.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Ishai Rabinovitz 0c5b395239 IB/srp: Clean up loop in srp_remove_one()
Interrupts will always be enabled in srp_remove_one(), so
spin_lock_irq() can be used instead of spin_lock_irqsave().
Also, the loop takes target->scsi_host->host_lock, so target->state
can just be set to SRP_TARGET_REMOVED witout testing the old value.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Roland Dreier 403a496fd4 IB: Make needlessly global ib_mad_cache static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Matthew Wilcox b3589fd490 IB/srp: Change target_mutex to a spinlock
The SRP driver never sleeps while holding target_mutex, and it's just
used to protect some simple list operations, so hold times will be
short.  So just convert it to a spinlock, which is smaller and faster.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox 549c5fc2c8 IB/srp: Get rid of unneeded use of list_for_each_entry_safe()
list_for_each_entry_safe() is used in one place where the list isn't
modified.  So just change it to list_for_each_entry().

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox 1962a4a1e4 IB/srp: Use SCAN_WILD_CARD from SCSI headers
SCAN_WILD_CARD is indeed available from <scsi/scsi.h>, which is
already included.  So get rid of private hack.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Roland Dreier e9cd59418f IB/mthca: Convert FW commands to use wait_for_completion_timeout()
The kernel has had wait_for_completion_timeout() for a long time now.
mthca should use it to handle FW commands timing out, instead of
implementing the same thing in a much more complicated way by using
wait_for_completion() along with a timer that does complete().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Roland Dreier f5358a172f IB/srp: Use FMRs to map gather/scatter lists
Create an SRP FMR pool on HCAs that support FMRs, and use FMRs to map
gather/scatter lists that have more than one entry into a single
memory region that appears virtually contiguous to the SRP target
(which is the RDMA initiator).

This patch bails out on FMR mapping for SCSI commands where the
gather/scatter list cannot be mapped into a single FMR because there
are sub-page-sized entries in middle of the list.  An unaligned
start or end of the list is OK.

Based on a patch by Vu Pham <vuhuong@mellanox.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Michael S. Tsirkin a26026c122 IB/mthca: Remove dead code
Kill some dead code in mthca_eq.c

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Sean Hefty e51060f08a IB: IP address based RDMA connection manager
Kernel connection management agent over InfiniBand that connects based
on IP addresses.  The agent defines a generic RDMA connection
abstraction to support clients wanting to connect over different RDMA
devices.

The agent also handles RDMA device hotplug events on behalf of clients.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Sean Hefty 7025fcd36b IB: address translation to map IP toIB addresses (GIDs)
Add an address translation service that maps IP addresses to
InfiniBand GID addresses using IPoIB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty a1e8733e55 [NET]: Export ip_dev_find()
Export ip_dev_find() to allow locating a net_device given an IP address.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty 6e61d04f2d IB/cm: Match connection requests based on private data
Extend matching connection requests to listens in the InfiniBand CM to
include private data checks.

This allows applications to listen on the same service identifier,
with private data directing the request to the appropriate application.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty 6a9af2e18a IB: common handling for marshalling parameters to/from userspace
Provide common handling for marshalling data between userspace clients
and kernel InfiniBand drivers.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:27 -07:00
Michael S. Tsirkin 4e56ea794e IB/mthca: memfree completion with error FW bug workaround
Memfree firmware is in rare cases reporting WQE index == base - 1 in
receive completion with error, instead of (rq size - 1); base is 0 in
mthca.  Here is a patch to avoid kernel crash and report a correct WR
id in this case.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:20 -07:00
Michael S. Tsirkin 13aa6ecb47 IB/mthca: restore missing PCI registers after reset
mthca does not restore the following PCI-X/PCI Express registers after reset:
  PCI-X device: PCI-X command register
  PCI-X bridge: upstream and downstream split transaction registers
  PCI Express : PCI Express device control and link control registers

This causes instability and/or bad performance on systems where one of
these registers is set to a non-default value by BIOS.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:00 -07:00
Linus Torvalds 427abfa28a Linux v2.6.17
Being named "Crazed Snow-Weasel" instills a lot of confidence in this
release, so I'm sure this will be one of the better ones.
2006-06-17 18:49:35 -07:00
Arnd Bergmann ce221982e0 [PATCH] powerpc: enable CPU_FTR_CI_LARGE_PAGE for cell
Reflect the fact that the Cell Broadband Engine supports 64k
pages by adding the bit to the CPU features.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:56:24 -07:00
Arnd Bergmann 19242b2407 [PATCH] powerpc: Fix 64k pages on non-partitioned machines
The page size encoding passed to tlbie is incorrect for new-style
large pages.  This fixes it.  This doesn't affect anything on older
machines because mmu_psize_defs[psize].penc (the page size encoding)
is 0 for 4k and 16M pages (the two are distinguished by a separate "is
a large page" bit).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:56:24 -07:00
Oleg Nesterov f53ae1dc34 [PATCH] arm_timer: remove a racy and obsolete PF_EXITING check
arm_timer() checks PF_EXITING to prevent BUG_ON(->exit_state)
in run_posix_cpu_timers().

However, for some reason it does so only for CPUCLOCK_PERTHREAD
case (which is imho wrong).

Also, this check is not reliable, PF_EXITING could be set on
another cpu without any locks/barriers just after the check,
so it can't prevent from attaching the timer to the exiting
task.

The previous patch makes this check unneeded.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:52:13 -07:00
Oleg Nesterov 30f1e3dd8c [PATCH] run_posix_cpu_timers: remove a bogus BUG_ON()
do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.
arm_timer() tries to protect against this race, but the check
is racy.

After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.

At this moment exiting task has no pending cpu timers, they were
cleanuped in __exit_signal()->posix_cpu_timers_exit{,_group}(),
so we can just return from irq.

John Stultz recently confirmed this bug, see

	http://marc.theaimsgroup.com/?l=linux-kernel&m=115015841413687

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:52:13 -07:00
Oleg Nesterov 8f17fc20bf [PATCH] check_process_timers: fix possible lockup
If the local timer interrupt happens just after do_exit() sets PF_EXITING
(and before it clears ->it_xxx_expires) run_posix_cpu_timers() will call
check_process_timers() with tasklist_lock + ->siglock held and

	check_process_timers:

		t = tsk;
		do {
			....

			do {
				t = next_thread(t);
			} while (unlikely(t->flags & PF_EXITING));
		} while (t != tsk);

the outer loop will never stop.

Actually, the window is bigger.  Another process can attach the timer
after ->it_xxx_expires was cleared (see the next commit) and the 'if
(PF_EXITING)' check in arm_timer() is racy (see the one after that).

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:52:13 -07:00
Stephen Hemminger 88d113601c [PATCH] sky2: netconsole suspend/resume interaction
A couple of fixes that should prevent crashes when using netconsole and
suspend/resume. First, netconsole poll routine shouldn't run unless the
device is up; second, the NAPI poll should be disabled during suspend.

This is only an issue on sky2, because it has to have one NAPI poll
routine for both ports on dual port boards. Normal drivers use
netif_rx_schedule_prep and that checks for netif_running.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:52:12 -07:00
Jens Axboe 991721572e [PATCH] Fix missing ret assignment in __bio_map_user() error path
If get_user_pages() returns less pages than what we asked for, we jump
to out_unmap which will return ERR_PTR(ret).  But ret can contain a
positive number just smaller than local_nr_pages, so be sure to set it
to -EFAULT always.

Problem found and diagnosed by Damien Le Moal <damien@sdl.hitachi.co.jp>

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:52:12 -07:00
Jens Axboe 16070428d3 [PATCH] fix cdrom open
Some time ago the cdrom open routine was changed so that we call the
driver's open routine before checking to see if it is read only.  However,
if we discovered that a read write open was not possible and the open
flags required a writable open, we just returned -EROFS without calling
the driver's release routine.   This seems to work for most cdrom drivers,
but breaks the Powerpc iSeries virtual cdrom rather badly.

This just inserts the release call in the error path to balance the call
to "->open()" done by "open_for_data()".

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:44:26 -07:00
Jens Axboe 553698f944 [PATCH] cfq-iosched: fix crash in do_div()
We don't clear the seek stat values in cfq_alloc_io_context(), and if
->seek_mean is unlucky enough to be set to -36 by chance, the first
invocation of cfq_update_io_seektime() will oops with a divide by zero
in do_div().

Just memset the entire cic instead of filling invididual values
independently.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-14 10:22:16 -07:00
Kirill Korotaev 9cedc194a7 [PATCH] Return error in case flock_lock_file failure
If flock_lock_file() failed to allocate flock with locks_alloc_lock()
then "error = 0" is returned. Need to return some non-zero.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-14 08:59:44 -07:00
Stephen Hemminger eb35cf60e4 [PATCH] sky2: stop/start hardware idle timer on suspend/resume
The resume bug was caused not by an early interrupt but because the idle
timeout was not being stopped on suspend.  Also disable hardware IRQ's
on suspend.  Will need to revisit this with hotplug?

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-13 13:16:41 -07:00
Stephen Hemminger 8ab8fca207 [PATCH] sky2: save/restore base hardware irq during suspend/resume
The hardware should be fully shut off during suspend, and the base
irq mask restored during resume.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-13 13:16:41 -07:00
Stephen Hemminger 26ec43f132 [PATCH] sky2: fix hotplug detect during poll
If the poll routine detects no hardware available, it needs to dequeue
it self from the network poll list. Linus didn't understand NAPI.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-13 13:16:41 -07:00
Stephen Hemminger f05267e7de [PATCH] sky2: don't hard code number of ports
It is cleaner, to not loop over both ports if only one exists.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-13 13:16:41 -07:00