Commit graph

3337 commits

Author SHA1 Message Date
Eric W. Biederman 7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Tejun Heo e8c8d1bc06 idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c
MAX_IDR_MASK is another weirdness in the idr interface.  As idr covers
whole positive integer range, it's defined as 0x7fffffff or INT_MAX.

Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
They basically mask off the sign bit and operate on the rest, so if
the caller, by accident, passes in a negative number, the sign bit
will be masked off and the remaining part will be used as if that was
the input, which is worse than crashing.

The constant is visible in idr.h and there are several users in the
kernel.

* drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()

  Basically used to test if adap->nr is a negative number which isn't
  -1 and returns -EINVAL if so.  idr_alloc() already has negative
  @start checking (w/ WARN_ON_ONCE), so this can go away.

* drivers/infiniband/core/cm.c:cm_alloc_id()
  drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()

  Used to wrap cyclic @start.  Can be replaced with max(next, 0).
  Note that this type of cyclic allocation using idr is buggy.  These
  are prone to spurious -ENOSPC failure after the first wraparound.

* fs/super.c:get_anon_bdev()

  The ID allocated from ida is masked off before being tested whether
  it's inside valid range.  ida allocated ID can never be a negative
  number and the masking is unnecessary.

Update idr_*() functions to fail with -EINVAL when negative @id is
specified and update other MAX_IDR_MASK users as described above.

This leaves MAX_IDR_MASK without any user, remove it and relocate
other MAX_IDR_* constants to lib/idr.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Wolfram Sang <wolfram@the-dreams.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:20 -08:00
Tejun Heo 80f22b4430 IB/qib: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Marciniszyn <infinipath@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:17 -08:00
Tejun Heo cffcd59f15 IB/ocrdma: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:17 -08:00
Tejun Heo 6a9200603d IB/mlx4: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:17 -08:00
Tejun Heo 5c213f8641 IB/ipath: convert to idr_alloc()
Convert to the much saner new idr interface.

[yongjun_wei@trendmicro.com.cn: use GFP_NOWAIT under spin lock]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:17 -08:00
Tejun Heo cbbbce1de2 IB/ehca: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:16 -08:00
Tejun Heo e8d4dd606b IB/cxgb4: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:16 -08:00
Tejun Heo 6fa780095f IB/cxgb3: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:16 -08:00
Tejun Heo ac1d68296b IB/amso1100: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:16 -08:00
Tejun Heo 3b069c5d85 IB/core: convert to idr_alloc()
Convert to the much saner new idr interface.

v2: Mike triggered WARN_ON() in idr_preload() because send_mad(),
    which may be used from non-process context, was calling
    idr_preload() unconditionally.  Preload iff @gfp_mask has
    __GFP_WAIT.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reported-by: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:16 -08:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Linus Torvalds 70a3a06d01 Main batch of InfiniBand/RDMA changes for 3.9:
- SRP error handling fixes from Bart Van Assche
  - Implementation of memory windows for mlx4 from Shani Michaeli
  - Lots of cxgb4 HW driver fixes from Vipul Pandya
  - Make iSER work for virtual functions, other fixes from Or Gerlitz
  - Fix for bug in qib HW driver from Mike Marciniszyn
  - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
  - Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRLPNoAAoJEENa44ZhAt0h0bMP/1xqlgDR3DGdoUoV4yTd6O9G
 Uccdb7og5o5tedVo+8xZz01y4at99P8FWW4hJ4os6k8n6yKoBVwo7qjN+BOR+JG5
 q8+O+ynUSIg4tGrb5sXcMnKXAXbw/vkftMWYNA41cbrM24DTYzB/2SLpvhwbFoTT
 tdQc2tgz5QaDqzWbagyCR4+k/IgO+Llrz/RvIdtz4dsTnTDogN7QCoSffX8n/Lpb
 DxtyXK4sdl3DAtd3CsIdsB/TSMb3RkHLCoSvmrWlLnqMdsbRxVnCVfBm4BOghW3J
 Y2K3joRoCjjIZSRNs/i0FMFkT/jbCXg1oXg9ek/a6YFNcgyk7z8iGyXrRY7fOnno
 8U2SfxJ69YpVYeJr+DSjaeHcmjsaYU7NN7JPxzvPKcJOIsxQJ/euJDXAXau3lEQY
 o9/p4JsGty0WHi1NanyygvghvBAoP1C5/59Sl4bHH5gckPyJT1kinPSCTT76YXGS
 WkSHg2mDhiJHy7Pnuy85iZldPoy2/5z09/I4aGMeL+8kUZbD4iFqzXIJU0HTsAim
 EONoRXDhIcN5DNVSVH1ig6nJ2a7Vhov4Z0r/vB8P4KhslBcqFwf2leC0eCoe5mNt
 SzcKhqosZDXoL8AwzpntzGIOid8pWmHbUx/PgIcoVXPjtl0h2ULNIFoYYyMZ3cyU
 AyN2tSiUZVddTV1/aKGL
 =RAQw
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband update from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.9:

   - SRP error handling fixes from Bart Van Assche

   - Implementation of memory windows for mlx4 from Shani Michaeli

   - Lots of cxgb4 HW driver fixes from Vipul Pandya

   - Make iSER work for virtual functions, other fixes from Or Gerlitz

   - Fix for bug in qib HW driver from Mike Marciniszyn

   - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman

   - Various cleanups and warning fixes from Julia Lawall, Paul Bolle,
     Wei Yongjun"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (41 commits)
  IB/mlx4: Advertise MW support
  IB/mlx4: Support memory window binding
  mlx4: Implement memory windows allocation and deallocation
  mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
  mlx4_core: Disable memory windows for virtual functions
  IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
  IB/srp: Fail I/O requests if the transport is offline
  IB/srp: Avoid endless SCSI error handling loop
  IB/srp: Avoid sending a task management function needlessly
  IB/srp: Track connection state properly
  IB/mlx4: Remove redundant NULL check before kfree
  IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable
  IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
  IB/iser: Enable iser when FMRs are not supported
  IB/iser: Avoid error prints on EAGAIN registration failures
  IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
  IB/uverbs: Implement memory windows support in uverbs
  IB/core: Add "type 2" memory windows support
  mlx4_core: Propagate MR deregistration failures to caller
  mlx4_core: Rename MPT-related functions to have mpt_ prefix
  ...
2013-02-26 11:41:08 -08:00
Roland Dreier ef4e359d9b Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next 2013-02-26 09:17:56 -08:00
Shani Michaeli b425388dc1 IB/mlx4: Advertise MW support
Indicate memory windows support through device capabilities, kernel
verb entries and the relevant uverbs command mask entries.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:32 -08:00
Shani Michaeli 6ff63e1940 IB/mlx4: Support memory window binding
* Implement memory windows binding in mlx4_ib_post_send.

* Implement mlx4_ib_bind_mw by deferring to mlx4_ib_post_send.

* Rename MLX4_WQE_FMR_PERM_* flags to MLX4_WQE_FMR_AND_BIND_PERM_*,
  indicating that they are used both for fast registration work
  requests, and for memory window bind work requests.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:32 -08:00
Shani Michaeli 804d6a89a5 mlx4: Implement memory windows allocation and deallocation
Implement MW allocation and deallocation in mlx4_core and mlx4_ib.
Pass down the enable bind flag when registering memory regions.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:32 -08:00
Roland Dreier f72dd56690 IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().

Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:42:15 -08:00
Bart Van Assche 2ce19e72f4 IB/srp: Fail I/O requests if the transport is offline
If an SRP target is no longer reachable and srp_reset_host() fails to
reconnect then ib_srp will invoke scsi_remove_host().  That function
will invoke __scsi_remove_device() for each LUN.  And that last
function will change the device state from SDEV_TRANSPORT_OFFLINE into
SDEV_CANCEL.  Certain user space software, e.g. older versions of
multipathd, continue queueing I/O to SCSI devices that are in the
SDEV_CANCEL state.

If these I/O requests are submitted as SG_IO that means that the
REQ_PREEMPT flag will be set and hence that these requests will be
passed to srp_queuecommand().  These requests will time out.  If new
requests are queued fast enough from user space these active requests
will prevent __scsi_remove_device() to finish.

Avoid this by failing I/O requests in the SDEV_CANCEL state if the
transport is offline.  Introduce a new variable to keep track of the
transport state instead of failing requests if (!target->connected ||
target->qp_in_error), so that the SCSI error handler has a chance to
retry commands after a transport layer failure occurred.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche c7c4e7ff80 IB/srp: Avoid endless SCSI error handling loop
If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out.  If aborting fails, a device reset will be attempted.  If
the device reset also fails a host reset will be attempted.  If the
host reset also fails the whole procedure will be repeated.

srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started.  Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.

Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
  has already started or if reconnecting fails.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche 3780d1f088 IB/srp: Avoid sending a task management function needlessly
Do not send a task management function if sending will fail anyway
because either there is no RDMA/RC connection or the QP is in the
error state.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche e1b2f13aba IB/srp: Track connection state properly
Remove an assignment that incorrectly overwrites the connection state
update by srp_connect_target().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Syam Sidhardhan c89d127128 IB/mlx4: Remove redundant NULL check before kfree
kfree on NULL pointer is a no-op.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:24:32 -08:00
Paul Bolle 57d88cffc8 IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable
Building qp.o triggers this gcc warning:

    drivers/infiniband/hw/mlx4/qp.c: In function ‘mlx4_ib_post_send’:
    drivers/infiniband/hw/mlx4/qp.c:1862:62: warning: ‘vlan’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    drivers/infiniband/hw/mlx4/qp.c:1752:6: note: ‘vlan’ was declared here

Looking at the code it is clear 'vlan' is only set and used if 'is_eth'
is non-zero. But by initializing 'vlan' to 0xffff, on

    gcc (Ubuntu 4.7.2-22ubuntu1) 4.7.2

on x86-64 at least, we fix the warning, and the compiler was already
setting 'vlan' to 0 in the generated code, so there's no real downside.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>

[ Get rid of unnecessary move of 'is_vlan' initialization.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:17:13 -08:00
Roland Dreier a29bec1241 IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
Matches the way they're used, and actually lets at least x86-64 generate
better code:

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-38 (-38)
    function                                     old     new   delta
    mlx4_ib_post_send                           4416    4378     -38

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:02:03 -08:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Or Gerlitz 5525d210fd IB/iser: Enable iser when FMRs are not supported
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:30 -08:00
Or Gerlitz 819a087316 IB/iser: Avoid error prints on EAGAIN registration failures
Under IO/CPU stress its possible that the FMR pool might not have a
free FMR mapping element for iSER to use because of incomplete
background unmapping processing.  In that case we get -EAGAIN and the
IO is pushed back to the SCSI layer which soon retries it.  No need to
be so verbose about that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:29 -08:00
Or Gerlitz b96e4abaae IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:29 -08:00
Linus Torvalds 9afa3195b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Assorted tiny fixes queued in trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits)
  DocBook: update EXPORT_SYMBOL entry to point at export.h
  Documentation: update top level 00-INDEX file with new additions
  ARM: at91/ide: remove unsused at91-ide Kconfig entry
  percpu_counter.h: comment code for better readability
  x86, efi: fix comment typo in head_32.S
  IB: cxgb3: delay freeing mem untill entirely done with it
  net: mvneta: remove unneeded version.h include
  time: x86: report_lost_ticks doesn't exist any more
  pcmcia: avoid static analysis complaint about use-after-free
  fs/jfs: Fix typo in comment : 'how may' -> 'how many'
  of: add missing documentation for of_platform_populate()
  btrfs: remove unnecessary cur_trans set before goto loop in join_transaction
  sound: soc: Fix typo in sound/codecs
  treewide: Fix typo in various drivers
  btrfs: fix comment typos
  Update ibmvscsi module name in Kconfig.
  powerpc: fix typo (utilties -> utilities)
  of: fix spelling mistake in comment
  h8300: Fix home page URL in h8300/README
  xtensa: Fix home page URL in Kconfig
  ...
2013-02-21 17:40:58 -08:00
Shani Michaeli 6b52a12bc3 IB/uverbs: Implement memory windows support in uverbs
The existing user/kernel uverbs API has IB_USER_VERBS_CMD_ALLOC/DEALLOC_MW.
Implement these calls, along with destroying user memory windows during
process cleanup.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-21 11:59:09 -08:00
Shani Michaeli 7083e42ee2 IB/core: Add "type 2" memory windows support
This patch enhances the IB core support for Memory Windows (MWs).

MWs allow an application to have better/flexible control over remote
access to memory.

Two types of MWs are supported, with the second type having two flavors:

    Type 1  - associated with PD only
    Type 2A - associated with QPN only
    Type 2B - associated with PD and QPN

Applications can allocate a MW once, and then repeatedly bind the MW
to different ranges in MRs that are associated to the same PD. Type 1
windows are bound through a verb, while type 2 windows are bound by
posting a work request.

The 32-bit memory key is composed of a 24-bit index and an 8-bit
key. The key is changed with each bind, thus allowing more control
over the peer's use of the memory key.

The changes introduced are the following:

* add memory window type enum and a corresponding parameter to ib_alloc_mw.
* type 2 memory window bind work request support.
* create a struct that contains the common part of the bind verb struct
  ibv_mw_bind and the bind work request into a single struct.
* add the ib_inc_rkey helper function to advance the tag part of an rkey.

Consumer interface details:

* new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and
  IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support
  for these features.

  Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or
  IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B
  memory windows. It can set neither to indicate it doesn't support
  type 2 windows at all.

* modify existing provides and consumers code to the new param of
  ib_alloc_mw and the ib_mw_bind_info structure

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-21 11:51:45 -08:00
Shani Michaeli 6108372070 mlx4_core: Propagate MR deregistration failures to caller
MR deregistration fails when memory windows are bound to the MR.
Handle such failures by propagating them to the caller ULP.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-21 11:38:43 -08:00
Shani Michaeli aee38fadd2 IB/mlx4_ib: Remove local invalidate segment unused fields
Remove unused fields from the local invalidate WQE segment structure.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-21 11:36:29 -08:00
Itai Garbi 5a2815f03c IPoIB: Don't attempt to release resources on error flow
If the ipoib client info isn't found on the _remove_one callback, we
must not attempt to scan the returned null list.  Found by Coverity.

Signed-off-by: Itai Garbi <igarbi@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:36 -08:00
Yan Burman 4b48680b55 IPoIB: Add version and firmware info to ethtool reporting
Implement version info as well as report firmware version and bus info
of the underlying IB HW device.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:36 -08:00
Shlomo Pongratz 9d1ad66e3e IPoIB: Fix ipoib_neigh hashing to use the correct daddr octets
The hash function introduced in commit b63b70d877 ("IPoIB: Use a
private hash table for path lookup in xmit path") was designd to use
the 3 octets of the IPoIB HW address that holds the remote QPN.
However, this currently isn't the case on little-endian machines,
because the the code there uses the flags part (octet[0]) and not the
last octet of the QPN (octet[3]).  Fix this.

The fix caused a checkpatch warning on line over 80 characters, to
solve that changed the name of the temp variable that holds the daddr.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:35 -08:00
Julia Lawall 6950a235b8 IB/mlx4: Adjust duplicate test
Delete successive tests to the same location.  The code tested the result
of a previous allocation, that itself was already tested.  It is changed to
test the result of the most recent allocation.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@s exists@
local idexpression y;
expression x,e;
@@

*if ( \(x == NULL\|IS_ERR(x)\|y != 0\) )
 { ... when forall
   return ...; }
... when != \(y = e\|y += e\|y -= e\|y |= e\|y &= e\|y++\|y--\|&y\)
    when != \(XT_GETPAGE(...,y)\|WMI_CMD_BUF(...)\)
*if ( \(x == NULL\|IS_ERR(x)\|y != 0\) )
 { ... when forall
   return ...; }
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-15 15:23:26 -08:00
Dan Carpenter cab66d1273 IB/mlx4: Fix bug unwinding on error in mlx4_ib_init_sriov()
We have to decrement "i" before calling mlx4_ib_free_demux_ctx() or we
free something that wasn't allocated.  That's fine for free_pv_object()
but it would lead to a NULL dereference calling mlx4_ib_free_demux_ctx().
The null dereference is because ->tun is NULL when we check:

	if (!ctx->tun[i])

Also we didn't free ->sriov.demux[0] so it was a small leak.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-15 15:22:26 -08:00
Wei Yongjun 8ab10f7537 RDMA/amso1100: Use module_pci_driver() to simplify the code
Use the module_pci_driver() macro to make the code simpler by
eliminating module_init and module_exit calls.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Steve WIse <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-15 15:16:10 -08:00
Stefan Hasko b23523d858 RDMA/cxgb4: Fix cast warning
Fix compile warning about cast to pointer from integer of different size.

Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-15 15:13:43 -08:00
Mike Marciniszyn bcc9b67a5b IB/qib: Fix QP locate/remove race
remove_qp() can execute concurrently with a qib_lookup_qpn() on
another CPU, which in of itself, is ok, given the RCU locking.

The issue is that remove_qp() NULLs out the qp->next field so that a
qib_lookup_qpn() might fail to find a qp if it occurs after the one
that is being deleted.  This is a momentary issue and subsequent
qib_lookup_qpn() calls would find the qp's since the search restarts
from the bucket head.  At scale, the issue might causes dropped
packets and unnecessary retransmissions.

The fix just deletes the qp->next NULL assignment to prevent the
remove_qp() from hiding qp's from qib_lookup_qpn().

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 17:04:18 -08:00
Paul Bolle 710a31102b RDMA/cxgb4: "cookie" can stay in host endianness
Work requests are passed between the host and the firmware with a
"cookie".  This cookie is swapped to big-endian when passed to the
firmware and back to host endianness on return.  This swapping seems
to be implemented incorrectly.  Moreover, the byte swapping triggers
GCC warnings on 32 bit:

    drivers/infiniband/hw/cxgb4/cm.c: In function ‘passive_ofld_conn_reply’:
    drivers/infiniband/hw/cxgb4/cm.c:2803:12: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    drivers/infiniband/hw/cxgb4/cm.c: In function ‘send_fw_pass_open_req’:
    drivers/infiniband/hw/cxgb4/cm.c:2941:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    [...]

But byte swapping isn't needed as the firmware doesn't actually touch
the cookie.  Dropping byte swapping makes the warnings go away too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:55:05 -08:00
Vipul Pandya ef5d6355ed RDMA/cxgb4: Address sparse warnings
Fixe the following types of sparse warnings
- cast to pointer from integer of different size
- cast from pointer to integer of different size
- incorrect type in assignment (different base types)
- incorrect type in argument 1 (different base types)
- cast from restricted __be64
- cast from restricted __be32

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:58 -08:00
Vipul Pandya b3de6cfebc RDMA/cxgb4: Insert hwtid in pass_accept_req instead in pass_establish
CPL_ABORT_REQ_RSS can come before TCP connection is established.  In
such case peer_abort was trying to remove the hwtid, which was not
inserted.  To avoid this we insert the hwtid when we are sure that we
are surely going to send passive accept request.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:58 -08:00
Vipul Pandya 7c0a33d611 RDMA/cxgb4: Don't wakeup threads for MPAv2
Don't wakeup threads blocked in rdma_init/rdma_fini if we are on
MPAv2, and want to retry connection with MPAv1.

Stop ep-timer on getting MPA version mismatch, before doing the
abort_connection - in process_mpa_request.

Take care to stop ep-timer in error paths for process_mpa_request.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:57 -08:00
Vipul Pandya fe7e0a4dd0 RDMA/cxgb4: Don't reconnect on abort for mpa_rev 1
Only reconnect if the endpoint wasn't freed.

peer_abort() should only attempt to reconnect if the endpoint wasn't
freed.  Also remove hwtid from the debugfs idr.

Add missing check for peer2peer in MPAv2 code

Use correct mpa version on reject.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:57 -08:00
Vipul Pandya 1ec779cc29 RDMA/cxgb4: Fix endpoint timeout race condition
The endpoint timeout logic had a race that could cause an endpoint
object to be freed while it was still on the timedout list.  This
can happen if the timer is stopped after it had fired, but before
the timedout thread processed the endpoint timeout.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:57 -08:00
Vipul Pandya e8e5b9278b RDMA/cxgb4: Only log rx_data warnings if cpl status is non-zero
With newer firmware, we can get streaming data due to connection
errors before the driver moves the QP out of RTS.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:56 -08:00
Vipul Pandya 04236df2a5 RDMA/cxgb4: Always log async errors
Log AEs even if the QP isn't in RTS.  It is useful information.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:56 -08:00
Vipul Pandya 325abead6c RDMA/cxgb4: Keep QP referenced until TID released
The driver is currently releasing the last ref on the QP too early.
This can cause bus errors due to HW still fetching WRs from the HW
queue.  The fix is to keep a qp ref until we release the HW TID.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:56 -08:00
Vipul Pandya 1557967bf9 RDMA/cxgb4: Display streaming mode error only if detected in RTS
With later firmware, the chances of getting streaming mode data after
we exit RTS is likely, so we don't need to warn for it.  The only real
case where we don't expect it is when the QP is in RTS.

Move QP to ERROR when streaming mode data received.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:55 -08:00
Vipul Pandya 91e9c07195 RDMA/cxgb4: Abort connections when moving to ERROR state
If a FINI operation fails, then we need to ABORT instead of CLOSE.
Also, if we ABORT due to unexpected STREAMING data, then wake up
anybody blocked in FINI...

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:55 -08:00
Vipul Pandya 55abf8df0a RDMA/cxgb4: Abort connections that receive unexpected streaming mode data
This error means the RDMA connection was knocked out of RDMA mode,
probably due to an error on the connection.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 15:51:55 -08:00
David S. Miller fd5023111c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Synchronize with 'net' in order to sort out some l2tp, wireless, and
ipv6 GRE fixes that will be built on top of in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-08 18:02:14 -05:00
Roland Dreier cbdba97a0f Merge branches 'ipoib', 'mlx4' and 'qib' into for-next 2013-02-05 09:45:25 -08:00
Mike Marciniszyn d359f35430 IB/qib: Fix for broken sparse warning fix
Commit 1fb9fed6d4 ("IB/qib: Fix QP RCU sparse warning") broke QP
hash list deletion in qp_remove() badly.

This patch restores the former for loop behavior, while still fixing
the sparse warnings.

Cc: <stable@vger.kernel.org>
Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-05 09:43:09 -08:00
Shlomo Pongratz 7e5a90c25f IPoIB: Fix crash due to skb double destruct
After commit b13912bbb4 ("IPoIB: Call skb_dst_drop() once skb is
enqueued for sending"), using connected mode and running multithreaded
iperf for long time, ie

    iperf -c <IP> -P 16 -t 3600

results in a crash.

After the above-mentioned patch, the driver is calling skb_orphan() and
skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
(also in ipoib_ib.c::ipoib_send())

The problem with this is, as is written in a comment in both routines,
"it's entirely possible that the completion handler will run before we
execute anything after the post_send()."  This leads to running the
skb cleanup routines simultaneously in two different contexts.

The solution is to always perform the skb_orphan() and skb_dst_drop()
before queueing the send work request.  If an error occurs, then it
will be no different than the regular case where dev_free_skb_any() in
the completion path, which is assumed to be after these two routines.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-05 09:35:06 -08:00
Jesper Juhl fe194f19da IB: cxgb3: delay freeing mem untill entirely done with it
Sure, it's just the pointer value we use, but the coverity checker
complains about a use-after-free bug and it really does seem cleaner
to delay freeing until we are entirely done with the memory. So,
rearrange the code to move the kfree() later untill we are completely
done.
Trivial and harmless, but nice IMHO.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-29 10:52:20 +01:00
David S. Miller 4b87f92259 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/networking/ip-sysctl.txt
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

Both conflicts were simply overlapping context.

A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-15 15:05:59 -05:00
Jiri Pirko aaeb6cdfa5 remove init of dev->perm_addr in drivers
perm_addr is initialized correctly in register_netdevice() so to init it in
drivers is no longer needed.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 18:00:48 -08:00
Jiri Pirko 7826d43f2d ethtool: fix drvinfo strings set in drivers
Use strlcpy where possible to ensure the string is \0 terminated.
Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN
and custom defines.
Use snprintf instead of sprint.
Remove unnecessary inits of ->fw_version
Remove unnecessary inits of drvinfo struct.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 21:06:31 -08:00
Jiri Pirko 7f6e7101df nes: remove usage of dev->master
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-04 13:31:50 -08:00
Greg Kroah-Hartman 1e6d9abea7 Drivers: infinband: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitdata,
and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:15 -08:00
Linus Torvalds 184e251661 Second batch of InfiniBand/RDMA changes for 3.8:
- cxgb4 changes to fix lookup engine hash collisions
  - mlx4 changes to make flow steering usable
  - fix to IPoIB to avoid pinning dst reference for too long
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQ1NdhAAoJEENa44ZhAt0hjb0P/i0tL4Ux+PvqG/Phh2gaZaQR
 evoi3bw6tCYFzWEJPfur2AJ63svKjyrSrfpvgEZjhthDdYORjIv2Dw2Je1qkOSrf
 5tJtSp3r9D0y35SE/DxNnzlgua9heBqphPlOGpjcKdN83KP7XIyVG2SGyJiEeLza
 owefPx/48jZr8hsw7LB2DlZmNUbVWK00o+pXa/VsUQX/dlIU5hyihAzBjtkwJyT2
 xtiyu9oqXGuv1JW/a16ooPGDaETDLJ1G50NndadUZYWFWj36VrAwW6hOAK3oOikf
 Qa4z3gJVzpSdaC1kiuxERj7GxlRpVUJY0IgHEoMTVrexOz1IsFEP8KEfGLkAYwzB
 jjuXh+Z2+QU5OOO3un0nINRGxKZUSD8Scoa222GBwGnWuCCq68APx2UGTkVkhWon
 FyjhF13WJRbElg2oXzI1cg9lJNv2pf10hXhiy2qdO6tDElVXVQk+KRiDdcNtxS0H
 FYYh3og/DjFwp18j+FVLA9r5AiPuVV5DjNnlwBNjTTMc9RwlOrX/6oCK2kZN24rZ
 l+Nr0gv+h6MAjTPBPYLUP2bsY6wYt5n566mfLea/lir9YeI+Q3PL2sDdGZ7C6xHH
 S4pRW95leP4pEFpHqYg8z3QKPswYqzokgbUSHxg2TVYrV01RC75axXCh0Q90q3RE
 oLP8GDYod2okxhAU2AyN
 =lDXt
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull more infiniband changes from Roland Dreier:
 "Second batch of InfiniBand/RDMA changes for 3.8:
   - cxgb4 changes to fix lookup engine hash collisions
   - mlx4 changes to make flow steering usable
   - fix to IPoIB to avoid pinning dst reference for too long"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cxgb4: Fix bug for active and passive LE hash collision path
  RDMA/cxgb4: Fix LE hash collision bug for passive open connection
  RDMA/cxgb4: Fix LE hash collision bug for active open connection
  mlx4_core: Allow choosing flow steering mode
  mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
  mlx4_core: Fix error flow in the flow steering wrapper
  mlx4_core: Add QPN enforcement for flow steering rules set by VFs
  cxgb4: Add LE hash collision bug fix path in LLD driver
  cxgb4: Add T4 filter support
  IPoIB: Call skb_dst_drop() once skb is enqueued for sending
2012-12-21 16:40:26 -08:00
Roland Dreier d72623b665 Merge branches 'cxgb4', 'ipoib' and 'mlx4' into for-next 2012-12-19 23:03:43 -08:00
Vipul Pandya 793dad94e7 RDMA/cxgb4: Fix bug for active and passive LE hash collision path
Retries active opens for INUSE errors.

Logs any active ofld_connect_wr error replies.

Sends ofld_connect_wr on same ctrlq. It needs to go  on the same control txq as
regular CPL active/passive messages.

Retries on active open replies with EADDRINUSE.

Uses active open fw wr only if active filter region is set.

Adds stat for ofld_connect_wr failures.

This patch also adds debugfs file to show endpoints.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 23:03:12 -08:00
Vipul Pandya 1cab775c3e RDMA/cxgb4: Fix LE hash collision bug for passive open connection
It establishes passive open connection through firmware work request. Passive
open connection will go through this path as now instead of listening server we
create a server filter which will redirect the incoming SYN packet to the
offload queue. After this driver tries to establish the connection using
firmware work request.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 23:03:11 -08:00
Vipul Pandya 5be78ee924 RDMA/cxgb4: Fix LE hash collision bug for active open connection
It enables establishing active open connection using fw_ofld_connection work
request when cpl_act_open_rpl says TCAM full error which may be because
of LE hash collision. Current support is only for IPv4 active open connections.

Sets ntuple bits in active open requests. For T4 firmware greater than 1.4.10.0
ntuple bits are required to be set.

Adds nocong and enable_ecn module parameter options.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>

[ Move all FW return values to t4fw_api.h.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 23:02:43 -08:00
Roland Dreier b13912bbb4 IPoIB: Call skb_dst_drop() once skb is enqueued for sending
Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently.  It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).

However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet.  This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst.  We've seen this lead
to pathological behavior with respect to route and neighbour GC.  The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().

Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 09:16:43 -08:00
Linus Torvalds 16e024f30c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlight is probably some base POWER8 support.  There's more
  to come such as transactional memory support but that will wait for
  the next one.

  Overall it's pretty quiet, or rather I've been pretty poor at picking
  things up from patchwork and reviewing them this time around and Kumar
  no better on the FSL side it seems..."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (73 commits)
  powerpc+of: Rename and fix OF reconfig notifier error inject module
  powerpc: mpc5200: Add a3m071 board support
  powerpc/512x: don't compile any platform DIU code if the DIU is not enabled
  powerpc/mpc52xx: use module_platform_driver macro
  powerpc+of: Export of_reconfig_notifier_[register,unregister]
  powerpc/dma/raidengine: add raidengine device
  powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct
  powerpc/mpc85xx: Change spin table to cached memory
  powerpc/fsl-pci: Add PCI controller ATMU PM support
  powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI
  drivers/virt: the Freescale hypervisor driver doesn't need to check MSR[GS]
  powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers
  powerpc: Disable relocation on exceptions when kexecing
  powerpc: Enable relocation on during exceptions at boot
  powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function
  powerpc: Add wrappers to enable/disable relocation on exceptions
  powerpc: Add set_mode hcall
  powerpc: Setup relocation on exceptions for bare metal systems
  powerpc: Move initial mfspr LPCR out of __init_LPCR
  powerpc: Add relocation on exception vector handlers
  ...
2012-12-18 09:58:09 -08:00
Linus Torvalds 5bd665f28d Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target updates from Nicholas Bellinger:
 "It has been a very busy development cycle this time around in target
  land, with the highlights including:

   - Kill struct se_subsystem_dev, in favor of direct se_device usage
     (hch)
   - Simplify reservations code by combining SPC-3 + SCSI-2 support for
     virtual backends only (hch)
   - Simplify ALUA code for virtual only backends, and remove left over
     abstractions (hch)
   - Pass sense_reason_t as return value for I/O submission path (hch)
   - Refactor MODE_SENSE emulation to allow for easier addition of new
     mode pages.  (roland)
   - Add emulation of MODE_SELECT (roland)
   - Fix bug in handling of ExpStatSN wrap-around (steve)
   - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
   - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
   - Convert ib_srpt to use modern target_submit_cmd caller + drop
     legacy ioctx->kref usage (nab)
   - Convert ib_srpt to use modern target_submit_tmr caller (nab)
   - Add link_magic for fabric allow_link destination target_items for
     symlinks within target_core_fabric_configfs.c code (nab)
   - Allocate pointers in instead of full structs for
     config_group->default_groups (sebastian)
   - Fix 32-bit highmem breakage for FILEIO (sebastian)

  All told, hch was able to shave off another ~1K LOC by killing the
  se_subsystem_dev abstraction, along with a number of PR + ALUA
  simplifications.  Also, a nice patch by Roland is the refactoring of
  MODE_SENSE handling, along with the addition of initial MODE_SELECT
  emulation support for virtual backends.

  Sebastian found a long-standing issue wrt to allocation of full
  config_group instead of pointers for config_group->default_group[]
  setup in a number of areas, which ends up saving memory with big
  configurations.  He also managed to fix another long-standing BUG wrt
  to broken 32-bit highmem support within the FILEIO backend driver.

  Thank you again to everyone who contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi_target: Add NodeACL tags for initiator group support
  target/tcm_fc: fix the lockdep warning due to inconsistent lock state
  sbp-target: fix error path in sbp_make_tpg()
  sbp-target: use simple assignment in tgt_agent_rw_agent_state()
  iscsi-target: use kstrdup() for iscsi_param
  target/file: merge fd_do_readv() and fd_do_writev()
  target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
  target: Add link_magic for fabric allow_link destination target_items
  ib_srpt: Convert TMR path to target_submit_tmr
  ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
  target: Make spc_get_write_same_sectors return sector_t
  target/configfs: use kmalloc() instead of kzalloc() for default groups
  target/configfs: allocate only 6 slots for dev_cg->default_groups
  target/configfs: allocate pointers instead of full struct for default_groups
  target: update error handling for sbc_setup_write_same()
  iscsit: use GFP_ATOMIC under spin lock
  iscsi_target: Remove redundant null check before kfree
  target/iblock: Forward declare bio helpers
  target: Clean up flow in transport_check_aborted_status()
  target: Clean up logic in transport_put_cmd()
  ...
2012-12-15 14:25:10 -08:00
Roland Dreier 01e0336598 Merge branch 'nes' into for-next 2012-12-08 20:45:48 -08:00
Tatyana Nikolova 7d9c199a55 RDMA/nes: Fix for crash when registering zero length MR for CQ
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-08 00:31:03 -08:00
Tatyana Nikolova 7bfcfa51c3 RDMA/nes: Fix for terminate timer crash
The terminate timer needs to be initialized just once.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-08 00:31:02 -08:00
Tatyana Nikolova 00ad255d17 RDMA/nes: Fix for BUG_ON due to adding already-pending timer
To avoid nes tcp_timer crash for SMP architectures, add_timer is
replaced with mod_timer.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-08 00:31:02 -08:00
Roland Dreier fb57e1dbbd Merge branch 'srp' into for-next 2012-11-30 17:41:04 -08:00
Bart Van Assche dc1bdbd9b8 IB/srp: Allow SRP disconnect through sysfs
Make it possible to disconnect the IB RC connection used by the SRP
protocol to communicate with a target.

Have the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:33 -08:00
Vu Pham 55d93898a1 IB/srp: send disconnect request without waiting for CM timewait exit
Now that SRP recreates the CM ID, QP, and CQ for each connection,
there is no need to wait for the timewait state to complete.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Ishai Rabinovitz 73aa89ed9e IB/srp: destroy and recreate QP and CQs when reconnecting
HW QP FATAL errors persist over a reset operation, but we can recover
from that by recreating the QP and associated CQs for each connection.
Creating a new QP/CQ also completely forecloses any possibility of
getting stale completions or packets on the new connection.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>

[ updated to current code from OFED, cleaned up commit message ]

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Bart Van Assche ef6c49d87c IB/srp: Eliminate state SRP_TARGET_DEAD
Only queue removal work after having changed the target state
into SRP_TARGET_REMOVED and not if that state was already equal
to SRP_TARGET_REMOVED.  That allows us to remove the state
SRP_TARGET_DEAD.  Add a call to srp_disconnect_target() in
srp_remove_target() -- due to previous changes it is now safe to
invoke that function even if the IB connection has already
been disconnected.  This change allows us to replace the target
removal code in srp_remove_one() by an (indirect) call to
srp_remove_target().  Rename srp_target_port.work into
srp_target_port.remove_work to reflect its usage.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche ee12d6a80c IB/srp: Introduce the helper function srp_remove_target()
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche 294c875a65 IB/srp: Suppress superfluous error messages
Keep track of the connection state.  Only report QP errors while
connected.  Only invoke ib_send_cm_dreq() when connected so that
invoking srp_disconnect_target() after having received a DREQ does not
cause an error message to be printed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche 4f0af69799 IB/srp: Process all error completions
If the RDMA RC connection is closed, tell the SCSI mid-layer to
terminate all pending commands instead of only the first.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche 948d1e889e IB/srp: Introduce srp_handle_qp_err()
Introduce the function srp_handle_qp_err(), change the type of
qp_in_error from int into bool and move the initialization of that
variable from srp_reconnect_target() to srp_connect_target().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche 224db15743 IB/srp: Simplify SCSI error handling
Since scsi_remove_host() has been modified so that SCSI error handling
functions will no longer be invoked after scsi_remove_host() returns,
the test at the start of srp_send_tsk_mgmt() is now superfluous.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche f371823120 IB/srp: Keep processing commands during host removal
Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from
inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()).
Make sure that these commands have a chance to reach the SCSI device.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche 09be70a238 IB/srp: Eliminate state SRP_TARGET_CONNECTING
Block the SCSI host while reconnecting instead of representing the
reconnection activity as a distinct SRP target state.  This allows us
to eliminate the target state SRP_TARGET_CONNECTING.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Bart Van Assche c9b03c1ae5 IB/srp: Increase block layer timeout
Increase the block layer timeout for disks so that it is above the
InfiniBand transport layer timeout.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Roland Dreier 78c90247af Merge branches 'cma' and 'mlx4' into for-next 2012-11-29 12:16:43 -08:00
shefty 63f05be2c0 RDMA/cm: Change return value from find_gid_port()
Problem reported by Dan Carpenter <dan.carpenter@oracle.com>:

The patch 3c86aa70bf: "RDMA/cm: Add RDMA CM support for IBoE
devices" from Oct 13, 2010, leads to the following warning:
net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create()
	 error: passing non neg 1 to ERR_PTR

This bug would result in a NULL dereference.  svc_rdma_create() is
supposed to return ERR_PTRs or valid pointers, but instead it returns
ERR_PTRs, valid pointers and 1.

The call tree is:

svc_rdma_create()
   => rdma_bind_addr()
      => cma_acquire_dev()
         => find_gid_port()

rdma_bind_addr() should return a valid errno.  Fix this by having
find_gid_port() also return a valid errno.  If we can't find the
specified GID on a given port, return -EADDRNOTAVAIL, rather than
-EAGAIN, to better indicate the error.  We also drop using the
special return value of '1' and instead pass through the error
returned by the underlying verbs call.  On such errors, rather
than aborting the search,  we simply continue to check the next
device/port.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-29 12:16:29 -08:00
Jack Morgenstein ceb7decb36 IB/mlx4: Fix spinlock order to avoid lockdep warnings
lockdep warns about taking a hard-irq-unsafe lock (sriov->id_map_lock)
inside a hard-irq-safe lock (sriov->going_down_lock).

Since id_map_lock is never taken in the interrupt context, we can
simply reverse the order of taking the two spinlocks, thus avoiding
the warning and the depencency.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-29 12:14:45 -08:00
Nicholas Bellinger 3e4f574857 ib_srpt: Convert TMR path to target_submit_tmr
This patch converts the TMR path in srpt_handle_tsk_mgmt() to use
target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage.

v2: Drop ununused res in target_submit_tmr (Fengguang.Wu)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 11:20:28 -08:00
Nicholas Bellinger 9474b04313 ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
This patch converts the main srpt_handle_cmd() I/O path to use modern
target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage.  This includes
dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
response callbacks.

It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
completion of aborted commands, and adds target_wait_for_sess_cmds() into
srpt_release_channel_work() to allow outstanding I/O to complete during
session shutdown.

Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 01:10:59 -08:00
Roland Dreier d0951d2133 Merge branches 'cxgb4', 'misc', 'mlx4', 'nes' and 'uapi' into for-next 2012-11-26 11:08:41 -08:00
Julia Lawall 5107c2a3d1 RDMA/cxgb3: use WARN
Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 11:08:16 -08:00
Julia Lawall 76f267b7ad RDMA/cxgb4: use WARN
Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 11:07:18 -08:00
Or Gerlitz 08ff32352d mlx4: 64-byte CQE/EQE support
ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 10:19:17 -08:00
Benjamin Herrenschmidt 2a859ab07b Merge branch 'merge' into next
Merge my own merge branch to get various fixes from there
and upstream, especially the hvc console tty refcouting fixes
which which testing is quite a bit harder...
2012-11-26 09:23:57 +11:00