Commit graph

1775 commits

Author SHA1 Message Date
Parav Pandit 4ad6a0245e IB/{core, cm, cma, ipoib}: Rename ib_init_ah_from_path to ib_init_ah_attr_from_path
Since ib_init_ah_from_path initializes the address handle attribute, it is
renamed to reflect so.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:11 -07:00
Parav Pandit dbb12562f7 IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer
Currently there are no users of ib_find_gid for RoCE transport. It is
only used by IPoIB.
Therefore its simplified to ignore RoCE ports and GID type check which
was previously done for every port.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:10 -07:00
Erez Shitrit 98aebc550b IB/ipoib: Update pathrec field if not valid record
In case that the PathRecord is not valid (SM changed its network prefix)
ipoib will continue issue PathQuery requests with the same parameters
that are in its database, which are no longer valid anymore.

Now the driver in that case will re-initialize the record from a valid
place (the priv structure keeps the updated values), and a valid request
will be issued.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:05 -07:00
Erez Shitrit 439000892e IB/ipoib: Avoid memory leak if the SA returns a different DGID
The ipoib path database is organized around DGIDs from the LLADDR, but the
SA is free to return a different GID when asked for path. This causes a
bug because the SA's modified DGID is copied into the database key, even
though it is no longer the correct lookup key, causing a memory leak and
other malfunctions.

Ensure the database key does not change after the SA query completes.

Demonstration of the bug is as  follows
ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it
creates new record in the DB with that gid as a key, and issues a new
request to the SM.
Now, the SM from some reason returns path-record with other SGID (for
example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local
subnet prefix) now ipoib will overwrite the current entry with the new
one, and if new request to the original GID arrives ipoib  will not find
it in the DB (was overwritten) and will create new record that in its
turn will also be overwritten by the response from the SM, and so on
till the driver eats all the device memory.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:05 -07:00
Yuval Shaia ac6dbf7fa4 IB/ipoib: Warn when one port fails to initialize
If one port fails to initialize an error message should indicate the
reason and driver should continue serving the working port(s) and other
HCA(s).

Fixes: e4b2d06892 ("IB/ipoib: Remove device when one port fails to init").
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-13 10:55:49 -07:00
Yuval Shaia 9d98e19ba0 IB/ipoib: Restore MM behavior in case of tx_ring allocation failure
memalloc_noio_save modifies the behavior of MM, we must restore it after
we are done.

Fixes: d83187dda9 ("IB/IPoIB: Convert IPoIB to memalloc_noio_* calls")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-13 10:31:57 -07:00
Andy Shevchenko e711f968c4 IB/srp: replace custom implementation of hex2bin()
There is no need to have a duplication of the generic library, i.e.
hex2bin().

Replace the open coded variant.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-11 16:19:45 -07:00
Yuval Shaia c55359a23c IB/ipoib: Replace printk with pr_warn
pr_* is the preferred way to print messages, replace all
printk(KERN_WARN, ...) with pr_warn.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-11 16:19:43 -07:00
Linus Torvalds ad0835a930 Updates for 4.15 kernel merge window
- Add iWARP support to qedr driver
 - Lots of misc fixes across subsystem
 - Multiple update series to hns roce driver
 - Multiple update series to hfi1 driver
 - Updates to vnic driver
 - Add kref to wait struct in cxgb4 driver
 - Updates to i40iw driver
 - Mellanox shared pull request
 - timer_setup changes
 - massive cleanup series from Bart Van Assche
 - Two series of SRP/SRPT changes from Bart Van Assche
 - Core updates from Mellanox
 - i40iw updates
 - IPoIB updates
 - mlx5 updates
 - mlx4 updates
 - hns updates
 - bnxt_re fixes
 - PCI write padding support
 - Sparse/Smatch/warning cleanups/fixes
 - CQ moderation support
 - SRQ support in vmw_pvrdma
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
 eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
 3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
 V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
 b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
 GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
 TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
 E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
 EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
 jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
 Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
 3JMrjjLxtCIvpUyzCvDW
 =A1oL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a fairly plain pull request. Lots of driver updates across the
  stack, a huge number of static analysis cleanups including a close to
  50 patch series from Bart Van Assche, and a number of new features
  inside the stack such as general CQ moderation support.

  Nothing really stands out, but there might be a few conflicts as you
  take things in. In particular, the cleanups touched some of the same
  lines as the new timer_setup changes.

  Everything in this pull request has been through 0day and at least two
  days of linux-next (since Stephen doesn't necessarily flag new
  errors/warnings until day2). A few more items (about 30 patches) from
  Intel and Mellanox showed up on the list on Tuesday. I've excluded
  those from this pull request, and I'm sure some of them qualify as
  fixes suitable to send any time, but I still have to review them
  fully. If they contain mostly fixes and little or no new development,
  then I will probably send them through by the end of the week just to
  get them out of the way.

  There was a break in my acceptance of patches which coincides with the
  computer problems I had, and then when I got things mostly back under
  control I had a backlog of patches to process, which I did mostly last
  Friday and Monday. So there is a larger number of patches processed in
  that timeframe than I was striving for.

  Summary:
   - Add iWARP support to qedr driver
   - Lots of misc fixes across subsystem
   - Multiple update series to hns roce driver
   - Multiple update series to hfi1 driver
   - Updates to vnic driver
   - Add kref to wait struct in cxgb4 driver
   - Updates to i40iw driver
   - Mellanox shared pull request
   - timer_setup changes
   - massive cleanup series from Bart Van Assche
   - Two series of SRP/SRPT changes from Bart Van Assche
   - Core updates from Mellanox
   - i40iw updates
   - IPoIB updates
   - mlx5 updates
   - mlx4 updates
   - hns updates
   - bnxt_re fixes
   - PCI write padding support
   - Sparse/Smatch/warning cleanups/fixes
   - CQ moderation support
   - SRQ support in vmw_pvrdma"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
  RDMA/core: Rename kernel modify_cq to better describe its usage
  IB/mlx5: Add CQ moderation capability to query_device
  IB/mlx4: Add CQ moderation capability to query_device
  IB/uverbs: Add CQ moderation capability to query_device
  IB/mlx5: Exposing modify CQ callback to uverbs layer
  IB/mlx4: Exposing modify CQ callback to uverbs layer
  IB/uverbs: Allow CQ moderation with modify CQ
  iw_cxgb4: atomically flush the qp
  iw_cxgb4: only call the cq comp_handler when the cq is armed
  iw_cxgb4: Fix possible circular dependency locking warning
  RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
  IB/core: Only maintain real QPs in the security lists
  IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
  RDMA/core: Make function rdma_copy_addr return void
  RDMA/vmw_pvrdma: Add shared receive queue support
  RDMA/core: avoid uninitialized variable warning in create_udata
  RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
  RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
  RDMA/bnxt_re: Set QP state in case of response completion errors
  RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
  ...
2017-11-15 14:54:53 -08:00
Linus Torvalds 1be2172e96 Modules updates for v4.15
Summary of modules changes for the 4.15 merge window:
 
 - Treewide module_param_call() cleanup, fix up set/get function
   prototype mismatches, from Kees Cook
 
 - Minor code cleanups
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
 Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
 2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
 iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
 D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
 bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
 5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
 6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
 kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
 ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
 7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
 fbGX8juZWP4ypbK+irTB
 =c8vb
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "Summary of modules changes for the 4.15 merge window:

   - treewide module_param_call() cleanup, fix up set/get function
     prototype mismatches, from Kees Cook

   - minor code cleanups"

* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Do not paper over type mismatches in module_param_call()
  treewide: Fix function prototypes for module_param_call()
  module: Prepare to convert all module_param_call() prototypes
  kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
2017-11-15 13:46:33 -08:00
Leon Romanovsky 4190b4e969 RDMA/core: Rename kernel modify_cq to better describe its usage
Current ib_modify_cq() is used to set CQ moderation parameters.

This patch renames ib_modify_cq() to be rdma_set_cq_moderation(),
because the kernel version of RDMA API doesn't need to follow already
exposed to user's API pattern (create_XXX/modify_XXX/query_XXX/destroy_XXX)
and better to have more accurate name which describes the actual usage.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:59:22 -05:00
Bart Van Assche 57b0c46054 IB/srpt: Ensure that modifying the use_srq configfs attribute works
The use_srq configfs attribute is created after it is read. Hence
modify srpt_tpg_attrib_use_srq_store() such that this function
switches dynamically between non-SRQ and SRQ mode.

Reported-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Bart Van Assche 8b6dc529ea IB/srpt: Wait until channel release has finished during module unload
Introduce the helper function srpt_set_enabled(). Protect
sport->enabled changes with sdev->mutex. Makes configfs writes
into 'enabled' wait until all channel resources have been freed.
Wait until channel release has finished during kernel module
unload.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Bart Van Assche 01b3ee13c2 IB/srpt: Introduce srpt_disconnect_ch_sync()
Except for changing a BUG_ON() call into a WARN_ON_ONCE() call, this
patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Bart Van Assche c76d7d64f8 IB/srpt: Introduce helper functions for SRQ allocation and freeing
The only functional change in this patch is in the srpt_add_one()
error path: if allocating the ring buffer for the SRQ fails, fall
back to non-SRQ mode instead of disabling SRP target functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Mike Marciniszyn 321e329b9c IB/srpt: Post receive work requests after qp transition to INIT state
IB and iWARP specs both spell out that posting a receive work request
to a queue pair in the RESET state is an invalid operation and required
to fail.  Postpone posting receive work requests until after the
transition to the INIT state.

Fixes: commit dea262094c ("IB/srpt: Change default behavior from using SRQ to using RC")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
David S. Miller 2a171788ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Files removed in 'net-next' had their license header updated
in 'net'.  We take the remove from 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:26:51 +09:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Kees Cook e4dca7b7aa treewide: Fix function prototypes for module_param_call()
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:

@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@

 module_param_call(_name, _set_func, _get_func, _arg, _mode);

@fix_set_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _set_func(
-_val_type _val
+const char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

@fix_get_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _get_func(
-_val_type _val
+char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:

	drivers/platform/x86/thinkpad_acpi.c
	fs/lockd/svc.c

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2017-10-31 15:30:37 +01:00
Erez Shitrit b9595c5bac IB/ipoib: Change number of TX wqe to 64
NAPI budget is 64 packets, while maximum polling size for
the send CQ is 16. Let's bring them in sync, so the NAPI
budget will be reused completely.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 13:36:50 -04:00
Erez Shitrit 8966e28d2e IB/ipoib: Use NAPI in UD/TX flows
Instead of explicit call to poll_cq of the tx ring, use the NAPI mechanism
to handle the completions of each packet that has been sent to the HW.

The next major changes were taken:
 * The driver init completion function in the creation of the send CQ,
   that function triggers the napi scheduling.
 * The driver uses CQ for RX for both modes UD and CM, and CQ for TX
   for CM and UD.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 13:36:50 -04:00
Erez Shitrit 2c104ea683 IB/ipoib: Get rid of the tx_outstanding variable in all modes
The first step toward using NAPI in the UD/TX flow is to separate
between two flows, the NAPI and the xmit, meaning no use of shared
variables between both flows.

This patch takes out the tx_outstanding variable that was used in both
flows and instead the driver uses the 2 cyclic ring variables: tx_head
and tx_tail, tx_head used in the xmit flow and tx_tail in the NAPI flow.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 13:36:50 -04:00
Doug Ledford 894b82c427 Merge branch 'timer_setup' into for-next
Conflicts:
	drivers/infiniband/hw/cxgb4/cm.c
	drivers/infiniband/hw/qib/qib_driver.c
	drivers/infiniband/hw/qib/qib_mad.c

There were minor fixups needed in these files.  Just minor context diffs
due to patches from independent sources touching the same basic area.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:12:09 -04:00
Doug Ledford 754137a769 Merge branch 'for-next-early' into for-next
The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base.  I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:07:13 -04:00
Bart Van Assche 4c532d6ce1 IB/srp: Make CM timeout dependent on subnet timeout
For small networks it is safe to reduce the subnet timeout from
its default value (18 for opensm) to 16. Make the SRP CM timeout
dependent on the subnet timeout such that decreasing the subnet
timeout also causes SRP failover and failback to occur faster.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche cee687b68d IB/srp: Cache global rkey
This is a micro-optimization for the hot path.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche 9566b05492 IB/srp: Remove second argument of srp_destroy_qp()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche 8a0d18c621 IB/srp: Avoid that a cable pull can trigger a kernel crash
This patch fixes the following kernel crash:

general protection fault: 0000 [#1] PREEMPT SMP
Workqueue: ib_mad2 timeout_sends [ib_core]
Call Trace:
 ib_sa_path_rec_callback+0x1c4/0x1d0 [ib_core]
 send_handler+0xb2/0xd0 [ib_core]
 timeout_sends+0x14d/0x220 [ib_core]
 process_one_work+0x200/0x630
 worker_thread+0x4e/0x3b0
 kthread+0x113/0x150

Fixes: commit aef9ec39c4 ("IB: Add SCSI RDMA Protocol (SRP) initiator")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche dea262094c IB/srpt: Change default behavior from using SRQ to using RC
Although in the RC mode more resources are needed that mode has three
advantages over SRQ:
- It works with all RDMA adapters, even those that do not support
  SRQ.
- Posting WRs and polling WCs does not trigger lock contention
  because only one thread at a time accesses a WR or WC queue in
  non-SRQ mode.
- The end-to-end flow control mechanism is used.

>From the IB spec:

    C9-150.2.1: For QPs that are not associated with an SRQ, each HCA
    receive queue shall generate end-to-end flow control credits. If
    a QP is associated with an SRQ, the HCA receive queue shall not
    generate end-to-end flow control credits.

Add new configfs attributes that allow to configure which mode to use
(/sys/kernel/config/target/srpt/$GUID/$GUID/attrib/use_srq). Note:
only the attribute for port 1 is relevant on multi-port adapters.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche 74333f1223 IB/srpt: Cache global L_Key
This patch is a micro-optimization for the hot path.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche 7a01d05cd5 IB/srpt: Limit the send and receive queue sizes to what the HCA supports
Additionally, correct the comment about ch->rq_size.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche c70ca38960 IB/srpt: Do not accept invalid initiator port names
Make srpt_parse_i_port_id() return a negative value if hex2bin()
fails.

Fixes: commit a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:49:54 -04:00
Bart Van Assche d2271bd083 RDMA/isert: Suppress gcc 7 fall-through complaints
Avoid that gcc 7 reports the following warning when building with W=1:

warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:05 -04:00
Alex Vesker 980f91c3a6 IB/ipoib: Add ability to set PKEY index to lower device driver
To support passing child interfaces to the lower device a new
rdma_netdev function was used, set_id. This will allow us to
attach the PKEY index lower device resources such as TIS/QP.
For devices that do not support offloads in IPoIB same logic
will be used, setting the PKEY index to priv struct.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
2017-10-14 11:22:08 -07:00
Alex Vesker b4b678b06f IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
When ndo_open and ndo_stop are called RTNL lock should be held.
In this specific case ipoib_ib_dev_open calls the offloaded ndo_open
which re-sets the number of TX queue assuming RTNL lock is held.
Since RTNL lock is not held, RTNL assert will fail.

Signed-off-by: Alex Vesker <valex@mellanox.com>
2017-10-14 11:22:08 -07:00
Bart Van Assche 9d18717790 IB/core: Simplify sa_path_set_[sd]lid() calls
Instead of making every caller convert the second argument of
sa_path_set_slid() and sa_path_set_dlid() to big endian format,
make these two functions accept LIDs in CPU endian format.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Don Hiatt <don.hiatt@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-10 10:49:44 -04:00
Kees Cook 6d290d69ac IB/ipoib: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Alex Vesker <valex@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Cc: linux-rdma@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-09 12:19:41 -04:00
Ajaykumar Hotchandani b04dc19990 IB/{ipoib, iser}: Consistent print format of vendor error
Vendor error print should be consistent across protocols to avoid any
confusion.
This patch corrects that.

Suggested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29 11:18:56 -04:00
Doug Ledford 0ff4b7e6db Merge branch 'vnic' into k.o/for-next
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29 11:13:50 -04:00
Niranjana Vishwanathapura b209a368eb IB/opa_vnic: Add routing control information
Add protocol specific routing control information in the encapsulation
header as per the configuration.

Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Scott Franco <safranco@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:21:57 -04:00
Niranjana Vishwanathapura e82b7c388a IB/opa_vnic: Properly set vesw port status
Update eth_link_status and operating status information to
represent the overall status of the virtual Ethernet switch port.

Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:21:57 -04:00
Scott Franco 4bbdfe2560 IB/opa_vnic: Properly clear Mac Table Digest
Clear the MAC table digest when the MAC table is freed.

Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Scott Franco <safranco@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:21:57 -04:00
Niranjana Vishwanathapura b77eb45e0d IB/opa_vnic: Properly return the total MACs in UC MAC list
Do not include EM specified MAC address in total MACs of the
UC MAC list.

Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:21:57 -04:00
Niranjana Vishwanathapura 5a5a85da40 IB/opa_vnic: Allow reset of MAC address
Ensure MAC address is reset while deleting the vesw port.

Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:21:57 -04:00
Niranjana Vishwanathapura 7f291d2869 IB/opa_vnic: Set POD value for Ethernet MTU
Set power on default value of 1500 for eth_mtu.

Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:21:57 -04:00
Niranjana Vishwanathapura 62f1e84e46 IB/opa_vnic: Mark unused Ethernet MTU fields as reserved
Per pcp mtu fields are not used, mark them as reserved.

Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:21:57 -04:00
Alex Estrin 612601d001 Revert "IB/ipoib: Update broadcast object if PKey value was changed in index 0"
commit 9a9b811269 will cause core to fail UD QP from being destroyed
on ipoib unload, therefore cause resources leakage.
On pkey change event above patch modifies mgid before calling underlying
driver to detach it from QP. Drivers' detach_mcast() will fail to find
modified mgid it was never given to attach in a first place.
Core qp->usecnt will never go down, so ib_destroy_qp() will fail.

IPoIB driver actually does take care of new broadcast mgid based on new
pkey by destroying an old mcast object in ipoib_mcast_dev_flush())
....
	if (priv->broadcast) {
		rb_erase(&priv->broadcast->rb_node, &priv->multicast_tree);
		list_add_tail(&priv->broadcast->list, &remove_list);
		priv->broadcast = NULL;
	}
...

then in restarted ipoib_macst_join_task() creating a new broadcast mcast
object, sending join request and on completion tells the driver to attach
to reinitialized QP:
...
if (!priv->broadcast) {
...
	broadcast = ipoib_mcast_alloc(dev, 0);
...
	memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
	       sizeof (union ib_gid));
	priv->broadcast = broadcast;
...

Fixes: 9a9b811269 ("IB/ipoib: Update broadcast object if PKey value was changed in index 0")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:10:36 -04:00
Yuval Shaia e4b2d06892 IB/ipoib: Remove device when one port fails to init
Call ipoib_remove_one when one of the IPoIB ports fails to initialize in
order not to leave the module in unstable state.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 09:17:51 -04:00
Yuval Shaia 931bc0d916 IB: Move PCI dependency from root KConfig to HW's KConfigs
No reason to have dependency on PCI for the entire infiniband stack so
move it to KConfig of only the drivers that actually using PCI.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 08:54:19 -04:00
Alex Vesker 7c9d966210 IB/ipoib: Fix inconsistency with free_netdev and free_rdma_netdev
Call free_rdma_netdev instead of free_netdev each time we want to
release a netdevice. This call is also relevant for future freeing
of offloaded child interfaces.

This patch also adds a missing call for free netdevice when releasing
a parent interface that has child interfaces using ipoib_remove_one.

Fixes: cd565b4b51 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:24 -04:00
Shalom Lagziel 9c6f42e925 IB/ipoib: Fix sysfs Pkey create<->remove possible deadlock
A possible ABBA lock can happen with RTNL and vlan_rwsem.
For example:

Flow A: Device Flush
	__ipoib_ib_dev_flush
	down_read(vlan_rwsem) 			// Lock A
	ipoib_flush_ah
	flush_workqueue(priv->wq) 		// Wait for completion
	A work on shared WQ (Mcast carrier)
		ipoib_mcast_carrier_on_task
		while (!rtnl_trylock())         // Wait for lock B

Flow B: Sysfs PKEY delete
	ipoib_vlan_delete
	lock(RTNL) 				// Lock B
	down_write(vlan_rwsem) 			// Wait for lock A

This can happen with PKEY creates as well. The solution is to release
the RTNL lock in sysfs functions in case it is not possible to lock
VLAN RW semaphore and reset the SYS call.

Fixes: 69956d8326 ("IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock")
Signed-off-by: Shalom Lagziel <shaloml@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:23 -04:00
Parav Pandit edd3155114 IB: Correct MR length field to be 64-bit
The ib_mr->length represents the length of the MR in bytes as per
the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION).

Currently ib_mr->length field is defined as only 32-bits field.
This might result into truncation and failed WRs of consumers who
registers more than 4GB bytes memory regions and whose WRs accessing
such MRs.

This patch makes the length 64-bit to avoid such truncation.

Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Fixes: 4c67e2bfc8 ("IB/core: Introduce new fast registration API")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:23 -04:00
Santosh Shilimkar af3c79beff IB/ipoib: Suppress the retry related completion errors
IPoIB doesn't support transport/rnr retry schemes as per
RFC so those errors are expected. No need to flood the
log files with them.

Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reported-by: Rajiv Raja <rajiv.raja@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-22 13:12:36 -04:00
Doug Ledford a1139697ad Merge branch 'mellanox' into k.o/for-next
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 20:25:15 -04:00
Feras Daoud 31a8236276 IB/ipoib: Enable ioctl for to IPoIB rdma netdevs
Adds support for ioctl callback in the RDMA netdevs to allow
supporting functions not handled by the generic interface code.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:34:57 -04:00
Erez Shitrit 69956d8326 IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock
In order to avoid deadlock between sysfs functions (like create/delete
child) and remove_one (both of them are using the sysfs lock and
rtnl_lock) the driver will use a state mutex for sync.

That will fix traces as the following:
schedule+0x3e/0x90
kernfs_drain+0x75/0xf0
? wait_woken+0x90/0x90
__kernfs_remove+0x12e/0x1c0
kernfs_remove+0x25/0x40
sysfs_remove_dir+0x57/0x90
kobject_del+0x22/0x60
device_del+0x195/0x230
 pm_runtime_set_memalloc_noio+0xac/0xf0
netdev_unregister_kobject+0x71/0x80
rollback_registered_many+0x205/0x2f0
rollback_registered+0x31/0x40
unregister_netdevice_queue+0x58/0xb0
unregister_netdev+0x20/0x30
ipoib_remove_one+0xb7/0x240 [ib_ipoib]
ib_unregister_device+0xbc/0x1b0 [ib_core]
ib_unregister_mad_agent+0x29/0x30 [ib_core]
mlx4_ib_remove+0x67/0x280 [mlx4_ib]
INFO: task echo:24082 blocked for more than 120 seconds.
Tainted: G           OE   4.1.12-37.5.1.el6uek.x86_64 #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
Call Trace:
schedule+0x3e/0x90
schedule_preempt_disabled+0xe/0x10
__mutex_lock_slowpath+0x95/0x110
? _rcu_barrier+0x177/0x220
mutex_lock+0x23/0x40
rtnl_lock+0x15/0x20
netdev_run_todo+0x81/0x1f0
rtnl_unlock+0xe/0x10
ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib]
delete_child+0x69/0x80 [ib_ipoib]
dev_attr_store+0x20/0x30
sysfs_kf_write+0x41/0x50

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:31:08 -04:00
Leon Romanovsky dcc9881e67 RDMA/(core, ulp): Convert register/unregister event handler to be void
The functions ib_register_event_handler() and
ib_unregister_event_handler() always returned success and they can't fail.

Let's convert those functions to be void, remove redundant checks and
cleanup tons of goto statements.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:10 -04:00
Doug Ledford b0e32e20e3 Merge branch 'k.o/for-4.13-rc' into k.o/for-next
Merging our (hopefully) final -rc pull branch into our for-next branch
because some of our pending patches won't apply cleanly without having
the -rc patches in our tree.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:12:04 -04:00
Doug Ledford d0d62c34fb Merge branch 'rdma-netlink' into k.o/merge-test
Conflicts:
	include/rdma/ib_verbs.h - Modified a function signature adjacent
	to a newly added function signature from a previous merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:34:18 -04:00
Doug Ledford 320438301b Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-test
Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Both add new code
	include/rdma/ib_verbs.h - Both add new code

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:31:29 -04:00
Leon Romanovsky 9abb0d1bbd RDMA: Simplify get firmware interface
There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.

The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.

The introduction of this define allows us to remove the string size
from get_fw_str function signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:10 +03:00
Dasaratharaman Chandramouli 7e93e2cb83 IB/IPoIB: Increase local_lid to 32 bits
IPoIB contains local_lid field which is 16 bits in
length, increase it to 32 bits.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:47:18 -04:00
Dasaratharaman Chandramouli 4c4736905e IB/srpt: Increase lid and sm_lid to 32 bits
srpt contains lid and sm_lid fields which are 16 bits in
length, increase them to 32 bits.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:47:18 -04:00
Doug Ledford a5f66725c7 Merge branch 'misc' into k.o/for-next 2017-07-27 09:00:38 -04:00
Leon Romanovsky e1267b0124 RDMA: Remove useless MODULE_VERSION
All modules in drivers/infiniband defined and used MODULE_VERSION, which
was pointless because the kernel version describes their state more accurate
then those arbitrary numbers.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimbrg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Doug Ledford 03da084ed8 Merge branch 'hfi1' into k.o/for-4.14 2017-07-24 08:33:43 -04:00
Erez Shitrit 5dc78ad190 IB/ipoib: Notify on modify QP failure only when relevant
Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.

The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 10:52:00 +03:00
Leon Romanovsky 1b355094b3 IB/ipoib: Remove double pointer assigning
There is no need to assign "p" pointer twice.

This patch fixes the following smatch warning:
drivers/infiniband/ulp/ipoib/ipoib_cm.c:517 ipoib_cm_rx_handler() warn:
	missing break? reassigning 'p->id'

Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-07-23 10:00:15 +03:00
Leon Romanovsky dc892e17bb IB/ipoib: Clean error paths in add port
Refactor error paths in ipoib_add_port() function. The code flow
ensures that the function terminates on every error flow and it makes
redundant all "else" cases.

The functions are called during the flow are returning "result < 0", in
case of error, so there is no need to check it explicitly.

Fixes: 58e9cc90cd ("IB/IPoIB: Fix bad error flow in ipoib_add_port()")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-07-23 09:45:11 +03:00
Feras Daoud eb54714ddc IB/ipoib: Add get statistics support to SRIOV VF
Add SRIOV VF support to get traffic statistics.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 09:45:11 +03:00
Alex Vesker 4829d964df IB/ipoib: Add multicast packets statistics
Update the multicast counter when multicast packets are received and
provide this information through ethtool support.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 09:45:11 +03:00
Feras Daoud d2e46fccc3 IB/ipoib: Set IPOIB_NEIGH_TBL_FLUSH after flushed completion initialization
Set IPOIB_NEIGH_TBL_FLUSH bit after initializing the neighbor
flushed completion, otherwise the garbage collector may signal
a completion while it is not initialized yet.

Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 09:45:11 +03:00
Alex Vesker 11f74b4035 IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp
Don't allow negative values to max_nonsrq_conn_qp. There is no functional
impact on a negative value but it is logicically incorrect.

Fixes: 68e995a295 ("IPoIB/cm: Add connected mode support for devices without SRQs")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 09:45:11 +03:00
Erez Shitrit a08e112062 IB/ipoib: Make sure no in-flight joins while leaving that mcast
While cleaning neighs and there is a send-only mcast neigh, the driver
should wait to finish its join process before trying to remove it.

Without this patch, we will see messages like: "ipoib_mcast_leave on an
in-flight join" and unexpected results in the join_complete.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 09:45:11 +03:00
Erez Shitrit 6bdc8de2e8 IB/ipoib: Use cancel_delayed_work_sync when needed
The work mcast_task can re-queue itself, so instead of doing
cancel && flush_workqueue, that still can leave a queued task
on the air, use cancel_delayed_work_sync.

Also, no need to use lock over the cancel, the original lock was
due to bit assignment setting (IPOIB_MCAST_RUN) that is not in use
anymore.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 09:45:11 +03:00
Feras Daoud edf3f301db IB/ipoib: Fix race between light events and interface restart
A potential race between light_event and interface restart
may attach multicast group to an already attached QP.

Scenario:
light_event flow goes through ipoib_mcast_dev_flush function,
if a context switch occurs before calling ipoib_mcast_remove_list,
then we may face a situation where the broadcast of the priv is null
and the corresponding QP is not detached yet.
If an "interface restart" runs during the previous context switch,
the following scenario occurs:
When the device goes up, ipoib_ib_dev_up function will be called,
it will send a new registration request to the broadcast group and then
attach the group to the QP that was not detached before.

     IPOIB_FLUSH_LIGHT                                          INTERFACE RESTART

    __ipoib_ib_dev_flush                                                |
        |                                                               |
        |                                                               |
        |                                                               |
    ipoib_mcast_dev_flush                                               |
    Move mcast list and broadcast to remove_list                        |
        |                                                               |
        |                                                               |
    Context Switch-->                                                   |
        |                                                       ipoib_ib_dev_down
        |                                                               |
        |                                                               |
        |                                                       ipoib_ib_dev_up
        |                                                               |
        |                                                               |
        |                                                       ipoib_mcast_join_task
        |                                                       allocate new broadcast
        |                                                               |
        |                                                               |
        |                                                       Attach QP to multicast group
        |                                                               |
        |                                                               |
        |                                                       <--Context Switch
    ipoib_mcast_leave
    Detach QP from multicast group

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-07-23 09:45:11 +03:00
Sagi Grimberg e6e52aec49 RDMA/iser: don't send an rkey if all data is written as immadiate-data
We might get some bogus error completions in case the target will
remotely invalidate the rkey and the HCA will need to retransmit
from this buffer.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Dan Carpenter 5c8857b653 IB/IPoIB: Fix error code in ipoib_add_port()
We accidentally don't see the error code on some of these error paths.
It means we return ERR_PTR(0) which is NULL and it results in a NULL
dereference in the caller.

This bug dates to pre-git days.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Erez Shitrit b6c871e587 IB/ipoib: Let lower driver handle get_stats64 call
The driver checks if the lower level driver supports get_stats, and if
so calls it to get the updated statistics, otherwise takes from the
current netdevice stats object.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:26 -04:00
Leon Romanovsky d83187dda9 IB/IPoIB: Convert IPoIB to memalloc_noio_* calls
Commit 21caf2fc19 ("mm: teach mm by current context info to not do I/O
during memory allocation") added the memalloc_noio_(save|restore) functions
to enable people to modify the MM behavior by disabling I/O during memory
allocation. This was further extended in Fixes: 934f3072c1 ("mm: clear
__GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent
allocation paths recursing back into the filesystem without explicitly
changing the flags for every allocation site.

However the IPoIB hasn't been keeping up with the changes and missed
completely these memalloc_noio_* calls. This led to update of
allocation site with special QP creation flag, see commit 09b93088d7
("IB: Add a QP creation flag to use GFP_NOIO allocations"), while this
flag is supported by small number of drivers in IB stack.

Let's change it by updating to memalloc_noio_* calls and allow
for every driver underneath enjoy NOIO allocations.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:23 -04:00
Erez Shitrit ed7b521d8a IB/IPoIB: Forward MTU change to driver below
This patch checks if there is a driver below that
needs to be updated on the new MTU and calls it
accordingly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:22 -04:00
Leon Romanovsky 98e77d9fd7 IB: Convert msleep below 20ms to usleep_range
The msleep(1) may do not sleep 1 ms as expected
and will sleep longer. The simple conversion from
msleep to usleep_range between 1ms and 2ms can solve an
issue.

The full and comprehensive explanation can be found at [1] and [2].

[1] https://lkml.org/lkml/2007/8/3/250
[2] Documentation/timers/timers-howto.txt

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:22 -04:00
Vladimir Neyelov c8c16d3bae IB/iser: Fix connection teardown race condition
Under heavy iser target(scst) start/stop stress during login/logout
on iser intitiator side happened trace call provided below.

The function iscsi_iser_slave_alloc iser_conn pointer could be NULL,
due to the fact that function iscsi_iser_conn_stop can be called before
and free iser connection. Let's protect that flow by introducing global mutex.

BUG: unable to handle kernel paging request at 0000000000001018
IP: [<ffffffffc0426f7e>] iscsi_iser_slave_alloc+0x1e/0x50 [ib_iser]
Call Trace:
? scsi_alloc_sdev+0x242/0x300
scsi_probe_and_add_lun+0x9e1/0xea0
? kfree_const+0x21/0x30
? kobject_set_name_vargs+0x76/0x90
? __pm_runtime_resume+0x5b/0x70
__scsi_scan_target+0xf6/0x250
scsi_scan_target+0xea/0x100
iscsi_user_scan_session.part.13+0x101/0x130 [scsi_transport_iscsi]
? iscsi_user_scan_session.part.13+0x130/0x130 [scsi_transport_iscsi]
iscsi_user_scan_session+0x1e/0x30 [scsi_transport_iscsi]
device_for_each_child+0x50/0x90
iscsi_user_scan+0x44/0x60 [scsi_transport_iscsi]
store_scan+0xa8/0x100
? common_file_perm+0x5d/0x1c0
dev_attr_store+0x18/0x30
sysfs_kf_write+0x37/0x40
kernfs_fop_write+0x12c/0x1c0
__vfs_write+0x18/0x40
vfs_write+0xb5/0x1a0
SyS_write+0x55/0xc0

Fixes: 318d311e8f ("iser: Accept arbitrary sg lists mapping if the device supports it")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Vladimir Neyelov <vladimirn@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 11:45:25 -04:00
Doug Ledford 3d886aa3be Linux v4.13-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZapWhAAoJEHm+PkMAQRiGKb0IAJM6b7SbWaw69Og7+qiFB+zZ
 xp29iXqbE9fPISC6a5BRQV1ONjeDM6opGixGHqGC8Hla6k2IYz25VDNoF8wd0MXN
 cz/Ih20vd3C5afxXGe5cTT8lsPAlV0mWXxForlu6j8jPeL62FPfq6RhEkw7AcrYL
 yfYy3k3qSdOrrvBdII0WAAUi46UfIs+we9BQgbsMbkHOiqV2K0MOrzKE84Xbgepq
 RAy2xg6P4b4+hTx8xTrYc1MXwpnqjRc0oJ08gdmiwW3AOOU7LxYFn7zDkLPWi9Rr
 g4x6r4YhBTGxT4wNvovLIiqd9QFs//dMCuPWYwEtTICG48umIqqq24beQ0mvCdg=
 =08Ic
 -----END PGP SIGNATURE-----

Merge tag 'v4.13-rc1' into k.o/for-4.13-rc

Linux v4.13-rc1
2017-07-17 11:26:58 -04:00
Linus Torvalds 48ea2cedde Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "It's been usually busy for summer, with most of the efforts centered
  around TCMU developments and various target-core + fabric driver bug
  fixing activities. Not particularly large in terms of LoC, but lots of
  smaller patches from many different folks.

  The highlights include:

   - ibmvscsis logical partition manager support (Michael Cyr + Bryant
     Ly)

   - Convert target/iblock WRITE_SAME to blkdev_issue_zeroout (hch +
     nab)

   - Add support for TMR percpu LUN reference counting (nab)

   - Fix a potential deadlock between EXTENDED_COPY and iscsi shutdown
     (Bart)

   - Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce (Jiang Yi)

   - Fix TMCU module removal (Xiubo Li)

   - Fix iser-target OOPs during login failure (Andrea Righi + Sagi)

   - Breakup target-core free_device backend driver callback (mnc)

   - Perform TCMU add/delete/reconfig synchronously (mnc)

   - Fix TCMU multiple UIO open/close sequences (mnc)

   - Fix TCMU CHECK_CONDITION sense handling (mnc)

   - Fix target-core SAM_STAT_BUSY + TASK_SET_FULL handling (mnc + nab)

   - Introduce TYPE_ZBC support in PSCSI (Damien Le Moal)

   - Fix possible TCMU memory leak + OOPs when recalculating cmd base
     size (Xiubo Li + Bryant Ly + Damien Le Moal + mnc)

   - Add login_keys_workaround attribute for non RFC initiators (Robert
     LeBlanc + Arun Easi + nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (68 commits)
  iscsi-target: Add login_keys_workaround attribute for non RFC initiators
  Revert "qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT"
  tcmu: clean up the code and with one small fix
  tcmu: Fix possbile memory leak / OOPs when recalculating cmd base size
  target: export lio pgr/alua support as device attr
  target: Fix return sense reason in target_scsi3_emulate_pr_out
  target: Fix cmd size for PR-OUT in passthrough_parse_cdb
  tcmu: Fix dev_config_store
  target: pscsi: Introduce TYPE_ZBC support
  target: Use macro for WRITE_VERIFY_32 operation codes
  target: fix SAM_STAT_BUSY/TASK_SET_FULL handling
  target: remove transport_complete
  pscsi: finish cmd processing from pscsi_req_done
  tcmu: fix sense handling during completion
  target: add helper to copy sense to se_cmd buffer
  target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE
  target: make device_mutex and device_list static
  tcmu: Fix flushing cmd entry dcache page
  tcmu: fix multiple uio open/close sequences
  tcmu: drop configured check in destroy
  ...
2017-07-13 14:27:32 -07:00
Mike Marciniszyn e7d80c8304 IB/iser: Handle lack of memory management extentions correctly
max_fast_reg_page_list_len is only valid when the
memory management extentions are signaled by the underlying
driver.

Fix by adjusting iser_calc_scsi_params() to use
ISCSI_ISER_MAX_SG_TABLESIZE when the extentions are not indicated.

Reported-by: Thomas Rosenstein <thomas.rosenstein@creamfinance.com>
Fixes: Commit df749cdc45 ("IB/iser: Support up to 8MB data transfer in a single command")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Thomas Rosenstein <thomas.rosenstein@creamfinance.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-11 10:47:16 -04:00
Nicholas Bellinger fce50a2fa4 iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done
This patch fixes a NULL pointer dereference in isert_login_recv_done()
of isert_conn->cm_id due to isert_cma_handler() -> isert_connect_error()
resetting isert_conn->cm_id = NULL during a failed login attempt.

As per Sagi, we will always see the completion of all recv wrs posted
on the qp (given that we assigned a ->done handler), this is a FLUSH
error completion, we just don't get to verify that because we deref
NULL before.

The issue here, was the assumption that dereferencing the connection
cm_id is always safe, which is not true since:

    commit 4a579da258
    Author: Sagi Grimberg <sagig@mellanox.com>
    Date:   Sun Mar 29 15:52:04 2015 +0300

         iser-target: Fix possible deadlock in RDMA_CM connection error

As I see it, we have a direct reference to the isert_device from
isert_conn which is the one-liner fix that we actually need like
we do in isert_rdma_read_done() and isert_rdma_write_done().

Reported-by: Andrea Righi <righi.andrea@gmail.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:35 -07:00
Bart Van Assche 13fdd4458e IB/srpt: Make a debug statement in srpt_abort_cmd() more informative
Do not only report the state of the I/O context before srpt_abort_cmd()
was called but also the new state assigned by srpt_abort_cmd()

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:58:01 -07:00
Linus Torvalds 9871ab22f2 Fixes #3 for 4.12-rc
- 2 Fixes for OPA found by debug kernel
 - 1 Fix for user supplied input causing kernel problems
 - 1 Fix for the IPoIB fixes submitted around -rc4
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZXVfPAAoJELgmozMOVy/dRRcP/AiN4wyEQ897se1fKXAktL1g
 a17tiSkK2MukAVHbM++9Ea/YXK66e2s7Ls8Pd230E85N3V48rSUhWZUIUQLOm+gS
 b98z53uNs6KkdBCezXABsHIi4PB6u1CfzaFaUfN5WI3ymAgsYqpQWMtNyO6GNe/R
 Dur3vDieXPNJ2x+F1jiNxHFBXLKofCG0y1FX88zqsQI5vVVq7ASKgaaSX3T1emQY
 18l4Dd7pesrWj4QD9jaqQiYkruF5VC1NE8/he8Zzy6XjSgnUZZfjbjuMptbW4y3y
 Tvvd5bjMAkJhCbK1mhe1dZHPlYJhAguUBZfThjVSKtiMGwRhGA4SYkRtek3nZOga
 /OLhERgj0VomHx7o+Pwp74DWnsSv08EMoc4hXKHZPPyxok83r9czejqm7mC2VbGd
 Sa8LmVeLQp79e9MbGAj+PbNRHf9CE9dnLeFUmbj+qptXUVGvT8j9U1a9iTjTz0+2
 NX/O4iWjtnt/CIkH9dhN9aWolswbmO2jSmmzb/x2EuCLv94GNtTyZLSifvxSYMnN
 IWO86aGQmuUkWJ3RI/5tzq+gVzI6bdKB9hG5DOPWN/uJVF9nWkq3c69Bv9djvUoM
 xi/rI0grxTqYHelRx3ja4ZqaI43R6YwL928XdtZJKQ/uNanq65Lyd6KKz3W7hT0l
 emCoqb2MjuzsNWIPkSgg
 =JEor
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma update from Doug Ledford:
 "This includes two bugs against the newly added opa vnic that were
  found by turning on the debug kernel options:

   - sleeping while holding a lock, so a one line fix where they
     switched it from GFP_KERNEL allocation to a GFP_ATOMIC allocation

   - a case where they had an isolated caller of their code that could
     call them in an atomic context so they had to switch their use of a
     mutex to a spinlock to be safe, so this was considerably more lines
     of diff because all uses of that lock had to be switched

  In addition, the bug that was discussed with you already about an out
  of bounds array access in ib_uverbs_modify_qp and ib_uverbs_create_ah
  and is only seven lines of diff.

  And finally, one fix to an earlier fix in the -rc cycle that broke
  hfi1 and qib in regards to IPoIB (this one is, unfortunately, larger
  than I would like for a -rc7 submission, but fixing the problem
  required that we not treat all devices as though they had allocated a
  netdev universally because it isn't true, and it took 70 lines of diff
  to resolve the issue, but the final patch has been vetted by Intel and
  Mellanox and they've both given their approval to the fix).

  Summary:

   - Two fixes for OPA found by debug kernel
   - Fix for user supplied input causing kernel problems
   - Fix for the IPoIB fixes submitted around -rc4"

[ Doug sent this having not noticed the 4.12 release, so I guess I'll be
  getting another rdma pull request with the actuakl merge window
  updates and not just fixes.

  Oh well - it would have been nice if this small update had been the
  merge window one.     - Linus ]

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev
  RDMA/uverbs: Check port number supplied by user verbs cmds
  IB/opa_vnic: Use spinlock instead of mutex for stats_lock
  IB/opa_vnic: Use GFP_ATOMIC while sending trap
2017-07-06 11:45:08 -07:00
Niranjana Vishwanathapura 8e95960199 IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev
IPOIB is calling free_rdma_netdev even though alloc_rdma_netdev has
returned -EOPNOTSUPP.
Move free_rdma_netdev from ib_device structure to rdma_netdev structure
thus ensuring proper cleanup function is called for the rdma net device.

Fix the following trace:

ib0: Failed to modify QP to ERROR state
BUG: unable to handle kernel paging request at 0000000000001d20
IP: hfi1_vnic_free_rn+0x26/0xb0 [hfi1]
Call Trace:
 ipoib_remove_one+0xbe/0x160 [ib_ipoib]
 ib_unregister_device+0xd0/0x170 [ib_core]
 rvt_unregister_device+0x29/0x90 [rdmavt]
 hfi1_unregister_ib_device+0x1a/0x100 [hfi1]
 remove_one+0x4b/0x220 [hfi1]
 pci_device_remove+0x39/0xc0
 device_release_driver_internal+0x141/0x200
 driver_detach+0x3f/0x80
 bus_remove_driver+0x55/0xd0
 driver_unregister+0x2c/0x50
 pci_unregister_driver+0x2a/0xa0
 hfi1_mod_cleanup+0x10/0xf65 [hfi1]
 SyS_delete_module+0x171/0x250
 do_syscall_64+0x67/0x150
 entry_SYSCALL64_slow_path+0x25/0x25

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-05 17:11:00 -04:00
Vishwanathapura, Niranjana a379d69f00 IB/opa_vnic: Use spinlock instead of mutex for stats_lock
Stats can be read from atomic context, hence make stats_lock as
a spinlock.

Fix the following trace with debug kernel.

BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:238
in_atomic(): 1, irqs_disabled(): 0, pid: 6487, name: sadc
Call Trace:
 dump_stack+0x63/0x90
 ___might_sleep+0xda/0x130
 __might_sleep+0x4a/0x90
 mutex_lock+0x20/0x50
 opa_vnic_get_stats64+0x56/0x140 [opa_vnic]
 dev_get_stats+0x74/0x130
 dev_seq_printf_stats+0x37/0x120
 dev_seq_show+0x14/0x30
 seq_read+0x26d/0x3d0

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-29 12:43:52 -04:00
Vishwanathapura, Niranjana 0568c4640e IB/opa_vnic: Use GFP_ATOMIC while sending trap
Pass GFP_ATOMIC flag to ib_create_send_mad() while sending trap as it
can be triggered from the atomic context.

Fix the following trace with debug kernel.

BUG: sleeping function called from invalid context at mm/slab.h:432
in_atomic(): 1, irqs_disabled(): 0, pid: 1771, name: NetworkManager
Call Trace:
 dump_stack+0x63/0x90
 ___might_sleep+0xda/0x130
 __might_sleep+0x4a/0x90
 __kmalloc+0x19e/0x220
 ? ib_create_send_mad+0xea/0x390 [ib_core]
 ib_create_send_mad+0xea/0x390 [ib_core]
 opa_vnic_vema_send_trap+0x17b/0x460 [opa_vnic]
 opa_vnic_vema_report_event+0x57/0x80 [opa_vnic]
 opa_vnic_mac_send_event+0xaa/0xf0 [opa_vnic]
 opa_vnic_set_rx_mode+0x17/0x30 [opa_vnic]
 __dev_set_rx_mode+0x52/0x90
 dev_set_rx_mode+0x26/0x40
 __dev_open+0xe8/0x140

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-29 12:43:52 -04:00
Vishwanathapura, Niranjana cb49366f36 IB/core,rdmavt,hfi1,opa-vnic: Send OPA cap_mask3 in trap
Provide the ability for IB clients to modify the OPA specific
capability mask and include this mask in the subsequent trap data.

Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Michael N. Henry <michael.n.henry@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Matthias Schiffer ad744b223c net: add netlink_ext_ack argument to rtnl_link_ops.changelink
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:22 -04:00
Matthias Schiffer 7a3f4a1851 net: add netlink_ext_ack argument to rtnl_link_ops.newlink
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:21 -04:00
David S. Miller 3d09198243 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 17:35:22 -04:00
Johannes Berg d58ff35122 networking: make skb_push & __skb_push return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:40 -04:00
Feras Daoud 4542d66bb2 IB/ipoib: Fix memory leak in create child syscall
The flow of creating a new child goes through ipoib_vlan_add
which allocates a new interface and checks the rtnl_lock.

If the lock is taken, restart_syscall will be called to restart
the system call again. In this case we are not releasing the
already allocated interface, causing a leak.

Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 15:16:23 -04:00
Alex Vesker 560b7c3ffe IB/ipoib: Fix access to un-initialized napi struct
There is no need to re-enable napi since we set the initialized
flag before calling ipoib_ib_dev_stop which will disable napi,
disabling napi twice is harmless in case it was already disabled.

One more reason for this fix is that when using IPoIB new device
driver napi is not added to priv, this can lead to kernel panic
when rn_ops ndo_open fails.

[ 289.755840] invalid opcode: 0000 [#1] SMP
[ 289.757111] task: ffff880036964440 ti: ffff880178ee8000 task.ti: ffff880178ee8000
[ 289.757111] RIP: 0010:[<ffffffffa05368d6>] [<ffffffffa05368d6>] napi_enable.part.24+0x4/0x6 [ib_ipoib]
[ 289.757111] RSP: 0018:ffff880178eeb6d8 EFLAGS: 00010246
[ 289.757111] RAX: 0000000000000000 RBX: ffff880177a80010 RCX: 000000007fffffff
[ 289.757111] RDX: ffffffff81d5f118 RSI: 0000000000000000 RDI: ffff880177a80010
[ 289.757111] RBP: ffff880178eeb6d8 R08: 0000000000000082 R09: 0000000000000283
[ 289.757111] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175a00000
[ 289.757111] R13: ffff880177a80080 R14: 0000000000000000 R15: 0000000000000001
[ 289.757111] FS: 00007fe2ee346880(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000
[ 289.757111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 289.757111] CR2: 00007fffca979020 CR3: 00000001792e4000 CR4: 00000000000006f0
[ 289.757111] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 289.757111] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 289.757111] Stack:
[ 289.796027] ffff880178eeb6f0 ffffffffa05251f5 ffff880177a80000 ffff880178eeb718
[ 289.796027] ffffffffa0528505 ffff880175a00000 ffff880177a80000 0000000000000000
[ 289.796027] ffff880178eeb748 ffffffffa051f0ab ffff880175a00000 ffffffffa0537d60
[ 289.796027] Call Trace:
[ 289.796027] [<ffffffffa05251f5>] napi_enable+0x25/0x30 [ib_ipoib]
[ 289.796027] [<ffffffffa0528505>] ipoib_ib_dev_open+0x175/0x190 [ib_ipoib]
[ 289.796027] [<ffffffffa051f0ab>] ipoib_open+0x4b/0x160 [ib_ipoib]
[ 289.796027] [<ffffffff814fe33f>] _dev_open+0xbf/0x130
[ 289.796027] [<ffffffff814fe62d>] __dev_change_flags+0x9d/0x170
[ 289.796027] [<ffffffff814fe729>] dev_change_flags+0x29/0x60
[ 289.796027] [<ffffffff8150caf7>] do_setlink+0x397/0xa40

Fixes: cd565b4b51 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 15:16:23 -04:00
Alex Vesker b53d4566cc IB/ipoib: Delete napi in device uninit default
This patch mekas init_default and uninit_default symmetric
with a call to delete napi. Additionally, the uninit_default
gained delete napi call in case of init_default fails.

Fixes: 515ed4f3aa ('IB/IPoIB: Separate control and data related initializations')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 15:16:23 -04:00
Alex Vesker 022d038a16 IB/ipoib: Limit call to free rdma_netdev for capable devices
Limit calls to free_rdma_netdev() for capable devices only.

Fixes: cd565b4b51 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 15:16:23 -04:00
Alex Vesker ab156afd3e IB/ipoib: Fix memory leaks for child interfaces priv
There is a need to free priv explicitly and not just to release
the device, child priv is freed explicitly on remove flow and this
patch also includes priv free on error flow in P_key creation
and also in add_port.

Fixes: cd565b4b51 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 15:16:23 -04:00
Majd Dibbiny d3957b86a4 RDMA/SA: Fix kernel panic in CMA request handler flow
Commit 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB and
ROCE specific fields) moved the service_id to be specific attribute
for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.

This caused to the following kernel panic in the CMA request handler flow:

[   27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   27.074731] IP: __radix_tree_lookup+0x1d/0xe0
...
[   27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
[   27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
[   27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
...
[   27.075979] Call Trace:
[   27.076015]  radix_tree_lookup+0xd/0x10
[   27.076055]  cma_ps_find+0x59/0x70 [rdma_cm]
[   27.076097]  cma_id_from_event+0xd2/0x470 [rdma_cm]
[   27.076144]  ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
[   27.076193]  cma_req_handler+0x25/0x480 [rdma_cm]
[   27.076237]  cm_process_work+0x25/0x120 [ib_cm]
[   27.076280]  ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
[   27.076350]  cm_req_handler+0xb03/0xd40 [ib_cm]
[   27.076430]  ? sched_clock_cpu+0x11/0xb0
[   27.076478]  cm_work_handler+0x194/0x1588 [ib_cm]
[   27.076525]  process_one_work+0x160/0x410
[   27.076565]  worker_thread+0x137/0x4a0
[   27.076614]  kthread+0x112/0x150
[   27.076684]  ? max_active_store+0x60/0x60
[   27.077642]  ? kthread_park+0x90/0x90
[   27.078530]  ret_from_fork+0x2c/0x40

This patch moves it back to the common SA Path Record structure
and removes the redundant setter and getter.

Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.

Fixes: 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB ands
	ROCE specific fields)
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:20:14 -04:00
Israel Rukshin 95c2ef50c7 RDMA/srp: Fix NULL deref at srp_destroy_qp()
If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
may be NULL.
Calling directly to ib_destroy_qp() is sufficient because
no work requests were posted on the created qp.

Fixes: 9294000d6d ("IB/srp: Drain the send queue before destroying a QP")
Cc: <stable@vger.kernel.org>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:20:10 -04:00
Leon Romanovsky 0a1a972630 RDMA/IPoIB: Limit the ipoib_dev_uninit_default scope
ipoib_dev_uninit_default() call is used in ipoib_main.c file only
and it generates the following warning from smatch tool:
	drivers/infiniband/ulp/ipoib/ipoib_main.c:1593:6: warning:
	symbol 'ipoib_dev_uninit_default' was not declared. Should it
	be static?

so let's declare that function as static.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:20:10 -04:00
Honggang Li 8c490669de RDMA/IPoIB: Replace netdev_priv with ipoib_priv for ipoib_get_link_ksettings
ipoib_dev_init accesses the wrong private data for the IPoIB device.
Commit cd565b4b51 (IB/IPoIB: Support acceleration options callbacks)
changed ipoib_priv from being identical to netdev_priv to being an
area inside of, but not the same pointer as, the netdev_priv pointer.
As such, the struct we want is the ipoib_priv area, not the netdev_priv
area, so use the right accessor, otherwise we kernel panic.

[   27.271938] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8006: link becomes ready
[   28.156790] BUG: unable to handle kernel NULL pointer dereference at 000000000000067c
[   28.166309] IP: ib_query_port+0x30/0x180 [ib_core]
...
[   28.306282] RIP: 0010:ib_query_port+0x30/0x180 [ib_core]
...
[   28.393337] Call Trace:
[   28.397594]  ipoib_get_link_ksettings+0x66/0xe0 [ib_ipoib]
[   28.405274]  __ethtool_get_link_ksettings+0xa0/0x1c0
[   28.412353]  speed_show+0x74/0xa0
[   28.417503]  dev_attr_show+0x20/0x50
[   28.422922]  ? mutex_lock+0x12/0x40
[   28.428179]  sysfs_kf_seq_show+0xbf/0x1a0
[   28.434002]  kernfs_seq_show+0x21/0x30
[   28.439470]  seq_read+0x116/0x3b0
[   28.444445]  ? do_filp_open+0xa5/0x100
[   28.449774]  kernfs_fop_read+0xff/0x180
[   28.455220]  __vfs_read+0x37/0x150
[   28.460167]  ? security_file_permission+0x9d/0xc0
[   28.466560]  vfs_read+0x8c/0x130
[   28.471318]  SyS_read+0x55/0xc0
[   28.475950]  do_syscall_64+0x67/0x150
[   28.481163]  entry_SYSCALL64_slow_path+0x25/0x25
...
[   28.584493] ---[ end trace 3549968a4bf0aa5d ]---

Fixes: cd565b4b51 (IB/IPoIB: Support acceleration options callbacks)
Fixes: 0d7e2d2166 (IB/ipoib: add get_link_ksettings in ethtool)
Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:20:09 -04:00
Linus Torvalds bd1286f964 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things were a lot more calm than previously expected. It's primarily
  fixes in various areas, with most of the new functionality centering
  around TCMU backend driver work that Xiubo Li has been driving.

  Here's the summary on the feature side:

   - Make T10-PI verify configurable for emulated (FILEIO + RD) backends
    (Dmitry Monakhov)
   - Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic
    (Bryant Ly + MNC)
   - Add TCMU support for growing ring buffer size (Xiubo Li + MNC)
   - Add TCMU support for global block data pool (Xiubo Li + MNC)

  and on the bug-fix side:

   - Fix COMPARE_AND_WRITE non GOOD status handling for READ phase
    failures (Gary Guo + nab)
   - Fix iscsi-target hang with explicitly changing per NodeACL
    CmdSN number depth with concurrent login driven session
    reinstatement.  (Gary Guo + nab)
   - Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly)
   - Fix target-core/FILEIO zero length handling (Bart Van Assche)

  Also, there was an OOPs introduced with the WRITE_VERIFY changes that
  I ended up reverting at the last minute, because as not unusual Bart
  and I could not agree on the fix in time for -rc1. Since it's specific
  to a conformance test, it's been reverted for now.

  There is a separate patch in the queue to address the underlying
  control CDB write overflow regression in >= v4.3 separate from the
  WRITE_VERIFY revert here, that will be pushed post -rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (30 commits)
  Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
  IB/srpt: Avoid that aborting a command triggers a kernel warning
  IB/srpt: Fix abort handling
  target/fileio: Fix zero-length READ and WRITE handling
  ibmvscsis: Do not send aborted task response
  tcmu: fix module removal due to stuck thread
  target: Don't force session reset if queue_depth does not change
  iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
  target: Fix compare_and_write_callback handling for non GOOD status
  tcmu: Recalculate the tcmu_cmd size to save cmd area memories
  tcmu: Add global data block pool support
  tcmu: Add dynamic growing data area feature support
  target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
  target: fixup error message in target_tg_pt_gp_alua_access_type_store()
  target/user: PGR Support
  target: Add WRITE_VERIFY_16
  Documentation/target: add an example script to configure an iSCSI target
  target: Use kmalloc_array() in transport_kmap_data_sg()
  target: Use kmalloc_array() in compare_and_write_callback()
  target: Improve size determinations in two functions
  ...
2017-05-12 11:44:13 -07:00
Bart Van Assche bd2c52d733 IB/srpt: Avoid that aborting a command triggers a kernel warning
Avoid that the following warning is triggered:

WARNING: CPU: 10 PID: 166 at ../drivers/infiniband/ulp/srpt/ib_srpt.c:2674 srpt_release_cmd+0x139/0x140 [ib_srpt]
CPU: 10 PID: 166 Comm: kworker/u24:8 Not tainted 4.9.4-1-default #1
Workqueue: tmr-fileio target_tmr_work [target_core_mod]
Call Trace:
 [<ffffffffaa3c4f70>] dump_stack+0x63/0x83
 [<ffffffffaa0844eb>] __warn+0xcb/0xf0
 [<ffffffffaa0845dd>] warn_slowpath_null+0x1d/0x20
 [<ffffffffc06ba429>] srpt_release_cmd+0x139/0x140 [ib_srpt]
 [<ffffffffc06e4377>] target_release_cmd_kref+0xb7/0x120 [target_core_mod]
 [<ffffffffc06e4d7f>] target_put_sess_cmd+0x2f/0x60 [target_core_mod]
 [<ffffffffc06e15e0>] core_tmr_lun_reset+0x340/0x790 [target_core_mod]
 [<ffffffffc06e4816>] target_tmr_work+0xe6/0x140 [target_core_mod]
 [<ffffffffaa09e4d3>] process_one_work+0x1f3/0x4d0
 [<ffffffffaa09e7f8>] worker_thread+0x48/0x4e0
 [<ffffffffaa09e7b0>] ? process_one_work+0x4d0/0x4d0
 [<ffffffffaa0a46da>] kthread+0xca/0xe0
 [<ffffffffaa0a4610>] ? kthread_park+0x60/0x60
 [<ffffffffaa71b775>] ret_from_fork+0x25/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-07 16:11:26 -07:00
Bart Van Assche 55d694275f IB/srpt: Fix abort handling
Let the target core check the CMD_T_ABORTED flag instead of the SRP
target driver. Hence remove the transport_check_aborted_status()
call. Since state == SRPT_STATE_CMD_RSP_SENT is something that really
should not happen, do not try to recover if srpt_queue_response() is
called for an I/O context that is in that state. This patch is a bug
fix because the srpt_abort_cmd() call is misplaced - if that function
is called from srpt_queue_response() it should either be called
before the command state is changed or after the response has been
sent.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-07 16:11:09 -07:00
Zhu Yanjun 0d7e2d2166 IB/ipoib: add get_link_ksettings in ethtool
In order to let the bonding driver report the correct speed
of the underlaying interfaces, when they are IPoIB, the ethtool
function get_link_ksettings() in the IPoIB driver is implemented.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Suggested-by: HÃ¥kon Bugge <Haakon.Bugge@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:31:46 -04:00
Dasaratharaman Chandramouli 4c33bd1926 IB/SA: Add support to query OPA path records
When the bit 26 of capmask2 field in OPA classport info
query is set, SA will query for OPA path records instead
of querying for IB path records. Note that OPA
path records can only be queried by kernel ULPs.
Userspace clients continue to query IB path records.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:39:02 -04:00
Dasaratharaman Chandramouli 5752075144 IB/SA: Add OPA path record type
Add opa_sa_path_rec to sa_path_rec data structure.
The 'type' field in sa_path_rec identifies the
type of the path record.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:39:02 -04:00
Dasaratharaman Chandramouli 9fdca4da4d IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields
sa_path_rec now contains a union of sa_path_rec_ib and sa_path_rec_roce
based on the type of the path record. Note that fields applicable to
path record type ROCE v1 and ROCE v2 fall under sa_path_rec_roce.
Accessor functions are added to these fields so the caller doesn't have
to know the type.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:38:19 -04:00
Dasaratharaman Chandramouli c2f8fc4ec4 IB/SA: Rename ib_sa_path_rec to sa_path_rec
Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:37:28 -04:00
Dasaratharaman Chandramouli 44c58487d5 IB/core: Define 'ib' and 'roce' rdma_ah_attr types
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli d8966fcd4c IB/core: Use rdma_ah_attr accessor functions
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 3652315934 IB/core: Rename ib_destroy_ah to rdma_destroy_ah
Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the
rename of the ib address handle attribute

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 0a18cfe4f6 IB/core: Rename ib_create_ah to rdma_create_ah
Rename ib_create_ah to rdma_create_ah so its in sync with the
rename of the ib address handle attribute

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 90898850ec IB/core: Rename struct ib_ah_attr to rdma_ah_attr
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli cfd519358f IB/IPoIB: Remove 'else' when the 'if' has a return.
This patch fixes a checkpatch issue related to not having
to use an 'else' if the 'if' path returns from the function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli aa4656d9a4 IB/core: Move opa_class_port_info definition to header file
Both opa_vnic and the hfi driver use the same opa_classport_info
definition. We will also have ib_sa capable of querying opa class
port info and would need this definition. Move it to ib_mad.h
for everyone to use.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 18:10:05 -04:00
Dasaratharaman Chandramouli ee1c60b1bf IB/SA: Modify SA to implicitly cache Class Port info
SA will query and cache class port info as part of
its initialization. SA will also invalidate and
refresh the cache based on specific events. Callers such
as IPoIB and CM can query the SA to get the classportinfo
information. Apart from making the caller code much simpler,
this change puts the onus on the SA to query and maintain
classportinfo much like how it maitains the address handle to the SM.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 14:00:17 -04:00
Colin Ian King 05a24b9b7f IB/iser: fix spelling mistake: "unexepected" -> "unexpected"
trivial fix to spelling mistake in iser_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 14:06:49 -04:00
Feras Daoud 3e31a490e0 IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
Before calling ipoib_stop, rtnl_lock should be taken, then
the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
is set.

On the other hand, the flow of multicast join task initializes
a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
call returns EINVAL without setting the mcast completion and
leads to a deadlock.

    ipoib_stop                          |
        |                               |
    clear_bit(IPOIB_FLAG_ADMIN_UP)      |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join_task
        |                               |
        |                       spin_lock_irq(lock)
        |                               |
        |                       init_completion(mcast)
        |                               |
        |                       set_bit(IPOIB_MCAST_FLAG_BUSY)
        |                               |
        |                       Context Switch
        |                               |
    clear_bit(IPOIB_FLAG_OPER_UP)       |
        |                               |
    spin_lock_irqsave(lock)             |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join
        |                       return (-EINVAL)
        |                               |
        |                       spin_unlock_irq(lock)
        |                               |
        |                       Context Switch
        |                               |
    ipoib_mcast_dev_flush               |
    wait_for_completion(mcast)          |

ipoib_stop will wait for mcast completion for ever, and will
not release the rtnl_lock. As a result panic occurs with the
following trace:

    [13441.639268] Call Trace:
    [13441.640150]  [<ffffffff8168b579>] schedule+0x29/0x70
    [13441.641038]  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
    [13441.641914]  [<ffffffff810bc017>] ? complete+0x47/0x50
    [13441.642765]  [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
    [13441.643580]  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
    [13441.644434]  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
    [13441.645293]  [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
    [13441.646159]  [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
    [13441.647013]  [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]

Fixes: 08bc327629 ("IB/ipoib: fix for rare multicast join race condition")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 11:45:55 -04:00
Feras Daoud 9a9b811269 IB/ipoib: Update broadcast object if PKey value was changed in index 0
Update the broadcast address in the priv->broadcast object when the
Pkey value changes in index 0, otherwise the multicast GID value will
keep the previous value of the PKey, and will not be updated.
This leads to interface state down because the interface will keep the
old PKey value.

For example, in SR-IOV environment, if the PF changes the value of PKey
index 0 for one of the VFs, then the VF receives PKey change event that
triggers heavy flush. This flush calls update_parent_pkey that update the
broadcast object and its relevant members. If in this case the multicast
GID will not be updated, the interface state will be down.

Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 11:45:55 -04:00
Erez Shitrit cd565b4b51 IB/IPoIB: Support acceleration options callbacks
IPoIB driver now uses the new set of callback functions.

If the hardware provider supports the new ipoib_options implementation,
the driver uses the callbacks in its data path flows, otherwise it uses the
driver default implementation for all data flows in its code.

The default implementation wasn't change and it is exactly as it was before
introduction of acceleration support.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:44 -04:00
Erez Shitrit c1048aff7e IB/IPoIB: Use defined function for netdev_priv function
Make ipoib_priv point to netdev_priv where the code calls netdev_priv.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:44 -04:00
Erez Shitrit 10adcbd2d5 IB/IPoIB: Rename qpn to be dqpn in ipoib_send and post_send functions
Change of function parameter name from qpn to be dqpn.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:43 -04:00
Erez Shitrit 7ce1a3ee02 IB/IPoIB: Separate control from HW operation on ipoib_open/stop ndo
This patch is preparing the netdev part at the IPoIB driver to be able
to use the ipoib_options.

It deals with the two flows from the .ndo: ipoib_open and ipoib_stop.

The code is rearranged as follows:
 * All operations which deal with the hardware resources, (for example
   change QP state, post-receive etc.) are performed in one place.
 * All operations that are control oriented (like restart multicast task,
   start the reap_ah etc.) are performed in separate place.

The functions that deal with the hardware resources now located at
__ipoib_ib_dev_open for the ipoib_open flow and __ipoib_ib_dev_stop
for ipoib_stop.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:43 -04:00
Erez Shitrit 515ed4f3aa IB/IPoIB: Separate control and data related initializations
This patch prepares init and teardown flows so we can call them
through ipoib_options function pointers.

It arranges that area of code as the following:
 * All operations which deal with the resource allocation/deletion
   are performed in one place.
 * All operations that are control oriented, meaning that they are not
   connected to a specific hardware, are performed in a separate place.

The operations for allocation of hardware resources are now in the
function ipoib_dev_init_default, and the deletion of all the resources
are in ipoib_dev_uninit_default

The only exception is the creation of the PD object,
which is used both for resource allocation (create QP etc.)
and for control flows like creating AH.

It also does:
 * Move creation of rx_ring and tx_ring to be in the resources
   allocation area.
 * Move the function ipoib_ib_dev_open that does the open device
   to the control area instead of the dev_init which creates resources.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:42 -04:00
Vishwanathapura, Niranjana 1bd671ab3f IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) function
OPA VEMA function interfaces with the Infiniband MAD stack to exchange the
management information packets with the Ethernet Manager (EM).
It interfaces with the OPA VNIC netdev function to SET/GET the management
information. The information exchanged with the EM includes class port
details, encapsulation configuration, various counters, unicast and
multicast MAC list and the MAC table. It also supports sending traps
to the EM.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana cfd34f8eb0 IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) interface
OPA VNIC EMA interface functions are the management interfaces to the OPA
VNIC netdev. Add support to add and remove VNIC ports. Implement the
required GET/SET management interface functions and processing of new
management information. Add support to send trap notifications upon various
events like interface status change, unicast/multicast mac list update and
mac address change.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 174e03d7e6 IB/opa-vnic: VNIC MAC table support
OPA VNIC MAC table contains the MAC address to DLID mappings provided by
the Ethernet manager. During transmission, the MAC table provides the MAC
address to DLID translation. Implement MAC table using simple hash list.
Also provide support to update/query the MAC table by Ethernet manager.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 009b7dd40c IB/opa-vnic: VNIC statistics support
OPA VNIC driver statistics support maintains various counters including
standard netdev counters and the Ethernet manager defined counters.
Add the Ethtool hook to read the counters.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 72dc761440 IB/opa-vnic: VNIC Ethernet Management (EM) structure definitions
Define VNIC EM MAD structures and the associated macros. These structures
are used for information exchange between VNIC EM agent (EMA) on the host
and the Ethernet manager. These include the virtual ethernet switch (vesw)
port information, vesw port mac table, summay and error counters,
vesw port interface mac lists and the EMA trap.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 7d6f728c67 IB/opa-vnic: Virtual Network Interface Controller (VNIC) netdev
OPA VNIC netdev function supports Ethernet functionality over Omni-Path
fabric by encapsulating Ethernet packets inside Omni-Path packet header.
It allocates a rdma netdev device and interfaces with the network stack to
provide standard Ethernet network interfaces. It overrides HFI1 device's
netdev operations where it is required.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Doug Ledford 23790ba2d7 Merge branch 'k.o/for-4.12' into k.o/for-4.12-rdma-netdevice 2017-04-20 12:00:41 -04:00
Linus Torvalds 025def92dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:

 "There has been work in a number of different areas over the last
  weeks, including:

   - Fix target-core-user (TCMU) back-end bi-directional handling (Xiubo
     Li + Mike Christie + Ilias Tsitsimpis)

   - Fix iscsi-target TMR reference leak during session shutdown (Rob
     Millner + Chu Yuan Lin)

   - Fix target_core_fabric_configfs.c race between LUN shutdown +
     mapped LUN creation (James Shen)

   - Fix target-core unknown fabric callback queue-full errors (Potnuri
     Bharat Teja)

   - Fix iscsi-target + iser-target queue-full handling in order to
     support iw_cxgb4 RNICs. (Potnuri Bharat Teja + Sagi Grimberg)

   - Fix ALUA transition state race between multiple initiator (Mike
     Christie)

   - Drop work-around for legacy GlobalSAN initiator, to allow QLogic
     57840S + 579xx offload HBAs to work out-of-the-box in MSFT
     environments. (Martin Svec + Arun Easi)

  Note that a number are CC'ed for stable, and although the queue-full
  bug-fixes required for iser-target to work with iw_cxgb4 aren't CC'ed
  here, they'll be posted to Greg-KH separately"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
  iscsi-target: Drop work-around for legacy GlobalSAN initiator
  target: Fix ALUA transition state race between multiple initiators
  iser-target: avoid posting a recv buffer twice
  iser-target: Fix queue-full response handling
  iscsi-target: Propigate queue_data_in + queue_status errors
  target: Fix unknown fabric callback queue-full errors
  tcmu: Fix wrongly calculating of the base_command_size
  tcmu: Fix possible overwrite of t_data_sg's last iov[]
  target: Avoid mappedlun symlink creation during lun shutdown
  iscsi-target: Fix TMR reference leak during session shutdown
  usb: gadget: Correct usb EP argument for BOT status request
  tcmu: Allow cmd_time_out to be set to zero (disabled)
2017-04-11 23:51:58 -07:00
Shamir Rabinovitch 771a525840 IB/IPoIB: ibX: failed to create mcg debug file
When udev renames the netdev devices, ipoib debugfs entries does not
get renamed. As a result, if subsequent probe of ipoib device reuse the
name then creating a debugfs entry for the new device would fail.

Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part
of ipoib event handling in order to avoid any race condition between these.

Fixes: 1732b0ef3b ([IPoIB] add path record information in debugfs)
Cc: stable@vger.kernel.org # 2.6.15+
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:47:24 -04:00
Sagi Grimberg 7a56dc8888 iser-target: avoid posting a recv buffer twice
We pre-allocate our send-queues and might overflow them
in case we have multi work-request operations which tend
to occur for large RDMA transfers over devices with limited
allowed sg elements. When we get to a queue-full condition
we might retry again later, so track our receive buffers
so we don't repost them for a retry case.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 20:35:50 -07:00
Nicholas Bellinger 555a65f66c iser-target: Fix queue-full response handling
This patch addresses two queue-full handling bugs in iser-target.

The first is propagating isert_rdma_rw_ctx_post() return back
to target-core via isert_put_datain() + isert_get_dataout()
callbacks, in order to trigger queue-full logic in target-core.
Note target-core expects -EAGAIN or -ENOMEM error to signal
RDMA WRITE/READ data-transfer callbacks should be retried,
after queue-full logic been invoked.

Other types of errors propagated up from RDMA RW API will result
in target-core generating internal CHECK_CONDITION status,
avoiding subsequent isert_put_datain() and isert_get_dataout()
iscsit_transport callback retry attempts.

The second is to use transport_generic_request_failure()
during T10-PI hw-offload errors in isert_rdma_write_done()
and isert_rdma_read_done(), so CHECK_CONDITION queue-full
is handled internally by target-core.

Also add isert_put_response() T10-PI failure case fixme in
isert_rdma_write_done(), which is currently not internally
retried or released until session reinstatement.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 20:35:07 -07:00
Sagi Grimberg ea174c9573 RDMA/iser: Fix possible mr leak on device removal event
When the rdma device is removed, we must cleanup all
the rdma resources within the DEVICE_REMOVAL event
handler to let the device teardown gracefully. When
this happens with live I/O, some memory regions are
occupied. Thus, track them too and dereg all the mr's.

We are safe with mr access by iscsi_iser_cleanup_task.

Reported-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 22:31:19 -04:00
Ingo Molnar 174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Linus Torvalds ac1820fb28 This is a tree wide change and has been kept separate for that reason.
Bart Van Assche noted that the ib DMA mapping code was significantly
 similar enough to the core DMA mapping code that with a few changes
 it was possible to remove the IB DMA mapping code entirely and
 switch the RDMA stack to use the core DMA mapping code.  This resulted
 in a nice set of cleanups, but touched the entire tree.  This branch
 will be submitted separately to Linus at the end of the merge window
 as per normal practice for tree wide changes like this.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
 2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
 CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
 Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
 xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
 0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
 1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
 ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
 IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
 ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
 XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
 3e7mgpcwC+3XfA/6vU3F
 =e0Si
 -----END PGP SIGNATURE-----

Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma DMA mapping updates from Doug Ledford:
 "Drop IB DMA mapping code and use core DMA code instead.

  Bart Van Assche noted that the ib DMA mapping code was significantly
  similar enough to the core DMA mapping code that with a few changes it
  was possible to remove the IB DMA mapping code entirely and switch the
  RDMA stack to use the core DMA mapping code.

  This resulted in a nice set of cleanups, but touched the entire tree
  and has been kept separate for that reason."

* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
  IB/core: Remove ib_device.dma_device
  nvme-rdma: Switch from dma_device to dev.parent
  RDS: net: Switch from dma_device to dev.parent
  IB/srpt: Modify a debug statement
  IB/srp: Switch from dma_device to dev.parent
  IB/iser: Switch from dma_device to dev.parent
  IB/IPoIB: Switch from dma_device to dev.parent
  IB/rxe: Switch from dma_device to dev.parent
  IB/vmw_pvrdma: Switch from dma_device to dev.parent
  IB/usnic: Switch from dma_device to dev.parent
  IB/qib: Switch from dma_device to dev.parent
  IB/qedr: Switch from dma_device to dev.parent
  IB/ocrdma: Switch from dma_device to dev.parent
  IB/nes: Remove a superfluous assignment statement
  IB/mthca: Switch from dma_device to dev.parent
  IB/mlx5: Switch from dma_device to dev.parent
  IB/mlx4: Switch from dma_device to dev.parent
  IB/i40iw: Remove a superfluous assignment statement
  IB/hns: Switch from dma_device to dev.parent
  ...
2017-02-25 13:45:43 -08:00
Linus Torvalds 4cc4b9323f First set of updates for 4.11 kernel merge window
- Add new Broadcom bnxt_re RoCE driver
 - rxe driver updates
 - ioctl cleanups
 - ETH_P_IBOE declaration cleanup
 - IPoIB changes
 - Add port state cache
 - Allow srpt driver to accept guids as port names in config
 - Update to hfi1 driver
 - Update to srp driver
 - Lots of misc. minor changes all over
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYrfewAAoJELgmozMOVy/dFnEP/2Qe7NqXRqxLS0ZqsQseFHgQ
 jd236E7R/XtQQTE3PTcrWL0mq0DRF6tMEjfhUASKTbZVfCBTniJAoXYrvWhN/STq
 LxAdigdV/0SPbxO3r9B1Xvk2v5BySaIBkaUDvcEXzT4e7UVQwZgxDkhhsYeY0Z/r
 9bNB5760PzW8uO5cctXccNcWztZnW0IUZuAHVfQCPjZ7svoGwLnNDW6YQx+FsEkW
 tbPdzMXX8VKHlC5RcKbfOOBjdNyrUpWl+uvWEc/7mazKscp4yKVFZL7PcxqPJSfd
 aKdfqXYawhjZZpyws8Kn0rhkfT7xWKD/y9G5STykRJPj9/n1BDScFkmyDQhtP5bJ
 GANzdgH0z7Dt9LkcAs86A8EVBbIdbdT2cpPVu7t0uWEIsJw/O5ThKpgjnrrTm6m+
 89tgqLZooifTEsdj4UkZoyktrD3J9LSNZkgVmWtRn01W3oYFOPbdM4TmBZtg+/Yl
 VGmOJEHMEsNuJBcJcOuRJ1MVz2LebXmPUcB0RXzgmHHgulZ/DqoOtlpg5JNmJcr5
 wpw/yppkBop4V4+etJBlzDsZNmZZlX+AY0ZLqQJsDHNszDjwXgAy5Rn5FYIdMyk4
 ff0FKb5dzASSxHRDxAsu2uoGaREM0NkpA0UYiIZbepGLSO8PuFG2ScQ6qzU47vqu
 9SEzOaaQY2S2uqFFFnYp
 =ugNm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "First set of updates for 4.11 kernel merge window

   - Add new Broadcom bnxt_re RoCE driver
   - rxe driver updates
   - ioctl cleanups
   - ETH_P_IBOE declaration cleanup
   - IPoIB changes
   - Add port state cache
   - Allow srpt driver to accept guids as port names in config
   - Update to hfi1 driver
   - Update to srp driver
   - Lots of misc minor changes all over"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (114 commits)
  RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0."
  rdma_cm: fail iwarp accepts w/o connection params
  IB/srp: Drain the send queue before destroying a QP
  IB/core: Add support for draining IB_POLL_DIRECT completion queues
  IB/srp: Improve an error path
  IB/srp: Make a diagnostic message more informative
  IB/srp: Document locking conventions
  IB/srp: Fix race conditions related to task management
  IB/srp: Avoid that duplicate responses trigger a kernel bug
  IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
  RDMA/qedr: Fix some error handling
  RDMA/bnxt_re: add DCB dependency
  IB/hns: include linux/module.h
  IB/vmw_pvrdma: Expose vendor error to ULPs
  vmw_pvrdma: switch to pci_alloc_irq_vectors
  IB/hfi1: use size_t for passing array length
  IB/ipoib: Remove redudant label
  IB/ipoib: remove the unnecessary memory free
  IB/mthca: switch to pci_alloc_irq_vectors
  IB/hfi1: Code reuse with memdup_copy
  ...
2017-02-23 08:27:57 -08:00
Linus Torvalds cdc194705d SCSI misc on 20170220
This update includes the usual round of major driver updates (ncr5380,
 ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
 megaraid_sas, ).  There's also an assortment of minor fixes and the
 major update of switching a bunch of drivers to pci_alloc_irq_vectors
 from Christoph.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYq5adAAoJEAVr7HOZEZN4bjUP/Atk7CSZVnC75pcYmncbEGCx
 ysOlEHK4uW2HhiAYk3PlYMk+pKrMHet2zsbbM9PHJfopdOHZ7Sq1+UZZVeqE1Zun
 8pe0NhON+fZx7XAnevdEvnSSULQZ+AGfjZO72iUwkJiN3ozYaFtCITOyn49l4GpR
 ra9emskBh7CQOFW2voGn1AKeDijPYGx3+TO4AUrWjVMiByR06gb1bmImx+ljiUrs
 jzRJPfrt90ORcTdpMateyN2EXxudcASMhX03SJ6fRI84hPAhMCROMbTv8RnzOTE4
 DPbnvbYUowlHt43iUhJHSwGdkRRaRBnkzQENBp1fNrNzZgF6vB7+kShxbonrYB2p
 gC4ewaJr0BNj+HsUnvTpe3WseiPOcfsnBsKilPLKBlm2dCKEXqFox/dj/T1uexxg
 HoyFrl3u8fyEqVHrzRS4M9t/njWh0NFmXxb0wBdj+lkVFTRErGSKQ8SfOqshuSGs
 P8NN88jy8vC7uqgzKBJ+UH3ehzn3qfBxasFHIC/e2awY9FqKjHGTxKMmSVpjXVxy
 wCvE2FQ3k/qEj2XSM6f7/NGytlSOlju5q1rFtHPW2M+TFSh0LJWCnmVjR/Zle9em
 pBWmtIgCv8W5b41zL2H94nLWAZbfdrrNU/XnX88l47LKnmorte/PGhpxu36NEsMS
 VCgreQmFMdMRY+WzDWl1
 =cBQx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates (ncr5380,
  ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
  megaraid_sas, ...).

  There's also an assortment of minor fixes and the major update of
  switching a bunch of drivers to pci_alloc_irq_vectors from Christoph"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits)
  scsi: megaraid_sas: handle dma_addr_t right on 32-bit
  scsi: megaraid_sas: array overflow in megasas_dump_frame()
  scsi: snic: switch to pci_irq_alloc_vectors
  scsi: megaraid_sas: driver version upgrade
  scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2
  scsi: megaraid_sas: Indentation and smatch warning fixes
  scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints
  scsi: megaraid_sas: Increase internal command pool
  scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete
  scsi: megaraid_sas: Bail out the driver load if ld_list_query fails
  scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void
  scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR
  scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool
  scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion
  scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero
  scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate
  scsi: megaraid_sas: update can_queue only if the new value is less
  scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions
  scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID
  scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD
  ...
2017-02-21 11:51:42 -08:00
Bart Van Assche 9294000d6d IB/srp: Drain the send queue before destroying a QP
A quote from the IB spec:

However, if the Consumer does not wait for the Affiliated Asynchronous
Last WQE Reached Event, then WQE and Data Segment leakage may occur.
Therefore, it is good programming practice to tear down a QP that is
associated with an SRQ by using the following process:
* Put the QP in the Error State;
* wait for the Affiliated Asynchronous Last WQE Reached Event;
* either:
  * drain the CQ by invoking the Poll CQ verb and either wait for CQ
    to be empty or the number of Poll CQ operations has exceeded CQ
    capacity size; or
  * post another WR that completes on the same CQ and wait for this WR to return as a WC;
* and then invoke a Destroy QP or Reset QP.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:55 -05:00
Bart Van Assche b02c15360b IB/srp: Improve an error path
Avoid that the following message is printed if login fails:

scsi host0: ib_srp: Sending CM DREQ failed

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:54 -05:00
Bart Van Assche a7139ca80f IB/srp: Make a diagnostic message more informative
Report the destination port GID if connecting fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:53 -05:00
Bart Van Assche 93c76dbb74 IB/srp: Document locking conventions
Use lockdep_assert_held() statements to verify at run-time
whether the proper locks are held.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:53 -05:00
Bart Van Assche 0a6fdbdeb1 IB/srp: Fix race conditions related to task management
Avoid that srp_process_rsp() overwrites the status information
in ch if the SRP target response timed out and processing of
another task management function has already started. Avoid that
issuing multiple task management functions concurrently triggers
list corruption. This patch prevents that the following stack
trace appears in the system log:

WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0
list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68
CPU: 8 PID: 9269 Comm: sg_reset Tainted: G        W       4.10.0-rc7-dbg+ #3
Call Trace:
 dump_stack+0x68/0x93
 __warn+0xc6/0xe0
 warn_slowpath_fmt+0x4a/0x50
 __list_del_entry_valid+0xbc/0xc0
 wait_for_completion_timeout+0x12e/0x170
 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp]
 srp_reset_device+0x5b/0x110 [ib_srp]
 scsi_ioctl_reset+0x1c7/0x290
 scsi_ioctl+0x12a/0x420
 sd_ioctl+0x9d/0x100
 blkdev_ioctl+0x51e/0x9f0
 block_ioctl+0x38/0x40
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:52 -05:00
Bart Van Assche 6cb72bc1b4 IB/srp: Avoid that duplicate responses trigger a kernel bug
After srp_process_rsp() returns there is a short time during which
the scsi_host_find_tag() call will return a pointer to the SCSI
command that is being completed. If during that time a duplicate
response is received, avoid that the following call stack appears:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: srp_recv_done+0x450/0x6b0 [ib_srp]
Oops: 0000 [#1] SMP
CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1
Call Trace:
 <IRQ>
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_poll_handler+0x1d/0x70 [ib_core]
 irq_poll_softirq+0xba/0x120
 __do_softirq+0xba/0x4c0
 irq_exit+0xbe/0xd0
 smp_apic_timer_interrupt+0x38/0x50
 apic_timer_interrupt+0x90/0xa0
 </IRQ>
RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:52 -05:00
Bart Van Assche d6c58dc40f IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
Tests have shown that the following error message is reported when
using SG-GAPS registration with an mlx5 adapter:

scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 2500002a ad9fafd1
scsi host1: ib_srp: reconnect succeeded
mlx5_0:dump_cqe:262:(pid 7369): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000032 00105dd0
scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138

Hence avoid using SG-GAPS memory registrations. Additionally,
always configure the blk_queue_virt_boundary() to avoid to trigger
a mapping failure when using adapters that support SG-GAPS (e.g.
mlx5).

Fixes: commit ad8e66b4a8 ("IB/srp: fix mr allocation when the device supports sg gaps")
Fixes: commit 509c5f33f4 ("IB/srp: Prevent mapping failures")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mark Bloch <markb@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:41 -05:00
Zhu Yanjun 23536dfaa3 IB/ipoib: Remove redudant label
There are 2 labels to mark the same statements. Replace the 2 labels
with one versatile labe.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:46 -05:00
Zhu Yanjun c5e8f57b0b IB/ipoib: remove the unnecessary memory free
In the function ipoib_cm_nonsrq_init_rx, the memory is not
allocated successfully. It is not necessary to free it.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:46 -05:00
Max Gurtovoy 32f8e839ed IB/iser: Protect completion context active_qps update
As iser connections can share completion contexts, we need
to protect the active_qps update each time we set it.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:31 -05:00
Doug Ledford 6dd7abae71 Merge branch 'k.o/for-4.10-rc' into HEAD 2017-02-19 09:18:21 -05:00
Erez Shitrit 2b0841766a IB/IPoIB: Add destination address when re-queue packet
When sending packet to destination that was not resolved yet
via path query, the driver keeps the skb and tries to re-send it
again when the path is resolved.

But when re-sending via dev_queue_xmit the kernel doesn't call
to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb
and to put the destination address inside them.

In that way the dev_start_xmit will have the correct destination,
and the driver won't take the destination from the skb->data, while
nothing exists there, which causes to packet be be dropped.

The test flow is:
1. Run the SM on remote node,
2. Restart the driver.
4. Ping some destination,
3. Observe that first ICMP request will be dropped.

Fixes: fc791b6335 ("IB/ipoib: move back IB LL address into the hard header")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-15 09:51:28 -05:00
Christoph Hellwig b6a05c823f scsi: remove eh_timed_out methods in the transport template
Instead define the timeout behavior purely based on the host_template
eh_timed_out method and wire up the existing transport implementations
in the host templates.  This also clears up the confusion that the
transport template method overrides the host template one, so some
drivers have to re-override the transport template one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-06 19:10:03 -05:00
Bart Van Assche 2bce1a6d22 IB/srpt: Accept GUIDs as port names
Port and ACL information must be configured before an initiator
logs in.  Make it possible to configure this information before
a subnet prefix has been assigned to a port by not only accepting
GIDs as target port and initiator port names but by also accepting
port GUIDs.

Add a 'priv' member to struct se_wwn to allow target drivers to
associate their own data with struct se_wwn.

Reported-by: Doug Ledford <dledford@redhat.com>
References: http://www.spinics.net/lists/linux-rdma/msg39505.html
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-27 14:31:50 -05:00
Zhu Yanjun 5c37077fd0 IB/ipoib: Remove the unnecessary error check
The function ipoib_mcast_start_thread/ipoib_ib_dev_up always return zero.
As such, in the function ipoib_open, err_stop will never be reached.
So remove this err_stop and change the return type of the function
ipoib_mcast_start_thread/ipoib_ib_dev_up to void.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:22:24 -05:00
Zhu Yanjun dfc0e55506 IB/ipoib: function interface change
The ipoib_ib_dev_down/ipoib_ib_dev_stop return zero unconditionally
and the callers never check the returned values,
change the return type to void and remove the redundant return values.

Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Zhu Yanjun f7534f45dc IB/ipoib: Remove unnecessary returned value check
In the function ipoib_set_dev_features, the returned value is always 0.
As such, it is not necessary to check the returned value.
This is not a bug. It is a trivial problem.

Reviewed-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Colin Ian King 506f71d181 IB/isert: fix spelling mistake: "teminating" -> "terminating"
Trivial fix to spelling mistake in isert_warn message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Bart Van Assche e3dfa60c0a IB/srpt: Modify a debug statement
Since a later patch will remove ib_device.dma_device and since knowing
the value of that pointer is not too important, remove dma_device from
the debug output.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche dee2b82a5f IB/srp: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche 61118cecf2 IB/iser: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche db97ed0a2e IB/IPoIB: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche 5657933dbb treewide: Move dma_ops from struct dev_archdata into struct device
Some but not all architectures provide set_dma_ops(). Move dma_ops
from struct dev_archdata into struct device such that it becomes
possible on all architectures to configure dma_ops per device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: x86@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Max Gurtovoy 83236f0157 IB/iser: remove unused variable from iser_conn struct
max_sectors calculation was fixed in commit:
9c674815d3 ("IB/iser: Fix max_sectors calculation").
Thus, iser_conn variable scsi_max_sectors is not needed anymore.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:37:45 -05:00
Max Gurtovoy 1e5db6c31a IB/iser: Fix sg_tablesize calculation
For devices that can register page list that is bigger than
USHRT_MAX, we actually take the wrong value for sg_tablesize.
E.g: for CX4 max_fast_reg_page_list_len is 65536 (bigger than USHRT_MAX)
so we set sg_tablesize to 0 by mistake. Therefore, each IO that is
bigger than 4k splitted to "< 4k" chunks that cause performance degredation.
Remove wrong sg_tablesize assignment, and use the value that was set during
address resolution handler with the needed casting.

Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:37:45 -05:00
Israel Rukshin 0a475ef422 IB/srp: fix invalid indirect_sg_entries parameter value
After setting indirect_sg_entries module_param to huge value (e.g 500,000),
srp_alloc_req_data() fails to allocate indirect descriptors for the request
ring (kmalloc fails). This commit enforces the maximum value of
indirect_sg_entries to be SG_MAX_SEGMENTS as signified in module param
description.

Fixes: 65e8617fba (scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS)
Fixes: c07d424d61 (IB/srp: add support for indirect tables that don't fit in SRP_CMD)
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:30:14 -05:00
Israel Rukshin ad8e66b4a8 IB/srp: fix mr allocation when the device supports sg gaps
If the device support arbitrary sg list mapping (device cap
IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
IB_MR_TYPE_SG_GAPS.

Fixes: 509c5f33f4 ("IB/srp: Prevent mapping failures")
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:03:17 -05:00
Feras Daoud 27d41d29c7 IB/ipoib: Change list_del to list_del_init in the tx object
Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function
belong to different work queues, they can run in parallel.
In this case if ipoib_cm_tx_reap calls list_del and release the
lock, ipoib_cm_tx_start may acquire it and call list_del_init
on the already deleted object.
Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem.

Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:06 -05:00
Feras Daoud c586071d1d IB/ipoib: Replace list_del of the neigh->list with list_del_init
In order to resolve a situation where a few process delete
the same list element in sequence and cause panic, list_del
is replaced with list_del_init. In this case if the first
process that calls list_del releases the lock before acquiring
it again, other processes who can acquire the lock will call
list_del_init.

Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:05 -05:00
Feras Daoud 13ee429a02 IB/ipoib: Use debug prints instead of warnings in RNR WC status
If a receive request has not been posted to the work queue, the incoming
message is rejected and the peer will receive a receiver-not-ready (RNR)
error. In IPoIB, IB_WC_RNR_RETRY_EXC_ERR error is part of the life cycle
therefore ipoib_cm_handle_tx_wc function will print to debug instead
of warnings.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:05 -05:00
Feras Daoud d32b9a81d7 IB/ipoib: Add detailed error message to dev_queue_xmit call
Add a detailed return code to dev_queue_xmit function when
calling to requeue packet via __skb_dequeue.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:04 -05:00
Feras Daoud 89a3987ab7 IB/ipoib: rtnl_unlock can not come after free_netdev
The ipoib_vlan_add function calls rtnl_unlock after free_netdev,
rtnl_unlock not only releases the lock, but also calls netdev_run_todo.
The latter function browses the net_todo_list array and completes the
unregistration of all its net_device instances. If we call free_netdev
before rtnl_unlock, then netdev_run_todo call over the freed device causes
panic.
To fix, move rtnl_unlock call before free_netdev call.

Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:04 -05:00
Feras Daoud 0a0007f283 IB/ipoib: Fix deadlock between rmmod and set_mode
When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.

The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.

    set_mod:
    [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
    [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
    [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
    [<ffffffff81448675>] rtnl_lock+0x15/0x20
    [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
    [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
    [<ffffffff8134b840>] dev_attr_store+0x20/0x30
    [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
    [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
    [<ffffffff8117ba81>] sys_write+0x51/0x90
    [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

    rmmod:
    [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
    [<ffffffff8127a2ee>] ? number+0x2ee/0x320
    [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
    [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
    [<ffffffff8127b550>] ? string+0x40/0x100
    [<ffffffff814fe323>] wait_for_common+0x123/0x180
    [<ffffffff81060250>] ? default_wake_function+0x0/0x20
    [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
    [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
    [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
    [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
    [<ffffffff81273f66>] kobject_del+0x16/0x40
    [<ffffffff8134cd14>] device_del+0x184/0x1e0
    [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
    [<ffffffff8143c05e>] rollback_registered+0xae/0x130
    [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
    [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
    [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
    [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
    [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
    [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]

Fixes: 862096a8bb ("IB/ipoib: Add more rtnl_link_ops callbacks")
Cc: <stable@vger.kernel.org> # v3.6+
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:03 -05:00
Feras Daoud 1c3098cdb0 IB/ipoib: Fix deadlock over vlan_mutex
This patch fixes Deadlock while executing ipoib_vlan_delete.

The function takes the vlan_rwsem semaphore and calls
unregister_netdevice. The later function calls
ipoib_mcast_stop_thread that cause workqueue flush.

When the queue has one of the ipoib_ib_dev_flush_xxx events,
a deadlock occur because these events also tries to catch the
same vlan_rwsem semaphore.

To fix, unregister_netdevice should be called after releasing
the semaphore.

Fixes: cbbe1efa49 ("IPoIB: Fix deadlock between ipoib_open() and child interface create")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:02 -05:00
Feras Daoud 80b5b35aba IB/ipoib: Set device connection mode only when needed
When changing the connection mode, the ipoib_set_mode function
did not check if the previous connection mode equals to the
new one. This commit adds the required check and return 0 if the new
mode equals to the previous one.

Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:02 -05:00
Feras Daoud 29da686dff IB/ipoib: When given an invalid UD MTU, give debug msg
In datagram mode, the IB UD (Unreliable Datagram) transport is used
so the MTU of the interface is equal to the IB L2 MTU minus the
IPoIB encapsulation header. Any request to change the MTU value
above the maximum range will change the MTU to the max allowed, but
will not show any warning message. An ipoib_warn is issued in such
cases, letting the user know that even though the value is legal,
it can't be currently applied.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 13:59:56 -05:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds 4d5b57e05a Updates for 4.10 kernel merge window
- Shared mlx5 updates with net stack (will drop out on merge if Dave's
   tree has already been merged)
 - Driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe
 - Debug cleanups
 - New connection rejection helpers
 - SRP updates
 - Various misc fixes
 - New paravirt driver from vmware
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYUbAPAAoJELgmozMOVy/dMXcP/iuG5MNzfN8Ny1JftyBQGWg3
 cqoQ2OLj9CsXjwVB+5EqbcZHRZY852lKONaLoDKkIOx4YAXO2YuIKOp944vN7EQx
 96wfqzT1F5jzAcy5mYZXgLaStGFDAwejKMqeHd0LfJj3OEtemGnVPWYzyqSQmSKo
 dzJraS1Z9GIRppzU5WaRpB9PtRBkqIqGJ5vZ0EKLGhed5hYY5r0iMJB0GfriMRDO
 lJ4UUVfpsAoLPnqDBFH6IMn2V2UeAw9IR5zNa1mrM1RBfvt/uYTxrw1w3p9WoaNs
 GRodhk4DCeAfeyqzVPNBLyXZ4Zq4FzGe3UWM4qysJ1RR4oFNw9Cuw0Fqk8mrfznr
 7hv5TpGIckRZiKf8l6e+qLirF0qGtXJg29j2vPVQI9i5nSj95g1agA81PnLQlLLb
 flWyxeMj81my7lfMHN1xcV6pqPEKMCOysZmfcvVfJd2XxpjuVD7ekl/YXWp8o8kU
 YPdQMqPD626XsD8VpPdMszb9FPmx0JD0HEv+Y1rIFX8JegEI+c3H2X0dqC27T/Ou
 FEPWOy025EgHm0Fh/7eIzkG6tjZ4JHoCugJAcxNZGj2XW4eB6r5vY8UwJ8iQRv+n
 PVYHiy0UoIRePh0mrdOSSphGZMi/GO/DsqKwCtAMEK43WqZQju6wR7QSIGkh66mp
 4uSHJqpf3YEYylxGMhk3
 =QeGy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is the complete update for the rdma stack for this release cycle.

  Most of it is typical driver and core updates, but there is the
  entirely new VMWare pvrdma driver. You may have noticed that there
  were changes in DaveM's pull request to the bnxt Ethernet driver to
  support a RoCE RDMA driver. The bnxt_re driver was tentatively set to
  be pulled in this release cycle, but it simply wasn't ready in time
  and was dropped (a few review comments still to address, and some
  multi-arch build issues like prefetch() not working across all
  arches).

  Summary:

   - shared mlx5 updates with net stack (will drop out on merge if
     Dave's tree has already been merged)

   - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe

   - debug cleanups

   - new connection rejection helpers

   - SRP updates

   - various misc fixes

   - new paravirt driver from vmware"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits)
  IB: Add vmw_pvrdma driver
  IB/mlx4: fix improper return value
  IB/ocrdma: fix bad initialization
  infiniband: nes: return value of skb_linearize should be handled
  MAINTAINERS: Update Intel RDMA RNIC driver maintainers
  MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
  IB/core: fix unmap_sg argument
  qede: fix general protection fault may occur on probe
  IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
  mlx5, calc_sq_size(): Make a debug message more informative
  mlx5: Remove a set-but-not-used variable
  mlx5: Use { } instead of { 0 } to init struct
  IB/srp: Make writing the add_target sysfs attr interruptible
  IB/srp: Make mapping failures easier to debug
  IB/srp: Make login failures easier to debug
  IB/srp: Introduce a local variable in srp_add_one()
  IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
  IB/multicast: Check ib_find_pkey() return value
  IPoIB: Avoid reading an uninitialized member variable
  IB/mad: Fix an array index check
  ...
2016-12-15 12:03:32 -08:00
Doug Ledford 9032ad78bb Merge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp' into merge-test 2016-12-14 14:44:47 -05:00
Doug Ledford 86ef0beaa0 Merge branch 'mlx' into merge-test 2016-12-14 14:44:25 -05:00
Bart Van Assche 4fa354c9db IB/srp: Make writing the add_target sysfs attr interruptible
Avoid that shutdown of srp_daemon is delayed if add_target_mutex is
held by another process.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:47 -05:00
Bart Van Assche 290081b453 IB/srp: Make mapping failures easier to debug
Make it easier to figure out what is going on if memory mapping
fails because more memory regions than mr_per_cmd are needed.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche 3787d9908c IB/srp: Make login failures easier to debug
If login fails because memory region allocation failed it can be
hard to figure out what happened. Make it easier to figure out
why login failed by logging a message if ib_alloc_mr() fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche 042dd765bd IB/srp: Introduce a local variable in srp_add_one()
This patch makes the srp_add_one() code more compact and does not
change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche 1a1faf7a8a IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
Avoid that the kernel build fails as follows if dynamic debug support
is disabled:

drivers/infiniband/ulp/srp/ib_srp.c:2272:3: error: implicit declaration of function 'DEFINE_DYNAMIC_DEBUG_METADATA'
drivers/infiniband/ulp/srp/ib_srp.c:2272:33: error: 'ddm' undeclared (first use in this function)
drivers/infiniband/ulp/srp/ib_srp.c:2275:39: error: '_DPRINTK_FLAGS_PRINT' undeclared (first use in this function)

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche 11b642b84e IPoIB: Avoid reading an uninitialized member variable
This patch avoids that Coverity reports the following:

    Using uninitialized value port_attr.state when calling printk

Fixes: commit 94232d9ce8 ("IPoIB: Start multicast join process only on active ports")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:27:34 -05:00
Bart Van Assche 0d38c240f9 IB/srpt: Report login failures only once
Report the following message only once if no ACL has been configured
yet for an initiator port:

"Rejected login because no ACL has been configured yet for initiator %s.\n"

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagig@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:58:30 -05:00
Alexey Khoroshilov def4a6ffc9 IB/isert: do not ignore errors in dma_map_single()
There are several places, where errors in dma_map_single() are
ignored. The patch fixes them.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:51:31 -05:00
Steve Wise 1e38a366ee ib_isert: log the connection reject message
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Steve Wise 97540bb90a ib_iser: log the connection reject message
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Leon Romanovsky 74226649f4 IB/ipoib: Remove and fix debug prints after allocation failure
The prints after [k|v][m|z|c]alloc() functions are not needed,
because in case of failure, allocator will print their internal
error prints anyway.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03 13:12:52 -05:00
Leon Romanovsky 93b80f2904 IB/isert: Remove and fix debug prints after allocation failure
The prints after [k|v][m|z|c]alloc() functions are not needed,
because in case of failure, allocator will print their internal
error prints anyway.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03 13:12:52 -05:00
Kamal Heib 0b59970e7d IB/IPoIB: Remove can't use GFP_NOIO warning
Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.

This print become more annoying when the IPoIB interface is configured
to work in connected mode.

Fixes: 09b93088d7 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16 20:04:48 -05:00
David S. Miller 27058af401 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple overlapping changes.

For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30 12:42:58 -04:00
Jarod Wilson b3e3893e12 net: use core MTU range checking in misc drivers
firewire-net:
- set min/max_mtu
- remove fwnet_change_mtu

nes:
- set max_mtu
- clean up nes_netdev_change_mtu

xpnet:
- set min/max_mtu
- remove xpnet_dev_change_mtu

hippi:
- set min/max_mtu
- remove hippi_change_mtu

batman-adv:
- set max_mtu
- remove batadv_interface_change_mtu
- initialization is a little async, not 100% certain that max_mtu is set
  in the optimal place, don't have hardware to test with

rionet:
- set min/max_mtu
- remove rionet_change_mtu

slip:
- set min/max_mtu
- streamline sl_change_mtu

um/net_kern:
- remove pointless ndo_change_mtu

hsi/clients/ssi_protocol:
- use core MTU range checking
- remove now redundant ssip_pn_set_mtu

ipoib:
- set a default max MTU value
- Note: ipoib's actual max MTU can vary, depending on if the device is in
  connected mode or not, so we'll just set the max_mtu value to the max
  possible, and let the ndo_change_mtu function continue to validate any new
  MTU change requests with checks for CM or not. Note that ipoib has no
  min_mtu set, and thus, the network core's mtu > 0 check is the only lower
  bounds here.

mptlan:
- use net core MTU range checking
- remove now redundant mpt_lan_change_mtu

fddi:
- min_mtu = 21, max_mtu = 4470
- remove now redundant fddi_change_mtu (including export)

fjes:
- min_mtu = 8192, max_mtu = 65536
- The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
  get past the core net MTU range checks so fjes_change_mtu can validate a
  new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)

hsr:
- min_mtu = 0 (calls ether_setup, max_mtu is 1500)

f_phonet:
- min_mtu = 6, max_mtu = 65541

u_ether:
- min_mtu = 14, max_mtu = 15412

phonet/pep-gprs:
- min_mtu = 576, max_mtu = 65530
- remove redundant gprs_set_mtu

CC: netdev@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: Stefan Richter <stefanr@s5r6.in-berlin.de>
CC: Faisal Latif <faisal.latif@intel.com>
CC: linux-rdma@vger.kernel.org
CC: Cliff Whickman <cpw@sgi.com>
CC: Robin Holt <robinmholt@gmail.com>
CC: Jes Sorensen <jes@trained-monkey.org>
CC: Marek Lindner <mareklindner@neomailbox.ch>
CC: Simon Wunderlich <sw@simonwunderlich.de>
CC: Antonio Quartulli <a@unstable.cc>
CC: Sathya Prakash <sathya.prakash@broadcom.com>
CC: Chaitra P B <chaitra.basappa@broadcom.com>
CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
CC: MPT-FusionLinux.pdl@broadcom.com
CC: Sebastian Reichel <sre@kernel.org>
CC: Felipe Balbi <balbi@kernel.org>
CC: Arvid Brodin <arvid.brodin@alten.se>
CC: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:10 -04:00
David Ahern e0e79c8e74 IB/ipoib: Flip to new dev walk API
Convert ipoib_get_net_dev_match_addr to the new upper device walk API.
This is just a code conversion; no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:44:59 -04:00
Paolo Abeni fc791b6335 IB/ipoib: move back IB LL address into the hard header
After the commit 9207f9d45b ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel&m=146787279825501&w=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3d7 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.

After this commit, IPoIB performances are back to pre-regression
value.

v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size

Fixes: 9207f9d45b ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:54:53 -04:00
Linus Torvalds b9044ac829 Merge of primary rdma-core code for 4.9
- Updates to mlx5
 - Updates to mlx4 (two conflicts, both minor and easily resolved)
 - Updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper
   resolution is to keep the code in cxgb4_main.c as it is in Linus'
   tree as attach_uld was refactored and moved into cxgb4_uld.c)
 - Improvements to uAPI (moved vendor specific API elements to uAPI area)
 - Add hns-roce driver and hns and hns-roce ACPI reset support
 - Conversion of all rdma code away from deprecated
   create_singlethread_workqueue
 - Security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
   staging)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJX+AwSAAoJELgmozMOVy/d0WkQAKxPzVccMWwHv28iZI4ey13u
 JwE+VoCNpCAZAVuEgzK5zzFdNHPvAk2jU93H4apA7dfXJBXPatVuj9Lnk+ieEEnW
 tbFwJjBpbQ3Zol3+SPfAHnsVMbtax+xmd6WDKExPXXEDl1L6rutwL3KKfmgWEitg
 ysX7XOJCiSdyM0hcg4T6UPB9a3jGPff9NLu0oGamV+yoUk5Y0WGoVFxHZ4MKcw8t
 OkFBYIxGz4SGwq2tulStuH03HteURX594KngtrA8dyq6l1R2GlGRv+bkJAUEIWUv
 aA0ow3VWusOM6fT+jLXPCv8iUwIXM8tR/U6F7X+cmORUUtWvCl+uCUVid113j/aN
 BK+Af2nJnfoJ5cDBPsD+bC76l5gQycNZO/Qh8op2kmgJtD+6OpGM3cBXsHx53+kk
 0wloJ2lKCGShWxNj+ig8n8rR/rhhs/x3vV3ouCVWNMbOUgOSN3eYHxmK3wGFW4nd
 Qx+WYCjj9Yi/J6nmUDcfEQ4NWPR22Q2+0ENAabfhLhV6mDloAO5ILHd4GDqC3IA9
 UtxlVjf4ZonaiLnTQQzCnDMGVVk6tT8FJ9D42s0ScwjbdYwjyCW9/rs/g2EhcprR
 Cc+AmjqLviCWGtzBSFO0SijqQon8lcQOwdLw61CdFFvPa/mlLdf1rbx9ArIyNVKn
 JSrbr3CGyoqyYj6qaEO5
 =LC+S
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull main rdma updates from Doug Ledford:
 "This is the main pull request for the rdma stack this release.  The
  code has been through 0day and I had it tagged for linux-next testing
  for a couple days.

  Summary:

   - updates to mlx5

   - updates to mlx4 (two conflicts, both minor and easily resolved)

   - updates to iw_cxgb4 (one conflict, not so obvious to resolve,
     proper resolution is to keep the code in cxgb4_main.c as it is in
     Linus' tree as attach_uld was refactored and moved into
     cxgb4_uld.c)

   - improvements to uAPI (moved vendor specific API elements to uAPI
     area)

   - add hns-roce driver and hns and hns-roce ACPI reset support

   - conversion of all rdma code away from deprecated
     create_singlethread_workqueue

   - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
     staging)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
  staging/lustre: Disable InfiniBand support
  iw_cxgb4: add fast-path for small REG_MR operations
  cxgb4: advertise support for FR_NSMR_TPTE_WR
  IB/core: correctly handle rdma_rw_init_mrs() failure
  IB/srp: Fix infinite loop when FMR sg[0].offset != 0
  IB/srp: Remove an unused argument
  IB/core: Improve ib_map_mr_sg() documentation
  IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
  IB/mthca: Move user vendor structures
  IB/nes: Move user vendor structures
  IB/ocrdma: Move user vendor structures
  IB/mlx4: Move user vendor structures
  IB/cxgb4: Move user vendor structures
  IB/cxgb3: Move user vendor structures
  IB/mlx5: Move and decouple user vendor structures
  IB/{core,hw}: Add constant for node_desc
  ipoib: Make ipoib_warn ratelimited
  IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
  IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
  IB/ipoib: Remove deprecated create_singlethread_workqueue
  ...
2016-10-09 17:04:33 -07:00
Bart Van Assche 681cc36083 IB/srp: Fix infinite loop when FMR sg[0].offset != 0
Avoid that mapping an sg-list in which the first element has a
non-zero offset triggers an infinite loop when using FMR. This
patch makes the FMR mapping code similar to that of ib_sg_to_pages().

Note: older Mellanox HCAs do not support non-zero offsets for FMR.
See also commit 8c4037b501 ("IB/srp: always avoid non-zero offsets
into an FMR").

Reported-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07 16:54:39 -04:00
Bart Van Assche 52bb8c626e IB/srp: Remove an unused argument
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07 16:54:39 -04:00
kernel@kyup.com 32f7451d1c ipoib: Make ipoib_warn ratelimited
In certain cases it's possible to be flooded by warning messages. To
cope with such situations make the ipoib_warn macro be ratelimited.
To prevent accidental limiting of legitimate, bursty messages make
the limit fairly liberal by allowing up to 100 messages in 10 seconds.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07 16:54:33 -04:00
Bhaktipriya Shridhar b4541f6f88 IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.

The workqueue "wq" queues mulitple work items viz &priv->restart_task,
&priv->cm.rx_reap_task, &priv->cm.skb_task, &priv->neigh_reap_task,
&priv->ah_reap_task, &priv->mcast_task and &priv->carrier_on_task.
The work items require strict execution ordering.
Hence, an ordered dedicated workqueue has been used.

WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07 16:54:32 -04:00
Bhaktipriya Shridhar 855cda6828 IB/ipoib: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() replaces deprecated
create_singlethread_workqueue().

The workqueue "ipoib_workqueue" that is used for all flush operations
for the device.

WQ_MEM_RECLAIM has been set since the flush operations may need to
complete in order for other network functions to continue, and
the memory reclaim operation might need the network functioning in
order to make progress.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07 16:54:32 -04:00
Christoph Hellwig 5f071777f9 IB/srp: use IB_PD_UNSAFE_GLOBAL_RKEY
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23 13:47:44 -04:00
Christoph Hellwig 8e61212d05 IB/iser: use IB_PD_UNSAFE_GLOBAL_RKEY
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23 13:47:44 -04:00
Christoph Hellwig ed082d36a7 IB/core: add support to create a unsafe global rkey to ib_create_pd
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
less unchecked, this moves the capability of creating a global rkey into
the RDMA core, where it can be easily audited.  It also prints a warning
everytime this feature is used as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23 13:47:44 -04:00
Alex Vesker 344bacca8c IB/ipoib: Don't allow MC joins during light MC flush
This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.

The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.

[18332.714265] ------------[ cut here ]------------
[18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core]
...
[18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[18332.779411]  0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000
[18332.784960]  0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300
[18332.790547]  ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280
[18332.796199] Call Trace:
[18332.798015]  [<ffffffff813fed47>] dump_stack+0x63/0x8c
[18332.801831]  [<ffffffff8109add1>] __warn+0xd1/0xf0
[18332.805403]  [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20
[18332.809706]  [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core]
[18332.814384]  [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib]
[18332.820031]  [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib]
[18332.825220]  [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib]
[18332.830290]  [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib]
[18332.834911]  [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0
[18332.839741]  [<ffffffff81772bd1>] rollback_registered+0x31/0x40
[18332.844091]  [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80
[18332.848880]  [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib]
[18332.853848]  [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib]
[18332.858474]  [<ffffffff81520c08>] dev_attr_store+0x18/0x30
[18332.862510]  [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50
[18332.866349]  [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170
[18332.870471]  [<ffffffff81207198>] __vfs_write+0x28/0xe0
[18332.874152]  [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50
[18332.878274]  [<ffffffff81208062>] vfs_write+0xa2/0x1a0
[18332.881896]  [<ffffffff812093a6>] SyS_write+0x46/0xa0
[18332.885632]  [<ffffffff810039b7>] do_syscall_64+0x57/0xb0
[18332.889709]  [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25
[18332.894727] ---[ end trace 09ebbe31f831ef17 ]---

Fixes: ee1e2c82c2 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16 14:14:08 -04:00
Erez Shitrit 546481c281 IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
	neigh_add_path --> ipoib_cm_create_tx -->
	queue_work (pointer to path is in the cm/tx struct)
	#while the work is still in the queue,
	#the port goes down and causes the ipoib_flush_paths:
	ipoib_flush_paths --> path_free --> kfree(path)
	#at this point the work scheduled starts.
	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
	 -> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02 14:07:38 -04:00
Raju Rangoju 63b268d232 IB/isert: Properly release resources on DEVICE_REMOVAL
When the low level driver exercises the hot unplug they would call
rdma_cm cma_remove_one which would fire DEVICE_REMOVAL event to all cma
consumers. Now, if consumer doesn't make sure they destroy all IB
objects created on that IB device instance prior to finalizing all
processing of DEVICE_REMOVAL callback, rdma_cm will let the lld to
de-register with IB core and destroy the IB device instance. And if the
consumer calls (say) ib_dereg_mr(), it will crash since that dev object
is NULL.

In the current implementation, iser-target just initiates the cleanup
and returns from DEVICE_REMOVAL callback. This deferred work creates a
race between iser-target cleaning IB objects(say MR) and lld destroying
IB device instance.

This patch includes the following fixes
  -> make sure that consumer frees all IB objects associated with device
     instance
  -> return non-zero from the callback to destroy the rdma_cm id

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02 13:46:32 -04:00
Doug Ledford 716b076ba4 IB/srpt: Update sport->port_guid with each port refresh
If port_guid is set with the default subnet_prefix, then we get a change
event and run a port refresh, we don't update the port_guid.  As a
result, attempts to create a target device that uses the new
subnet_prefix in the wwn will fail to find a match and be rejected by
the ib_srpt driver.  This makes it impossible to configure a port if it
was initialized with a default subnet_prefix and later changed to any
non-default subnet-prefix.  Updating the port refresh task to always
update the wwn based upon the current subnext_prefix solves this
problem.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: nab@linux-iscsi.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-24 16:51:16 -04:00
Wei Yongjun 1d5840c971 IB/isert: fix error return code in isert_alloc_login_buf()
Fix to return error code -ENOMEM from the alloc error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22 14:26:55 -04:00
Linus Torvalds 84e39eeb08 Second round of merge items for 4.8
- hfi1 driver updates
 - Fix for max SGEs allowed via RDMA R/W API
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXoqUzAAoJELgmozMOVy/dNKAP/1/Rzn/k97eda1qFqzWpqsPl
 lMaxDiZZnRIAFJEqEF9Iwo1JLiFIzjpDJnqHB++CKuXZQT0NY6sHW0yrcyUwzsx7
 5gui92ldkVg4vY7PTco171vyzG+79KKRZ1dFS14z7oC8XAg48zQ7yJmfb1op3dEw
 mgxyoLaaMwMF5aLwPoWG4+aPkBMtKUGB/ARb4ehq6M2p71c43lb18GaarJuWLdAz
 1HxakXL/uzttyvGDyJGKDrT6ktXXSyvdCTRO60OrrPFJ67P2xRYXce85TLRr8srp
 Q5RNjyR5fP8uN0qtrQz+hl09mtBeBQHKomyFIOVwkB2r53OKqsR5g5roz3BlpA1X
 7PF/MO0pKy4t8XQnLfohEwtNWgszupvxkyAAISI8MwzLOPra/V8smQ9CpTltx1UB
 hTu3tpAMy1auAjh8TWzzzII1ZoRZz6YCTziWnTaC3bqAljufjt1mnvjrtNmQ1sNi
 MCLeA3yr8HjlKWdwYr+gVfhSR1wEoOxwHZdLsvBsxmC32hFLlh6rbg2x8wceqTlR
 4T8l0AERV1YPjsoSe3/pWVImKUA97qppIfeFcCZiBCBHBPlhpw3ebVt6B1mLVUCV
 hTMuZeFVcV75D+qr0kR5ZuVn4jgEn9zB1VH3tCV9LJnhBfySZFcP4yhATqiELaHG
 RVoVAiTBxq5RgNVOH4Zo
 =cQcp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull second round of rdma updates from Doug Ledford:
 "This can be split out into just two categories:

   - fixes to the RDMA R/W API in regards to SG list length limits
     (about 5 patches)

   - fixes/features for the Intel hfi1 driver (everything else)

  The hfi1 driver is still being brought to full feature support by
  Intel, and they have a lot of people working on it, so that amounts to
  almost the entirety of this pull request"

* tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits)
  IB/hfi1: Add cache evict LRU list
  IB/hfi1: Fix memory leak during unexpected shutdown
  IB/hfi1: Remove unneeded mm argument in remove function
  IB/hfi1: Consistently call ops->remove outside spinlock
  IB/hfi1: Use evict mmu rb operation
  IB/hfi1: Add evict operation to the mmu rb handler
  IB/hfi1: Fix TID caching actions
  IB/hfi1: Make the cache handler own its rb tree root
  IB/hfi1: Make use of mm consistent
  IB/hfi1: Fix user SDMA racy user request claim
  IB/hfi1: Fix error condition that needs to clean up
  IB/hfi1: Release node on insert failure
  IB/hfi1: Validate SDMA user iovector count
  IB/hfi1: Validate SDMA user request index
  IB/hfi1: Use the same capability state for all shared contexts
  IB/hfi1: Prevent null pointer dereference
  IB/hfi1: Rename TID mmu_rb_* functions
  IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
  IB/hfi1: Restructure hfi1_file_open
  IB/hfi1: Make iovec loop index easy to understand
  ...
2016-08-04 20:26:31 -04:00
Linus Torvalds 0cda611386 Round one of 4.8 code
- Updates/fixes for iw_cxgb4 driver
 - Updates/fixes for mlx5 driver
 - Add flow steering and RSS API
 - Add hardware stats to mlx4 and mlx5 drivers
 - Add firmware version API for RDMA driver use
 - Add the rxe driver (this is a software RoCE driver that makes any
   Ethernet device a RoCE device)
 - Fixes for i40iw driver
 - Support for send only multicast joins in the cma layer
 - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq
 3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+
 GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu
 5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5
 phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq
 7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ
 NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8
 2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw
 xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0
 REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8
 3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq
 kwy96TvYtT43SkyNmmBf
 =oZOO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull base rdma updates from Doug Ledford:
 "Round one of 4.8 code: while this is mostly normal, there is a new
  driver in here (the driver was hosted outside the kernel for several
  years and is actually a fairly mature and well coded driver).  It
  amounts to 13,000 of the 16,000 lines of added code in here.

  Summary:

   - Updates/fixes for iw_cxgb4 driver
   - Updates/fixes for mlx5 driver
   - Add flow steering and RSS API
   - Add hardware stats to mlx4 and mlx5 drivers
   - Add firmware version API for RDMA driver use
   - Add the rxe driver (this is a software RoCE driver that makes any
     Ethernet device a RoCE device)
   - Fixes for i40iw driver
   - Support for send only multicast joins in the cma layer
   - Other minor fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
  Soft RoCE driver
  IB/core: Support for CMA multicast join flags
  IB/sa: Add cached attribute containing SM information to SA port
  IB/uverbs: Fix race between uverbs_close and remove_one
  IB/mthca: Clean up error unwind flow in mthca_reset()
  IB/mthca: NULL arg to pci_dev_put is OK
  IB/hfi1: NULL arg to sc_return_credits is OK
  IB/mlx4: Add diagnostic hardware counters
  net/mlx4: Query performance and diagnostics counters
  net/mlx4: Add diagnostic counters capability bit
  Use smaller 512 byte messages for portmapper messages
  IB/ipoib: Report SG feature regardless of HW UD CSUM capability
  IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
  IB/hfi1: Disable by default
  IB/rdmavt: Disable by default
  IB/mlx5: Fix port counter ID association to QP offset
  IB/mlx5: Fix iteration overrun in GSI qps
  i40iw: Add NULL check for puda buffer
  i40iw: Change dup_ack_thresh to u8
  i40iw: Remove unnecessary check for moving CQ head
  ...
2016-08-04 20:10:31 -04:00
Doug Ledford 7f1d25b47d Merge branches 'misc' and 'rxe' into k.o/for-4.8-1 2016-08-04 11:13:47 -04:00
Yuval Shaia 5faba54695 IB/ipoib: Report SG feature regardless of HW UD CSUM capability
Decouple SG support from HW ability to do UD checksum.
This coupling is for historical reasons and removed with 'commit
ec5f061564 ("net: Kill link between CSUM and SG features.")'

During driver load it is assumed that device does not supports SG. The
final decision is taken after creating UD QP based on device capability.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:32 -04:00
Bart Van Assche e6d66e3eb6 IB/isert: Remove an unused member variable
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Parav Pandit <pandit.parav@gmail.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 12:02:41 -04:00
Bart Van Assche 10fce586b2 IB/srpt: Simplify srpt_queue_response()
Initialize first_wr to &send_wr. This allows to remove a ternary
operator and an else branch. This patch does not change the behavior
of srpt_queue_response().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Parav Pandit <pandit.parav@gmail.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 12:02:41 -04:00
Bart Van Assche 30c6d8773d IB/srpt: Limit the number of SG elements per work request
Limit the number of SG elements per work request to what the HCA
and the queue pair support.

Fixes: 34693573fde0 ("IB/srpt: Reduce QP buffer size")
Reported-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Parav Pandit <pandit.parav@gmail.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org> #v4.7+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 12:02:41 -04:00
Ira Weiny 1a8632121a IB/ipoib: Use new device FW version string
Using this allows for devices to specify the format of their
firmware version rather than forcing a format.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 12:08:34 -04:00
Bart Van Assche c0cf4512a3 IB/srpt: Reduce QP buffer size
The memory needed for the send and receive queues associated with
a QP is proportional to the max_sge parameter. The current value
of that parameter is such that with an mlx4 HCA the QP buffer size
is 8 MB. Since DMA is used for communication between HCA and CPU
that buffer either has to be allocated coherently or map_single()
must succeed for that buffer. Since large contiguous allocations
are fragile and since the maximum segment size for e.g. swiotlb
is 256 KB, reduce the max_sge parameter. This patch avoids that
the following text appears on the console after SRP logout and
relogin on a system equipped with multiple IB HCAs:

mlx4_core 0000:05:00.0: swiotlb buffer is full (sz: 8388608 bytes)
swiotlb: coherent allocation failed for device 0000:05:00.0 size=8388608
CPU: 11 PID: 148 Comm: kworker/11:1 Not tainted 4.7.0-rc4-dbg+ #1
Call Trace:
 [<ffffffff812c6d35>] dump_stack+0x67/0x92
 [<ffffffff812efe71>] swiotlb_alloc_coherent+0x141/0x150
 [<ffffffff810458be>] x86_swiotlb_alloc_coherent+0x3e/0x50
 [<ffffffffa03861fa>] mlx4_buf_direct_alloc.isra.5+0x9a/0x120 [mlx4_core]
 [<ffffffffa0386545>] mlx4_buf_alloc+0x165/0x1a0 [mlx4_core]
 [<ffffffffa035053d>] create_qp_common.isra.29+0x57d/0xff0 [mlx4_ib]
 [<ffffffffa03510da>] mlx4_ib_create_qp+0x12a/0x3f0 [mlx4_ib]
 [<ffffffffa031154a>] ib_create_qp+0x3a/0x250 [ib_core]
 [<ffffffffa055dd4b>] srpt_cm_handler+0x4bb/0xcad [ib_srpt]
 [<ffffffffa02c1ab0>] cm_process_work+0x20/0xf0 [ib_cm]
 [<ffffffffa02c3640>] cm_work_handler+0x1ac0/0x2059 [ib_cm]
 [<ffffffff810737ed>] process_one_work+0x19d/0x490
 [<ffffffff81073b29>] worker_thread+0x49/0x490
 [<ffffffff8107a0ea>] kthread+0xea/0x100
 [<ffffffff815b25af>] ret_from_fork+0x1f/0x40

Fixes: b99f8e4d7b ("IB/srpt: convert to the generic RDMA READ/WRITE API")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 12:04:09 -04:00
Erez Shitrit 61c78eea95 IB/IPoIB: Don't update neigh validity for unresolved entries
ipoib_neigh_get unconditionally updates the "alive" variable member on
any packet send.  This prevents the neighbor garbage collection from
cleaning out a dead neighbor entry if we are still queueing packets
for it.  If the queue for this neighbor is full, then don't update the
alive timestamp.  That way the neighbor can time out even if packets
are still being queued as long as none of them are being sent.

Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07 10:49:48 -04:00
Mark Bloch 9b29953bf8 IB/IPoIB: Disable bottom half when dealing with device address
Align locking usage when touching device address with rest
of the kernel. Lock the bottom half when doing so using
netif_addr_lock_bh.

This also solves the following case as reported by lockdep:
	CPU0                    CPU1
	----                    ----
lock(_xmit_INFINIBAND);
				local_irq_disable();
				lock(&(&mc->mca_lock)->rlock);
				lock(_xmit_INFINIBAND);
<Interrupt>
lock(&(&mc->mca_lock)->rlock);

*** DEADLOCK ***

Fixes: 492a7e67ff ("IB/IPoIB: Allow setting the device address")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07 09:50:54 -04:00
Erez Shitrit 198b12f770 IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions
In ipoib_remove_one the driver holds the rtnl_lock and tries to do some
operation like dev_change_flags or unregister_netdev, while sysfs
callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the
rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not
free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to
free its sysfs directory, and we will get  a->b, b->a deadlock.

    Trace like the following:

        schedule+0x37/0x80
        schedule_preempt_disabled+0xe/0x10
        __mutex_lock_slowpath+0xb5/0x120
        mutex_lock+0x23/0x40
        rtnl_lock+0x15/0x20
        netdev_run_todo+0x17c/0x320
        rtnl_unlock+0xe/0x10
        ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib]
        delete_child+0x54/0x80 [ib_ipoib]
        dev_attr_store+0x18/0x30
        sysfs_kf_write+0x37/0x40
        mutex_lock+0x16/0x40
        SyS_write+0x55/0xc0
        entry_SYSCALL_64_fastpath+0x16/0x75
    And
        schedule+0x37/0x80
        __kernfs_remove+0x1a8/0x260
        ? wake_atomic_t_function+0x60/0x60
        kernfs_remove+0x25/0x40
        sysfs_remove_dir+0x50/0x80
        kobject_del+0x18/0x50
        device_del+0x19f/0x260
        netdev_unregister_kobject+0x6a/0x80
        rollback_registered_many+0x1fd/0x340
        rollback_registered+0x3c/0x70
        unregister_netdevice_queue+0x55/0xc0
        unregister_netdev+0x20/0x30
        ipoib_remove_one+0x114/0x1b0 [ib_ipoib]
        ib_unregister_client+0x4a/0x170 [ib_core]
        ? find_module_all+0x71/0xa0
        ipoib_cleanup_module+0x10/0x94 [ib_ipoib]
        SyS_delete_module+0x1b5/0x210
        entry_SYSCALL_64_fastpath+0x16/0x75

The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to
get out from the sysfs function.

Fixes: 862096a8bb ("IB/ipoib: Add more rtnl_link_ops callbacks")
Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07 09:50:53 -04:00
Bart Van Assche 9edba790fc IB/srp: Fix srp_map_sg_dma()
Because patch "IB/srp: Move common code into the caller" was applied
partially srp_map_sg_dma() doesn't work properly. Fix this by
applying the remainder of that patch. See also
http://thread.gmane.org/gmane.linux.drivers.rdma/35803/focus=35811.

Fixes: 3849e44d1c ("IB/srp: Move common code into the caller")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Sagi Grimberg <sai@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06 19:32:59 -04:00
Bart Van Assche 249f06561f IB/srp: Always initialize use_fast_reg and use_fmr
Avoid that mapping fails due to use_fast_reg != 0 or use_fmr != 0
if both member variables should be zero (if never_register == 1 or
if neither FMR nor FR is supported). Remove an initialization that
became superfluous due to changing a kmalloc() into a kzalloc()
call.

Fixes: 509c5f33f4 ("IB/srp: Prevent mapping failures")
Cc: Sagi Grimberg <sai@grimberg.m>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06 19:32:58 -04:00
Linus Torvalds 9ba55cf7cf Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Here are the outstanding target pending updates for v4.7-rc1.

  The highlights this round include:

   - Allow external PR/ALUA metadata path be defined at runtime via top
     level configfs attribute (Lee)
   - Fix target session shutdown bug for ib_srpt multi-channel (hch)
   - Make TFO close_session() and shutdown_session() optional (hch)
   - Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref
     (hch)
   - Add tcm_qla2xxx endpoint attribute for basic FC jammer (Laurence)
   - Refactor iscsi-target RX/TX PDU encode/decode into common code
     (Varun)
   - Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu,
     validate_parameters, and get_r2t_ttt for generic ISO offload
     (Varun)
   - Initial merge of cxgb iscsi-segment offload target driver (Varun)

  The bulk of the changes are Chelsio's new driver, along with a number
  of iscsi-target common code improvements made by Varun + Co along the
  way"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (29 commits)
  iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
  cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute
  iscsi-target: Convert transport drivers to signal rdma_shutdown
  iscsi-target: Make iscsi_tpg_np driver show/store use generic code
  tcm_qla2xxx Add SCSI command jammer/discard capability
  iscsi-target: graceful disconnect on invalid mapping to iovec
  target: need_to_release is always false, remove redundant check and kfree
  target: remove sess_kref and ->shutdown_session
  iscsi-target: remove usage of ->shutdown_session
  tcm_qla2xxx: introduce a private sess_kref
  target: make close_session optional
  target: make ->shutdown_session optional
  target: remove acl_stop
  target: consolidate and fix session shutdown
  cxgbit: add files for cxgbit.ko
  iscsi-target: export symbols
  iscsi-target: call complete on conn_logout_comp
  iscsi-target: clear tx_thread_active
  iscsi-target: add new offload transport type
  iscsi-target: use conn_transport->transport_type in text rsp
  ...
2016-05-28 12:04:17 -07:00
Linus Torvalds 1cbe06c3cf Round two of 4.7 merge window patches
- Dynamic counter infrastructure in the IB drivers
   This is a sysfs based code to allow free form access to the hardware
   counters RDMA devices might support so drivers don't need to code
   this up repeatedly themselves
 - SendOnlyFullMember multicast support
 - IB router support
 - A couple misc fixes
 - The big item on the list: hfi1 driver updates, plus moving the hfi1
   driver out of staging
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXRyupAAoJELgmozMOVy/deHAQAJp87yn8hntHXh81Kjrvezch
 McJ0v7FTsfsLVe/Y+YAbaAREjanhnUDzZ36WUqS9rHadOl+Ia0fD+M53aTk9hDKS
 R9/oWQDsUb8k7imdhNrSS72Q2kbwr++8RqBoEEFcl4fuDCVsrRhUaf+perJTEnoK
 I9APDQCNU3FCTPFRllK5al6gi5UGEW9vO9JQZb+VnTVW9+LjlYobzcZfttAw8oRN
 rtQLP3xGb7ZwV0ngH0dt2TthN2ih97TavnISojTdJ5I+WHFxLV0YbHSK/E2JLdoi
 hVknhrflPIKr+mRVv97yCCz92kGAdTKG3sT9vNUVUQ0Y8kEdrW/LrnrmxsBEYzkl
 GcQeqg7//ffOlQ0dGZZjlkPg/8yMPo9CBsfPDcOvXeenK0lkao9qg2yrIz+CBC/9
 YlpUKfL0JLRal9O6y3Mh++ZQ9R1kvQN8/CgHLUlRQdWLfxf7VPA4rIcaWT7iGfvg
 z6jIJKjsK+hqKhY9oKh3YfNmXh65IQ5OF3+tBzHKwNiGXNSsMYAVYP3vc/HYTJvz
 4EpsenndKb5l0AOvDch8rMmw7YZPf2NOdi90TW6Bq1VgTIDRxh0WzR2QTnWBVRSf
 XAcC3DLXTHqC/UUpHrdXgfvU/0nQXaVUvPWIAM/t3hy91GLscByDw4973+tISs/B
 7AHSptbVQaBUpTjWHfKf
 =5+cf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "This is the second group of code for the 4.7 merge window.  It looks
  large, but only in one sense.  I'll get to that in a minute.  The list
  of changes here breaks down as follows:

   - Dynamic counter infrastructure in the IB drivers

     This is a sysfs based code to allow free form access to the
     hardware counters RDMA devices might support so drivers don't need
     to code this up repeatedly themselves

   - SendOnlyFullMember multicast support

   - IB router support

   - A couple misc fixes

   - The big item on the list: hfi1 driver updates, plus moving the hfi1
     driver out of staging

  There was a group of 15 patches in the hfi1 list that I thought I had
  in the first pull request but they weren't.  So that added to the
  length of the hfi1 section here.

  As far as these go, everything but the hfi1 is pretty straight
  forward.

  The hfi1 is, if you recall, the driver that Al had complaints about
  how it used the write/writev interfaces in an overloaded fashion.  The
  write portion of their interface behaved like the write handler in the
  IB stack proper and did bi-directional communications.  The writev
  interface, on the other hand, only accepts SDMA request structures.
  The completions for those structures are sent back via an entirely
  different event mechanism.

  With the security patch, we put security checks on the write
  interface, however, we also knew they would be going away soon.  Now,
  we've converted the write handler in the hfi1 driver to use ioctls
  from the IB reserved magic area for its bidirectional communications.
  With that change, Intel has addressed all of the items originally on
  their TODO when they went into staging (as well as many items added to
  the list later).

  As such, I moved them out, and since they were the last item in the
  staging/rdma directory, and I don't have immediate plans to use the
  staging area again, I removed the staging/rdma area.

  Because of the move out of staging, as well as a series of 5 patches
  in the hfi1 driver that removed code people thought should be done in
  a different way and was optional to begin with (a snoop debug
  interface, an eeprom driver for an eeprom connected directory to their
  hfi1 chip and not via an i2c bus, and a few other things like that),
  the line count, especially the removal count, is high"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits)
  staging/rdma: Remove the entire rdma subdirectory of staging
  IB/core: Make device counter infrastructure dynamic
  IB/hfi1: Fix pio map initialization
  IB/hfi1: Correct 8051 link parameter settings
  IB/hfi1: Update pkey table properly after link down or FM start
  IB/rdamvt: Fix rdmavt s_ack_queue sizing
  IB/rdmavt: Max atomic value should be a u8
  IB/hfi1: Fix hard lockup due to not using save/restore spin lock
  IB/hfi1: Add tracing support for send with invalidate opcode
  IB/hfi1, qib: Add ieth to the packet header definitions
  IB/hfi1: Move driver out of staging
  IB/hfi1: Do not free hfi1 cdev parent structure early
  IB/hfi1: Add trace message in user IOCTL handling
  IB/hfi1: Remove write(), use ioctl() for user cmds
  IB/hfi1: Add ioctl() interface for user commands
  IB/hfi1: Remove unused user command
  IB/hfi1: Remove snoop/diag interface
  IB/hfi1: Remove EPROM functionality from data device
  IB/hfi1: Remove UI char device
  IB/hfi1: Remove multiple device cdev
  ...
2016-05-28 11:04:16 -07:00
Mark Bloch 492a7e67ff IB/IPoIB: Allow setting the device address
In IB networks, and specifically in IPoIB/rdmacm traffic, the device
address of an IPoIB interface is used as a means to exchange information
between nodes needed for communication.

Currently an IPoIB interface will always be created with a device
address based on its node GUID without a way to change that.

This change adds the ability to set the device address of an IPoIB
interface by value. We use the set mac address ndo to do that.

The flow should be broken down to two:
1) The GID value is already in the GID table,
   in this case the interface will be able to set carrier up.

2) The GID value is not yet in the GID table,
   in this case the interface won't try to join the multicast group
   and will wait (listen on GID_CHANGE event) until the GID is inserted.

In order to track those changes, we add a new flag:
* IPOIB_FLAG_DEV_ADDR_SET.

When set, it means the dev_addr is a based on a value in the gid
table. this bit will be cleared upon a dev_addr change triggered
by the user and set after validation.

Per IB spec the port GUID can't change if the module is loaded.
port GUID is the basis for GID at index 0 which is the basis for
the default device address of a ipoib interface.

The issue is that there are devices that don't follow the spec,
they change the port GUID while HCA is powered on, so in order
not to break userspace applications. We need to check if the
user wanted to control the device address and we assume that
if he sets the device address back to be based on GID index 0,
he no longer wishs to control it.

In order to track this, we add an additional flag:
* IPOIB_FLAG_DEV_ADDR_CTRL

When setting the device address, there is no validation of the upper
twelve bytes of the device address (flags, qpn, subnet prefix) as those
bytes are not under the control of the user.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25 15:39:03 -04:00
Erez Shitrit 3b56113016 IB/ipoib: Support SendOnlyFullMember MCG for SendOnly join
Check (via an SA query) if the SM supports the new option for SendOnly
multicast joins.
If the SM supports that option it will use the new join state to create
such multicast group.
If SendOnlyFullMember is supported, we wouldn't use faked FullMember state
join for SendOnly MCG, use the correct state if supported.

This check is performed at every invocation of mcast_restart task, to be
sure that the driver stays in sync with the current state of the SM.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25 15:39:03 -04:00
Erez Shitrit 507f6afa3a IB/core: Introduce capabilitymask2 field in ClassPortInfo mad
Change struct ib_class_port_info to conform to IB Spec 1.3
That in order to get specific capability mask from ClassPortInfo mad.

>From the IB Spec, ClassPortInfo section:
        "CapabilityMask2 Bits 0-26: Additional class-specific capabilities...
         RespTimeValue the rest 5 bits"

The new struct now has one field for capabilitymask2 (previously was the
reserved field) and the resp_time field.

And it fixes up qib and srpt, use of the field repurposed to be used as
capabilitymask2:
IB/qib: Change pma_get_classportinfo
IB/srpt: Adjust the use of ib_class_port_info

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25 15:39:02 -04:00
Linus Torvalds 76b584d312 Primary 4.7 merge window changes
- Updates to the new Intel X722 iWARP driver
 - Updates to the hfi1 driver
 - Fixes for the iw_cxgb4 driver
 - Misc core fixes
 - Generic RDMA READ/WRITE API addition
 - SRP updates
 - Misc ipoib updates
 - Minor mlx5 updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXPIAnAAoJELgmozMOVy/dEYUP/0A6NH5ptzUrwPuLrWoz8h+e
 KBfJE7H09mfKBx0Rq8YmnU+pz4lk8vrMLLaqpbGN57mwO0a1lK9bgc3E6KUhQPhc
 dpGEX/NG1+aILomD7M4l1yAKkG17kxFLD75cLCeaxhO76jBRWsunukqk5mT/u0EG
 fUYZs1fRb9t2LDtTNhPfSFR1+dgP5S17xLhpl9ttn87hTmIuiGWR6ig2nTC7azZD
 G0d7RVjohfY2sDD28YgQiUEJ+q+1ymp3XTaCZhPCVl9VCRPweEdtLKcbNWZIvClx
 ewuTCgADXg8tAL/6zu6bEKqZlC17UrmVJee3csKQLT09PJkSEICeFR/ld6ONUKAF
 nDhi3ySa75Xxg9VDcCnuRkXKK+/zi7oDelZuh9mvMG0JJqPK9rTZDD29j2kBf7C0
 bdx4R5cI4KJWQ/GlCyi/nLiuYkmAiCugzcGnRho4ub+EJ0yX1w6n8KVYr37kFsFu
 q6MCnEfArEgDpbq1wo0+9MWtqBYrnOI/XtG81Zd+6X2MW975qU85wUdUSjg6OOb1
 v1osyAmFDy9A0Y80yY+l1HHrSVIvI0IAWZDfxsbCLQY8O03ZNcvxE2RsrzWd5CKL
 iZsX24tjV0WR9+lORHLfAKB3DL9CcfHv/tHo7q+5iAHmIuWZGrEN22ELkwS/4X7x
 d/V0XDjzs6lgQeTJ7R4B
 =e+Zv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "Primary 4.7 merge window changes

   - Updates to the new Intel X722 iWARP driver
   - Updates to the hfi1 driver
   - Fixes for the iw_cxgb4 driver
   - Misc core fixes
   - Generic RDMA READ/WRITE API addition
   - SRP updates
   - Misc ipoib updates
   - Minor mlx5 updates"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits)
  IB/mlx5: Fire the CQ completion handler from tasklet
  net/mlx5_core: Use tasklet for user-space CQ completion events
  IB/core: Do not require CAP_NET_ADMIN for packet sniffing
  IB/mlx4: Fix unaligned access in send_reply_to_slave
  IB/mlx5: Report Scatter FCS device capability when supported
  IB/mlx5: Add Scatter FCS support for Raw Packet QP
  IB/core: Add Scatter FCS create flag
  IB/core: Add Raw Scatter FCS device capability
  IB/core: Add extended device capability flags
  i40iw: pass hw_stats by reference rather than by value
  i40iw: Remove unnecessary synchronize_irq() before free_irq()
  i40iw: constify i40iw_vf_cqp_ops structure
  IB/mlx5: Add UARs write-combining and non-cached mapping
  IB/mlx5: Allow mapping the free running counter on PROT_EXEC
  IB/mlx4: Use list_for_each_entry_safe
  IB/SA: Use correct free function
  IB/core: Fix a potential array overrun in CMA and SA agent
  IB/core: Remove unnecessary check in ibnl_rcv_msg
  IB/IWPM: Fix a potential skb leak
  RDMA/nes: replace custom print_hex_dump()
  ...
2016-05-20 14:35:07 -07:00
Linus Torvalds 675e0655c1 SCSI misc on 20160517
This patch includes the usual quota of driver updates (bnx2fc, mp3sas,
 hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas) there's
 also a multiqueue update for scsi_debug, assorted bug fixes and a few
 other minor updates (refactor of scsi_sg_pools into generic code, alua
 and VPD updates, and struct timeval conversions).
 
 Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJXO8W0AAoJEDeqqVYsXL0MW24H/jGWwfjsDUiSsLwbLca6DWu8
 ZCWZ7rSZ27CApwGPgZGpLvUg+vpW8Ykm2zdeBnlZ6ScXS+dT3uo/PHsnemsTextj
 6glQNIOFY0Ja2GwkkN00M6IZQhTJ628cqJKIEJxC68lIw16wiOwjZaK68GMrusDO
 Sl062rkuLR6Jb2T+YoT/sD8jQfWlSj2V9e9rqJoS/rIbS6B+hUipuybz2yQ2yK2u
 XFc30yal9oVz1fHEoh2O8aqckW3/iskukVXVuZ0MQzT/lV/bm9I6AnWVHw7d0Yhp
 ZELjXpjx5M2Z/d8k0Wvx1e25oL/ERwa96yLnTvRcqyF5Yt1EgAhT+jKvo4pnGr8=
 =L6y/
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "First round of SCSI updates for the 4.6+ merge window.

  This batch includes the usual quota of driver updates (bnx2fc, mp3sas,
  hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas).  There's
  also a multiqueue update for scsi_debug, assorted bug fixes and a few
  other minor updates (refactor of scsi_sg_pools into generic code, alua
  and VPD updates, and struct timeval conversions)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (138 commits)
  mpt3sas: Used "synchronize_irq()"API to synchronize timed-out IO & TMs
  mpt3sas: Set maximum transfer length per IO to 4MB for VDs
  mpt3sas: Updating mpt3sas driver version to 13.100.00.00
  mpt3sas: Fix initial Reference tag field for 4K PI drives.
  mpt3sas: Handle active cable exception event
  mpt3sas: Update MPI header to 2.00.42
  Revert "lpfc: Delete unnecessary checks before the function call mempool_destroy"
  eata_pio: missing break statement
  hpsa: Fix type ZBC conditional checks
  scsi_lib: Decode T10 vendor IDs
  scsi_dh_alua: do not fail for unknown VPD identification
  scsi_debug: use locally assigned naa
  scsi_debug: uuid for lu name
  scsi_debug: vpd and mode page work
  scsi_debug: add multiple queue support
  bfa: fix bfa_fcb_itnim_alloc() error handling
  megaraid_sas: Downgrade two success messages to info
  cxlflash: Fix to resolve dead-lock during EEH recovery
  scsi_debug: rework resp_report_luns
  scsi_debug: use pdt constants
  ...
2016-05-18 16:38:59 -07:00
Nicholas Bellinger bd027d856d iscsi-target: Convert transport drivers to signal rdma_shutdown
Instead of special casing the handful of callers that check for
iser-target rdma verbs specific shutdown, use a simple flag at
iscsit_transport->rdma_shutdown so each driver can signal this.

Also, update iscsi-target/tcp + cxgbit to rdma_shutdown = false.

Cc: Varun Prakash <varun@chelsio.com>
Cc: Hariprasad Shenai <hariprasad@chelsio.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2016-05-16 22:23:33 -07:00
Doug Ledford 0651ec932a Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into k.o/for-4.7 2016-05-13 19:40:38 -04:00
Bart Van Assche 54f5c9c52d IB/srp: Fix a debug kernel crash
Avoid that the following BUG() is triggered against a debug
kernel:

kernel BUG at include/linux/scatterlist.h:92!
RIP: 0010:[<ffffffffa0467199>]  [<ffffffffa0467199>] srp_map_idb+0x199/0x1a0 [ib_srp]
Call Trace:
 [<ffffffffa04685fa>] srp_map_data+0x84a/0x890 [ib_srp]
 [<ffffffffa0469674>] srp_queuecommand+0x1e4/0x610 [ib_srp]
 [<ffffffff813f5a5e>] scsi_dispatch_cmd+0x9e/0x180
 [<ffffffff813f8b07>] scsi_request_fn+0x477/0x610
 [<ffffffff81298ffe>] __blk_run_queue+0x2e/0x40
 [<ffffffff81299070>] blk_delay_work+0x20/0x30
 [<ffffffff81071f07>] process_one_work+0x197/0x480
 [<ffffffff81072239>] worker_thread+0x49/0x490
 [<ffffffff810787ea>] kthread+0xea/0x100
 [<ffffffff8159b632>] ret_from_fork+0x22/0x40

Fixes: f7f7aab1a5 ("IB/srp: Convert to new registration API")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # v4.4+
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:39:59 -04:00
Hans Westgaard Ry e3614bc9dc IB/ipoib: Add readout of statistics using ethtool
IPoIB collects statistics of traffic including number of packets
sent/received, number of bytes transferred, and certain errors. This
patch makes these statistics available to be queried by ethtool.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:39:43 -04:00
Sebastian Andrzej Siewior 01690e9c70 infiniband/ulp/ipoib: remove pkey_mutex
The last user of pkey_mutex was removed in db84f88037 ("IB/ipoib: Use
P_Key change event instead of P_Key polling mechanism") but the lock
remained.
This patch removes it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:39:43 -04:00
Bart Van Assche c222a39f0d IB/srp: Do not register memory if never_register has been set
This makes it easier to test the code path that does not use
memory registration (srp_map_sg_dma()).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:47:07 -04:00
Bart Van Assche 509c5f33f4 IB/srp: Prevent mapping failures
If both max_sectors and the queue_depth are high enough it can
happen that the MR pool is depleted temporarily. This causes
the SRP initiator to report mapping failures. Although the SRP
initiator recovers from such mapping failures, prevent that
this can happen by allocating more memory regions.

Additionally, only enable memory registration if at least two
pages can be registered per memory region.

Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:46:51 -04:00
Bart Van Assche 835ee624c9 IB/srp: Swap two code blocks in srp_add_one()
This patch does not change any functionality but makes the next
patch in this series easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:38:00 -04:00
Bart Van Assche 9aa8b3217e IB/core: Enhance ib_map_mr_sg()
The SRP initiator allows to set max_sectors to a value that exceeds
the largest amount of data that can be mapped at once with an mlx4
HCA using fast registration and a page size of 4 KB. Hence modify
ib_map_mr_sg() such that it can map partial sg-elements. If an
sg-element has been mapped partially, let the caller know
which fraction has been mapped by adjusting *sg_offset.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:57 -04:00
Bart Van Assche f83b2561a6 IB/srp: Fix srp_create_target() error handling
Avoid that the following kernel oops occurs if memory pool
allocation fails:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa048d0a0>] ib_drain_rq+0x0/0x20 [ib_core]
Call Trace:
 [<ffffffffa04af386>] srp_create_target+0xca6/0x13a9 [ib_srp]
 [<ffffffff813cc863>] dev_attr_store+0x13/0x20
 [<ffffffff81214b50>] sysfs_kf_write+0x40/0x50
 [<ffffffff81213f1c>] kernfs_fop_write+0x13c/0x180
 [<ffffffff81197683>] __vfs_write+0x23/0xf0
 [<ffffffff81198744>] vfs_write+0xa4/0x1a0
 [<ffffffff81199a44>] SyS_write+0x44/0xa0
 [<ffffffff8159e3e9>] entry_SYSCALL_64_fastpath+0x1c/0xac

Fixes: 1dc7b1f10d ("IB/srp: use the new CQ API")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:21 -04:00
Bart Van Assche 9d8e7d0dac IB/srp: Fix a memory descriptor leak in an error path
If an error occurs after srp_fr_pool_get() succeeded and before the
descriptor is stored in srp_map_state (*state->fr.next++ = desc)
then srp_unmap_data() won't free the newly allocated memory
descriptor. Hence free the descriptor explicitly.

Fixes: f7f7aab1a5 ("IB/srp: Convert to new registration API")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Sagi Grimberg <sai@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:21 -04:00
Bart Van Assche cf1acab7d7 IB/srp: Print "ib_srp: " prefix once
pr_debug() already prints prefix PFX. Avoid that PFX is printed
twice if the debug statement in srp_add_target() is enabled.

Fixes: 34aa654ecb ("IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:21 -04:00