1
0
Fork 0
Commit Graph

9686 Commits (32c2d40252454e539b91d37776869ed69c23b2e8)

Author SHA1 Message Date
Moni Shoua 08100fad5c IB/mlx5: Add ODP SRQ support
Add changes to the WQE page-fault handler to

1. Identify that the event is for a SRQ WQE
2. Pass SRQ object instead of a QP to the function that reads the WQE
3. Parse the SRQ WQE with respect to its structure

The rest is handled as for regular RQ WQE.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua fbeb4075c6 IB/mlx5: Let read user wqe also from SRQ buffer
Reading a WQE from SRQ is almost identical to reading from regular RQ.
The differences are the size of the queue, the size of a WQE and buffer
location.

Make necessary changes to mlx5_ib_read_user_wqe() to let it read a WQE
from a SRQ or RQ by caller choice.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua 29917f4750 IB/mlx5: Add XRC initiator ODP support
Skip XRC segment in the beginning of a send WQE and fetch ODP XRC
capabilities when QP type is IB_QPT_XRC_INI. The rest of the handling is
the same as in RC QP.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua 6ff7414a17 IB/mlx5: Clean mlx5_ib_mr_responder_pfault_handler() signature
In the function mlx5_ib_mr_responder_pfault_handler()

1. The parameter wqe is used as read-only so there is no need to pass it
   by reference.
2. Remove the unused argument pfault from list of arguments.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:06 -07:00
Moni Shoua 586f4e95c7 IB/mlx5: Remove useless check in ODP handler
When handling an ODP event for a receive WQE in SRQ the target QP is
unknown. Therefore, it is wrong to ask if QP has a SRQ in the page-fault
handler.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:06 -07:00
Moni Shoua 52a72e2a39 IB/uverbs: Expose XRC ODP device capabilities
Expose XRC ODP capabilities as part of the extended device capabilities.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:06 -07:00
Moni Shoua 10f56242e3 IB/mlx5: Fix the locking of SRQ objects in ODP events
QP and SRQ objects are stored in different containers so the action to get
and lock a common resource during ODP event needs to address that.

While here get rid of 'refcount' and 'free' fields in mlx5_core_srq struct
and use the fields with same semantics in common structure.

Fixes: 032080ab43 ("IB/mlx5: Lock QP during page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:06 -07:00
YueHaibing da91ddfdc7 RDMA/hns: Remove set but not used variable 'rst'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'hns_roce_v2_qp_flow_control_init':
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:4384:33: warning:
 variable 'rst' set but not used [-Wunused-but-set-variable]

It never used since introduction.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-31 15:41:07 -07:00
Kaike Wan a131d16460 IB/hfi1: Add static trace for OPFN
This patch adds the static trace to the OPFN code and moves tid related
static trace code into a new header file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-31 11:37:40 -05:00
Kaike Wan 48a615dc00 IB/hfi1: Integrate OPFN into RC transactions
OPFN parameter negotiation allows a pair of connected RC QPs to exchange
a set of parameters in succession. This negotiation does not commence
till the first ULP request. Because OPFN operations are operations
private to the driver, they do not generate user completions or put the
QP into error when they run out of retries. This patch integrates the
OPFN protocol into the transactions of an RC QP.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-31 11:37:34 -05:00
Kaike Wan ddf922c31f IB/hfi1, IB/rdmavt: Allow for extending of QP's s_ack_queue
The OPFN protocol uses the COMPARE_SWAP request to exchange data
between the requester and the responder and therefore needs to
be stored in the QP's s_ack_queue when the request is received
on the responder side. However, because the user does not know
anything about the OPFN protocol, this extra entry in the
queue cannot be advertised to the user. This patch adds an extra
entry in a QP's s_ack_queue.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-31 11:36:05 -05:00
Kaike Wan f01b4d5a43 IB/hfi1: OPFN interface
OPFN allows a pair of connected RC QPs to exchange a set of parameters
in succession. The parameter exchange itself is done using the IB compare
and swap request with a special virtual address. The request is triggered
using a reserved IB work request opcode. This patch implements the OPFN
interface to initialize, start, process, and reset the OPFN request.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-31 11:36:05 -05:00
Kaike Wan d22a207d74 IB/hfi1: Add OPFN helper functions for TID RDMA feature
This patch adds the OPFN helper functions to initialize, encode, decode,
and reset OPFN parameters for the TID RDMA feature.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-31 11:36:05 -05:00
Mitko Haralanov 44e43d91ad IB/hfi1: OPFN support discovery
OPFN (Omni Path Feature Negotiation) support discovery allows a RC QP to
announce that it supports OPFN and also discover if OPFN is supported by
the peer QP. OPFN parameter negotiation is skipped unless OPFN support is
first discovered. OPFN support is announced by claiming what was
the reserved bit in dword 1 of OmniPath modified base transport header
in requests and responses.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-31 11:36:04 -05:00
Leon Romanovsky 02da375097 RDMA/core: Use the ops infrastructure to keep all callbacks in one place
As preparation to hide rdma_restrack_root, refactor the code to use the
ops structure instead of a special callback which is hidden in
rdma_restrack_root.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:34:21 -07:00
Leon Romanovsky 5e458d3f89 RDMA/restrack: Refactor user/kernel restrack additions
Since we already know if we are user/kernel before calling restrack_add,
move type dependent code into the callers to make the flow more readable.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:20:23 -07:00
Leon Romanovsky 0ad699c0ed RDMA/core: Simplify restrack interface
In the current implementation, we have one restrack root per-device and
all users are simply providing it directly. Let's simplify the interface
and have callers provide the ib_device and internally access the
restrack_root.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:15:47 -07:00
Leon Romanovsky 659067b0b5 RDMA/nldev: Prepare CAP_NET_ADMIN checks for .doit callbacks
The .doit callbacks don't have a netlink_callback to check capabilities so
in order to use the same fill_res_func for both .dump and .doit, we need
to do the capability check outside of those functions.

For .doit callbacks, it is possible to check CAP_NET_ADMIN directly on the
received sk_buff.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:12:33 -07:00
Leon Romanovsky 8be565e65f RDMA/nldev: Factor out the PID namespace check
The PID namespace is going to be used in the .doit callback, so generalize
its implementation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:11:45 -07:00
Leon Romanovsky f732e7135b RDMA/nldev: Dynamically generate restrack dumpit callbacks
There is no need to manually write same callbacks, automatically generate
them using C-macro language.

This macro is going to be extended to generate doit callbacks too, so use
general name for this macro.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 21:10:21 -07:00
Parav Pandit cf34e1fe52 IB/mlx5: Consider vlan of lower netdev for macvlan GID entries
When a given netdev of the GID entry is macvlan netdevice, and if the
lower netdevice is vlan device, GID entry for macvlan based IP address
needs to inherit the vlan of the lower netdevice.

Therefore, attempt to find out if the lower device exist and if so, if
it is vlan device and setup the vlan tag correctly.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 20:43:46 -07:00
Gal Pressman cfc30ad3d0 IB/usnic: Remove stub functions
Lack of mandatory verbs no longer fail device registration, the device
will be marked as a non-kverbs provider.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Tested-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 20:32:25 -07:00
Gal Pressman 6780c4fa9d RDMA: Add indication for in kernel API support to IB device
Drivers that do not provide kernel verbs support should not be used by ib
kernel clients at all.

In case a device does not implement all mandatory verbs for kverbs usage
mark it as a non kverbs provider and prevent its usage for all clients
except for uverbs.

The device is marked as a non kverbs provider using the 'kverbs_provider'
flag which should only be set by the core code.  The clients can choose
whether kverbs are requested for its usage using the 'no_kverbs_req' flag
which is currently set for uverbs only.

This patch allows drivers to remove mandatory verbs stubs and simply set
the callbacks to NULL. The IB device will be registered as a non-kverbs
provider. Note that verbs that are required for the device registration
process must be implemented.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 20:32:25 -07:00
Leon Romanovsky 459cc69fa4 RDMA: Provide safe ib_alloc_device() function
All callers to ib_alloc_device() provide a larger size than struct
ib_device and rely on the fact that struct ib_device is embedded in their
driver specific structure as the first member.

Provide a safer variant of ib_alloc_device() that checks and enforces this
approach to make sure the drivers are using it right.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:52:30 -07:00
Kamal Heib e5c1bb47cc IB/mlx5: Remove set but not used variable
Remove 'del_mkey' variable that is set but not used.

Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:17:17 -07:00
Kamal Heib f3ffed0ce4 IB/mlx5: Make mlx5_ib_stage_odp_cleanup() static
The function mlx5_ib_stage_odp_cleanup() is only used in main.c

Fixes: d5d284b829 ("{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:17:17 -07:00
Bart Van Assche 0b5cb3300a RDMA/srp: Increase max_segment_size
The default behavior of the SCSI core is to set the block layer request
queue parameter max_segment_size to 64 KB. That means that elements of
scatterlists are limited to 64 KB. Since RDMA adapters support larger
sizes, increase max_segment_size for the SRP initiator.

Notes:
- The SCSI max_segment_size parameter was introduced in kernel v5.0. See
  also commit 50c2e9107f ("scsi: introduce a max_segment_size
  host_template parameters").
- Some other block drivers already set max_segment_size to UINT_MAX,
  e.g. nbd and rbd.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:04:16 -07:00
Michael J. Ruhl db421a5499 IB/{hfi1, qib, rvt} Cleanup open coded sge usage
Several locations for manipulating sges use an open coded sequence
that is covered by helper functions.

Use the appropriate helper functions.

Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-30 14:22:32 -05:00
Michael J. Ruhl 87fc34b575 IB/{hfi1,qib}: Cleanup open coded sge sizing
Sge sizing is done in several places using an open coded method.

This can cause maintenance issues.  The open coded method is
encapsulated in a helper routine.  The helper was introduced with
commit:

1198fcea8a ("IB/hfi1, rdmavt: Move SGE state helper routines into
rdmavt")

Update all call sites that have the open coded path with the helper
routine.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-30 14:22:32 -05:00
Kamal Heib ed0bc2658e IB/ipoib: Make ipoib_intercept_dev_id_attr() static
The function ipoib_intercept_dev_id_attr() is only used in ipoib_main.c

Fixes: f6350da41d ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 21:50:36 -07:00
Adit Ranadive 8aa04ad3b3 RDMA/vmw_pvrdma: Support upto 64-bit PFNs
Update the driver to use the new device capability to report 64-bit UAR
PFNs.

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 21:40:28 -07:00
Yishai Hadas 7b21b69ab2 IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate
The vma->vm_mm can become impossible to get before rdma_umap_close() is
called, in this case we must not try to get an mm that is already
undergoing process exit. In this case there is no need to wait for
anything as the VMA will be destroyed by another thread soon and is
already effectively 'unreachable' by userspace.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 1 PID: 2050 Comm: bash Tainted: G        W  OE 4.20.0-rc6+ #3
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__rb_erase_color+0xb9/0x280
 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89
               58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d
               10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f
 RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246
 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101
 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828
 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838
 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000
 FS:  00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0
 Call Trace:
  unlink_file_vma+0x3b/0x50
  free_pgtables+0xa1/0x110
  exit_mmap+0xca/0x1a0
  ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib]
  mmput+0x54/0x140
  uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs]
  uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs]
  ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
  ib_unregister_device+0xfb/0x200 [ib_core]
  mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
  mlx5_remove_device+0xc1/0xd0 [mlx5_core]
  mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
  remove_one+0x2a/0x90 [mlx5_core]
  pci_device_remove+0x3b/0xc0
  device_release_driver_internal+0x16d/0x240
  unbind_store+0xb2/0x100
  kernfs_fop_write+0x102/0x180
  __vfs_write+0x36/0x1a0
  ? __alloc_fd+0xa9/0x170
  ? set_close_on_exec+0x49/0x70
  vfs_write+0xad/0x1a0
  ksys_write+0x52/0xc0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Cc: <stable@vger.kernel.org> # 4.19
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:57:22 -07:00
Jason Gunthorpe 55c293c38e Merge branch 'devx-async' into k.o/for-next
Yishai Hadas says:

Enable DEVX asynchronous query commands

This series enables querying a DEVX object in an asynchronous mode.

The userspace application won't block when calling the firmware and it will be
able to get the response back once that it will be ready.

To enable the above functionality:

- DEVX asynchronous command completion FD object was introduced.
- The applicable file operations were implemented to enable using it by
  the user application.
- Query asynchronous method was added to the DEVX object, it will call the
  firmware asynchronously and manages the response on the given input FD.
- Hot unplug support was added for the FD to work properly upon
  unbind/disassociate.
- mlx5 core fence for asynchronous commands was implemented and used to
  prevent racing upon unbind/disassociate.

This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

* branch 'devx-async':
  IB/mlx5: Implement DEVX hot unplug for async command FD
  IB/mlx5: Implement the file ops of DEVX async command FD
  IB/mlx5: Introduce async DEVX obj query API
  IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:49:31 -07:00
Yishai Hadas eaebaf77e7 IB/mlx5: Implement DEVX hot unplug for async command FD
Implement DEVX hot unplug for the async command FD.

This is done by managing a list of the inflight commands and wait until
all launched work is completed as part of
devx_hot_unplug_async_cmd_event_file.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas 4accbb3fd2 IB/mlx5: Implement the file ops of DEVX async command FD
Implement the file ops of the DEVX async command FD, this enables using
the FD for reading the events and manage other options on the FD.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas a124edba26 IB/mlx5: Introduce async DEVX obj query API
Introduce async DEVX obj query API to get the command response back to
user space once it's ready without blocking when calling the firmware.

The event's data includes a header with some meta data then the firmware
output command data.

The header includes:
- The input 'wr_id' to let application recognizing the response.

The input FD attribute is used to have the event data ready on.
Downstream patches from this series will implement the file ops to let
application read it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas 6bf8f22aea IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation.

This object is from type class FD and will be used to read DEVX async
commands completion.

The core layer should allow the driver to set object from type FD in a
safe mode, this option was added with a matching comment in place.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:32:43 -07:00
Masahiro Yamada b360ce3b2b infiniband: prefix header search paths with $(srctree)/
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].

To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.

Having whitespaces after -I does not matter since commit 48f6e3cf5b
("kbuild: do not drop -I without parameter").

[1]: https://patchwork.kernel.org/patch/9632347/

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 15:28:50 -07:00
Masahiro Yamada ed4cdf4a21 infiniband: remove unneeded header search paths
The included headers are located in include/target/. I was able to
build these drivers without the extra header search paths.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 15:28:50 -07:00
Feras Daoud 6ab4aba00f IB/ipoib: Fix for use-after-free in ipoib_cm_tx_start
The following BUG was reported by kasan:

 BUG: KASAN: use-after-free in ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib]
 Read of size 80 at addr ffff88034c30bcd0 by task kworker/u16:1/24020

 Workqueue: ipoib_wq ipoib_cm_tx_start [ib_ipoib]
 Call Trace:
  dump_stack+0x9a/0xeb
  print_address_description+0xe3/0x2e0
  kasan_report+0x18a/0x2e0
  ? ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib]
  memcpy+0x1f/0x50
  ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib]
  ? kvm_clock_read+0x1f/0x30
  ? ipoib_cm_skb_reap+0x610/0x610 [ib_ipoib]
  ? __lock_is_held+0xc2/0x170
  ? process_one_work+0x880/0x1960
  ? process_one_work+0x912/0x1960
  process_one_work+0x912/0x1960
  ? wq_pool_ids_show+0x310/0x310
  ? lock_acquire+0x145/0x440
  worker_thread+0x87/0xbb0
  ? process_one_work+0x1960/0x1960
  kthread+0x314/0x3d0
  ? kthread_create_worker_on_cpu+0xc0/0xc0
  ret_from_fork+0x3a/0x50

 Allocated by task 0:
  kasan_kmalloc+0xa0/0xd0
  kmem_cache_alloc_trace+0x168/0x3e0
  path_rec_create+0xa2/0x1f0 [ib_ipoib]
  ipoib_start_xmit+0xa98/0x19e0 [ib_ipoib]
  dev_hard_start_xmit+0x159/0x8d0
  sch_direct_xmit+0x226/0xb40
  __dev_queue_xmit+0x1d63/0x2950
  neigh_update+0x889/0x1770
  arp_process+0xc47/0x21f0
  arp_rcv+0x462/0x760
  __netif_receive_skb_core+0x1546/0x2da0
  netif_receive_skb_internal+0xf2/0x590
  napi_gro_receive+0x28e/0x390
  ipoib_ib_handle_rx_wc_rss+0x873/0x1b60 [ib_ipoib]
  ipoib_rx_poll_rss+0x17d/0x320 [ib_ipoib]
  net_rx_action+0x427/0xe30
  __do_softirq+0x28e/0xc42

 Freed by task 26680:
  __kasan_slab_free+0x11d/0x160
  kfree+0xf5/0x360
  ipoib_flush_paths+0x532/0x9d0 [ib_ipoib]
  ipoib_set_mode_rss+0x1ad/0x560 [ib_ipoib]
  set_mode+0xc8/0x150 [ib_ipoib]
  kernfs_fop_write+0x279/0x440
  __vfs_write+0xd8/0x5c0
  vfs_write+0x15e/0x470
  ksys_write+0xb8/0x180
  do_syscall_64+0x9b/0x420
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 The buggy address belongs to the object at ffff88034c30bcc8
                which belongs to the cache kmalloc-512 of size 512
 The buggy address is located 8 bytes inside of
                512-byte region [ffff88034c30bcc8, ffff88034c30bec8)
 The buggy address belongs to the page:

The following race between change mode and xmit flow is the reason for
this use-after-free:

Change mode     Send packet 1 to GID XX      Send packet 2 to GID XX
     |                    |                             |
   start                  |                             |
     |                    |                             |
     |                    |                             |
     |         Create new path for GID XX               |
     |           and update neigh path                  |
     |                    |                             |
     |                    |                             |
     |                    |                             |
 flush_paths              |                             |
                          |                             |
               queue_work(cm.start_task)                |
                          |                 Path for GID XX not found
                          |                      create new path
                          |
                          |
               start_task runs with old
                    released path

There is no locking to protect the lifetime of the path through the
ipoib_cm_tx struct, so delete it entirely and always use the newly looked
up path under the priv->lock.

Fixes: 546481c281 ("IB/ipoib: Fix memory corruption in ipoib cm mode connect flow")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 12:17:54 -07:00
Yishai Hadas f8ade8e242 IB/uverbs: Fix ioctl query port to consider device disassociation
Methods cannot peak into the ufile, the only way to get a ucontext and
hence a device is via the ib_uverbs_get_ucontext() call or inspecing a
locked uobject.

Otherwise during/after disassociation the pointers may be null or free'd.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
 PGD 800000005ece6067 P4D 800000005ece6067 PUD 5ece7067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 10631 Comm: ibv_ud_pingpong Tainted: GW  OE     4.20.0-rc6+ #3
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT+0x53/0x191 [ib_uverbs]
 Code: 80 00 00 00 31 c0 48 8b 47 40 48 8d 5c 24 38 48 8d 6c 24
               08 48 89 df 48 8b 40 08 4c 8b a0 18 03 00 00 31 c0 f3 48 ab 48 89
               ef <49> 83 7c 24 78 00 b1 06 f3 48 ab 0f 84 89 00 00 00 45 31  c9 31 d2
 RSP: 0018:ffffb54802ccfb10 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffffb54802ccfb48 RCX:0000000000000000
 RDX: fffffffffffffffa RSI: ffffb54802ccfcf8 RDI:ffffb54802ccfb18
 RBP: ffffb54802ccfb18 R08: ffffb54802ccfd18 R09:0000000000000000
 R10: 0000000000000000 R11: 00000000000000d0 R12:0000000000000000
 R13: ffffb54802ccfcb0 R14: ffffb54802ccfc48 R15:ffff9f736e0059a0
 FS:  00007f55a6bd7740(0000) GS:ffff9f737ba00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000078 CR3: 0000000064214000 CR4:00000000000006f0
 Call Trace:
  ib_uverbs_cmd_verbs.isra.5+0x94d/0xa60 [ib_uverbs]
  ? copy_port_attr_to_resp+0x120/0x120 [ib_uverbs]
  ? arch_tlb_finish_mmu+0x16/0xc0
  ? tlb_finish_mmu+0x1f/0x30
  ? unmap_region+0xd9/0x120
  ib_uverbs_ioctl+0xbc/0x120 [ib_uverbs]
  do_vfs_ioctl+0xa9/0x620
  ? __do_munmap+0x29f/0x3a0
  ksys_ioctl+0x60/0x90
  __x64_sys_ioctl+0x16/0x20
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f55a62cb567

Fixes: 641d1207d2 ("IB/core: Move query port to ioctl")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 11:58:06 -07:00
Mark Bloch c1b03c25f5 RDMA/mlx5: Fix flow creation on representors
The intention of the flow_is_supported was to disable the entire tree and
methods that allow raw flow creation, but the grammar syntax has this
disable the entire UVERBS_FLOW object. Since the method requires a
MLX5_IB_OBJECT_FLOW_MATCHER there is no need to do anything, as it is
automatically disabled when matchers are disabled.

This restores the ability to create flow steering rules on representors
via regular verbs.

Fixes: a1462351b5 ("RDMA/mlx5: Fail early if user tries to create flows on IB representors")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 11:58:06 -07:00
Yishai Hadas 425784aa5b IB/uverbs: Fix OOPs upon device disassociation
The async_file might be freed before the disassociation has been ended,
causing qp shutdown to use after free on it.

Since uverbs_destroy_ufile_hw is not a fence, it returns if a
disassociation is ongoing in another thread. It has to be written this way
to avoid deadlock. However this means that the ufile FD close cannot
destroy anything that may still be used by an active kref, such as the the
async_file.

To fix that move the kref_put() to be in ib_uverbs_release_file().

 BUG: unable to handle kernel paging request at ffffffffba682787
 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061
 Oops: 0003 [#1] SMP PTI
 CPU: 1 PID: 32410 Comm: bash Tainted: G           OE 4.20.0-rc6+ #3
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0
 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d
		ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85
		d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83
 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006
 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001
 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787
 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294
 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00
 FS:  00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0
 Call Trace:
  _raw_spin_lock_irq+0x27/0x30
  ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs]
  uverbs_free_qp+0x7e/0x90 [ib_uverbs]
  destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
  uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
  __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs]
  uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs]
  ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
  ib_unregister_device+0xfb/0x200 [ib_core]
  mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
  mlx5_remove_device+0xc1/0xd0 [mlx5_core]
  mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
  remove_one+0x2a/0x90 [mlx5_core]
  pci_device_remove+0x3b/0xc0
  device_release_driver_internal+0x16d/0x240
  unbind_store+0xb2/0x100
  kernfs_fop_write+0x102/0x180
  __vfs_write+0x36/0x1a0
  ? __alloc_fd+0xa9/0x170
  ? set_close_on_exec+0x49/0x70
  vfs_write+0xad/0x1a0
  ksys_write+0x52/0xc0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fac551aac60

Cc: <stable@vger.kernel.org> # 4.2
Fixes: 036b106357 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 11:58:06 -07:00
Artemy Kovalyov a2093dd35f RDMA/umem: Add missing initialization of owning_mm
When allocating a umem leaf for implicit ODP MR during page fault the
field owning_mm was not set.

Initialize and take a reference on this field to avoid kernel panic when
trying to access this field.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
 PGD 800000022dfed067 P4D 800000022dfed067 PUD 22dfcf067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 634 Comm: kworker/u33:0 Not tainted 4.20.0-rc6+ #89
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
 RIP: 0010:ib_umem_odp_map_dma_pages+0xf3/0x710 [ib_core]
 Code: 45 c0 48 21 f3 48 89 75 b0 31 f6 4a 8d 04 33 48 89 45 a8 49 8b 44 24 60 48 8b 78 10 e8 66 16 a8 c5 49 8b 54 24 08 48 89 45 98 <8b> 42 58 85 c0 0f 84 8e 05 00 00 8d 48 01 48 8d 72 58 f0 0f b1 4a
 RSP: 0000:ffffb610813a7c20 EFLAGS: 00010202
 RAX: ffff95ace6e8ac80 RBX: 0000000000000000 RCX: 000000000000000c
 RDX: 0000000000000000 RSI: 0000000000000850 RDI: ffff95aceaadae80
 RBP: ffffb610813a7ce0 R08: 0000000000000000 R09: 0000000000080c77
 R10: ffff95acfffdbd00 R11: 0000000000000000 R12: ffff95aceaa20a00
 R13: 0000000000001000 R14: 0000000000001000 R15: 000000000000000c
 FS:  0000000000000000(0000) GS:ffff95acf7800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000058 CR3: 000000022c834001 CR4: 00000000001606f0
 Call Trace:
  pagefault_single_data_segment+0x1df/0xc60 [mlx5_ib]
  mlx5_ib_eqe_pf_action+0x7bc/0xa70 [mlx5_ib]
  ? __switch_to+0xe1/0x470
  process_one_work+0x174/0x390
  worker_thread+0x4f/0x3e0
  kthread+0x102/0x140
  ? drain_workqueue+0x130/0x130
  ? kthread_stop+0x110/0x110
  ret_from_fork+0x1f/0x30

Fixes: f27a0d50a4 ("RDMA/umem: Use umem->owning_mm inside ODP")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 09:55:48 -07:00
Lijun Ou 9d9d4ff788 RDMA/hns: Update the kernel header file of hns
The hns_roce_ib_create_srq_resp is used to interact with the user for
data, this was open coded to use a u32 directly, instead use a properly
sized structure.

Fixes: c7bcb13442 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 09:55:48 -07:00
Yuval Avnery 535005ca8e IB/core: Destroy QP if XRC QP fails
The open-coded variant missed destroy of SELinux created QP, reuse already
existing ib_detroy_qp() call and use this opportunity to clean
ib_create_qp() from double prints and unclear exit paths.

Reported-by: Parav Pandit <parav@mellanox.com>
Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Ira Weiny ff0244bb59 RDMA/qib: Use GUP longterm for PSM page pining
Similar to the core change commit 5f1d43de54 ("IB/core: disable memory
registration of filesystem-dax vmas")

PSM should be prevented from using filesystem DAX pages.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Yangyang Li 0e40dc2f70 RDMA/hns: Add timer allocation support for hip08
This patch adds qpc timer and cqc timer allocation support for hardware
timeout retransmission in kernel space driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Yangyang Li aa84fa1874 RDMA/hns: Add SCC context clr support for hip08
This patch adds SCC context clear support for DCQCN in kernel space
driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Yangyang Li 6a157f7d1b RDMA/hns: Add SCC context allocation support for hip08
This patch adds SCC context allocation and initialization support for
DCQCN in kernel space driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Moni Shoua da6a496a34 IB/mlx5: Ranges in implicit ODP MR inherit its write access
A sub-range in ODP implicit MR should take its write permission from the
MR and not be set always to allow.

Fixes: d07d1d70ce ("IB/umem: Update on demand page (ODP) support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Jason Gunthorpe 8ba0ddd094 RDMA/iw_cxgb4: Drop __GFP_NOFAIL
There is no reason for this __GFP_NOFAIL, none of the other routines in
this file use it, and there is an error unwind here. NOFAIL should be
reserved for special cases, not used by network drivers.

Fixes: 6a0b6174d3 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Reported-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Bart Van Assche 0a353c2e94 IB/mlx5: Declare local functions 'static'
This patch avoids that sparse complains about missing function
declarations.

Fixes: c9990ab39b ("RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mm")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Bart Van Assche f373859190 IB/core: Declare local functions 'static'
This patch avoids that sparse complains about missing function
declarations.

Fixes: f27a0d50a4 ("RDMA/umem: Use umem->owning_mm inside ODP")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman 2e061c691c infiniband: ipoib: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman 316bcda81d infiniband: usnic: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman 2537672966 infiniband: ocrdma: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman 73eb8f03f0 infiniband: mlx5: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman 0d0336cf54 infiniband: qib: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman e775118025 infiniband: hfi1: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman 5c43276499 infiniband: hfi1: drop crazy DEBUGFS_SEQ_FILE_CREATE() macro
The macro was just making things harder to follow, and audit, so remove
it and call debugfs_create_file() directly.  Also, the macro did not
need to warn about the call failing as no one should ever care about any
debugfs functions failing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman 8283d78725 infiniband: cxgb4: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Parav Pandit 039d713a59 IB/umad: Do not check status of nonseekable_open()
As the comment block of nonseekable_open() describes, nonseekable_open()
can never fail. Several places in kernel depend on this behavior.
Therefore, simplify the umad module to depend on this basic kernel
functionality.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Jason Gunthorpe e355477ed9 net/mlx5: Make mlx5_cmd_exec_cb() a safe API
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.

The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-24 14:25:26 +02:00
Parav Pandit ee848721f6 IB/umad: Avoid additional device reference during open()/close()
ib_umad_init_port_dev() holds the reference of a ib_umad_device instance.
ib_umad_device contains standard core device and cdev.  cdev holds the
reference of its parent core device.  file ops holds the reference to cdev
using core kernel.

Therefore, there is no need to hold additional reference while opening
umad related char devices.

While at it, add comments to bring clarity on releasing references to
ib_umd_device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-23 11:37:05 -07:00
Maor Gottlieb 6113cc4401 IB/mlx5: Don't override existing ip_protocol
Two flow specifications can set the ip protocol field in
the flow table entry:

1) IB_FLOW_SPEC_TCP/UDP/GRE - set the ip protocol accordingly.
2) IB_FLOW_SPEC_IPV4/6 - has ip_protocol field for users
who want to receive specific L4 packets.

We need to avoid overriding of the ip_protocol with zeros,
in case that the user first put the L4 specification and
only then the L3.

Fixes: ca0d475385 ('IB/mlx5: Add support in TOS and protocol to flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:10:06 -07:00
Yishai Hadas 414556af5f IB/mlx5: Add support for ODP for DEVX indirection mkey
Add support for ODP for DEVX indirection mkey, it includes:
- Recognizing its type as part of the radix tree lookup.
- Use similar flow as done for the MW MKEY type.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:06:49 -07:00
Yishai Hadas 534fd7aac5 IB/mlx5: Manage indirection mkey upon DEVX flow for ODP
Manage indirection mkey upon DEVX flow to support ODP.

To support a page fault event on the indirection mkey it needs to be part
of the device mkey radix tree.

Both the creation and the deletion flows for a DEVX object which is
indirection mkey were adapted to handle that.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:06:49 -07:00
Yishai Hadas fa31f14380 IB/mlx5: DEVX handling for indirection MKEY
Once an indirection MKEY is created umem valid bit shouldn't be set as
this MKEY doesn't really hold a umem.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:06:49 -07:00
Xiaofei Tan 2b9acb9a97 RDMA/hns: Add the process of AEQ overflow for hip08
AEQ overflow will be reported by hardware when too many asynchronous
events occurred but not be handled in time.  Normally, AEQ overflow error
is not easy to occur. Once happened, we have to do physical function reset
to recover.  PF reset is implemented in two steps. Firstly, set reset
level with ae_dev->ops->set_default_reset_request.  Secondly, run reset
with ae_dev->ops->reset_event.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 16:47:54 -07:00
Zhu Yanjun 9802c335e7 IB/rxe: Remove unnecessary rxe variable
The variable rxe in the function is not used. So it is removed.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 16:46:08 -07:00
Jason Gunthorpe 951d01b96f IB/mlx5: Fix how advise_mr() launches async work
Work must hold a kref on the ib_device otherwise the dev pointer can
become free before the work runs. This can happen because the work is
being pushed onto the system work queue which is not flushed during driver
unregister.

Remove the bogus use of 'reg_state':
 - While in uverbs the reg_state is guaranteed to always be
   REGISTERED
 - Testing reg_state with no locking is bogus. Use ib_device_try_get()
   to get back into a region that prevents unregistration.

For now continue with a flow that is similar to the existing code.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
2019-01-21 14:39:29 -07:00
Jason Gunthorpe d79af7242b RDMA/device: Expose ib_device_try_get(()
It turns out future patches need this capability quite widely now, not
just for netlink, so provide two global functions to manage the
registration lock refcount.

This also moves the point the lock becomes 1 to within
ib_register_device() so that the semantics of the public API are very sane
and clear. Calling ib_device_try_get() will fail on devices that are only
allocated but not yet registered.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
2019-01-21 14:33:08 -07:00
Mike Marciniszyn 09ce351dff IB/hfi1: Add limit test for RC/UC send via loopback
Fix potential memory corruption and panic in loopback for IB_WR_SEND
variants.

The code blindly assumes the posted length will fit in the fetched rwqe,
which is not a valid assumption.

Fix by adding a limit test, and triggering the appropriate send completion
and putting the QP in an error state.  This mimics the handling for
non-loopback QPs.

Fixes: 1570346153 ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
Cc: <stable@vger.kernel.org> #v4.20+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 14:20:08 -07:00
Michael J. Ruhl 7709b0dc26 IB/hfi1: Remove overly conservative VM_EXEC flag check
Applications that use the stack for execution purposes cause userspace PSM
jobs to fail during mmap().

Both Fortran (non-standard format parsing) and C (callback functions
located in the stack) applications can be written such that stack
execution is required. The linker notes this via the gnu_stack ELF flag.

This causes READ_IMPLIES_EXEC to be set which forces all PROT_READ mmaps
to have PROT_EXEC for the process.

Checking for VM_EXEC bit and failing the request with EPERM is overly
conservative and will break any PSM application using executable stacks.

Cc: <stable@vger.kernel.org> #v4.14+
Fixes: 1222026764 ("IB/hfi: Protect against writable mmap")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 14:20:08 -07:00
Brian Welty 904bba211a IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
The work completion length for a receiving a UD send with immediate is
short by 4 bytes causing application using this opcode to fail.

The UD receive logic incorrectly subtracts 4 bytes for immediate
value. These bytes are already included in header length and are used to
calculate header/payload split, so the result is these 4 bytes are
subtracted twice, once when the header length subtracted from the overall
length and once again in the UD opcode specific path.

Remove the extra subtraction when handling the opcode.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 14:20:08 -07:00
Jack Morgenstein f45f8edbe1 IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOV
The commit cited below replaced rdma_create_ah with
mlx4_ib_create_slave_ah when creating AHs for the paravirtualized special
QPs.

However, this change also required replacing rdma_destroy_ah with
mlx4_ib_destroy_ah in the affected flows.

The commit missed 3 places where rdma_destroy_ah should have been replaced
with mlx4_ib_destroy_ah.

As a result, the pd usecount was decremented when the ah was destroyed --
although the usecount was NOT incremented when the ah was created.

This caused the pd usecount to become negative, and resulted in the
WARN_ON stack trace below when the mlx4_ib.ko module was unloaded:

WARNING: CPU: 3 PID: 25303 at drivers/infiniband/core/verbs.c:329 ib_dealloc_pd+0x6d/0x80 [ib_core]
Modules linked in: rdma_ucm rdma_cm iw_cm ib_cm ib_umad mlx4_ib(-) ib_uverbs ib_core mlx4_en mlx4_core nfsv3 nfs fscache configfs xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc dm_mirror dm_region_hash dm_log dm_mod dax rndis_wlan rndis_host coretemp kvm_intel cdc_ether kvm usbnet iTCO_wdt iTCO_vendor_support cfg80211 irqbypass lpc_ich ipmi_si i2c_i801 mii pcspkr i2c_core mfd_core ipmi_devintf i7core_edac ipmi_msghandler ioatdma pcc_cpufreq dca acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas mptscsih crc32c_intel ata_piix bnx2 mptbase ipv6 crc_ccitt autofs4 [last unloaded: mlx4_core]
CPU: 3 PID: 25303 Comm: modprobe Tainted: G        W I       5.0.0-rc1-net-mlx4+ #1
Hardware name: IBM  -[7148ZV6]-/Node 1, System Card, BIOS -[MLE170CUS-1.70]- 09/23/2011
RIP: 0010:ib_dealloc_pd+0x6d/0x80 [ib_core]
Code: 00 00 85 c0 75 02 5b c3 80 3d aa 87 03 00 00 75 f5 48 c7 c7 88 d7 8f a0 31 c0 c6 05 98 87 03 00 01 e8 07 4c 79 e0 0f 0b 5b c3 <0f> 0b eb be 0f 0b eb ab 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
RSP: 0018:ffffc90005347e30 EFLAGS: 00010282
RAX: 00000000ffffffea RBX: ffff8888589e9540 RCX: 0000000000000006
RDX: 0000000000000006 RSI: ffff88885d57ad40 RDI: 0000000000000000
RBP: ffff88885b029c00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000004 R12: ffff8887f06c0000
R13: ffff8887f06c13e8 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fd6743c6740(0000) GS:ffff88887fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000ed1038 CR3: 00000007e3156000 CR4: 00000000000006e0
Call Trace:
 mlx4_ib_close_sriov+0x125/0x180 [mlx4_ib]
 mlx4_ib_remove+0x57/0x1f0 [mlx4_ib]
 mlx4_remove_device+0x92/0xa0 [mlx4_core]
 mlx4_unregister_interface+0x39/0x90 [mlx4_core]
 mlx4_ib_cleanup+0xc/0xd7 [mlx4_ib]
 __x64_sys_delete_module+0x17d/0x290
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 ? do_syscall_64+0x12/0x180
 do_syscall_64+0x4a/0x180
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 5e62d5ff1b ("IB/mlx4: Create slave AH's directly")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 14:20:08 -07:00
Mark Bloch 8af526e035 RDMA/mlx5: Fix check for supported user flags when creating a QP
When the flags verification was added two flags were missed from the
check:
 * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC
 * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC

This causes user applications that were using these flags to break.

Fixes: 2e43bb31b8 ("IB/mlx5: Verify that driver supports user flags")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 14:20:08 -07:00
Israel Rukshin 57b26497fa IB/iser: Pass the correct number of entries for dma mapped SGL
ib_dma_map_sg() augments the SGL into a 'dma mapped SGL'. This process may
change the number of entries and the lengths of each entry.

Code that touches dma_address is iterating over the 'dma mapped SGL' and
must use dma_nents which returned from ib_dma_map_sg().

ib_sg_to_pages() and ib_map_mr_sg() are using dma_address so they must use
dma_nents.

Fixes: 3940588500 ("IB/iser: Port to new fast registration API")
Fixes: bfe066e256 ("IB/iser: Reuse ib_sg_to_pages")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18 14:12:24 -07:00
YueHaibing 790b57f686 IB/hw: Remove unneeded semicolons
Remove unneeded semicolons.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18 14:04:46 -07:00
Mike Marciniszyn 14e517e4b4 IB/rdmavt: Add wc_flags and wc_immdata to cq entry trace
These fields were missing from the trace.  Add them.

Fixes: c6ad9482fc ("IB/rdmavt: Add tracing for cq entry and poll")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18 13:48:19 -07:00
Parav Pandit 7527a7b157 IB/core: Simplify rdma cgroup registration
RDMA cgroup registration routine always returns success, so simplify
function to be void and run clang formatter over whole CONFIG_CGROUP_RDMA
art of core_priv.h.

This reduces unwinding error path for regular registration and future net
namespace change functionality for rdma device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18 13:43:10 -07:00
YueHaibing 8ea175f005 RDMA/qedr: remove set but not used variable 'ib_ctx'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/qedr/verbs.c: In function 'qedr_create_srq':
drivers/infiniband/hw/qedr/verbs.c:1436:22: warning:
 variable 'ib_ctx' set but not used [-Wunused-but-set-variable]

drivers/infiniband/hw/qedr/verbs.c: In function 'qedr_create_user_qp':
drivers/infiniband/hw/qedr/verbs.c:1701:22: warning:
 variable 'ib_ctx' set but not used [-Wunused-but-set-variable]

Fixes: b0ea0fa543 ("IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18 13:41:23 -07:00
Jason Gunthorpe 344684e6d0 RDMA/device: Use __ib_device_get_by_name() in ib_device_rename()
No reason to open code this loop.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2019-01-18 13:05:58 -07:00
Lijun Ou de77503a59 RDMA/hns: RDMA/hns: Assign rq head pointer when enable rq record db
When flush cqe, it needs to get the pointer of rq and sq from db address
space of user and update it into qp context by modified qp. if rq does not
exist, it will not get the value from db address space of user.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18 13:04:27 -07:00
Linus Torvalds d7393226d1 First 5.0 rc pull request
Not much so far, but I'm feeling like the 2nd PR -rc will be larger than
 this. We have the usual batch of bugs and two fixes to code merged this cycle.
 
 - Restore valgrind support for the ioctl verbs interface merged this window,
   and fix a missed error code on an error path from that conversion
 
 - A user reported crash on obsolete mthca hardware
 
 - pvrdma was using the wrong command opcode toward the hypervisor
 
 - NULL pointer crash regression when dumping rdma-cm over netlink
 
 - Be conservative about exposing the global rkey
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlxBTeMACgkQOG33FX4g
 mxrOIQ//YdZdU9J825DM4ppH/MWRoPgayI+cca5sW2EG/nkgsvFJoiVDDK5/ka1g
 ge5Q21ZLMSPCBR0Iu/e/JOq6fJI4fsbcJGZURbyKgRZqyCBCf6qJbhiZKifpQMVb
 w7RP8kRFRdaiQzkAYfZSv9TP93JLvTDLg6zZ74r4vc8YphIzkI410v568hs6FiVu
 MIcb53pBWUswpCAnBVB+54sw+phJyjd02kmY4xTlWmiEzwHBb0JQ+Kps72/G0IWy
 0vOlDI1UjwqoDfThzyT7mcXqnSbXxg/e8EecMpyFzlorQyxgZ5TsJgQ8ubSYxuiQ
 7+dZ4rsdoZD++3MGtpmqDMQzKSPb989WzJT8WLp5oSw4ryAXeJJ+tys/APLtvPkf
 EgKgVyEqfxMDXn02/ENwDPpZyKLZkhcHFLgvfYmxtlDvtai/rvTLmzV1mptEaxlF
 +2pwSQM4/E/8qrLglN9kdFSfjBMb7Bvd2NYQqZ9vah2omb7gPsaTEEpVw6l/E0NX
 oOxFKPEzb0nP9KmJmwO8KLCvcrruuRL8kpmhc6sQMQJ6z0h4hmZrHF5EZZH92g0p
 maHyrx66vqw/Yl+TLvAb/T6FV1ax5c1TauiNErAjnag2wgVWW42Q7lQzSFLFI8su
 GU8oRlbIclDQ/1bszsf0IShq0r9G17+2n6yyTX39rj62YioiDlI=
 =ymZq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes frfom Jason Gunthorpe:
 "Not much so far. We have the usual batch of bugs and two fixes to code
  merged this cycle:

   - Restore valgrind support for the ioctl verbs interface merged this
     window, and fix a missed error code on an error path from that
     conversion

   - A user reported crash on obsolete mthca hardware

   - pvrdma was using the wrong command opcode toward the hypervisor

   - NULL pointer crash regression when dumping rdma-cm over netlink

   - Be conservative about exposing the global rkey"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT
  RDMA/mthca: Clear QP objects during their allocation
  RDMA/vmw_pvrdma: Return the correct opcode when creating WR
  RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type
  RDMA/nldev: Don't expose unsafe global rkey to regular user
  RDMA/uverbs: Fix post send success return value in case of error
2019-01-18 17:17:20 +12:00
Gustavo A. R. Silva 8e8aa14542 RDMA/mlx5: Replace kzalloc with kcalloc
Replace kzalloc() function with its 2-factor argument form, kcalloc().

This patch replaces cases of:

	kzalloc(a * b, gfp)

with:
	kcalloc(a, b, gfp)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-15 15:53:08 -07:00
Raju Rangoju 3352976c89 RDMA/iw_cxgb4: Fix the unchecked ep dereference
The patch 944661dd97f4: "RDMA/iw_cxgb4: atomically lookup ep and get a
reference" from May 6, 2016, leads to the following Smatch complaint:

    drivers/infiniband/hw/cxgb4/cm.c:2953 terminate()
    error: we previously assumed 'ep' could be null (see line 2945)

Fixes: 944661dd97 ("RDMA/iw_cxgb4: atomically lookup ep and get a reference")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-15 15:51:46 -07:00
Leon Romanovsky 73f5a82bb3 RDMA/mad: Reduce MAD scope to mlx5_ib only
Management Datagram Interface (MAD) is applicable
only when physical port is Infiniband. It makes MAD
command logic to be completely unrelated to eth/core
parts of mlx5.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-15 10:02:29 +02:00
Myungho Jung 5fc01fb846 RDMA/cma: Rollback source IP address if failing to acquire device
If cma_acquire_dev_by_src_ip() returns error in addr_handler(), the
device state changes back to RDMA_CM_ADDR_BOUND but the resolved source
IP address is still left. After that, if rdma_destroy_id() is called
after rdma_listen(), the device is freed without removed from
listen_any_list in cma_cancel_operation(). Revert to the previous IP
address if acquiring device fails.

Reported-by: syzbot+f3ce716af730c8f96637@syzkaller.appspotmail.com
Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 15:28:26 -07:00
Dan Carpenter 97099cc652 RDMA/bnxt_re: fix a size calculation
This is from static analysis not from testing.  Depending on the value
of rcfw->cmdq_depth, then this might not cause an issue at runtime.

The BITS_TO_LONGS() macro tells us how many longs it take to hold a
bitmap.  In other words, it divides by the number if bits per long and
rounds up.  Then we want to take that number and multiple by
sizeof(long) to get the number of bytes to allocate.

The code here does the multiplication first so the rounding up is done
in the wrong place.  So imagine we want to allocate 1 bit, then
"(1 * 8) / 64 = 1" when we round up.  But it should be
"(1 / 64) * 8 = 8".  In other words, because of the rounding difference
we might allocate up to "sizeof(long) - 1" bytes fewer than intended.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 14:05:54 -07:00
Jason Gunthorpe d6f4a21f30 RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT
When the ioctl interface for the write commands was introduced it did
not mark the core response with UVERBS_ATTR_F_VALID_OUTPUT. This causes
rdma-core in userspace to not mark the buffers as written for valgrind.

Along the same lines it turns out we have always missed marking the driver
data. Fixing both of these makes valgrind work properly with rdma-core and
ioctl.

Fixes: 4785860e04 ("RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-14 14:02:22 -07:00
Parav Pandit 5474723115 RDMA: Introduce and use rdma_device_to_ibdev()
Introduce and use rdma_device_to_ibdev() API for those drivers which are
registering one sysfs group and also use in ib_core.

In subsequent patch, device->provider_ibdev one-to-one mapping is no
longer holds true during accessing sysfs entries.
Therefore, introduce an API rdma_device_to_ibdev() that provides such
information.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 13:12:03 -07:00
Parav Pandit ea4baf7f11 RDMA: Rename port_callback to init_port
Most provider routines are callback routines which ib core invokes.
_callback suffix doesn't convey information about when such callback is
invoked. Therefore, rename port_callback to init_port.

Additionally, store the init_port function pointer in ib_device_ops, so
that it can be accessed in subsequent patches when binding rdma device to
net namespace.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 13:05:14 -07:00
Leon Romanovsky 081de9495c RDMA: Clear CTX objects during their allocation
As part of an audit process to update drivers to use rdma_restrack_add()
ensure that CTX objects is cleared before access.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:08:52 -07:00
Leon Romanovsky 0975890ebe RDMA: Clear CQ objects during their allocation
As part of an audit process to update drivers to use rdma_restrack_add()
ensure that CQ objects is cleared before access.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:08:52 -07:00
Leon Romanovsky 8cbfaac3d0 RDMA: Clear PD objects during their allocation
As part of an audit process to update drivers to use rdma_restrack_add()
ensure that PD objects is cleared before access.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:08:52 -07:00
Gal Pressman dbe30dae48 RDMA/qedr: Fix out of bounds index check in query pkey
The pkey table size is QEDR_ROCE_PKEY_TABLE_LEN, index should be tested
for >= QEDR_ROCE_PKEY_TABLE_LEN instead of > QEDR_ROCE_PKEY_TABLE_LEN.

Fixes: a7efd7773e ("qedr: Add support for PD,PKEY and CQ verbs")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:07:45 -07:00
Gal Pressman b188940796 RDMA/ocrdma: Fix out of bounds index check in query pkey
The pkey table size is one element, index should be tested for > 0 instead
of > 1.

Fixes: fe2caefcdf ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:07:45 -07:00
Gal Pressman 4959d5da57 IB/usnic: Fix out of bounds index check in query pkey
The pkey table size is one element, index should be tested for > 0 instead
of > 1.

Fixes: e3cf00d0a8 ("IB/usnic: Add Cisco VIC low-level hardware driver")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:07:45 -07:00
Jason Gunthorpe b0ea0fa543 IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata
ib_umem_get() can only be called in a method callback, which always has a
udata parameter. This allows ib_umem_get() to derive the ucontext pointer
directly from the udata without requiring the drivers to find it in some
way or another.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
2019-01-10 17:07:45 -07:00
Shamir Rabinovitch 6fa8f1afd3 IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to ib_uverbs
The next patch will add dependency from ib_umem_get in to ib_uverbs so
move the required ib_umem_xxx functionality to it's correct module -
ib_uverbs - and avoid circular dependecy from the form of ib_core ->
ib_uverbs -> ib_core in depmod.

Since this now requires all drivers to be build modular if uverbs is
modular, hoist the test a couple drivers had into the main kconfig and
apply it to all drivers uniformly.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:06:44 -07:00
Leon Romanovsky 9d9f59b420 RDMA/mthca: Clear QP objects during their allocation
As part of audit process to update drivers to use rdma_restrack_add()
ensure that QP objects is cleared before access. Such change fixes the
crash observed with uninitialized non zero sgid attr accessed by
ib_destroy_qp().

CPU: 3 PID: 74 Comm: kworker/u16:1 Not tainted 4.19.10-300.fc29.x86_64
Workqueue: ipoib_wq ipoib_cm_tx_reap [ib_ipoib]
RIP: 0010:rdma_put_gid_attr+0x9/0x30 [ib_core]
RSP: 0018:ffffb7ad819dbde8 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8d1bdf5a2e00 RCX: 0000000000002699
RDX: 206c656e72656af8 RSI: ffff8d1bf7ae6160 RDI: 206c656e72656b20
RBP: 0000000000000000 R08: 0000000000026160 R09: ffffffffc06b45bf
R10: ffffe849887da000 R11: 0000000000000002 R12: ffff8d1be30cb400
R13: ffff8d1bdf681800 R14: ffff8d1be2272400 R15: ffff8d1be30ca000
FS:  0000000000000000(0000) GS:ffff8d1bf7ac0000(0000)
knlGS:0000000000000000
Trace:
 ib_destroy_qp+0xc9/0x240 [ib_core]
 ipoib_cm_tx_reap+0x1f9/0x4e0 [ib_ipoib]
 process_one_work+0x1a1/0x3a0
 worker_thread+0x30/0x380
 ? pwq_unbound_release_workfn+0xd0/0xd0
 kthread+0x112/0x130
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x22/0x40

Reported-by: Alexander Murashkin <AlexanderMurashkin@msn.com>
Tested-by: Alexander Murashkin <AlexanderMurashkin@msn.com>
Fixes: 1a1f460ff1 ("RDMA: Hold the sgid_attr inside the struct ib_ah/qp")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:01:22 -07:00
Adit Ranadive 6325e01b6c RDMA/vmw_pvrdma: Return the correct opcode when creating WR
Since the IB_WR_REG_MR opcode value changed, let's set the PVRDMA device
opcodes explicitly.

Reported-by: Ruishuang Wang <ruishuangw@vmware.com>
Fixes: 9a59739bd0 ("IB/rxe: Revise the ib_wr_opcode enum")
Cc: stable@vger.kernel.org
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Ruishuang Wang <ruishuangw@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:00:28 -07:00
Steve Wise 917cb8a72a RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type
A recent regression causes a null ptr crash when dumping cm_id resources.
The cma is incorrectly adding all cm_id restrack resources as kernel mode.

Fixes: af8d70375d ("RDMA/restrack: Resource-tracker should not use uobject pointers")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 17:12:33 -07:00
Leon Romanovsky 13859d5df4 RDMA/mlx5: Embed into the code flow the ODP config option
Convert various places to more readable code, which embeds
CONFIG_INFINIBAND_ON_DEMAND_PAGING into the code flow.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky 8b4d5bc5cf RDMA/mlx5: Introduce and reuse helper to identify ODP MR
Consolidate various checks if MR is ODP backed to one simple helper and
update call sites to use it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky e502b8b011 RDMA/core: Don't depend device ODP capabilities on kconfig option
Device capability bits are exposing what specific device supports from HW
perspective. Those bits are not dependent on kernel configurations and
RDMA/core should ensure that proper interfaces to users will be disabled
if CONFIG_INFINIBAND_ON_DEMAND_PAGING is not set.

Fixes: f4056bfd8c ("IB/core: Add on demand paging caps to ib_uverbs_ex_query_device")
Fixes: 8cdd312cfe ("IB/mlx5: Implement the ODP capability query verb")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky 96f87ee181 RDMA: Clean structures from CONFIG_INFINIBAND_ON_DEMAND_PAGING
CONFIG_INFINIBAND_ON_DEMAND_PAGING is used in general structures to
micro-optimize the memory footprint. Remove it, so it will allow us to
simplify various ODP device flows.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Gustavo A. R. Silva 7a7b0fea6f IB/srp: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 15:46:59 -07:00
Luis Chamberlain 750afb08ca cross-tree: phase out dma_zalloc_coherent()
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.

This change was generated with the following Coccinelle SmPL patch:

@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@

-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-08 07:58:37 -05:00
Leon Romanovsky a9666c1cae RDMA/nldev: Don't expose unsafe global rkey to regular user
Unsafe global rkey is considered dangerous because it exposes memory
registered for all memory in the system. Only users with a QP on the same
PD can use the rkey, and generally those QPs will already know the
value. However, out of caution, do not expose the value to unprivleged
users on the local system. Require CAP_NET_ADMIN instead.

Cc: <stable@vger.kernel.org> # 4.16
Fixes: 29cf1351d4 ("RDMA/nldev: provide detailed PD information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 13:35:57 -07:00
Lijun Ou 91fb4d83b8 RDMA/hns: Modify the pbl ba page size for hip08
Modify the pbl ba page size to 16K for in order to support 4G MR size.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:14:42 -07:00
Lijun Ou 44754b95dd RDMA/hns: Add constraint on the setting of local ACK timeout
According to IB protocol, local ACK timeout shall be a 5 bit
value. Currently, hip08 could not support the possible max value 31. Fail
the request in this case.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:13:34 -07:00
Lijun Ou 4d103905eb RDMA/hns: Bugfix for the scene without receiver queue
In some application scenario, the user could not have receive queue when
run rdma write or read operation.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:12:50 -07:00
Lijun Ou 9c6ccc035c RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
When flush cqe with srq, the driver disable to update the rq head pointer
into the hardware.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:12:36 -07:00
Potnuri Bharat Teja e6b7b7d8a9 iw_cxgb4: Check for send WR also while posting write with completion WR
Inorder to optimize the NVMEoF read IOPs, iw_cxgb4 posts a FW Write with
Completion WQE that combines an RDMA Write WR and the subsequent RDMA Send
with Invalidate WR.

This patch is an extension to it, where it posts a Write with completion
for RDMA WRITE WR + RDMA SEND WR combination as well.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 12:02:45 -07:00
Gustavo A. R. Silva 5aad26a7ea IB/core: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:59:33 -07:00
Gustavo A. R. Silva 02fc184841 IB/usnic: Use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:43:03 -07:00
Gustavo A. R. Silva b5c61b968d IB/cm: Use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:43:03 -07:00
Gal Pressman f687ccea10 RDMA/uverbs: Fix post send success return value in case of error
If get QP object fails 'ret' must be assigned with a proper error code.

Fixes: 9a0738575f ("RDMA/uverbs: Use uverbs_response() for remaining response copying")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07 11:42:22 -07:00
Linus Torvalds 3954e1d031 4.21 merge window 2nd pull request
Over the break a few defects were found, so this is a -rc style pull
 request of various small things that have been posted.
 
 - An attempt to shorten RCU grace period driven delays showed crashes
   during heavier testing, and has been entirely reverted
 
 - A missed merge/rebase error between the advise_mr and ib_device_ops
   series
 
 - Some small static analysis driven fixes from Julia and Aditya
 
 - Missed ability to create a XRC_INI in the devx verbs interop series
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwu4TIACgkQOG33FX4g
 mxqPgw//XU7X2/AbXALQOvZgI6y/qs6BSzucGEkTEEMyJ5KvjS537yJqN7ltfe9d
 BiJLIpCUJ2NKqyUnbah7nHT06Mm7wZM+FkIxtf2N3te/MfYd5HwIvUdIwwmX+VEc
 k1DcRD1EfowZCSgBAVQAqqJu6oBW//Wi48BQ7HNGvyXVJJ/F+uKIM/Am6oGUTV/5
 69yo0ZfqP/+bRfbNvg7cHqWafCL8ed70pIqpoL67hRfHcxUW/TQVV6njw8FNB/MH
 DNL6pN3oncUweyOPDV/Z6Cx+De5BFF498Rbvosugk8OO62wQ780DTvTeA5AlEtxV
 TEjTtd7QqDhWRELzv4WtU9ojrOnp3bzEu36Ok7ANEGAW40WdAL//eWQiaJF423Az
 zcD3w/t9ZE2mIX9h7YcVnMpmDvGpyQorG4mFYPfZgXLVxgrY2phLwiZsOk3B6PY8
 cszL4mJFnk6DKB9/31nWgPpWl+V1/E48JODwU9Fz1d3ov+XvNC4SBp0hM6cfG25c
 insZevsAfMQ+k43Rw+iE62Sz9JTfJZpVekyMmIG5fqCZlzG4UXhB6On5r6TGvWc0
 cnbZ+ELmsZY54DyAloOAKvBUuVY/t8QYaFo3y69v0B5ZiVnY1I00r74FyGEo21Cv
 /uxKbUmQxW4T9rdgZtWtfsSKcuiGrRDLTcLJ5j19c6bqJyF3fao=
 =REsM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Over the break a few defects were found, so this is a -rc style pull
  request of various small things that have been posted.

   - An attempt to shorten RCU grace period driven delays showed crashes
     during heavier testing, and has been entirely reverted

   - A missed merge/rebase error between the advise_mr and ib_device_ops
     series

   - Some small static analysis driven fixes from Julia and Aditya

   - Missed ability to create a XRC_INI in the devx verbs interop
     series"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  infiniband/qedr: Potential null ptr dereference of qp
  infiniband: bnxt_re: qplib: Check the return value of send_message
  IB/ipoib: drop useless LIST_HEAD
  IB/core: Add advise_mr to the list of known ops
  Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads"
  IB/mlx5: Allow XRC INI usage via verbs in DEVX context
2019-01-05 18:20:51 -08:00
Linus Torvalds 96d4f267e4 Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03 18:57:57 -08:00
Aditya Pakki 9c6260de50 infiniband/qedr: Potential null ptr dereference of qp
idr_find() may fail and return a NULL pointer. The fix checks the return
value of the function and returns an error in case of NULL.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 16:08:38 -07:00
Aditya Pakki 94edd87a1c infiniband: bnxt_re: qplib: Check the return value of send_message
In bnxt_qplib_map_tc2cos(), bnxt_qplib_rcfw_send_message() can return an
error value but it is lost. Propagate this error to the callers.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 16:04:24 -07:00
Julia Lawall 2fb458953a IB/ipoib: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares is never used.

Commit 31c02e2157 ("IPoIB: Avoid using stale last_send counter
when reaping AHs") removed the uses, but not the declaration.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 31c02e2157 ("IPoIB: Avoid using stale last_send counter when reaping AHs")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 15:42:43 -07:00
Moni Shoua 2f1927b090 IB/core: Add advise_mr to the list of known ops
We need to add advise_mr to the list of operation setters on the ib_device
or otherwise callers to ib_set_device_ops() for advise_mr operation will
not have their callback registered.

When the advise_mr series was merged with the device ops series the
SET_DEVICE_OPS() was missed.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Leon Romanovsky ccffa54548 Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads"
Longer term testing shows this patch didn't play well with MR cache and
caused to call traces during remove_mkeys().

This reverts commit bb7e22a8ab.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Yishai Hadas 7422edce73 IB/mlx5: Allow XRC INI usage via verbs in DEVX context
From device point of view both XRC target and initiator are XRC transport
type.

Fix to use the expected UID as was handled for the XRC target case to
allow its usage via verbs in DEVX context.

Fixes: 5aa3771ded ("IB/mlx5: Allow XRC usage via verbs in DEVX context")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Linus Torvalds f346b0becb Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - large KASAN update to use arm's "software tag-based mode"

 - a few misc things

 - sh updates

 - ocfs2 updates

 - just about all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
  kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
  memcg, oom: notify on oom killer invocation from the charge path
  mm, swap: fix swapoff with KSM pages
  include/linux/gfp.h: fix typo
  mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
  hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  memory_hotplug: add missing newlines to debugging output
  mm: remove __hugepage_set_anon_rmap()
  include/linux/vmstat.h: remove unused page state adjustment macro
  mm/page_alloc.c: allow error injection
  mm: migrate: drop unused argument of migrate_page_move_mapping()
  blkdev: avoid migration stalls for blkdev pages
  mm: migrate: provide buffer_migrate_page_norefs()
  mm: migrate: move migrate_page_lock_buffers()
  mm: migrate: lock buffers before migrate_page_move_mapping()
  mm: migration: factor out code to compute expected number of page references
  mm, page_alloc: enable pcpu_drain with zone capability
  kmemleak: add config to select auto scan
  mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
  ...
2018-12-28 16:55:46 -08:00
Linus Torvalds 5d24ae67a9 4.21 merge window pull request
This has been a fairly typical cycle, with the usual sorts of driver
 updates. Several series continue to come through which improve and
 modernize various parts of the core code, and we finally are starting to
 get the uAPI command interface cleaned up.
 
 - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5,
   qib, rxe, usnic
 
 - Rework the entire syscall flow for uverbs to be able to run over
   ioctl(). Finally getting past the historic bad choice to use write()
   for command execution
 
 - More functional coverage with the mlx5 'devx' user API
 
 - Start of the HFI1 series for 'TID RDMA'
 
 - SRQ support in the hns driver
 
 - Support for new IBTA defined 2x lane widths
 
 - A big series to consolidate all the driver function pointers into
   a big struct and have drivers provide a 'static const' version of the
   struct instead of open coding initialization
 
 - New 'advise_mr' uAPI to control device caching/loading of page tables
 
 - Support for inline data in SRPT
 
 - Modernize how umad uses the driver core and creates cdev's and sysfs
   files
 
 - First steps toward removing 'uobject' from the view of the drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwhV2oACgkQOG33FX4g
 mxpF8A/9EkRCg6wCDC59maA53b5PjuNmD//9hXbycQPQSlxntI2PyYtxrzBqc0+2
 yIaFFMehL41XNN6y1zfkl7ndl62McCH2TpiidU8RyTxVw/e3KsDD5sU6++atfHRo
 M82RNfedDtxPG8TcCPKVLof6JHADApGSR1r4dCYfAnu7KFMyvlLmeYyx4r/2E6yC
 iQPmtKVOdbGkuWGeX+brGEA0vg7FUOAvaysnxddjyh9hyem4h0SUR3Af/Ik0N5ME
 PYzC+hMKbkPVBLoCWyg7QwUaqK37uWwguMQLtI2byF7FgbiK/lBQt6TsidR4Fw3p
 EalL7uqxgCTtLYh918vxLFjdYt6laka9j7xKCX8M8d06sy/Lo8iV4hWjiTESfMFG
 usqs7D6p09gA/y1KISji81j6BI7C92CPVK2drKIEnfyLgY5dBNFcv9m2H12lUCH2
 NGbfCNVaTQVX6bFWPpy2Bt2y/Litsfxw5RviehD7jlG0lQjsXGDkZzsDxrMSSlNU
 S79iiTJyK4kUZkXzrSSlN58pLBlbupJwm5MDjKmM+irsrsCHjGIULvc902qtnC3/
 8ImiTtW6XvqLbgWXyy2Th8/ZgRY234p1ybhog+DFaGKUch0XqB7VXTV2OZm0GjcN
 Fp4PUeBt+/gBgYqjpuffqQc1rI4uwXYSoz7wq9RBiOpw5zBFT1E=
 =T0p1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a fairly typical cycle, with the usual sorts of driver
  updates. Several series continue to come through which improve and
  modernize various parts of the core code, and we finally are starting
  to get the uAPI command interface cleaned up.

   - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4,
     mlx5, qib, rxe, usnic

   - Rework the entire syscall flow for uverbs to be able to run over
     ioctl(). Finally getting past the historic bad choice to use
     write() for command execution

   - More functional coverage with the mlx5 'devx' user API

   - Start of the HFI1 series for 'TID RDMA'

   - SRQ support in the hns driver

   - Support for new IBTA defined 2x lane widths

   - A big series to consolidate all the driver function pointers into a
     big struct and have drivers provide a 'static const' version of the
     struct instead of open coding initialization

   - New 'advise_mr' uAPI to control device caching/loading of page
     tables

   - Support for inline data in SRPT

   - Modernize how umad uses the driver core and creates cdev's and
     sysfs files

   - First steps toward removing 'uobject' from the view of the drivers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits)
  RDMA/srpt: Use kmem_cache_free() instead of kfree()
  RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
  IB/uverbs: Signedness bug in UVERBS_HANDLER()
  IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
  IB/umad: Start using dev_groups of class
  IB/umad: Use class_groups and let core create class file
  IB/umad: Refactor code to use cdev_device_add()
  IB/umad: Avoid destroying device while it is accessed
  IB/umad: Simplify and avoid dynamic allocation of class
  IB/mlx5: Fix wrong error unwind
  IB/mlx4: Remove set but not used variable 'pd'
  RDMA/iwcm: Don't copy past the end of dev_name() string
  IB/mlx5: Fix long EEH recover time with NVMe offloads
  IB/mlx5: Simplify netdev unbinding
  IB/core: Move query port to ioctl
  RDMA/nldev: Expose port_cap_flags2
  IB/core: uverbs copy to struct or zero helper
  IB/rxe: Reuse code which sets port state
  IB/rxe: Make counters thread safe
  IB/mlx5: Use the correct commands for UMEM and UCTX allocation
  ...
2018-12-28 14:57:10 -08:00
Linus Torvalds 938edb8a31 SCSI misc on 20181224
This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
 megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.  Additionally, we have
 a pile of annotation, unused variable and minor updates.  The big API
 change is the updates for Christoph's DMA rework which include
 removing the DISABLE_CLUSTERING flag.  And finally there are a couple
 of target tree updates.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXCEUNiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdjKAP9vrTTv
 qFaYmAoRSbPq9ZiixaXLMy0K/6o76Uay0gnBqgD/fgn3jg/KQ6alNaCjmfeV3wAj
 u1j3H7tha9j1it+4pUw=
 =GDa+
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
  megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.

  Additionally, we have a pile of annotation, unused variable and minor
  updates.

  The big API change is the updates for Christoph's DMA rework which
  include removing the DISABLE_CLUSTERING flag.

  And finally there are a couple of target tree updates"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
  scsi: isci: request: mark expected switch fall-through
  scsi: isci: remote_node_context: mark expected switch fall-throughs
  scsi: isci: remote_device: Mark expected switch fall-throughs
  scsi: isci: phy: Mark expected switch fall-through
  scsi: iscsi: Capture iscsi debug messages using tracepoints
  scsi: myrb: Mark expected switch fall-throughs
  scsi: megaraid: fix out-of-bound array accesses
  scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
  scsi: fcoe: remove set but not used variable 'port'
  scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
  scsi: smartpqi: fix build warnings
  scsi: smartpqi: update driver version
  scsi: smartpqi: add ofa support
  scsi: smartpqi: increase fw status register read timeout
  scsi: smartpqi: bump driver version
  scsi: smartpqi: add smp_utils support
  scsi: smartpqi: correct lun reset issues
  scsi: smartpqi: correct volume status
  scsi: smartpqi: do not offline disks for transient did no connect conditions
  scsi: smartpqi: allow for larger raid maps
  ...
2018-12-28 14:48:06 -08:00
Jérôme Glisse 5d6527a784 mm/mmu_notifier: use structure for invalidate_range_start/end callback
Patch series "mmu notifier contextual informations", v2.

This patchset adds contextual information, why an invalidation is
happening, to mmu notifier callback.  This is necessary for user of mmu
notifier that wish to maintains their own data structure without having to
add new fields to struct vm_area_struct (vma).

For instance device can have they own page table that mirror the process
address space.  When a vma is unmap (munmap() syscall) the device driver
can free the device page table for the range.

Today we do not have any information on why a mmu notifier call back is
happening and thus device driver have to assume that it is always an
munmap().  This is inefficient at it means that it needs to re-allocate
device page table on next page fault and rebuild the whole device driver
data structure for the range.

Other use case beside munmap() also exist, for instance it is pointless
for device driver to invalidate the device page table when the
invalidation is for the soft dirtyness tracking.  Or device driver can
optimize away mprotect() that change the page table permission access for
the range.

This patchset enables all this optimizations for device drivers.  I do not
include any of those in this series but another patchset I am posting will
leverage this.

The patchset is pretty simple from a code point of view.  The first two
patches consolidate all mmu notifier arguments into a struct so that it is
easier to add/change arguments.  The last patch adds the contextual
information (munmap, protection, soft dirty, clear, ...).

This patch (of 3):

To avoid having to change many callback definition everytime we want to
add a parameter use a structure to group all parameters for the
mmu_notifier invalidate_range_start/end callback.  No functional changes
with this patch.

[akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc]
Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>	[infiniband]
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:50 -08:00
Wei Yongjun f617e5ffe0 RDMA/srpt: Use kmem_cache_free() instead of kfree()
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Fixes: 5dabcd0456 ("RDMA/srpt: Add support for immediate data")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-22 16:07:47 -07:00
Dan Carpenter 58f7c0bfb4 RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
The "num_actions" variable needs to be signed for the error handling to
work.  The maximum number of actions is less than 256 so int type is large
enough for that.

Fixes: cbfdd442c4 ("IB/uverbs: Add helper to get array size from ptr attribute")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-22 16:07:13 -07:00
Dan Carpenter 573671a5f6 IB/uverbs: Signedness bug in UVERBS_HANDLER()
The "num_sge" variable needs to be signed for the error handling to work.
The uverbs_attr_ptr_get_array_size() returns int so this change is safe.

Fixes: ad8a449675 ("IB/uverbs: Add support to advise_mr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-22 16:07:13 -07:00
Yishai Hadas aa74be6eea IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
The per-port Q counter is some kernel resource and as such may be used by
few UID(s) upon DEVX usage.

To enable using it for QP/RQ when DEVX context is used need to allocate it
with a sharing mode indication to let firmware allows its usage.

The UID = 0xffff was chosen to mark it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 12:15:07 -07:00
Parav Pandit 75bf8a2a2f IB/umad: Start using dev_groups of class
Start using core defined dev_groups of a class which allows to add device
attributes to the core kernel and simplify the umad module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 11:39:41 -07:00
Parav Pandit cdb53b65ae IB/umad: Use class_groups and let core create class file
Use class->class_groups core kernel facility to create the abi version
file instead of open coding.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 11:39:40 -07:00
Parav Pandit e9dd5daf88 IB/umad: Refactor code to use cdev_device_add()
Refactor code to use cdev_device_add() and do other minor refactors while
modifying these functions as below.

1. Instead of returning generic -1, return an actual error for
   ib_umad_init_port().

2. Introduce and use ib_umad_init_port_dev() for sm and umad char devices.

3. Instead of kobj, use more light weight kref to refcount ib_umad_device.

4. Use modern cdev_device_add() single code cut down three steps of
   cdev_add(), device_create(). This further helps to move device sysfs
   files to class attributes in subsequent patch.

5. Remove few empty lines while refactoring these functions.

6. Use sizeof() instead of sizeof to avoid checkpatch warning.

7. Use struct_size() for calculation of ib_umad_port.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 11:39:38 -07:00
Parav Pandit cf7ad30302 IB/umad: Avoid destroying device while it is accessed
ib_umad_reg_agent2() and ib_umad_reg_agent() access the device name in
dev_notice(), while concurrently, ib_umad_kill_port() can destroy the
device using device_destroy().

        cpu-0                               cpu-1
        -----                               -----
    ib_umad_ioctl()
        [...]                            ib_umad_kill_port()
                                              device_destroy(dev)

        ib_umad_reg_agent()
            dev_notice(dev)

Therefore, first mark ib_dev as NULL, to block any further access in file
ops, unregister the mad agent and destroy the device at the end after
mutex is unlocked.

This ensures that device doesn't get destroyed, while it may get accessed.

Fixes: 0f29b46d49 ("IB/mad: add new ioctl to ABI to support new registration options")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 10:39:36 -07:00
Parav Pandit 900d07c12d IB/umad: Simplify and avoid dynamic allocation of class
Simplify code to have a static structure instance for umad class
allocation.

This will allow to have class attributes defined along with class
registration in subsequent patch and allows more class methods definition
similar to ib_core module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 10:39:36 -07:00
Jason Gunthorpe 623d154305 IB/mlx5: Fix wrong error unwind
The destroy_workqueue on error unwind is missing, and the code jumps to
the wrong exit label.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 20:49:31 -07:00
YueHaibing e7c4d8e604 IB/mlx4: Remove set but not used variable 'pd'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/mlx4/qp.c: In function '_mlx4_ib_destroy_qp':
drivers/infiniband/hw/mlx4/qp.c:1612:22: warning:
 variable 'pd' set but not used [-Wunused-but-set-variable]

Fixes: e00b64f7c5 ("RDMA: Cleanup undesired pd->uobject usage")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 20:47:42 -07:00
Steve Wise d53ec8af56 RDMA/iwcm: Don't copy past the end of dev_name() string
We now use dev_name(&ib_device->dev) instead of ib_device->name in iwpm
messages.  The name field in struct device is a const char *, where as
ib_device->name is a char array of size IB_DEVICE_NAME_MAX, and it is
pre-initialized to zeros.

Since iw_cm_map() was using memcpy() to copy in the device name, and
copying IWPM_DEVNAME_SIZE bytes, it ends up copying past the end of the
source device name string and copying random bytes.  This results in iwpmd
failing the REGISTER_PID request from iwcm.  Thus port mapping is broken.

Validate the device and if names, and use strncpy() to inialize the entire
message field.

Fixes: 896de0090a ("RDMA/core: Use dev_name instead of ibdev->name")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 20:45:56 -07:00
Huy Nguyen bb7e22a8ab IB/mlx5: Fix long EEH recover time with NVMe offloads
On NVMe offloads connection with many IO queues, EEH takes long time to
recover. The culprit is the synchronize_srcu in the destroy_mkey. The
solution is to use synchronize_srcu only for ODP mkey.

Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 16:28:47 -07:00
Or Gerlitz 842a9c837e IB/mlx5: Simplify netdev unbinding
When dealing with netdev unregister events, we just need to know that this
is our currently bounded netdev. There's no need to do any further
checks/queries.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:24 -07:00
Michael Guralnik 641d1207d2 IB/core: Move query port to ioctl
Add a method for query port under the uverbs global methods.  Current
ib_port_attr struct is passed as a single attribute and port_cap_flags2 is
added as a new attribute to the function.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:24 -07:00
Michael Guralnik 4fa2813d26 RDMA/nldev: Expose port_cap_flags2
port_cap_flags2 represents IBTA PortInfo:CapabilityMask2.

The field safely extends the RDMA_NLDEV_ATTR_CAP_FLAGS operand as it was
exported as 64 bit to allow this kind of extension.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:24 -07:00
Michael Guralnik 2e8039c656 IB/core: uverbs copy to struct or zero helper
Add a helper to zero fill fields before copying data to
UVERBS_ATTR_STRUCT.

As UVERBS_ATTR_STRUCT can be used as an extensible struct, we want to make
sure that if the user supplies us with a struct that has new fields that
we are not aware of, we return them zeroed to the user.

This helper should be used when using UVERBS_ATTR_STRUCT for an extendable
data structure and there is a need to make sure that extended members of
the struct, that the kernel doesn't handle, are returned zeroed to the
user. This is needed due to the fact that UVERBS_ATTR_STRUCT allows
non-zero values for members after 'last' member.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:18 -07:00
Yuval Shaia f55c3ec42a IB/rxe: Reuse code which sets port state
Same code is executed in both rxe_param_set_add and rxe_notify functions.
Make one function and call it from both places.

Since both callers already have a rxe object use it directly instead of
deriving it from the net device.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com> 
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 14:20:58 -07:00
Parav Pandit d5108e69fe IB/rxe: Make counters thread safe
Current rxe device counters are not thread safe.
When multiple QPs are used, they can be racy.
Make them thread safe by making it atomic64.

Fixes: 0b1e5b99a4 ("IB/rxe: Add port protocol stats")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 14:09:45 -07:00
Yishai Hadas 6e3722baac IB/mlx5: Use the correct commands for UMEM and UCTX allocation
During testing the command format was changed to close a security
hole. Revise the driver to use the command format that will actually be
supported in GA firmware.

Both the UMEM and UCTX are intended only for use by the kernel and cannot
be executed using a general command.

Since the UMEM and CTX are not part of the general object the caps bits
were moved to be some log_xxx location in the general HCA caps.

The firmware code was adapted as well to match the above.

Fixes: a8b92ca1b0 ("IB/mlx5: Introduce DEVX")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 13:49:48 -07:00
Yishai Hadas 425518cc5e IB/mlx5: Use uid as part of alloc/dealloc transport domain
Use uid as part of alloc/dealloc transport domain to let firmware manages
the resources correctly.

Fixes: d2d19121ae ("IB/mlx5: Set uid as part of TD commands")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 13:39:41 -07:00
Jason Gunthorpe ed50edfb72 Merge branch 'mlx5-next' into rdma.git
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

mlx5 updates taken for dependencies on following patches.

* branche 'mlx5-next': (23 commits)
  IB/mlx5: Introduce uid as part of alloc/dealloc transport domain
  net/mlx5: Add shared Q counter bits
  net/mlx5: Continue driver initialization despite debugfs failure
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c
  net/mlx5: Remove the get protocol device interface entry
  net/mlx5: Support extended destination format in flow steering command
  net/mlx5: E-Switch, Change vhca id valid bool field to bit flag
  net/mlx5: Introduce extended destination fields
  net/mlx5: Revise gre and nvgre key formats
  net/mlx5: Add monitor commands layout and event data
  net/mlx5: Add support for plugged-disabled cable status in PME
  net/mlx5: Add support for PCIe power slot exceeded error in PME
  net/mlx5: Rework handling of port module events
  net/mlx5: Move flow counters data structures from flow steering header
  ...
2018-12-20 13:24:50 -07:00
David S. Miller 2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
Devesh Sharma bd1c24ccf9 RDMA/bnxt_re: Increase depth of control path command queue
Increasing the depth of control path command queue to 8K entries to handle
burst of commands. This feature needs support from FW and the driver/fw
compatibility is checked from the interface version number.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:37:33 -07:00
Selvin Xavier 2b827ea192 RDMA/bnxt_re: Query HWRM Interface version from FW
Get HWRM interface major, minor, build and patch version from FW for
checking the FW/Driver compatibility.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:37:32 -07:00
Parvi Kaustubhi 8036e90f92 IB/usnic: Fix potential deadlock
Acquiring the rtnl lock while holding usdev_lock could result in a
deadlock.

For example:

usnic_ib_query_port()
| mutex_lock(&us_ibdev->usdev_lock)
 | ib_get_eth_speed()
  | rtnl_lock()

rtnl_lock()
| usnic_ib_netdevice_event()
 | mutex_lock(&us_ibdev->usdev_lock)

This commit moves the usdev_lock acquisition after the rtnl lock has been
released.

This is safe to do because usdev_lock is not protecting anything being
accessed in ib_get_eth_speed(). Hence, the correct order of holding locks
(rtnl -> usdev_lock) is not violated.

Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:30:16 -07:00
Gal Pressman 50c582de1d RDMA/bnxt_re: Make use of destroy AH sleepable flag
When in a sleepable (non-atomic) context, wait for firmware completion
instead of polling for it.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:28:04 -07:00
Gal Pressman 90e3edd8cc RDMA/bnxt_re: Make use of create AH sleepable flag
When in a sleepable (non-atomic) context, wait for firmware completion
instead of polling for it.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:28:03 -07:00
Gal Pressman 2553ba217e RDMA: Mark if destroy address handle is in a sleepable context
Introduce a 'flags' field to destroy address handle callback and add a
flag that marks whether the callback is executed in an atomic context or
not.

This will allow drivers to wait for completion instead of polling for it
when it is allowed.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:28:03 -07:00
Gal Pressman b090c4e3a0 RDMA: Mark if create address handle is in a sleepable context
Introduce a 'flags' field to create address handle callback and add a flag
that marks whether the callback is executed in an atomic context or not.

This will allow drivers to wait for completion instead of polling for it
when it is allowed.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:17:19 -07:00
Bart Van Assche 5dabcd0456 RDMA/srpt: Add support for immediate data
Modify allocation of the non-SRQ receive queues such that immediate
data is aligned on a 512 byte boundary. That alignment is necessary
to pass the immediate data without copying to the block layer. When
receiving an SRP_CMD with immediate data, postpone the ib_post_recv()
call until target_execute_cmd() has finished. See also
srpt_release_cmd().

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:26 -05:00
Bart Van Assche 82305f8235 RDMA/srpt: Rework the srpt_alloc_srq() error path
This patch does not change any functionality but makes the next patch
easier to read.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:26 -05:00
Bart Van Assche 6feb64ffda RDMA/srpt: Remove driver version and release date
Neither a driver version number nor a release data is useful in
an upstream driver. Remove the word "InfiniBand" from the driver
description because recently RoCE support has been added to this
driver.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:26 -05:00
Bart Van Assche c4bbe911c2 RDMA/srpt: Make kernel-doc headers complete
Add documentation for those structure members for which it is missing.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:26 -05:00
Bart Van Assche 75d79b801c RDMA/srpt: Join split strings
Make sure that long strings occur on a single line as required by the
coding standard.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:26 -05:00
Bart Van Assche ffd5980695 RDMA/srpt: Improve coding style conformance
Use tabs instead of spaces for indentation. Make sure that multi-line
expressions have the operator at the end of a line instead of the start.
Avoid a complaint about a missing space in a ternary expression by
changing '(boolean) ? 1: 0' into 'boolean'.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:26 -05:00
Bart Van Assche ed041919f0 RDMA/srpt: Fix a use-after-free in the channel release code
This patch avoids that KASAN sporadically reports the following:

BUG: KASAN: use-after-free in rxe_run_task+0x1e/0x60 [rdma_rxe]
Read of size 1 at addr ffff88801c50d8f4 by task check/24830

CPU: 4 PID: 24830 Comm: check Not tainted 4.20.0-rc6-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x86/0xca
 print_address_description+0x71/0x239
 kasan_report.cold.5+0x242/0x301
 __asan_load1+0x47/0x50
 rxe_run_task+0x1e/0x60 [rdma_rxe]
 rxe_post_send+0x4bd/0x8d0 [rdma_rxe]
 srpt_zerolength_write+0xe1/0x160 [ib_srpt]
 srpt_close_ch+0x8b/0xe0 [ib_srpt]
 srpt_set_enabled+0xe7/0x150 [ib_srpt]
 srpt_tpg_enable_store+0xc0/0x100 [ib_srpt]
 configfs_write_file+0x157/0x1d0
 __vfs_write+0xd7/0x3d0
 vfs_write+0x102/0x290
 ksys_write+0xab/0x130
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x71/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Allocated by task 13856:
 save_stack+0x43/0xd0
 kasan_kmalloc+0xc7/0xe0
 kasan_slab_alloc+0x11/0x20
 kmem_cache_alloc+0x105/0x320
 rxe_alloc+0xff/0x1f0 [rdma_rxe]
 rxe_create_qp+0x9f/0x160 [rdma_rxe]
 ib_create_qp+0xf5/0x690 [ib_core]
 rdma_create_qp+0x6a/0x140 [rdma_cm]
 srpt_cm_req_recv.cold.59+0x1588/0x237b [ib_srpt]
 srpt_rdma_cm_req_recv.isra.35+0x1d5/0x220 [ib_srpt]
 srpt_rdma_cm_handler+0x6f/0x100 [ib_srpt]
 cma_listen_handler+0x59/0x60 [rdma_cm]
 cma_ib_req_handler+0xd5b/0x2570 [rdma_cm]
 cm_process_work+0x2e/0x110 [ib_cm]
 cm_work_handler+0x2aae/0x502b [ib_cm]
 process_one_work+0x481/0x9e0
 worker_thread+0x67/0x5b0
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

Freed by task 3440:
 save_stack+0x43/0xd0
 __kasan_slab_free+0x139/0x190
 kasan_slab_free+0xe/0x10
 kmem_cache_free+0xbc/0x330
 rxe_elem_release+0x66/0xe0 [rdma_rxe]
 rxe_destroy_qp+0x3f/0x50 [rdma_rxe]
 ib_destroy_qp+0x140/0x360 [ib_core]
 srpt_release_channel_work+0xdc/0x310 [ib_srpt]
 process_one_work+0x481/0x9e0
 worker_thread+0x67/0x5b0
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche 882981f4a4 RDMA/srp: Add support for immediate data
Request permission to send immediate data during login. If the SRP
target grants this request, send the payload of write requests <= 8 KB
as immediate data.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche 513d564711 RDMA/srp: Rework handling of the maximum information unit length
Move the maximum initiator to target information unit length parameter
from struct srp_target_port into struct srp_rdma_ch. This patch does
not change any functionality but makes the next patch easier to read.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche 4f6d498c36 RDMA/srp: Move srp_rdma_ch.max_ti_iu_len declaration
Since srp_rdma_ch.max_ti_iu_len is used in the hot path, move it to the
section with data structure members used in the hot path.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche 2ee00f6a98 RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
This patch avoids that the SCSI mid-layer keeps retrying forever if
ib_post_send() fails. This was discovered while testing immediate
data support and passing a too large num_sge value to ib_post_send().

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche 482fffc43c RDMA/srp: Handle large SCSI CDBs correctly
Reserve additional space for CDBs that contain more than sixteen bytes
and set the add_cdb_len field for such CDBs as required. From the SRP
standard: "The ADDITIONAL CDB LENGTH field contains the length in
dwords of the ADDITIONAL CDB field."

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche e37df2d5b5 RDMA/srp: Document srp_parse_in() arguments
This patch avoids that a warning is reported when building with W=1.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche 16d14e01b7 include/scsi/srp.h: Add support for immediate data
Add constants and data structures to support immediate data. These
changes conform to SRP2r04.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Bart Van Assche feafa20433 include/scsi/srp.h: Move response flag definitions into this file
This patch moves all constants that come from the SRP standard into the
include/scsi/srp.h header file.

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 15:07:25 -05:00
Doug Ledford c9e585ebdc IB/mlx5: Fix compile issue when ODP disabled
When CONFIG_INFINIBAND_ON_DEMAND_PAGING is not enabled, we were getting
build failures for defined but not used code.  Fix that.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 13:43:17 -05:00
Christoph Hellwig 2a3d4eb8e2 scsi: flip the default on use_clustering
Most SCSI drivers want to enable "clustering", that is merging of
segments so that they might span more than a single page.  Remove the
ENABLE_CLUSTERING define, and require drivers to explicitly set
DISABLE_CLUSTERING to disable this feature.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-18 23:13:12 -05:00
Shamir Rabinovitch e00b64f7c5 RDMA: Cleanup undesired pd->uobject usage
Drivers should be using udata to determine if a method is invoked from
user space or kernel space. A pd does not necessarily say a different
objects is kernel or user.

Transforming the tests to use udata eliminates a large number of uobject
references from the drivers.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 19:15:48 -07:00
Shamir Rabinovitch af8d70375d RDMA/restrack: Resource-tracker should not use uobject pointers
Having uobject pointer embedded in ib core objects is not aligned with a
future shared ib_x model. The resource tracker only does this to keep
track of user/kernel objects - track this directly instead.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 15:38:26 -07:00
Moni Shoua 813e90b1ae IB/mlx5: Add advise_mr() support
The verb advise_mr() is used to give advice to the kernel about an address
range that belongs to a MR.  Implement the verb and register it on the
device. The current implementation supports the only known advice to date,
prefetch.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 15:26:19 -07:00
Moni Shoua ad8a449675 IB/uverbs: Add support to advise_mr
Add new ioctl method for the MR object - ADVISE_MR.

This command can be used by users to give an advice or directions to the
kernel about an address range that belongs to memory regions.

A new ib_device callback, advise_mr(), is introduced here to suupport the
new command. This command takes the following arguments:

- pd:		The protection domain to which all memory regions belong
- advice: 	The type of the advice
	  	* IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH - Pre-fetch a range of
		an on-demand paging MR
	  	* IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE - Pre-fetch a range
		of an on-demand paging MR with write intention
- flags:	The properties of the advice
		* IB_UVERBS_ADVISE_MR_FLAG_FLUSH - Operation must end before
		return to the caller
- sg_list:	The list of memory ranges
- num_sge:	The number of memory ranges in the list
- attrs:	More attributes to be parsed by the provider

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 15:19:46 -07:00
Moni Shoua cbfdd442c4 IB/uverbs: Add helper to get array size from ptr attribute
When the parser of an ioctl command has the knowledge that a ptr attribute
in a bundle represents an array of structures, it is useful for it to know
the number of elements in the array. This is done by dividing the
attribute length with the element size.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 15:19:45 -07:00
Parav Pandit bbc13cda37 RDMA/uverbs: Add an ioctl method to destroy an object
Add an ioctl method to destroy the PD, MR, MW, AH, flow, RWQ indirection
table and XRCD objects by handle which doesn't require any output response
during destruction.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:59:57 -07:00
Jason Gunthorpe 149d3845f4 RDMA/uverbs: Add a method to introspect handles in a context
Introduce a helper function gather_objects_handle() to copy object handles
under a spin lock.

Expose these objects handles via the uverbs ioctl interface.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-18 14:54:54 -07:00
Yuval Shaia 350b4c8ac1 IB/mlx4: Utilize macro to calculate SQ spare size
The macro MLX4_IB_SQ_HEADROOM calculates the spare room needed to be
left. Use it instead of hard-coding the HW prefetch size.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:24:34 -07:00
Dan Carpenter e9dfa53a39 RDMA/hns: Fix an error code in hns_roce_create_srq()
The function accidentally returns success on this error path.

Fixes: c7bcb13442 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:21:45 -07:00
Dan Carpenter 5050ae5fa3 IB/qib: Fix an error code in qib_sdma_verbs_send()
We accidentally return success on this error path.

Fixes: f931551baf ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:20:41 -07:00
Parav Pandit be5914c124 RDMA/core: Delete RoCE GID in hw when corresponding IP is deleted
Currently a RoCE GID entry is removed from the hardware when all
references to the GID entry drop to zero. This is a change in behavior
from before the fixed patch. The GID entry should be removed from the
hardware when GID entry deletion is requested. This allows the driver
terminate ongoing traffic through the RoCE GID.

While a GID is deleted from the hardware, GID slot in the software GID
cache is not freed. GID slot is freed once all references of such GID are
dropped. This continue to ensure that such GID slot of hardware is not
allocated to new GID entry allocation request. It is allocated once all
references to GID entry drop.

This approach allows drivers to put a tombestone of some kind on the HW
GID index to block the traffic.

Fixes: b150c3862d ("IB/core: Introduce GID entry reference counts")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:16:44 -07:00
Gal Pressman ac2f7e623d RDMA/mlx5: Fix function name typo 'fileds' -> 'fields'
Fix typo in 'set_mr_fileds' -> 'set_mr_fields'.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:15:24 -07:00
Kamal Heib b81a327dbc RDMA/i40iw: Make sure to initialize ib_device_ops
The initialization of the ib_device_ops was dropped by mistake when
rebasing the ib_device_ops series, this patch fixes that.

Fixes: 15644f57cb ("RDMA/i40iw: Initialize ib_device_ops struct")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:12:26 -07:00
Leon Romanovsky 8e3b688301 RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion
Handle atomic was left as unimplemented from 2013, remove the code till
this part will be developed.

Remove the dead code by simplifying SW completion logic which is supposed
to be the same for send and receive paths.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # compile tested
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 13:59:34 -07:00
Jason Gunthorpe 4785860e04 RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers
Now that the handlers do not process their own udata we can make a
sensible ioctl that wrappers them. The ioctl follows the same format as
the write_ex() and has the user explicitly specify the core and driver
in/out opaque structures and a command number.

This works for all forms of write commands.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-18 14:12:48 -05:00
Aviv Heller 7c34ec19e1 net/mlx5: Make RoCE and SR-IOV LAG modes explicit
With the introduction of SR-IOV LAG, checking whether LAG is active
is no longer good enough, since RoCE and SR-IOV LAG each entails
different behavior by both the core and infiniband drivers.

This patch introduces facilities to discern LAG type, in addition to
mlx5_lag_is_active(). These are implemented in such a way as to allow
more complex mode combinations in the future.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:55 -08:00
Saeed Mahameed 64e4cf0dab Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:
1) Lag refactroing and flow counter affinity bits.
2) mlx5 core cleanups

By Roi Dayan (2) and others
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 11:15:25 -08:00
Mark Bloch 06cc74af05 IB/mlx5: Unify e-switch representors load approach between uplink and VFs
When in switchdev mode and the add function is called by the core
level driver, make sure we only register the callbacks, but don't
create the mlx5 IB device or initialize anything. With this change
all the IB devices in switchdev mode are created only once the load
callback is invoked by the e-switch core sub-module. This follows
the design paradigm under which the all the Eth representors must
be loaded before any of IB reprs is loaded.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Mark Zhang 37fbd834b4 IB/core: Fix oops in netdev_next_upper_dev_rcu()
When support for bonding of RoCE devices was added, there was
necessarily a link between the RoCE device and the paired netdevice that
was part of the bond.  If you remove the mlx4_en module, that paired
association is broken (the RoCE device is still present but the paired
netdevice has been released).  We need to account for this in
is_upper_ndev_bond_master_filter() and filter out those links with a
broken pairing or else we later oops in netdev_next_upper_dev_rcu().

Fixes: 408f1242d9 ("IB/core: Delete lower netdevice default GID entries in bonding scenario")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-12 12:14:49 -05:00
Kamal Heib 3023a1e936 RDMA: Start use ib_device_ops
Make all the required change to start use the ib_device_ops structure.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:16 -07:00
Kamal Heib 02a42f8e40 RDMA/rdmavt: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops() and remove the use of check_driver_override().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:15 -07:00
Kamal Heib 573efc4b3c RDMA/rxe: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:15 -07:00
Kamal Heib 20a6b58861 RDMA/vmw_pvrdma: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:15 -07:00
Kamal Heib e761058190 RDMA/usnic: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:15 -07:00
Kamal Heib 16b0ba9571 RDMA/qib: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:14 -07:00
Kamal Heib bd59461e57 RDMA/qedr: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:14 -07:00
Kamal Heib a263c1241a RDMA/ocrdma: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:14 -07:00
Kamal Heib 5a6c6e71ac RDMA/nes: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:14 -07:00
Kamal Heib 56e2a43136 RDMA/mthca: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:13 -07:00
Kamal Heib 96458233ee RDMA/mlx5: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:13 -07:00
Kamal Heib 4725c4ba8d RDMA/mlx4: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:15:08 -07:00
Kamal Heib 15644f57cb RDMA/i40iw: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:15:08 -07:00
Kamal Heib 7f645a58d0 RDMA/hns: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:15:08 -07:00
Kamal Heib e3c320caa1 RDMA/hfi1: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:15:07 -07:00
Kamal Heib dad3b05d05 RDMA/cxgb4: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:15:07 -07:00
Kamal Heib 071b2ca40a RDMA/cxgb3: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:15:07 -07:00
Kamal Heib 9615f86be9 RDMA/bnxt_re: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:15:07 -07:00
Kamal Heib 521ed0d92a RDMA/core: Introduce ib_device_ops
This change introduces the ib_device_ops structure that defines all the
InfiniBand device operations in one place, so the code will be more
readable and clean, unlike today when the ops are mixed with ib_device
data members.

The providers will need to define the supported operations and assign them
using ib_set_device_ops(), that will also make the providers code more
readable and clean.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 15:14:09 -07:00
Yuval Shaia 2dd8e44cb4 IB/mlx4: Remove unneeded NULL check
NULL check for kfree is unnecessary, remove it.

Fixes: b42dde478b ("IB/mlx4: Rework special QP creation error path")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:18 -07:00
Leon Romanovsky 8cc0698f46 RDMA/ocrdma: Use PCI-ID as an identification in debugfs
In opposite to current implementation of identification based on device
name, PCI-ID identification provides stable names suitable for device
rename. Change ocrdma debugfs representation to use PCI-ID.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:17 -07:00
Leon Romanovsky 9435ef4cae RDMA/uverbs: Optimize clearing of extra bytes in response
Clear extra bytes in response in batch manner instead
of doing it per-byte.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:17 -07:00
Gal Pressman a276a4d93b RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
Create address handle callback should not sleep, use GFP_ATOMIC instead of
GFP_KERNEL for memory allocation.

Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Cc: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:17 -07:00
Yuval Shaia 59590b8ad2 IB/{mlx5,ocrdma,qedr,rxe}: Omit port validation from IB verbs
RDMA core layer already make sure port is valid, no need to check it here
again.

For the pkey validation this depends on commit b3ac5742fead ("RDMA/core:
Validate port number in query_pkey verb")

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
Leon Romanovsky a1462351b5 RDMA/mlx5: Fail early if user tries to create flows on IB representors
IB representors don't support creation of RAW ethernet QP flows.  Disable
them by reusing existing RDMA/core support macros.  We do it for both
creation and matcher because latter is not usable if no flow creation is
available.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
YueHaibing f94e02ddfd IB/mlx5: Remove duplicated include from mlx5_ib.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
Michael Guralnik d764970bce IB/mlx5: Add 2X width support to query_port
Report to the user 2x width over MAD interface.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
Jason Gunthorpe 28ab1bb0e8 Linux 4.20-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwNpb0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwGwH/00UHnXfxww3ixxz
 zwTVDzptA6SPm6s84yJOWatM5fXhPiAltZaHSYF9lzRzNU71NCq7Frhq3fQUIXKM
 OxqDn9nfSTWcjWTk2q5n2keyRV/KIn67YX7UgqFc1bO/mqtVjEgNWaMyblhI+e9E
 giu1ZXayHr43jK1cDOmGExZubXUq7Vsc9TOlrd+d2SwIqeEP7TCMrPhnHDwCNvX2
 UU5dtANpVzGtHaBcr37wJj+L8kODCc0f+PQ3g2ar5jTHst5SLlHp2u0AMRnUmgdi
 VkGx+mu/uk8mtwUqMIMqhplklVoqK6LTeLqsY5Xt32SKruw9UqyJGdphLjW2QP/g
 MkmA1lI=
 =7kaD
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc6' into rdma.git for-next

For dependencies in following patches.
2018-12-11 14:24:57 -07:00
Michael Guralnik b874155a5f IB/mlx5: Add HDR speed support to query port
Report HDR speed when HDR is supported in CapabilityMask2 and the actual
speed is HDR.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 13:22:46 -07:00
Michael Guralnik 4106a758f7 IB/mlx5: Report CapabilityMask2 in ib_query_port
CapabilityMask2 exists when IB_PORT_CAP_MASK2_SUP is set in the original
capability mask. In such cases, query its value and report it in query
port.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 13:22:45 -07:00
Michael Guralnik a5a5d19936 IB/core: Add new IB rates
Add the new rates that were added to Infiniband spec as part of HDR and 2x
support.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 13:22:45 -07:00
Yuval Shaia 6db21d8986 IB/rxe: Fix incorrect cache cleanup in error flow
Array iterator stays at the same slot, fix it.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 12:26:35 -07:00
Lijun Ou 0c1c388044 RDMA/hns: Bugfix for RoCE loopback test
This patch implements a cmdq to enable the loopback of ssu module
according to the modified hardware desgin.

The ssu consists of ingress unit, packet buffer and programmable packet
process unit. if the loopback bit of ssu is not enabled, the roce packet
with loopback bit will fail.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 12:04:06 -07:00
Lijun Ou f747b68945 RDMA/hns: Update posting & querying mailbox
This patch updates the implementation of the mailbox command interface by
using command queue instead of operating registers.  With this update, the
software can be well decoupled with the hardware.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 12:04:06 -07:00
Lijun Ou 4af07f01f7 RDMA/hns: Fix the bug while use multi-hop of pbl
It will prevent multiply overflow when defines the pbl for u64 type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 12:04:06 -07:00
Lijun Ou 233673e422 RDMA/hns: Encapsulate and simplify qp state transition
This patch move the codes of qp state transition into the new function as
well as simplify the logic for other qp states transition.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 12:04:06 -07:00
Lijun Ou 9f50710103 RDMA/hns: Init qp context when modify qp from reset to init
It needs to clear qp context previous when init qp context.  Otherwise,
the newly created qp context residue has the contents of the qp context
before the uninstall, and the qp context content is disordered, causing
the task to fail.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 12:04:06 -07:00
Saeed Mahameed 2f62747c77 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:

1) RDMA ODP  (On Demand Paging) improvements and moving ODP logic to
mlx5 RDMA driver
2) Improved mlx5 core driver and device events handling and provided API
for upper layers to subscribe to device events.
3) RDMA only code cleanup from mlx5 core
4) Add helper to get CQE opcode
5) Rework handling of port module events
6) shared mlx5_ifc.h updates to avoid conflicts

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 15:50:50 -08:00
Oz Shlomo 5886a96ad1 net/mlx5: Revise gre and nvgre key formats
GRE RFC defines a 32 bit key field. NVGRE RFC splits the 32 bit
key field to 24 bit VSID (gre_key_h) and 8 bit flow entropy (gre_key_l).

Define the two key parsing alternatives in a union, thus enabling both
access methods.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 14:00:08 -08:00
David S. Miller 4cc1feeb6f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several conflicts, seemingly all over the place.

I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.

The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.

The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.

cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.

__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy.  Or at least I think it was :-)

Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.

The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09 21:43:31 -08:00
Tariq Toukan bdefffd13b IB/mlx5: Use helper to get CQE opcode
Use the new helper that extracts the opcode
from a CQE (completion queue entry) structure.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-09 18:16:16 -08:00
Yuval Shaia 9af3f5cf9d RDMA/core: Validate port number in query_pkey verb
Before calling the driver's function let's make sure port is valid.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07 14:33:09 -07:00
Jason Gunthorpe fe15bcc6e2 Merge branch 'mlx5-packet-credit-fc' into rdma.git
Danit Goldberg says:

Packet based credit mode

Packet based credit mode is an alternative end-to-end credit mode for QPs
set during their creation. Credits are transported from the responder to
the requester to optimize the use of its receive resources.  In
packet-based credit mode, credits are issued on a per packet basis.

The advantage of this feature comes while sending large RDMA messages
through switches that are short in memory.

The first commit exposes QP creation flag and the HCA capability. The
second commit adds support for a new DV QP creation flag. The last commit
report packet based credit mode capability via the MLX5DV device
capabilities.

* branch 'mlx5-packet-credit-fc':
  IB/mlx5: Report packet based credit mode device capability
  IB/mlx5: Add packet based credit mode support
  net/mlx5: Expose packet based credit mode

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07 13:25:12 -07:00
Danit Goldberg 7e11b911b5 IB/mlx5: Report packet based credit mode device capability
Report packet based credit mode capability via the mlx5 DV interface.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07 13:22:09 -07:00
Danit Goldberg 569c665150 IB/mlx5: Add packet based credit mode support
The device can support two credit modes, message based (default) and
packet based. In order to enable packet based mode, the QP should be
created with special flag that indicates this.

This patch adds support for the new DV QP creation flag that can be used
for RC QPs in order to change the credit mode.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07 13:22:09 -07:00
Yuval Shaia e7521d82b3 IB/rxe: Utilize generic function to validate port number
Utilize rdma_is_port_valid to validate the given port.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 21:18:00 -07:00
Yuval Shaia 1ceb25c885 IB/rxe: Make function rxe_pool_cleanup return void
Since the function always returns 0 make it void.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 21:16:49 -07:00
Alex Vesker 419822c8b8 IB/mlx5: Enable TX on a DEVX flow table
Flow table can be passed as a DEVX object which is a valid destination in
an EGRESS flow. Fix the original code to allow that.

Fixes: a7ee18bdee ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 20:33:09 -07:00
Qian Cai 0fbc9b8b4e mlx4: Use snprintf instead of complicated strcpy
This fixes a compilation warning in sysfs.c

drivers/infiniband/hw/mlx4/sysfs.c:360:2: warning: 'strncpy' output may be
truncated copying 8 bytes from a string of length 31
[-Wstringop-truncation]

By eliminating the temporary stack buffer.

Signed-off-by: Qian Cai <cai@gmx.us>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 20:23:06 -07:00
Mike Marciniszyn 9aefcabe57 IB/hfi1: Reduce lock contention on iowait_lock for sdma and pio
Commit 4e045572e2 ("IB/hfi1: Add unique txwait_lock for txreq events")
laid the ground work to support per resource waiting locking.

This patch adds that with a lock unique to each sdma engine and pio
sendcontext and makes necessary changes for verbs, PSM, and vnic to use
the new locks.

This is particularly beneficial for smaller messages that will exhaust
resources at a faster rate.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 20:15:36 -07:00
Mike Marciniszyn 18912c4524 IB/hfi1: Close VNIC sdma_progress sleep window
The call to sdma_progress() is called outside the wait lock.

In this case, there is a race condition where sdma_progress() can return
false and the sdma_engine can idle.  If that happens, there will be no
more sdma interrupts to cause the wakeup and the vnic_sdma xmit will hang.

Fix by moving the lock to enclose the sdma_progress() call.

Also, delete the tx_retry. The need for this was removed by:
commit bcad29137a ("IB/hfi1: Serve the most starved iowait entry first")

Fixes: 64551ede6c ("IB/hfi1: VNIC SDMA support")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 20:15:36 -07:00