Commit graph

5728 commits

Author SHA1 Message Date
Linus Torvalds 84e39eeb08 Second round of merge items for 4.8
- hfi1 driver updates
 - Fix for max SGEs allowed via RDMA R/W API
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXoqUzAAoJELgmozMOVy/dNKAP/1/Rzn/k97eda1qFqzWpqsPl
 lMaxDiZZnRIAFJEqEF9Iwo1JLiFIzjpDJnqHB++CKuXZQT0NY6sHW0yrcyUwzsx7
 5gui92ldkVg4vY7PTco171vyzG+79KKRZ1dFS14z7oC8XAg48zQ7yJmfb1op3dEw
 mgxyoLaaMwMF5aLwPoWG4+aPkBMtKUGB/ARb4ehq6M2p71c43lb18GaarJuWLdAz
 1HxakXL/uzttyvGDyJGKDrT6ktXXSyvdCTRO60OrrPFJ67P2xRYXce85TLRr8srp
 Q5RNjyR5fP8uN0qtrQz+hl09mtBeBQHKomyFIOVwkB2r53OKqsR5g5roz3BlpA1X
 7PF/MO0pKy4t8XQnLfohEwtNWgszupvxkyAAISI8MwzLOPra/V8smQ9CpTltx1UB
 hTu3tpAMy1auAjh8TWzzzII1ZoRZz6YCTziWnTaC3bqAljufjt1mnvjrtNmQ1sNi
 MCLeA3yr8HjlKWdwYr+gVfhSR1wEoOxwHZdLsvBsxmC32hFLlh6rbg2x8wceqTlR
 4T8l0AERV1YPjsoSe3/pWVImKUA97qppIfeFcCZiBCBHBPlhpw3ebVt6B1mLVUCV
 hTMuZeFVcV75D+qr0kR5ZuVn4jgEn9zB1VH3tCV9LJnhBfySZFcP4yhATqiELaHG
 RVoVAiTBxq5RgNVOH4Zo
 =cQcp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull second round of rdma updates from Doug Ledford:
 "This can be split out into just two categories:

   - fixes to the RDMA R/W API in regards to SG list length limits
     (about 5 patches)

   - fixes/features for the Intel hfi1 driver (everything else)

  The hfi1 driver is still being brought to full feature support by
  Intel, and they have a lot of people working on it, so that amounts to
  almost the entirety of this pull request"

* tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits)
  IB/hfi1: Add cache evict LRU list
  IB/hfi1: Fix memory leak during unexpected shutdown
  IB/hfi1: Remove unneeded mm argument in remove function
  IB/hfi1: Consistently call ops->remove outside spinlock
  IB/hfi1: Use evict mmu rb operation
  IB/hfi1: Add evict operation to the mmu rb handler
  IB/hfi1: Fix TID caching actions
  IB/hfi1: Make the cache handler own its rb tree root
  IB/hfi1: Make use of mm consistent
  IB/hfi1: Fix user SDMA racy user request claim
  IB/hfi1: Fix error condition that needs to clean up
  IB/hfi1: Release node on insert failure
  IB/hfi1: Validate SDMA user iovector count
  IB/hfi1: Validate SDMA user request index
  IB/hfi1: Use the same capability state for all shared contexts
  IB/hfi1: Prevent null pointer dereference
  IB/hfi1: Rename TID mmu_rb_* functions
  IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
  IB/hfi1: Restructure hfi1_file_open
  IB/hfi1: Make iovec loop index easy to understand
  ...
2016-08-04 20:26:31 -04:00
Linus Torvalds 0cda611386 Round one of 4.8 code
- Updates/fixes for iw_cxgb4 driver
 - Updates/fixes for mlx5 driver
 - Add flow steering and RSS API
 - Add hardware stats to mlx4 and mlx5 drivers
 - Add firmware version API for RDMA driver use
 - Add the rxe driver (this is a software RoCE driver that makes any
   Ethernet device a RoCE device)
 - Fixes for i40iw driver
 - Support for send only multicast joins in the cma layer
 - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq
 3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+
 GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu
 5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5
 phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq
 7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ
 NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8
 2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw
 xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0
 REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8
 3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq
 kwy96TvYtT43SkyNmmBf
 =oZOO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull base rdma updates from Doug Ledford:
 "Round one of 4.8 code: while this is mostly normal, there is a new
  driver in here (the driver was hosted outside the kernel for several
  years and is actually a fairly mature and well coded driver).  It
  amounts to 13,000 of the 16,000 lines of added code in here.

  Summary:

   - Updates/fixes for iw_cxgb4 driver
   - Updates/fixes for mlx5 driver
   - Add flow steering and RSS API
   - Add hardware stats to mlx4 and mlx5 drivers
   - Add firmware version API for RDMA driver use
   - Add the rxe driver (this is a software RoCE driver that makes any
     Ethernet device a RoCE device)
   - Fixes for i40iw driver
   - Support for send only multicast joins in the cma layer
   - Other minor fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
  Soft RoCE driver
  IB/core: Support for CMA multicast join flags
  IB/sa: Add cached attribute containing SM information to SA port
  IB/uverbs: Fix race between uverbs_close and remove_one
  IB/mthca: Clean up error unwind flow in mthca_reset()
  IB/mthca: NULL arg to pci_dev_put is OK
  IB/hfi1: NULL arg to sc_return_credits is OK
  IB/mlx4: Add diagnostic hardware counters
  net/mlx4: Query performance and diagnostics counters
  net/mlx4: Add diagnostic counters capability bit
  Use smaller 512 byte messages for portmapper messages
  IB/ipoib: Report SG feature regardless of HW UD CSUM capability
  IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
  IB/hfi1: Disable by default
  IB/rdmavt: Disable by default
  IB/mlx5: Fix port counter ID association to QP offset
  IB/mlx5: Fix iteration overrun in GSI qps
  i40iw: Add NULL check for puda buffer
  i40iw: Change dup_ack_thresh to u8
  i40iw: Remove unnecessary check for moving CQ head
  ...
2016-08-04 20:10:31 -04:00
Doug Ledford 7f1d25b47d Merge branches 'misc' and 'rxe' into k.o/for-4.8-1 2016-08-04 11:13:47 -04:00
Moni Shoua 8700e3e7c4 Soft RoCE driver
Soft RoCE (RXE) - The software RoCE driver

ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device.  This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.

The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.

A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices.  The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.

Architecture:

     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
+---------------------------------------------------------------+
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Soft RoCE resources:

[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04 11:13:12 -04:00
Krzysztof Kozlowski 00085f1efa dma-mapping: use unsigned long for dma_attrs
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Doug Ledford 7c41765d8c Merge branches 'hfi1' and 'sge-limit' into k.o/for-4.8-2 2016-08-03 21:51:20 -04:00
Alex Vesker ab15c95a17 IB/core: Support for CMA multicast join flags
Added UCMA and CMA support for multicast join flags. Flags are
passed using UCMA CM join command previously reserved fields.
Currently supporting two join flags indicating two different
multicast JoinStates:

1. Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   and receive messages from the MCG.

2. Send Only Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   but doesn't receive any messages from the MCG.

   IB: Send Only Full Member requires a query of ClassPortInfo
       to determine if SM/SA supports this option. If SM/SA
       doesn't support Send-Only there will be no join request
       sent and an error will be returned.

   ETH: When Send Only Full Member is requested no IGMP join
	will be sent.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:06:46 -04:00
Alex Vesker 3d3fd74239 IB/sa: Add cached attribute containing SM information to SA port
Added a new SA port attribute containing SM ClassPortInfo fields,
(ClassPortInfo fields: Table 126 IB Spec 1.3.). This is useful for
checking SM support for specific features. The attribute is cached
to avoid resending queries, caching is done when a successful
ClassPortInfo reply is received on the port. Invalidation of the
attribute is done on SM change events, SM re-registration events,
and SM LID change events. The fields in ClassPortInfo should not
change during SM runtime without an event.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:36 -04:00
Jason Gunthorpe d1e09f304a IB/uverbs: Fix race between uverbs_close and remove_one
Fixes an oops that might happen if uverbs_close races with
remove_one.

Both contexts may run ib_uverbs_cleanup_ucontext, it depends
on the flow.

Currently, there is no protection for a case that remove_one
didn't make the cleanup it runs to its end, the underlying
ib_device was freed then uverbs_close will call
ib_uverbs_cleanup_ucontext and OOPs.

Above might happen if uverbs_close deleted the file from the list
then remove_one didn't find it and runs to its end.

Fixes to protect against that case by a new cleanup lock so that
ib_uverbs_cleanup_ucontext will be called always before that
remove_one is ended.

Fixes: 35d4a0b63d ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
Reported-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:36 -04:00
Markus Elfring 380bae5bf2 IB/mthca: Clean up error unwind flow in mthca_reset()
The kfree() function was called in a few cases by the mthca_reset()
function during error handling even if the passed variables "bridge_header"
and "hca_header" contained a null pointer.

Adjust jump targets according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:35 -04:00
Markus Elfring 3491ab63b4 IB/mthca: NULL arg to pci_dev_put is OK
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:35 -04:00
Markus Elfring f7ca535ba0 IB/hfi1: NULL arg to sc_return_credits is OK
The sc_return_credits() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:35 -04:00
Mark Bloch 3f85f2aaab IB/mlx4: Add diagnostic hardware counters
Expose IB diagnostic hardware counters.
The counters count IB events and are applicable for IB and RoCE.

The counters can be divided into two groups, per device and per port.
Device counters are always exposed.
Port counters are exposed only if the firmware supports per port counters.

rq_num_dup and sq_num_to are only exposed if we have firmware support
for them, if we do, we expose them per device and per port.
rq_num_udsdprd and num_cqovf are device only counters.

rq - denotes responder.
sq - denotes requester.

|-----------------------|---------------------------------------|
|	Name		|	Description			|
|-----------------------|---------------------------------------|
|rq_num_lle		| Number of local length errors		|
|-----------------------|---------------------------------------|
|sq_num_lle		| number of local length errors		|
|-----------------------|---------------------------------------|
|rq_num_lqpoe		| Number of local QP operation errors	|
|-----------------------|---------------------------------------|
|sq_num_lqpoe		| Number of local QP operation errors	|
|-----------------------|---------------------------------------|
|rq_num_lpe		| Number of local protection errors	|
|-----------------------|---------------------------------------|
|sq_num_lpe		| Number of local protection errors	|
|-----------------------|---------------------------------------|
|rq_num_wrfe		| Number of CQEs with error		|
|-----------------------|---------------------------------------|
|sq_num_wrfe		| Number of CQEs with error		|
|-----------------------|---------------------------------------|
|sq_num_mwbe		| Number of Memory Window bind errors	|
|-----------------------|---------------------------------------|
|sq_num_bre		| Number of bad response errors		|
|-----------------------|---------------------------------------|
|sq_num_rire		| Number of Remote Invalid request	|
|			| errors				|
|-----------------------|---------------------------------------|
|rq_num_rire		| Number of Remote Invalid request	|
|			| errors				|
|-----------------------|---------------------------------------|
|sq_num_rae		| Number of remote access errors	|
|-----------------------|---------------------------------------|
|rq_num_rae		| Number of remote access errors	|
|-----------------------|---------------------------------------|
|sq_num_roe		| Number of remote operation errors	|
|-----------------------|---------------------------------------|
|sq_num_tree		| Number of transport retries exceeded	|
|			| errors				|
|-----------------------|---------------------------------------|
|sq_num_rree		| Number of RNR NAK retries exceeded	|
|			| errors				|
|-----------------------|---------------------------------------|
|rq_num_rnr		| Number of RNR NAKs sent		|
|-----------------------|---------------------------------------|
|sq_num_rnr		| Number of RNR NAKs received		|
|-----------------------|---------------------------------------|
|rq_num_oos		| Number of Out of Sequence requests	|
|			| received				|
|-----------------------|---------------------------------------|
|sq_num_oos		| Number of Out of Sequence NAKs	|
|			| received				|
|-----------------------|---------------------------------------|
|rq_num_udsdprd		| Number of UD packets silently		|
|			| discarded on the Receive Queue due to	|
|			| lack of receive descriptor		|
|-----------------------|---------------------------------------|
|rq_num_dup		| Number of duplicate requests received	|
|-----------------------|---------------------------------------|
|sq_num_to		| Number of time out received		|
|-----------------------|---------------------------------------|
|num_cqovf		| Number of CQ overflows		|
|-----------------------|---------------------------------------|

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:34 -04:00
Mustafa Ismail e972dabf9c Use smaller 512 byte messages for portmapper messages
Portmapper messages are short and do not occupy more than 512 bytes.
Lower portmapper message size to 512 bytes. This change significantly
reduces the amount of memory needed when trying to establish a large
number of connections simultaneously. The old value is based on page
size.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:33 -04:00
Yuval Shaia 5faba54695 IB/ipoib: Report SG feature regardless of HW UD CSUM capability
Decouple SG support from HW ability to do UD checksum.
This coupling is for historical reasons and removed with 'commit
ec5f061564 ("net: Kill link between CSUM and SG features.")'

During driver load it is assumed that device does not supports SG. The
final decision is taken after creating UD QP based on device capability.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:32 -04:00
Roland Dreier 0c87b67209 IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
We allocate a small tracking structure as part of mlx4_ib_resize_cq().
However, we don't need to use GFP_ATOMIC -- immediately after the
allocation, we call mlx4_cq_resize(), which allocates a command
mailbox with GFP_KERNEL and then sleeps on a firmware command, so we
better not be in an atomic context.

This actually has a real impact, because when this GFP_ATOMIC
allocation fails (and GFP_ATOMIC does fail in practice) then a
userspace consumer resizing a CQ will get a spurious failure that we
can easily avoid.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:32 -04:00
Bart Van Assche a154a8cd08 IB/hfi1: Disable by default
There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: f48ad614c1 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jubin John <jubin.john@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org> # v4.7+
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:32 -04:00
Bart Van Assche 378fc3201e IB/rdmavt: Disable by default
There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: 0194621b22 ("IB/rdmavt: Create module framework and handle driver registration")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jubin John <jubin.john@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:31 -04:00
Doug Ledford 6a89d89d85 Merge branch 'i40iw' into k.o/for-4.8 2016-08-03 21:00:16 -04:00
Doug Ledford 3e5e8e8a9a Merge branches 'cxgb4' and 'mlx5' into k.o/for-4.8 2016-08-03 20:58:45 -04:00
Dean Luick 0636e9ab83 IB/hfi1: Add cache evict LRU list
The original code used a LRU list to evict nodes which were least
recently used.  For correctness the evict code was moved under the
handler->lock, now add back the LRU list.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 2677a7680e IB/hfi1: Fix memory leak during unexpected shutdown
During an unexpected shutdown, references to tid_rb_node were NULL'ed out
without properly being released.

Fix this by calling clear_tid_node in the mmu notifier remove callback
rather than after these callbacks are called.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 082b353291 IB/hfi1: Remove unneeded mm argument in remove function
The reworked mmu_rb interface allows the unused mm argument to be removed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick b85ced9151 IB/hfi1: Consistently call ops->remove outside spinlock
The ops->remove() callback was called by hfi1_mmu_unregister() with a
NULL mm argument while holding a spinlock.  In the case of sdma_rb_remove()
this caused it to pass current->mm to hfi1_release_user_pages()

This had 2 problems.  First this would attempt to acquire the mmap_sem
under a spin lock.  Second the use of current->mm is not always guaranteed
to be the proper mm when the fd is being closed.

Rather than depend on this implicit behavior we move all calls to
ops->remove outside of the spinlock.  This also allows the correct
mm to be used in the remove callback without fear of deadlock.

Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but
usually does, we must delay all remove callbacks until out of the notifier,
when the callbacks can take the mmap_sem if they need to.

Code comments were added to clarify what the expectations are for the
users of the mmu rb tree.

Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick b7df192f74 IB/hfi1: Use evict mmu rb operation
Use the new cache evict operation in the SDMA code.  This allows the cache
to properly coordinate evicts and removes, preventing any race.  With this
change, the separate list, lock, and race flag are not needed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 1034599805 IB/hfi1: Add evict operation to the mmu rb handler
Allow users to clear nodes from the rb tree based on their evict callback.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 622c202c4a IB/hfi1: Fix TID caching actions
Per file descriptor TID caching actions depend on a global that can
change midway through the lifetime of that file descriptor.

Make the use of caching consistent for the life of the file descriptor
by using the presence of the cache handler to decide when to use the cache
functions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick e0b09ac55d IB/hfi1: Make the cache handler own its rb tree root
The objects which use cache handling should reference their own handler
object not the internal data structure it uses to track the nodes.

Have the "users" of the mmu notifier code pass opaque objects which can
then be properly used in the mmu callbacks depending on the owners needs.

This patch has the additional benefit that operations no longer require a
look up in a list to find the handlers.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 3faa3d9a30 IB/hfi1: Make use of mm consistent
The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is
opened, and unregisters it when the device is closed.  The driver
incorrectly assumes that the close will always happen from the same
context as the open.  In particular, closes due to SIGKILL or OOM killer
activity may happen from a different context.  In these cases, the wrong
mm is passed to mmu_notifier_unregister(), which causes improper reference
counting for the victim mm, and eventual memory corruption.

Preserve the mm for all open file descriptors and use this mm rather than
current->mm for memory operations for the lifetime of that fd.  Note: this
patch leaves 1 use of current->mm in place.  This use is removed in a
follow on patch because other functional changes were required prior to
that use being removed.

If registration fails, there is no reason to keep the handler object
around.  Free the handler object rather than add it to the list to
prevent any mmu_notifier operations, including unregister, when
registration fails.

Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 7b3256e331 IB/hfi1: Fix user SDMA racy user request claim
The user SDMA in-use claim bit is in the structure that gets zeroed out
once the claim is made.  Move the request in-use flag into its own bit
array and use that for atomic claims.  This cleans up the claim code and
removes any race possibility.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 9da7e9a711 IB/hfi1: Fix error condition that needs to clean up
If input validation fails, properly free the request before returning.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick a383f8ec55 IB/hfi1: Release node on insert failure
If unable to insert node into the RB tree cache, node will be freed
before returning from the function.  Null out iovec's pointer to node
so iovec does not try to free it later.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 9ff73c8715 IB/hfi1: Validate SDMA user iovector count
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 4fa0d22c9a IB/hfi1: Validate SDMA user request index
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick bdf7752e07 IB/hfi1: Use the same capability state for all shared contexts
Save the current capability state at user context creation
time.  Report this saved value for all shared contexts.

Also get rid of unnecessary hfi1_get_base_kinfo function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 53445bb32d IB/hfi1: Prevent null pointer dereference
If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick a7cd2dc5d4 IB/hfi1: Rename TID mmu_rb_* functions
Clarify the names of the TID mmu functions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 20a42d0833 IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
Checking if the rb tree is empty is redundant with the while loop which is
emptying the rb tree.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny ea3a0ee52d IB/hfi1: Restructure hfi1_file_open
Rearrange the file open call in prep for new changes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick ff4ce9bde9 IB/hfi1: Make iovec loop index easy to understand
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 639297b4f0 IB/hfi1: Use "false" not 0
For bool parameters "false" should be used

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 5ed3b15b05 IB/hfi1: Remove unused sub-context parameter
subctxt is not used, just remove it.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 3c1091aa94 IB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove
__mmu_rb_remove was called in only 1 place which was a very simple
call site.  Combine this function into its caller.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick c0946642e5 IB/hfi1: Always expect ops functions
Remove, insert, and invalidate are always provided.  No
need to test.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 862548dace IB/hfi1: Add parameter names to callback declarations
This makes it more clear what these functions are
operating on.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny ac335e7e80 IB/hfi1: Add parameter names to function declarations
Parameter names to function declarations make it more clear
what those parameters do.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick fc87879ae2 IB/hfi1: Remove unused function hfi1_mmu_rb_search
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick 8e1f52df97 IB/hfi1: Remove unused uctxt->subpid and uctxt->pid
These are no longer needed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny 72720ddfc6 IB/hfi1: Fix minor format error
Brackets should be on the next line of a function

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny fc0b76c016 IB/hfi1: Expand reported serial number
Expand the serial number space by using more bits
from the GUID.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00