Commit graph

6884 commits

Author SHA1 Message Date
Dasaratharaman Chandramouli bfbfd661c9 IB/core: Rename ib_query_ah to rdma_query_ah
Rename ib_query_ah to rdma_query_ah so its in sync with the
rename of the ib address handle attribute

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 67b985b6c7 IB/core: Rename ib_modify_ah to rdma_modify_ah
Rename ib_modify_ah to rdma_modify_ah so its in sync with the
rename of the ib address handle attribute

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 0a18cfe4f6 IB/core: Rename ib_create_ah to rdma_create_ah
Rename ib_create_ah to rdma_create_ah so its in sync with the
rename of the ib address handle attribute

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 90898850ec IB/core: Rename struct ib_ah_attr to rdma_ah_attr
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli eca7ddf965 IB/rxe: Initialize ib_ah_attr during query_ah
Zero out ib_ah_attr before calling query_ah. Set ah_flags
appropriately.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 4ba66093bd IB/core: Check for global flag when using ah_attr
Read/write grh fields of the ah_attr only if the
ah_flags field has the IB_AH_GRH bit enabled

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli cf0b9395d0 IB/core: Add braces when using sizeof
This patch adds braces around parameters to sizeof
as called out by checkpatch

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli cfd519358f IB/IPoIB: Remove 'else' when the 'if' has a return.
This patch fixes a checkpatch issue related to not having
to use an 'else' if the 'if' path returns from the function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli bef4211a72 IB/ocrdma: Add identifier names to function definitions
Address a checkpatch issue on missing identifier names
on function definitions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 2196f27162 IB/SA: Add support to query opa classport info.
For OPA devices, SA will query the OPA classport info
instead of the IB defined classport info.
opa classport info exposes additional information and
capabilities that are specific to OPA devices.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 19:29:42 -04:00
Joerg Roedel 461a6946b1 iommu: Remove pci.h include from trace/events/iommu.h
The include file does not need any PCI specifics, so remove
that include. Also fix the places that relied on it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-29 00:20:49 +02:00
Dasaratharaman Chandramouli aa4656d9a4 IB/core: Move opa_class_port_info definition to header file
Both opa_vnic and the hfi driver use the same opa_classport_info
definition. We will also have ib_sa capable of querying opa class
port info and would need this definition. Move it to ib_mad.h
for everyone to use.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 18:10:05 -04:00
Dasaratharaman Chandramouli ee1c60b1bf IB/SA: Modify SA to implicitly cache Class Port info
SA will query and cache class port info as part of
its initialization. SA will also invalidate and
refresh the cache based on specific events. Callers such
as IPoIB and CM can query the SA to get the classportinfo
information. Apart from making the caller code much simpler,
this change puts the onus on the SA to query and maintain
classportinfo much like how it maitains the address handle to the SM.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 14:00:17 -04:00
Dasaratharaman Chandramouli cb8637660a IB/SA: Move functions update_sm_ah() and ib_sa_event()
Moving these will facilitate changes to these in the
next patchs. This is strictly a move and there are no
changes to the functions in any way.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:58:08 -04:00
Dasaratharaman Chandramouli 680562b569 IB/SA: Remove unwanted braces
This fixes a checkpatch issue. The fix is needed
so that some of these functions can be moved around
in the forthcoming patches

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:58:08 -04:00
Dasaratharaman Chandramouli dbb6c91fd8 IB/SA: Add braces when using sizeof
This fixes a checkpatch issue. The fix is needed
so that some of these functions can be moved around
in the forthcoming patches

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:58:08 -04:00
Dasaratharaman Chandramouli f96a318714 IB/SA: Fix lines longer than 80 columns
This fixes a checkpatch issue. The fix is needed
so that some of these functions can be moved around
in the forthcoming patches

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:58:08 -04:00
Dennis Dalessandro 4608e4c8f2 IB/hfi1: Use bool in process_ecn
The process_ecn intends to return a bool value. However it is doing
so incorrectly by ANDing the fecn mask. The fecn bit is bit 31. Bool is
not a native data type and is up to the compiler to implement how it
sees fit. It is conceivable that this upper bit gets washed out.

Fix by converting to a bool properly.

Cc: stable@vger.kernel.org
Fixes: Commit fd2b562edca6 ("IB/hfi1: Pull FECN/BECN processing to a common place")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:24 -04:00
Ira Weiny 1222026764 IB/hfi: Protect against writable mmap
The device/port status is not intended to be changed from user space.
Prevent a user from mapping them as write or execute.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:23 -04:00
Dennis Dalessandro ee495ada5c IB/hfi1: Fix unbalanced braces around else
Add missing braces around else blocks in a few places to make checkpatch
happy.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:23 -04:00
Dennis Dalessandro 08af5916b3 IB/hfi1: Convert %Lx to %llx
According to checkpatch %Lx is not standard C so remove it and use the
suggested %llx.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:22 -04:00
Dennis Dalessandro a498fbcd87 IB/hfi1: Fix misspelling in comment
Checkpatch flagged a misspelled word. Fix it.

Fixes: 8764522e52 ("staging/rdma/hfi1: Unexpected link up pkey values are not an error")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:22 -04:00
Neel Desai 53526500f3 IB/hfi1: Permanently enable P_Key checking in HFI
Ingress and egress port P_Key checking should always be performed for
HFIs. This patch will enable ingress and egress P_Key checking when
the port is initialized and will ignore the P_Key information sent by
the FM in the port info structure which is meant to be used only by the
switch.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Neel Desai <neel.desai@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:21 -04:00
Stuart Summers 98b9ee2002 IB/hfi1: Cache neighbor secure data after link up
Secure data is transferred across the link during verify
cap. This includes Neighbor Guid, Type, and Port Number.
This transfer is not guaranteed to complete until the 8051
firmware has completed processing of the state_complete
frame. Move the consumption of this data from verify cap
handling to link up handling to ensure the data is finalized.

Additionally, do not notify the SM that the link is up until
after this data is actually available.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Stuart Summers <john.s.summers@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:20 -04:00
Neel Desai 03e80e9a57 IB/hfi1: Adjust high temperature warning for QSFP cable
When we receive a QSFP_HIGH_TEMP_ALARM or QSFP_HIGH_TEMP_WARNING
interrupt, print a "QSFP cable temperature too high" message.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Neel Desai <neel.desai@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:20 -04:00
Tadeusz Struk 22546b741a IB/hfi1: Fix softlockup issue
Soft lockups can occur because the mad processing on different CPUs acquire
the spin lock dc8051_lock:

[534552.835870]  [<ffffffffa026f993>] ? read_dev_port_cntr.isra.37+0x23/0x160 [hfi1]
[534552.835880]  [<ffffffffa02775af>] read_dev_cntr+0x4f/0x60 [hfi1]
[534552.835893]  [<ffffffffa028d7cd>] pma_get_opa_portstatus+0x64d/0x8c0 [hfi1]
[534552.835904]  [<ffffffffa0290e7d>] hfi1_process_mad+0x48d/0x18c0 [hfi1]
[534552.835908]  [<ffffffff811dc1f1>] ? __slab_free+0x81/0x2f0
[534552.835936]  [<ffffffffa024c34e>] ? ib_mad_recv_done+0x21e/0xa30 [ib_core]
[534552.835939]  [<ffffffff811dd153>] ? __kmalloc+0x1f3/0x240
[534552.835947]  [<ffffffffa024c3fb>] ib_mad_recv_done+0x2cb/0xa30 [ib_core]
[534552.835955]  [<ffffffffa0237c85>] __ib_process_cq+0x55/0xd0 [ib_core]
[534552.835962]  [<ffffffffa0237d70>] ib_cq_poll_work+0x20/0x60 [ib_core]
[534552.835964]  [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
[534552.835966]  [<ffffffff810a8d76>] worker_thread+0x126/0x410
[534552.835969]  [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
[534552.835971]  [<ffffffff810b052f>] kthread+0xcf/0xe0
[534552.835974]  [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
[534552.835977]  [<ffffffff81696418>] ret_from_fork+0x58/0x90
[534552.835980]  [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140

This issue is made worse when the 8051 is busy and the reads take longer.
Fix by using a non-spinning lock procure.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciszyn <mike.marciniszyn@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:15 -04:00
Mike Marciniszyn b6eac931b9 IB/hfi1: Prevent kernel QP post send hard lockups
The driver progress routines can call cond_resched() when
a timeslice is exhausted and irqs are enabled.

If the ULP had been holding a spin lock without disabling irqs and
the post send directly called the progress routine, the cond_resched()
could yield allowing another thread from the same ULP to deadlock
on that same lock.

Correct by replacing the current hfi1_do_send() calldown with a unique
one for post send and adding an argument to hfi1_do_send() to indicate
that the send engine is running in a thread.   If the routine is not
running in a thread, avoid calling cond_resched().

CC: <stable@vger.kernel.org> # 4.7.x-
Fixes: Commit 831464ce4b ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Don Hiatt 3d591099a0 IB/hfi1: Use defines from common headers
Move FECN and BECN related defines to common header files

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Don Hiatt cb42705792 IB/hfi1: Add functions to parse 9B headers
These inline functions improve code readability by
enabling callers to read specific fields from the
header without knowledge of byte offsets.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Dasaratharaman Chandramouli aad559c21d IB/hfi1: Rename hdr2sc to hfi1_9B_get_sc5
The function really returned the 5-bit sc value from
the header and rhf. hdr2sc didn't quite describe what it did.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Sebastian Sanchez 3ca4fbc84a IB/hfi1: Return SC2VL mappings to FM with VL15 instead of ILLEGAL_VL
VL15 in the SC2VL table is used to indicate an invalid SC
for the FM, however, internally the driver remaps SCs from
VL15 to ILLEGAL_VL to prevent error counts. This mapping
confuses the FM when performing a sweep, making it return
a table mismatch error. Have SMA convert ILLEGAL_VL
to VL15 entries for the SC2VL table queries.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Michael J. Ruhl db730894f4 IB/hfi1: Validate the TID count before using it
Improve the safety of the code by validating the user supplied
tidcnt before use.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Michael J. Ruhl aad9ff97dd IB/rdmavt/hfi1/qib: Use the MGID and MLID for multicast addressing
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).

The current code only uses the MGID for identifying multicast groups.
Update the driver to be compliant with this definition.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Michael J. Ruhl 8561eae60f IB/core: For multicast functions, verify that LIDs are multicast LIDs
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).  Currently the MLID value is not
validated.

Add check to verify that the MLID value is in the correct address
range.

Fixes: 0c33aeedb2 ("[IB] Add checks to multicast attach and detach")
Cc: stable@vger.kernel.org
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Michael J. Ruhl 72fb70f5a3 IB/hfi1: Correct MulticastMask/CollectiveMask info to SMA output
The FM uses the values of MulticastMask and CollectiveMask to
determine the number of bits for net masks. The current values of
0 and 0 are incorrect.  The values should be 4 and 1.  Updated the
necessary code to reflect the specified values.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:48:01 -04:00
Michael J. Ruhl 20c7840a77 IB/core: If the MGID/MLID pair is not on the list return an error
A list of MGID/MLID pairs is built when doing a multicast attach.  When
the multicast detach is called, the list is searched, and regardless of
the search outcome, the driver detach is called.

If an MGID/MLID pair is not on the list, driver detach should not be
called, and an error should be returned.  Calling the driver without
removing an MGID/MLID pair from the list can leave the core and driver
out of sync.

Fixes: f4e401562c ("IB/uverbs: track multicast group membership for userspace QPs")
Cc: stable@vger.kernel.org
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:45:44 -04:00
Geliang Tang 2443c6cc92 IB/qib: use setup_timer
Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:19:14 -04:00
Geliang Tang 79d9df5618 IB/nes: use setup_timer
Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:19:13 -04:00
Geliang Tang 96ff2c11c5 IB/i40iw: use setup_timer
Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:19:05 -04:00
Leon Romanovsky 1e0729348f IB/nes: Fix incorrect type in assignment
Fix mismatch between types, wqe_words are in le32 format, while opcode
in CPU format.

The following sparse warnings are helped to find it:

drivers/infiniband/hw/nes/nes_hw.c:3058:24: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/nes/nes_hw.c:3058:24:    expected unsigned int [unsigned] [assigned] [usertype] opcode
drivers/infiniband/hw/nes/nes_hw.c:3058:24:    got restricted __le32 <noident>

CC: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:11:43 -04:00
Leon Romanovsky 313e16d5af IB/usnic: Simplify the code to balance loc/unlock calls
Simplify code in find_free_vf_and_create_qp_grp() to avoid sparse error
regarding call to unlock in the block other than lock was called.

drivers/infiniband/hw/usnic/usnic_ib_verbs.c:206:9: warning: context imbalance
			in 'find_free_vf_and_create_qp_grp' - different lock
			contexts for basic block

CC: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:11:43 -04:00
Leon Romanovsky f5029e752b Ib/usnic: Explicitly include usnic headers
Sparse tool complains about undeclared symbols in usnic_ib_verbs.c
and usnic_ib_sysfs.c This is caused by lack of direct include of
appropriate usnic_ib_verbs.h and usnic_ib_sysfs.h, where all
these functions were declared.

Simple include eliminates 30 warnings similar to the below one:

drivers/infiniband/hw/usnic/usnic_ib_sysfs.c:304:6: warning: symbol
				'usnic_ib_sysfs_unregister_usdev' was
				not declared. Should it be static?

CC: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:11:43 -04:00
Leon Romanovsky 218271adca Ib/core: Mark local uverbs_std_types functions to be static
Functions declared in uverbs_std_types.c are local to that file, but
they lack static declarations. This produces a lot of sparse warnings,
like the one below:

drivers/infiniband/core/uverbs_std_types.c:41:5: warning: symbol
				'uverbs_free_ah' was not declared.
				Should it be static?

So mark them as static.

CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:11:43 -04:00
Pan Bian 9ef63f31ad iw_cxgb4: check return value of alloc_skb
Function alloc_skb() will return a NULL pointer when there is no enough
memory. However, the return value of alloc_skb() is directly used
without validation in function send_fw_pass_open_req(). This patches
checks the return value of alloc_skb() against NULL.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:09:55 -04:00
Colin Ian King 27b0b83233 IB/rxe: fix typo: "algorithmi" -> "algorithm"
trivial fix to typo in pr_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:07:58 -04:00
Dan Carpenter f0bb2d44ca IB/rdmavt: restore IRQs on error path in rvt_create_ah()
We need to call spin_unlock_irqrestore() instead of vanilla
spin_unlock() on this error path.

Fixes: 119a8e708d ("IB/rdmavt: Add AH to rdmavt")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:01:58 -04:00
Paolo Abeni eea40b8f62 infiniband: call ipv6 route lookup via the stub interface
The infiniband address handle can be triggered to resolve an ipv6
address in response to MAD packets, regardless of the ipv6
module being disabled via the kernel command line argument.

That will cause a call into the ipv6 routing code, which is not
initialized, and a conseguent oops.

This commit addresses the above issue replacing the direct lookup
call with an indirect one via the ipv6 stub, which is properly
initialized according to the ipv6 status (e.g. if ipv6 is
disabled, the routing lookup fails gracefully)

Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 12:55:17 -04:00
Amrani, Ram b6acd71fef RDMA/qedr: add support for send+invalidate in poll CQ
Split the poll responder CQ into two functions.
Add support for send+invalidate in poll CQ.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 12:47:57 -04:00
Amrani, Ram 4dd72636c9 RDMA/qedr: destroy CQ only after HW releases it
Wait for all relevant CNQ interrupts before freeing the CQ.
Don't invoke completion handlers for a destroyed CQ.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 12:47:57 -04:00
Amrani, Ram 942b3b2c41 RDMA/qedr: enhance destroy flow for GSI QP
Avoid attempting to release irrelevant (and unused) resources for GSI QP.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 12:47:57 -04:00
Amrani, Ram f92faaba11 RDMA/qedr: properly check atomic capabilities
After checking the path upwards towards root complex, actualy check
root complex atomic_req capability, and not our own NIC.
Verify that the PCIe device control register's atomic egress block
is cleared in the path.
Verify that the PCIe version is at least 2.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 12:47:57 -04:00
Amrani, Ram 08c4cf51e3 RDMA/qedr: reset access control when registering a MR
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 12:47:57 -04:00
Eric Biggers cda37124f4 fs: constify tree_descr arrays passed to simple_fill_super()
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory.  Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:06 -04:00
Artemy Kovalyov db570d7dea IB/mlx5: Add ODP support to MW
Internally MW implemented as KLM MKey and filled by userspace UMR
postsends.  Handle pagefault trigered by operations on this MKeys.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov 1b7dbc26fc IB/mlx5: Extract page fault code
To make page fault handling code more flexible
split pagefault_single_data_segment() function.
Keep MR resolution in pagefault_single_data_segment() and
move actual updates into pagefault_single_mr().

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov 0008b84ea9 IB/umem: Add support to huge ODP
Add IB_ACCESS_HUGETLB ib_reg_mr flag.
Hugetlb region registered with this flag
will use single translation entry per huge page.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov b2ac91885b IB/mlx5: Add contiguous ODP support
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov 403cd12e2c IB/umem: Add contiguous ODP support
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov 4df4a5bac3 IB/mlx5: Decrease verbosity level of ODP errors
Decrease verbosity level of ODP error flows messages to debug level.
Remove one redundant print since debug level message already exists in
this flow.

Fixes: d9aaed8387 ('{net,IB}/mlx5: Refactor page fault handling')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov 523791d7c5 IB/mlx5: Fix implicit MR GC
When implicit MR's leaf MKey becomes unused, i.e. when it's
last page being released my MMU invalidation it is marked as "dying"
and scheduled for release by garbage collector.
Currentle consequent page fault may remove "dying" flag.
Treat leaf MKey as non-existent once it was scheduled to removal
by GC.

Fixes: 81713d3788 ('IB/mlx5: Add implicit MR support')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov 438b228e03 IB/mlx5: Fix UMR size calculation
Translation table updates of large UMR may require multiple post send
operations. The last operations can be in various lengths, but current
code set them to be the same length.

Fixes: 7d0cc6edcc ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov bd174fc2ca IB/mlx5: Fix function updating xlt emergency path
In memory shortage path we fall back to use spare buffer.
mlx5_ib_update_xlt() called from ib_uverbs_reg_mr when ibmr.ucontext
not initialized yet.

Scenario how to test it:
1. trigger memory exhaustion so __get_free_pages(GFP_KERNEL, 4) will fail
2. register MR
3. there should be no kernel oops

Fixes: 7d0cc6edcc ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov 3e7e1193e2 IB: Replace ib_umem page_size by page_shift
Size of pages are held by struct ib_umem in page_size field.

It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.

The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:

  ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

CC: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Steve Wise <swise@chelsio.com>
CC: Lijun Ou <oulijun@huawei.com>
CC: Shiraz Saleem <shiraz.saleem@intel.com>
CC: Adit Ranadive <aditr@vmware.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Ram Amrani <Ram.Amrani@Cavium.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Christoph Hellwig 21c433a74b IB/hfi1: Use pcie_flr() instead of duplicating it
Tested-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Doug Ledford <dledford@redhat.com>
2017-04-25 14:36:19 -05:00
Zhu Yanjun 8d2216be28 IB/core: change the return type to void
The function ib_unregister_mad_agent always returns zero. And
this returned value is not checked. As such, chane the return
type to void.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:30:26 -04:00
Ira Weiny e8ea95af87 IB/hfi: Fix up comments in engine mapping
Fix off by 1 error in comments documenting the sdma and send context
mappings.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:24:51 -04:00
Vlad Tsyrklevich 4f7f4dcfff infiniband/uverbs: Fix integer overflows
The 'num_sge' variable is verfied to be smaller than the 'sge_count'
variable; however, since both are user-controlled it's possible to cause
an integer overflow for the kmalloc multiply on 32-bit platforms
(num_sge and sge_count are both defined u32). By crafting an input that
causes a smaller-than-expected allocation it's possible to write
controlled data out-of-bounds.

Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:18:02 -04:00
Arnd Bergmann 5b0ff9a007 infiniband: hns: avoid gcc-7.0.1 warning for uninitialized data
hns_roce_v1_cq_set_ci() calls roce_set_bit() on an uninitialized field,
which will then change only a few of its bits, causing a warning with
the latest gcc:

infiniband/hw/hns/hns_roce_hw_v1.c: In function 'hns_roce_v1_cq_set_ci':
infiniband/hw/hns/hns_roce_hw_v1.c:1854:23: error: 'doorbell[1]' is used uninitialized in this function [-Werror=uninitialized]
  roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1);

The code is actually correct since we always set all bits of the
port_vlan field, but gcc correctly points out that the first
access does contain uninitialized data.

This initializes the field to zero first before setting the
individual bits.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:16:38 -04:00
Petr Mladek 50b6778c44 IB/fmr_pool: Convert the cleanup thread into kthread worker API
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the frm_pool kthread into the kthread worker
API because I am not sure how busy the thread is. It is well
possible that it does not need a dedicated kthread and workqueues
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.

The patch moves one iteration from the kthread into the work function.
It is queued only when there is a pending work. Therefore we do not
need to compare flush_ser and req_ser at the beginning. On the contrary,
the same work could be queued only once at a time. Therefore it has to
re-queue itself if some requests are pending.

Otherwise, wake_up_process() is replaced by queuing the work.

Important: The change is only compile tested. I did not find an easy
way how to check it in a real life.

Signed-off-by: Petr Mladek <pmladek@suse.com>
TO: Doug Ledford <dledford@redhat.com>
CC: Sean Hefty <sean.hefty@intel.com>
CC: Hal Rosenstock <hal.rosenstock@gmail.com>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 14:24:17 -04:00
Yuval Shaia 4d6f28591f {net,IB}/{rxe,usnic}: Utilize generic mac to eui32 function
This logic seems to be duplicated in (at least) three separate files.
Move it to one place so code can be re-use.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2017-04-25 14:21:34 -04:00
Yuval Shaia a7c81326ca IB/usnic: Remove unused functions
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 14:12:14 -04:00
Colin Ian King 05a24b9b7f IB/iser: fix spelling mistake: "unexepected" -> "unexpected"
trivial fix to spelling mistake in iser_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 14:06:49 -04:00
Ganesh Goudar e821303c42 iw_cxgb4: Use dsgl by default
Enable the use of dsgl by default and determine whether dsgl is
supported from lld info.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Bharat Potnuri <bharat@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 14:04:41 -04:00
Doug Ledford 374cb8610a RDMA/bnxt_re: Use IS_ERR_OR_NULL where appropriate
Constructs such as if (ptr && !IS_ERR(ptr)) can be shorted to
just !IS_ERR_OR_NULL(ptr) instead.  Make substitutions in the bnxt_re
driver where appropriate.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 14:00:59 -04:00
Colin Ian King ebbd1dfb26 RDMA/bnxt_re: remove redundant initialization of rc to zero
rc is initialized to zero but is then updated by calls to
bnxt_qplib_free_fast_reg_page_list and/or bnxt_qpliob_free_mrw
so the initialization is redundant and can be removed.

Detected with CoverityScan, CID#1408448 ("Unused Value")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 13:36:59 -04:00
Noa Osherovich f1b65df5a2 IB/mlx5: Add support for active_width and active_speed in RoCE
Add missing calculation and translation of active_width and
active_speed for RoCE.

Fixes: 3f89a643eb ('IB/mlx5: Extend query_device/port to ...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:58:41 -04:00
Noa Osherovich 50f22fd8ec IB/mlx5: Set mlx5_query_roce_port's return value to void
In case of an error, the properties reported to user
are zeroed out, so no need for a return value.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:54:51 -04:00
Noa Osherovich 12113a35ad IB/core: Add HDR speed enum
Add high data rate speed to the ib_port_speed enumeration.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Moni Shoua 12f8fedef2 IB/mlx5: Set correct SL in completion for RoCE
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.

While that, this patch also fills the vlan_id field in the completion if
present.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Moni Shoua 61c0ddbe97 IB/cma: Send MRA for reply messages
Current implementation of RDMA_CM sends MRA (Message Receipt
Acknowledgment) only for request messages but not for response messages.

As a result, a slow active side of the connection may send a ready-to-use
message to the passive side in a delay that is too long for the passive
side to wait for.

This patch adds a call to ib_send_cm_mra() upon receiving a response
message and by this tells the other side to modify the service timeout
to a bigger value, 16 times than before. As in the request case, MRA
for reply will be sent only if a duplicate response has arrived.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Matan Barak <matan@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Parav Pandit e1f24a79f4 IB/mlx5: Support congestion related counters
This patch adds support to query the congestion related hardware counters
through new command and links them with other hw counters being available
in hw_counters sysfs location.

In order to reuse existing infrastructure it renames related q_counter
data structures to more generic counters to reflect q_counters and
congestion counters and maybe some other counters in the future.

New hardware counters:
 * rp_cnp_handled - CNP packets handled by the reaction point
 * rp_cnp_ignored - CNP packets ignored by the reaction point
 * np_cnp_sent    - CNP packets sent by notification point to respond to
                     CE marked RoCE packets
 * np_ecn_marked_roce_packets - CE marked RoCE packets received by
                                notification point

It also avoids returning ENOSYS which is specific for invalid
system call and produces the following checkpatch.pl warning.

WARNING: ENOSYS means 'invalid syscall nr' and nothing else
+		return -ENOSYS;

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Leon Romanovsky a43402af1e IB/mthca: Check validity of output parameter pointer
The mthca driver didn't check supplied pointer to functions
mthca_cmd_poll() and mthca_cmd_wait(). This caused to the following
smatch errors:

drivers/infiniband/hw/mthca/mthca_cmd.c:371 mthca_cmd_poll() error: we previously assumed 'out_param' could be null (see line 353)
drivers/infiniband/hw/mthca/mthca_cmd.c:454 mthca_cmd_wait() error: we previously assumed 'out_param' could be null (see line 432)

In reality all callers of these functions are setting out_is_imm
flag are providing pointer too. However it is better to check
again to remove smatch errors to achieve warning free subsystem.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Slava Shwartsman a22ed86cff IB/mlx5: Add drop flow steering rule support
A drop rule is described by an action drop and no destination.
If a user specified IB_FLOW_SPEC_ACTION_DROP then set the action
to MLX5_FLOW_CONTEXT_ACTION_DROP and clear the destination.

Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Slava Shwartsman 483a3966b5 IB/core: Introduce drop flow specification
This flow steering specification identifies flow for drop by the HW.
If user create a flow only with the drop specification,
then all the packets that hit this flow will be dropped, otherwise the HW
will drop only the packets that match the other L2/L3/L4 specifications.

Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Ariel Levkovich 19cc75249a IB/mlx5: Use IP version matching to classify IP traffic
This change adds the ability for flow steering to classify IPv4/6
packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
packets and hit IPv4/6 classifed steering rules.

When user added a flow rule with IP classification, driver was
implicitly adding ethertype matching to the created rule in order
to distinguish between IPv4 and IPv6 protocols.
Since IP packets with MPLS tag header have MPLS ethertype, they missed
the rule and ended up hitting the default filters.
Such behavior prevented from MPLS packets to undergo inbound traffic
load balancing flows (if such were defined by configuring RSS) to
achieve higher throughput - the way that non-MPLS IP packets performed.

Since our device is able to look past the MPLS tag and identify the
next protocol we introduce this solution which replaces Ethertype
matching by the device's capability to perform IP version parsing
and matching in order to distinguish between IPv4 and IPv6.
Therefore, whenever a flow with IP spec is added and device support IP
version matching, driver will implicitly add IP version matching to the
rule (Based on the IP spec type) without Ethertype matching which will
cause relevant MPLS tagged packets to hit this rule as well.
Otherwise (device doesn't support IP version matching), we fall back to
setting Ethertype matching.

If the user's filters specify an L2 ethertype and an IP spec
the rule will then match both the ethertype and the IP version.

The device's support for IP version matching is reported by the
device via dedicated capability bit in query_device_cap and named
outer/inner_ip_version.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Ariel Levkovich 0f750966dc IB/mlx5: Add inner spec and IPv6 validation in user's flow attribute list
This change fixes an incomplete validation of the user's
flow attributes list.

Previous implementation validated only matching of IPv4 Ethertype
to IPv4 spec of outer headers (in case both Ethernet with specified
Ethertype and IP specs were present) and lacked the validation of:
1. Matching of IPv6 Ethertype in Ethernet spec (if such exists) to an
   IPv6 protocol spec (if such exists).
2. Validation of Ethertype to IP protocol matching on inner headers specs.
Which could cause some combinations of unmatching Ethernet and IP
protocols to pass validation and apply on the device.

The fix adds validation of IPv6 Ethertype and IP spec as well as
performing the scan on both outer and inner attributes.

Fixes: 038d2ef875 ("Add flow steering support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Bodong Wang 44f2e99ecd IB/mlx5: Fix wrong use of kfree at bad flow in create_cq_user
The kfree was called to free cqb, while it should free *cqb.

Fixes: 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb 00b7c2abb6 IB/mlx5: Enlarge autogroup flow table
In order to enlarge the flow group size to 8k, we decrease
the number of flow group types to 6 and increase the flow
table size to 64k.

Flow group size is calculated as follow:
  group_size = table_size / (#group_types + 1)

Fixes: 038d2ef875 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb dac388ef4c IB/mlx5: Check supported flow table size
Check that the required flow table size is supported
by device. Return ENOMEM error if no space left.

In addition change the create flow table routine
to return ENOMEM instead of ENOSPC.

Fixes: 038d2ef875 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb 1377661298 IB/mlx5: Change vma from shared to private
Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
it would lead to SIGBUS.

Remove the shared flags from the vma after we change it to be
anonymous.

This is easily reproduced by doing modprobe -r while running a
user-space application such as raw_ethernet_bw.

Fixes: 7c2344c3bb ('IB/mlx5: Implements disassociate_ucontext API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb ecc7d83be3 IB/mlx5: Take write semaphore when changing the vma struct
When the driver disassociate user context, it changes the vma to
anonymous by setting the vm_ops to null and zap the vma ptes.

In order to avoid race in the kernel, we need to take write lock
before we change the vma entries.

Fixes: 7c2344c3bb ('IB/mlx5: Implements disassociate_ucontext API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb ca37a664a8 IB/mlx4: Change vma from shared to private
Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
it would lead to SIGBUS.

Remove the shared flags from the vma after we change it to be
anonymous.

This is easily reproduced by doing modprobe -r while running a
user-space application such as raw_ethernet_bw.

Fixes: ae184ddeca ('IB/mlx4_ib: Disassociate support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb 22c3653d04 IB/mlx4: Take write semaphore when changing the vma struct
When the driver disassociate user context, it changes the vma to
anonymous by setting the vm_ops to null and zap the vma ptes.

In order to avoid race in the kernel, we need to take write lock
before we change the vma entries.

Fixes: ae184ddeca ('IB/mlx4_ib: Disassociate support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Jack Morgenstein fb7a91746a IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
A warning message during SRIOV multicast cleanup should have actually been
a debug level message. The condition generating the warning does no harm
and can fill the message log.

In some cases, during testing, some tests were so intense as to swamp the
message log with these warning messages, causing a stall in the console
message log output task. This stall caused an NMI to be sent to all CPUs
(so that they all dumped their stacks into the message log).
Aside from the message flood causing an NMI, the tests all passed.

Once the message flood which caused the NMI is removed (by reducing the
warning message to debug level), the NMI no longer occurs.

Sample message log (console log) output illustrating the flood and
resultant NMI (snippets with comments and modified with ... instead
of hex digits, to satisfy checkpatch.pl):

 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
 *** About 4000 almost identical lines in less than one second ***
 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
 INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
 *** { 17} above indicates that CPU 17 was the one that stalled ***
 sending NMI to all CPUs:
 ...
 NMI backtrace for cpu 17
 CPU: 17 PID: 45909 Comm: kworker/17:2
 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
 Workqueue: events fb_flashcursor
 task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
 RIP: 0010:[ffffffff81......]  [ffffffff81......] io_serial_in+0x15/0x20
 RSP: 0018:ffff88064e257cb0  EFLAGS: 00000002
 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
 RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
 RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
 R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
 R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
 FS:  0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080......
 CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
 DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
 DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
 Stack:
 ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
 ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
 ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
 Call Trace:
[<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
[<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
[<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
[<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
[<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
[<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
[<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
[<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
[<ffffffff81355c30>] ? bit_clear+0x120/0x120
[<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[<ffffffff810a5aef>] kthread+0xcf/0xe0
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81645858>] ret_from_fork+0x58/0x90
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6

As indicated in the stack trace above, the console output task got swamped.

Fixes: b9c5d6a643 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
Cc: <stable@vger.kernel.org> # v3.6+
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Jack Morgenstein 99e68909d5 IB/mlx4: Fix ib device initialization error flow
In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs.

However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not
called to free the allocated EQs.

Fixes: e605b743f3 ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs")
Cc: <stable@vger.kernel.org> # v3.4+
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Majd Dibbiny dd77abf8a0 IB/mlx4: Support RAW Ethernet when RoCE is disabled
On some environments, such as certain SR-IOV VF configurations, RoCE
isn't supported for mlx4 Ethernet ports. Currently the driver will
not open IB device on that port.

This is problematic since we do want user-space RAW Ethernet QPs functionality
to remain in place. For that end, enhance the relevant driver flows such that we
do create a device instance in that case.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Jack Morgenstein b312be3d87 IB/core: Fix sysfs registration error flow
The kernel commit cited below restructured ib device management
so that the device kobject is initialized in ib_alloc_device.

As part of the restructuring, the kobject is now initialized in
procedure ib_alloc_device, and is later added to the device hierarchy
in the ib_register_device call stack, in procedure
ib_device_register_sysfs (which calls device_add).

However, in the ib_device_register_sysfs error flow, if an error
occurs following the call to device_add, the cleanup procedure
device_unregister is called. This call results in the device object
being deleted -- which results in various use-after-free crashes.

The correct cleanup call is device_del -- which undoes device_add
without deleting the device object.

The device object will then (correctly) be deleted in the
ib_register_device caller's error cleanup flow, when the caller invokes
ib_dealloc_device.

Fixes: 55aeed0654 ("IB/core: Make ib_alloc_device init the kobject")
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Parav Pandit 4be3a4fa51 IB/core: Fix kernel crash during fail to initialize device
This patch fixes the kernel crash that occurs during ib_dealloc_device()
called due to provider driver fails with an error after
ib_alloc_device() and before it can register using ib_register_device().

This crashed seen in tha lab as below which can occur with any IB device
which fails to perform its device initialization before invoking
ib_register_device().

This patch avoids touching cache and port immutable structures if device
is not yet initialized.
It also releases related memory when cache and port immutable data
structure initialization fails during register_device() state.

[81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null)
[81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core]
[81416.576222] PGD 78da66067
[81416.576223] PUD 7f2d7c067
[81416.579484] PMD 0
[81416.582720]
[81416.587242] Oops: 0000 [#1] SMP
[81416.722395] task: ffff8807887515c0 task.stack: ffffc900062c0000
[81416.729148] RIP: 0010:ib_cache_release_one+0x29/0x80 [ib_core]
[81416.735793] RSP: 0018:ffffc900062c3a90 EFLAGS: 00010202
[81416.741823] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880859fec000
[81416.757757] RBP: ffffc900062c3aa0 R08: ffff8808536e5ac0 R09: ffff880859fec5b0
[81416.765708] R10: 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000
[81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: ffff88084ebc0060
[81416.781621] FS:  00007fd879fab740(0000) GS:ffff88085fac0000(0000) knlGS:0000000000000000
[81416.790522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[81416.797094] CR2: 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0
[81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[81416.812997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[81416.820950] Call Trace:
[81416.824226]  ib_device_release+0x1e/0x40 [ib_core]
[81416.829858]  device_release+0x32/0xa0
[81416.834370]  kobject_cleanup+0x63/0x170
[81416.839058]  kobject_put+0x25/0x50
[81416.843319]  ib_dealloc_device+0x25/0x40 [ib_core]
[81416.848986]  mlx5_ib_add+0x163/0x1990 [mlx5_ib]
[81416.854414]  mlx5_add_device+0x5a/0x160 [mlx5_core]
[81416.860191]  mlx5_register_interface+0x8d/0xc0 [mlx5_core]
[81416.866587]  ? 0xffffffffa09e9000
[81416.870816]  mlx5_ib_init+0x15/0x17 [mlx5_ib]
[81416.876094]  do_one_initcall+0x51/0x1b0
[81416.880861]  ? __vunmap+0x85/0xd0
[81416.885113]  ? kmem_cache_alloc_trace+0x14b/0x1b0
[81416.890768]  ? vfree+0x2e/0x70
[81416.894762]  do_init_module+0x60/0x1fa
[81416.899441]  load_module+0x15f6/0x1af0
[81416.904114]  ? __symbol_put+0x60/0x60
[81416.908709]  ? ima_post_read_file+0x3d/0x80
[81416.913828]  ? security_kernel_post_read_file+0x6b/0x80
[81416.920006]  SYSC_finit_module+0xa6/0xf0
[81416.924888]  SyS_finit_module+0xe/0x10
[81416.929568]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[81416.935089] RIP: 0033:0x7fd879494949
[81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[81416.947982] RAX: ffffffffffffffda RBX: 0000000001b66f00 RCX: 00007fd879494949
[81416.955965] RDX: 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003
[81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001b652a0
[81416.971861] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffdbc1b3e70
[81416.979763] R13: 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000
[81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: ffffc900062c3a90
[81417.016045] CR2: 0000000000000000

Fixes: 55aeed0654 ("IB/core: Make ib_alloc_device init the kobject")
Fixes: 7738613e7c ("IB/core: Add per port immutable struct to ib_device")
Cc: <stable@vger.kernel.org> # v4.2+
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Feras Daoud 3e31a490e0 IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
Before calling ipoib_stop, rtnl_lock should be taken, then
the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
is set.

On the other hand, the flow of multicast join task initializes
a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
call returns EINVAL without setting the mcast completion and
leads to a deadlock.

    ipoib_stop                          |
        |                               |
    clear_bit(IPOIB_FLAG_ADMIN_UP)      |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join_task
        |                               |
        |                       spin_lock_irq(lock)
        |                               |
        |                       init_completion(mcast)
        |                               |
        |                       set_bit(IPOIB_MCAST_FLAG_BUSY)
        |                               |
        |                       Context Switch
        |                               |
    clear_bit(IPOIB_FLAG_OPER_UP)       |
        |                               |
    spin_lock_irqsave(lock)             |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join
        |                       return (-EINVAL)
        |                               |
        |                       spin_unlock_irq(lock)
        |                               |
        |                       Context Switch
        |                               |
    ipoib_mcast_dev_flush               |
    wait_for_completion(mcast)          |

ipoib_stop will wait for mcast completion for ever, and will
not release the rtnl_lock. As a result panic occurs with the
following trace:

    [13441.639268] Call Trace:
    [13441.640150]  [<ffffffff8168b579>] schedule+0x29/0x70
    [13441.641038]  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
    [13441.641914]  [<ffffffff810bc017>] ? complete+0x47/0x50
    [13441.642765]  [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
    [13441.643580]  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
    [13441.644434]  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
    [13441.645293]  [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
    [13441.646159]  [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
    [13441.647013]  [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]

Fixes: 08bc327629 ("IB/ipoib: fix for rare multicast join race condition")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 11:45:55 -04:00
Feras Daoud 9a9b811269 IB/ipoib: Update broadcast object if PKey value was changed in index 0
Update the broadcast address in the priv->broadcast object when the
Pkey value changes in index 0, otherwise the multicast GID value will
keep the previous value of the PKey, and will not be updated.
This leads to interface state down because the interface will keep the
old PKey value.

For example, in SR-IOV environment, if the PF changes the value of PKey
index 0 for one of the VFs, then the VF receives PKey change event that
triggers heavy flush. This flush calls update_parent_pkey that update the
broadcast object and its relevant members. If in this case the multicast
GID will not be updated, the interface state will be down.

Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 11:45:55 -04:00
yonatanc 4ed6ad1eb3 IB/rxe: Cache dst in QP instead of getting it for each send
In RC QP there is no need to resolve the outgoing interface
for each packet, as this does not change during QP life cycle.

Instead cache the interface on the socket and use that one.
This improves performance by 12% by sparing redundant
calls to rxe_find_route.

ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100

----------------------------------------------------------------------------------------
|        | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
----------------------------------------------------------------------------------------
| before | 1048576 | 9000       | inf             | 551.21             | 0.000551      |
| after  | 1048576 | 9000       | inf             | 615.54             | 0.000616      |
----------------------------------------------------------------------------------------

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:45:02 -04:00
yonatanc cee2688e3c IB/rxe: Offload CRC calculation when possible
Use CPU ability to perform CRC calculations, by
replacing direct calls to crc32_le() with crypto_shash_updata().

The overall performance gain measured with ib_send_bw tool is 10% and it
was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU.

ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100

---------------------------------------------------------------------------------------------
|             | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
---------------------------------------------------------------------------------------------
| crc32_le    | 1048576 | 9000       | inf             | 497.60             | 0.000498      |
| CRC offload | 1048576 | 9000       | inf             | 546.70             | 0.000547      |
---------------------------------------------------------------------------------------------

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:45:02 -04:00
Parav Pandit 0d38ac8a8b IB/rxe: Do not export module's private function
Function rxe_rcv is used internally in RXE and don't need to be
exported. This patch removes such export declaration.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:43:28 -04:00
Parav Pandit 99fc12f60e IB/rxe: Avoid accessing timers for non RC QPs
This patch avoids RNR NAK timer and retransmit timer initialization and
cleanup for non RC QPs (such as UD QP, GSI QP).

Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:43:28 -04:00
Yonatan Cohen 0b1e5b99a4 IB/rxe: Add port protocol stats
Expose new counters using the get_hw_stats callback.
We expose the following counters:

+---------------------+----------------------------------------+
|      Name           |           Description                  |
|---------------------+----------------------------------------|
|sent_pkts            | number of sent pkts                    |
|---------------------+----------------------------------------|
|rcvd_pkts            | number of received packets             |
|---------------------+----------------------------------------|
|out_of_sequence      | number of errors due to packet         |
|                     | transport sequence number              |
|---------------------+----------------------------------------|
|duplicate_request    | number of received duplicated packets. |
|                     | A request that previously executed is  |
|                     | named duplicated.                      |
|---------------------+----------------------------------------|
|rcvd_rnr_err         | number of received RNR by completer    |
|---------------------+----------------------------------------|
|send_rnr_err         | number of sent RNR by responder        |
|---------------------+----------------------------------------|
|rcvd_seq_err         | number of out of sequence packets      |
|                     | received                               |
|---------------------+----------------------------------------|
|ack_deffered         | number of deferred handling of ack     |
|                     | packets.                               |
|---------------------+----------------------------------------|
|retry_exceeded_err   | number of times retry exceeded         |
|---------------------+----------------------------------------|
|completer_retry_err  | number of times completer decided to   |
|                     | retry                                  |
|---------------------+----------------------------------------|
|send_err             | number of failed send packet           |
+---------------------+----------------------------------------+

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:43:28 -04:00
Doug Ledford 339e7575ad cxgb4: Convert PDBG to pr_debug the second
A couple spots were missed in the original patch to implement this
change.  Add those spots.

Fixes: a9a42886d0 (cxgb4: Convert PDBG to pr_debug)
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 22:18:54 -04:00
Markus Elfring 4418b27b52 IB/hns: Use kcalloc() in hns_roce_buddy_init()
* Multiplications for the size determination of memory allocations
  indicated that array data structures should be processed.
  Thus use the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of data types by pointer dereferences
  to make the corresponding size determinations a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:31:49 -04:00
Markus Elfring e1d717de5d IB/hns: Use kmalloc_array() in hns_roce_cmd_use_events()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:31:49 -04:00
Markus Elfring db6f0289f5 IB/hfi1: Coding style improvement (make sizeof use safer)
Replace the specification of a data structure by a reference to
the desired member as the parameter for the operator "sizeof" to make
the corresponding size determination a bit safer according to
the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:25:04 -04:00
Markus Elfring e036c2006c IB/hfi1: Remove intermediate var in hfi1_user_sdma_alloc_queues()
* Pass a product for a call of the function "vmalloc_user" without storing
  it in an intermediate variable.

* Delete the local variable "memsize" which became unnecessary with
  this refactoring.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:24:05 -04:00
Markus Elfring 147d84e1e3 IB/hfi1: Use kcalloc() in hfi1_user_sdma_alloc_queues()
* Multiplications for the size determination of memory allocations
  indicated that array data structures should be processed.
  Thus reuse the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:23:25 -04:00
Markus Elfring 4076e5187d IB/hfi1: Use kcalloc() in hfi1_user_exp_rcv_init()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus reuse the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:23:25 -04:00
Joe Perches a9a42886d0 cxgb4: Convert PDBG to pr_debug
Use a more typical logging style.

Miscellanea:

o Obsolete the c4iw_debug module parameter
o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:14:13 -04:00
Joe Perches 700456bd25 cxgb4: Use more common logging style
Convert printks to pr_<level>

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:13:20 -04:00
Joe Perches b7b37ee0e1 cxgb3: Convert PDBG to pr_debug
Using the normal mechanism, not an indirected one, is clearer.

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:13:20 -04:00
Joe Perches 46b2d4e8ec cxgb3: Use more common logging style
Convert printks to pr_<level>

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 16:13:20 -04:00
Erez Shitrit cd565b4b51 IB/IPoIB: Support acceleration options callbacks
IPoIB driver now uses the new set of callback functions.

If the hardware provider supports the new ipoib_options implementation,
the driver uses the callbacks in its data path flows, otherwise it uses the
driver default implementation for all data flows in its code.

The default implementation wasn't change and it is exactly as it was before
introduction of acceleration support.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:44 -04:00
Erez Shitrit c1048aff7e IB/IPoIB: Use defined function for netdev_priv function
Make ipoib_priv point to netdev_priv where the code calls netdev_priv.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:44 -04:00
Erez Shitrit 10adcbd2d5 IB/IPoIB: Rename qpn to be dqpn in ipoib_send and post_send functions
Change of function parameter name from qpn to be dqpn.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:43 -04:00
Erez Shitrit 7ce1a3ee02 IB/IPoIB: Separate control from HW operation on ipoib_open/stop ndo
This patch is preparing the netdev part at the IPoIB driver to be able
to use the ipoib_options.

It deals with the two flows from the .ndo: ipoib_open and ipoib_stop.

The code is rearranged as follows:
 * All operations which deal with the hardware resources, (for example
   change QP state, post-receive etc.) are performed in one place.
 * All operations that are control oriented (like restart multicast task,
   start the reap_ah etc.) are performed in separate place.

The functions that deal with the hardware resources now located at
__ipoib_ib_dev_open for the ipoib_open flow and __ipoib_ib_dev_stop
for ipoib_stop.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:43 -04:00
Erez Shitrit 515ed4f3aa IB/IPoIB: Separate control and data related initializations
This patch prepares init and teardown flows so we can call them
through ipoib_options function pointers.

It arranges that area of code as the following:
 * All operations which deal with the resource allocation/deletion
   are performed in one place.
 * All operations that are control oriented, meaning that they are not
   connected to a specific hardware, are performed in a separate place.

The operations for allocation of hardware resources are now in the
function ipoib_dev_init_default, and the deletion of all the resources
are in ipoib_dev_uninit_default

The only exception is the creation of the PD object,
which is used both for resource allocation (create QP etc.)
and for control flows like creating AH.

It also does:
 * Move creation of rx_ring and tx_ring to be in the resources
   allocation area.
 * Move the function ipoib_ib_dev_open that does the open device
   to the control area instead of the dev_init which creates resources.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:42 -04:00
Vishwanathapura, Niranjana 64551ede6c IB/hfi1: VNIC SDMA support
HFI1 VNIC SDMA support enables transmission of VNIC packets over SDMA.
Map VNIC queues to SDMA engines and support halting and wakeup of the
VNIC queues.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:41 -04:00
Vishwanathapura, Niranjana 2280740f01 IB/hfi1: Virtual Network Interface Controller (VNIC) HW support
HFI1 HW specific support for VNIC functionality.
Dynamically allocate a set of contexts for VNIC when the first vnic
port is instantiated. Allocate VNIC contexts from user contexts pool
and return them back to the same pool while freeing up. Set aside
enough MSI-X interrupts for VNIC contexts and assign them when the
contexts are allocated. On the receive side, use an RSM rule to
spread TCP/UDP streams among VNIC contexts.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 15:19:35 -04:00
Vishwanathapura, Niranjana d4829ea603 IB/hfi1: OPA_VNIC RDMA netdev support
Add support to create and free OPA_VNIC rdma netdev devices.
Implement netstack interface functionality including xmit_skb,
receive side NAPI etc. Also implement rdma netdev control functions.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:03:12 -04:00
Vishwanathapura, Niranjana 1bd671ab3f IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) function
OPA VEMA function interfaces with the Infiniband MAD stack to exchange the
management information packets with the Ethernet Manager (EM).
It interfaces with the OPA VNIC netdev function to SET/GET the management
information. The information exchanged with the EM includes class port
details, encapsulation configuration, various counters, unicast and
multicast MAC list and the MAC table. It also supports sending traps
to the EM.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana cfd34f8eb0 IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) interface
OPA VNIC EMA interface functions are the management interfaces to the OPA
VNIC netdev. Add support to add and remove VNIC ports. Implement the
required GET/SET management interface functions and processing of new
management information. Add support to send trap notifications upon various
events like interface status change, unicast/multicast mac list update and
mac address change.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 174e03d7e6 IB/opa-vnic: VNIC MAC table support
OPA VNIC MAC table contains the MAC address to DLID mappings provided by
the Ethernet manager. During transmission, the MAC table provides the MAC
address to DLID translation. Implement MAC table using simple hash list.
Also provide support to update/query the MAC table by Ethernet manager.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 009b7dd40c IB/opa-vnic: VNIC statistics support
OPA VNIC driver statistics support maintains various counters including
standard netdev counters and the Ethernet manager defined counters.
Add the Ethtool hook to read the counters.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 72dc761440 IB/opa-vnic: VNIC Ethernet Management (EM) structure definitions
Define VNIC EM MAD structures and the associated macros. These structures
are used for information exchange between VNIC EM agent (EMA) on the host
and the Ethernet manager. These include the virtual ethernet switch (vesw)
port information, vesw port mac table, summay and error counters,
vesw port interface mac lists and the EMA trap.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 7d6f728c67 IB/opa-vnic: Virtual Network Interface Controller (VNIC) netdev
OPA VNIC netdev function supports Ethernet functionality over Omni-Path
fabric by encapsulating Ethernet packets inside Omni-Path packet header.
It allocates a rdma netdev device and interfaces with the network stack to
provide standard Ethernet network interfaces. It overrides HFI1 device's
netdev operations where it is required.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Doug Ledford 23790ba2d7 Merge branch 'k.o/for-4.12' into k.o/for-4.12-rdma-netdevice 2017-04-20 12:00:41 -04:00
Matan Barak db1b5ddd53 IB/core: Rename uverbs event file structure
Previously, ib_uverbs_event_file was suffixed by _file as it contained
the actual file information. Since it's now only used as base struct
for ib_uverbs_async_event_file and ib_uverbs_completion_event_file,
we change its name to ib_uverbs_event_queue. This represents its
logical role better.

Fixes: 1e7710f3f6 ('IB/core: Change completion channel to use the reworked objects schema')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak e0fcc61113 IB/core: Don't use is_async in event files to infer events size
Previously, we inferred the events size in ib_uverbs_event_read by
using the is_async flag. Instead of that, we pass the event size
directly.

Fixes: 1e7710f3f6 ('IB/core: Change completion channel to use the reworked objects schema')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak c52d8114d1 IB/core: A small refactor in destroy WQ handler
Instead of having uverbs_uobject_put both in the error flow and the
good flow, we unite them.

Fixes: fd3c7904db ('IB/core: Change idr objects to use the new schema')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak d9edfc5a4f IB/core: Nullify ib_uobject during allocation
Currently, we initialize all fields of ib_uobject straight after
allocation. Therefore, a kmalloc was sufficient. Since ib_uobject
could be embedded in a type specific structure, we nullify it to
spare programmer errors.

Fixes: 3832125624 ('IB/core: Add support for idr types')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak f025c48958 IB/core: Don't pass the lock state to _rdma_remove_commit_uobject
The only scenario where this function was called while the lock is
already taken is in the context cleanup scenario. Thus, in order not
to pass the lock state to this function, we just call the remove logic
straight from the cleanup context function.

Fixes: 3832125624 ('IB/core: Add support for idr types')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak 30004b861a IB/core: Rename write flag to exclusive in rdma_core
We rename the "write" flags to "exclusive", as it's used for both
WRITE and DESTROY actions.

Fixes: 3832125624 ('IB/core: Add support for idr types')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Erez Shitrit 93d576af3c hw/mlx5: Add New bit to check over QP creation
Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:32 -04:00
Saeed Mahameed 258545449b net/mlx5e: IPoIB, Xmit flow
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
David S. Miller 6b6cbc1471 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were simply overlapping changes.  In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.

In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15 21:16:30 -04:00
Johannes Berg fceb6435e8 netlink: pass extended ACK struct to parsing functions
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:22 -04:00
Johannes Berg 2d4bc93368 netlink: extended ACK reporting
Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:20 -04:00
Linus Torvalds 025def92dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:

 "There has been work in a number of different areas over the last
  weeks, including:

   - Fix target-core-user (TCMU) back-end bi-directional handling (Xiubo
     Li + Mike Christie + Ilias Tsitsimpis)

   - Fix iscsi-target TMR reference leak during session shutdown (Rob
     Millner + Chu Yuan Lin)

   - Fix target_core_fabric_configfs.c race between LUN shutdown +
     mapped LUN creation (James Shen)

   - Fix target-core unknown fabric callback queue-full errors (Potnuri
     Bharat Teja)

   - Fix iscsi-target + iser-target queue-full handling in order to
     support iw_cxgb4 RNICs. (Potnuri Bharat Teja + Sagi Grimberg)

   - Fix ALUA transition state race between multiple initiator (Mike
     Christie)

   - Drop work-around for legacy GlobalSAN initiator, to allow QLogic
     57840S + 579xx offload HBAs to work out-of-the-box in MSFT
     environments. (Martin Svec + Arun Easi)

  Note that a number are CC'ed for stable, and although the queue-full
  bug-fixes required for iser-target to work with iw_cxgb4 aren't CC'ed
  here, they'll be posted to Greg-KH separately"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
  iscsi-target: Drop work-around for legacy GlobalSAN initiator
  target: Fix ALUA transition state race between multiple initiators
  iser-target: avoid posting a recv buffer twice
  iser-target: Fix queue-full response handling
  iscsi-target: Propigate queue_data_in + queue_status errors
  target: Fix unknown fabric callback queue-full errors
  tcmu: Fix wrongly calculating of the base_command_size
  tcmu: Fix possible overwrite of t_data_sg's last iov[]
  target: Avoid mappedlun symlink creation during lun shutdown
  iscsi-target: Fix TMR reference leak during session shutdown
  usb: gadget: Correct usb EP argument for BOT status request
  tcmu: Allow cmd_time_out to be set to zero (disabled)
2017-04-11 23:51:58 -07:00
David S. Miller 6f14f443d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 08:24:51 -07:00
Mike Marciniszyn b58fc80497 IB/hfi1: Eliminate synchronize_rcu() in mr delete
The synchronize_rcu() call can be eliminated to improve memory deregistration
performance.

There are two key fields involved:
- The rcu pointer itself
- the lkey_published field

To close the window between the rcu read of the mregion pointer and the
reference count the code should:

1. To lkey/rkey validation (reader)

Read the rcu pointer.  If the pointer is non-NULL, get a reference.

To the current validation tests use a READ_ONCE() on the lkey_published.

Upon any failure release the reference.

2. To the remove logic (delete)

Insure the published is zeroed prior to setting the pointer to NULL.
This requires using rcu_assign_pointer() to insure lkey_published
is written prior to the NULL.

3. To the insert logic (add)

Insure the published is set use an rcu_assign_pointer() to insure the
pointer is after all MR fields.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Don Hiatt 243d9f436f IB/hfi1: Add transmit fault injection feature
Add ability to fault packets on transmit by opcode.
Dropping by packet can be achieved by setting the mask to 0.

In order to drop non-verbs traffic we set PbcInsertHrc
to NONE (0x2). The packet will still be delivered to
the receiving node but a KHdrHCRCErr (KDETH packet
with a bad HCRC) will be triggered and the packet will
not be delivered to the correct context.

In order to drop regular verbs traffic we set the
PbcTestEbp flag. The packet will still be delivered
to the receiving node but a 'late ebp error' will
be triggered and will be dropped.

A global toggle (/sys/kernel/debug/hfi1/hfi1_X/fault_suppress_err)
has been added to suppress the error messages on the receive
node when a packet was faulted on the sending node.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Don Hiatt 0181ce31b2 IB/hfi1: Add receive fault injection feature
Add fault injection capability:
  - Drop packets unconditionally (fault_by_packet)
  - Drop packets based on opcode (fault_by_opcode)

This feature reacts to the global FAULT_INJECTION
config flag.

The faulting traces have been added:
  - misc/fault_opcode
  - misc/fault_packet

See 'Documentation/fault-injection/fault-injection.txt'
for details.

Examples:
  - Dropping packets by opcode:
    /sys/kernel/debug/hfi1/hfi1_X/fault_opcode
	# Enable fault
	echo Y > fault_by_opcode
	# Setprobability of dropping (0-100%)
	# echo 25 > probability
	# Set opcode
	echo 0x64 > opcode
	# Number of times to fault
	echo 3 > times
	# An optional mask allows you to fault
	# a range of opcodes
	echo 0xf0 > mask
    /sys/kernel/debug/hfi1/hfi1_X/fault_stats
    contains a value in parentheses to indicate
    number of each opcode dropped.

  - Dropping packets unconditionally
    /sys/kernel/debug/hfi1/hfi1_X/fault_packet
	# Enable fault
	echo Y > fault_by_packet
    /sys/kernel/debug/hfi1/hfi1_X/fault_packet/fault_stats
    contains the number of packets dropped.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Michael J. Ruhl f7b4263372 IB/hfi1: Ensure VL index is within bounds
Improve the safety of the code and ensure the array cannot be indexed
out of bounds when picking the CPU for a given SDMA engine.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Mike Marciniszyn 44dcfa4b18 IB/rdmavt: Avoid reseting wqe send_flags in unreserve
The wqe should be read only and in fact the superfluous reset of the
RVT_SEND_RESERVE_USED flag causes an issue where reserved operations
elicit a bad completion to the ULP.

The maintenance of the flag is now entirely within rvt_post_one_wr()
where a reserved operation will set the flag and a non-reserved operation
will insure the operation that is about to be posted has the flag reset.

Fixes: Commit 856cc4c237 ("IB/hfi1: Add the capability for reserved operations")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Sebastian Sanchez 5f14e4e667 IB/rdmavt, IB/hfi1: Fix timer migration regressions
RC timeout counter isn't getting incremented.
Increment counter and add the trace for it.

Fixes: 87c23b4ab018 ("IB/rdmavt: Adding timer logic to rdmavt")
Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Michael J. Ruhl 5e6e94244b IB/hfi1: Add a patch value to the firmware version string
The HFI firmware now includes a patch level in its version.
Updating the necessary code to include the patch version in the
firmware string.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Easwar Hariharan fb897ad315 IB/hfi1: Check for QSFP presence before attempting reads
Attempting to read the status of a QSFP cable creates noise in the logs
and misses out on setting an appropriate Offline/Disabled Reason if the
cable is not plugged in. Check for this prior to attempting the read and
attendant retries.

Fixes: 673b975f1f ("IB/hfi1: Add QSFP sanity pre-check")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Tadeusz Struk 62eed66e98 IB/hfi1: Protect the global dev_cntr_names and port_cntr_names
Protect the global dev_cntr_names and port_cntr_names with the global
mutex as they are allocated and freed in a function called per device.
Otherwise there is a danger of double free and memory leaks.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Tadeusz Struk 5d6f08afdd IB/hfi1: Check device id early during init
If there is a wrong device passed to the driver it should fail early,
without trying to initialize the device only to find out that it has
an invalid device later during the init.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Mike Marciniszyn 9260b3541f IB/rdmavt: Add swqe completion trace
The following fields are available for filter/trace:
- wqe
- wr_id
- qpn
- qpt
- length
- idx
- ssn
- (wr)opcode
- (wr)send_flags

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Mike Marciniszyn c6ad9482fc IB/rdmavt: Add tracing for cq entry and poll
The following fields are defined for filtering and triggering:
- wr_id
- status
- opcode
- qpn
- length
- idx

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Mike Marciniszyn 2a1b7c8b8e IB/rdmavt: Add additional fields to post send trace
This fix is to get additional debugging information.

The following fields are added:
- wqe
- qpt
- num_sge
- ssn
- pid
- send_flags

These additional fields provide for more focused filtering
and triggering.

The patch also moves the trace to just before the wqe is
posted to get the most accurate information and future proofs
the code to trace all possible reserved opcodes.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Mike Marciniszyn 43a474aadb IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependent
The work to create a completion helper moved the translation of send
wqe operations to completion opcodes to rdmvat.

This precludes having driver dependent operations.  Make the translation
driver dependent by doing the translation in the driver prior to the
rvt_qp_swqe_complete() call using restored translation tables.

Fixes: Commit f2dc9cdce8 ("IB/rdmavt: Add a send completion helper")
Fixes: Commit 0771da5a6e ("IB/hfi1,IB/qib: Use new send completion helper")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Sebastian Sanchez 5a52a7acf7 IB/hfi1: NULL pointer dereference when freeing rhashtable
A NULL pointer dereference occurs when the driver
is unloaded, and the SDMA rhashtable is freed if
the rhashtable_init() function has not been called.
Prevent this by changing sdma_rht to be a pointer
to a dynamically allocated hash table. The NULL-ness
of the pointer serves as an indication that the hash
table was initialized and that it needs to be
destroyed.

Fixes: 0cb2aa690c ("IB/hfi1: Add sysfs interface for affinity setup")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Michael J. Ruhl 8688426ba6 IB/hfi1: Cache registers during state change
When the LCB is going offline, inopportune port queries can cause
benign error messages to be logged.  To deal with this, cache the
registers just before setting the LCB to offline, allowing queries to
return without eliciting the error.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Michael J. Ruhl 0519c520dc IB/hfi1: Race hazard avoidance in user SDMA driver
Set the errcode before the state and add the smb_wmb() to avoid a
potential race condition with the user.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Dean Luick ec8a142327 IB/hfi1: Force logical link down
If the logical link state does not read as down when
the physical link state is offline, force it to down.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Shamir Rabinovitch 771a525840 IB/IPoIB: ibX: failed to create mcg debug file
When udev renames the netdev devices, ipoib debugfs entries does not
get renamed. As a result, if subsequent probe of ipoib device reuse the
name then creating a debugfs entry for the new device would fail.

Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part
of ipoib event handling in order to avoid any race condition between these.

Fixes: 1732b0ef3b ([IPoIB] add path record information in debugfs)
Cc: stable@vger.kernel.org # 2.6.15+
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:47:24 -04:00
Mark Brown cd6ce4a573 IB/hns: Explicitly include linux/of.h
hns_roce_hw_v1.c uses DT interfaces but relies on implict inclusion of
linux/of.h which means that changes in other headers could break the
build, as happened in -next for arm64 today.  Add an explicit include.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:45:53 -04:00
Matan Barak 1e7710f3f6 IB/core: Change completion channel to use the reworked objects schema
This patch adds the standard fd based type - completion_channel.
The completion_channel is now prefixed with ib_uobject, similarly
to the rest of the uobjects.
This requires a few changes:
(1) We define a new completion channel fd based object type.
(2) completion_event and async_event are now two different types.
    This means they use different fops.
(3) We release the completion_channel exactly as we release other
    idr based objects.
(4) Since ib_uobjects are already kref-ed, we only add the kref to the
    async event.

A fd object requires filling out several parameters. Its op pointer
should point to uverbs_fd_ops and its size should be at least the
size if ib_uobject. We use a macro to make the type declaration
easier.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:28:04 -04:00
Matan Barak cf8966b347 IB/core: Add support for fd objects
The completion channel we use in verbs infrastructure is FD based.
Previously, we had a separate way to manage this object. Since we
strive for a single way to manage any kind of object in this
infrastructure, we conceptually treat all objects as subclasses
of ib_uobject.

This commit adds the necessary mechanism to support FD based objects
like their IDR counterparts. FD objects release need to be synchronized
with context release. We use the cleanup_mutex on the uverbs_file for
that.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:28:04 -04:00
Matan Barak f48b726920 IB/core: Add lock to multicast handlers
When two handlers used the same object in the old schema, we blocked
the process in the kernel. The new schema just returns -EBUSY. This
could lead to different behaviour in applications between the old
schema and the new schema. In most cases, using such handlers
concurrently could lead to crashing the process. For example, if
thread A destroys a QP and thread B modifies it, we could have the
destruction happens before the modification. In this case, we are
accessing freed memory which could lead to crashing the process.
This is true for most cases. However, attaching and detaching
a multicast address from QP concurrently is safe. Therefore, we
preserve the original behaviour by adding a lock there.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:28:04 -04:00
Matan Barak fd3c7904db IB/core: Change idr objects to use the new schema
This changes only the handlers which deals with idr based objects to
use the new idr allocation, fetching and destruction schema.
This patch consists of the following changes:
(1) Allocation, fetching and destruction is done via idr ops.
(2) Context initializing and release is done through
    uverbs_initialize_ucontext and uverbs_cleanup_ucontext.
(3) Ditching the live flag. Mostly, this is pretty straight
    forward. The only place that is a bit trickier is in
    ib_uverbs_open_qp. Commit [1] added code to check whether
    the uobject is already live and initialized. This mostly
    happens because of a race between open_qp and events.
    We delayed assigning the uobject's pointer in order to
    eliminate this race without using the live variable.

[1] commit a040f95dc8
	("IB/core: Fix XRC race condition in ib_uverbs_open_qp")

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:28:04 -04:00
Matan Barak 6be60aed12 IB/core: Add idr based standard types
This patch adds the standard idr based types. These types are
used in downstream patches in order to initialize, destroy and
lookup IB standard objects which are based on idr objects.

An idr object requires filling out several parameters. Its op pointer
should point to uverbs_idr_ops and its size should be at least the
size of ib_uobject. We add a macro to make the type declaration easier.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:28:04 -04:00
Matan Barak 3832125624 IB/core: Add support for idr types
The new ioctl infrastructure supports driver specific objects.
Each such object type has a hot unplug function, allocation size and
an order of destruction.

When a ucontext is created, a new list is created in this ib_ucontext.
This list contains all objects created under this ib_ucontext.
When a ib_ucontext is destroyed, we traverse this list several time
destroying the various objects by the order mentioned in the object
type description. If few object types have the same destruction order,
they are destroyed in an order opposite to their creation.

Adding an object is done in two parts.
First, an object is allocated and added to idr tree. Then, the
command's handlers (in downstream patches) could work on this object
and fill in its required details.
After a successful command, the commit part is called and the user
objects become ucontext visible. If the handler failed, alloc_abort
should be called.

Removing an uboject is done by calling lookup_get with the write flag
and finalizing it with destroy_commit. A major change from the previous
code is that we actually destroy the kernel object itself in
destroy_commit (rather than just the uobject).

We should make sure idr (per-uverbs-file) and list (per-ucontext) could
be accessed concurrently without corrupting them.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:28:04 -04:00
Matan Barak 771addf60a IB/core: Refactor idr to be per uverbs_file
The current code creates an idr per type. Since types are currently
common for all drivers and known in advance, this was good enough.
However, the proposed ioctl based infrastructure allows each driver
to declare only some of the common types and declare its own specific
types.

Thus, we decided to implement idr to be per uverbs_file.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 13:28:04 -04:00
Sagi Grimberg 7a56dc8888 iser-target: avoid posting a recv buffer twice
We pre-allocate our send-queues and might overflow them
in case we have multi work-request operations which tend
to occur for large RDMA transfers over devices with limited
allowed sg elements. When we get to a queue-full condition
we might retry again later, so track our receive buffers
so we don't repost them for a retry case.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 20:35:50 -07:00
Nicholas Bellinger 555a65f66c iser-target: Fix queue-full response handling
This patch addresses two queue-full handling bugs in iser-target.

The first is propagating isert_rdma_rw_ctx_post() return back
to target-core via isert_put_datain() + isert_get_dataout()
callbacks, in order to trigger queue-full logic in target-core.
Note target-core expects -EAGAIN or -ENOMEM error to signal
RDMA WRITE/READ data-transfer callbacks should be retried,
after queue-full logic been invoked.

Other types of errors propagated up from RDMA RW API will result
in target-core generating internal CHECK_CONDITION status,
avoiding subsequent isert_put_datain() and isert_get_dataout()
iscsit_transport callback retry attempts.

The second is to use transport_generic_request_failure()
during T10-PI hw-offload errors in isert_rdma_write_done()
and isert_rdma_read_done(), so CHECK_CONDITION queue-full
is handled internally by target-core.

Also add isert_put_response() T10-PI failure case fixme in
isert_rdma_write_done(), which is currently not internally
retried or released until session reinstatement.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 20:35:07 -07:00
Florian Westphal 282ccf6efb drivers: add explicit interrupt.h includes
These files all use functions declared in interrupt.h, but currently rely
on implicit inclusion of this file (via netns/xfrm.h).

That won't work anymore when the flow cache is removed so include that
header where needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30 11:05:34 -07:00
Greg Kroah-Hartman 57c0eabbd5 Merge 4.11-rc4 into char-misc-next
We want the char-misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-27 09:13:04 +02:00
Arnd Bergmann f6aafac184 IB/qib: fix false-postive maybe-uninitialized warning
aarch64-linux-gcc-7 complains about code it doesn't fully understand:

drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change':
include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The code is right, and despite trying hard, I could not come up with a version
that I liked better than just adding a fake initialization here to shut up the
warning.

Fixes: f931551baf ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 22:44:29 -04:00
Sagi Grimberg ea174c9573 RDMA/iser: Fix possible mr leak on device removal event
When the rdma device is removed, we must cleanup all
the rdma resources within the DEVICE_REMOVAL event
handler to let the device teardown gracefully. When
this happens with live I/O, some memory regions are
occupied. Thus, track them too and dereg all the mr's.

We are safe with mr access by iscsi_iser_cleanup_task.

Reported-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 22:31:19 -04:00
Sagi Grimberg b7363e67b2 IB/device: Convert ib-comp-wq to be CPU-bound
This workqueue is used by our storage target mode ULPs
via the new CQ API. Recent observations when working
with very high-end flash storage devices reveal that
UNBOUND workqueue threads can migrate between cpu cores
and even numa nodes (although some numa locality is accounted
for).

While this attribute can be useful in some workloads,
it does not fit in very nicely with the normal
run-to-completion model we usually use in our target-mode
ULPs and the block-mq irq<->cpu affinity facilities.

The whole block-mq concept is that the completion will
land on the same cpu where the submission was performed.
The fact that our submitter thread is migrating cpus
can break this locality.

We assume that as a target mode ULP, we will serve multiple
initiators/clients and we can spread the load enough without
having to use unbound kworkers.

Also, while we're at it, expose this workqueue via sysfs which
is harmless and can be useful for debug.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 22:24:04 -04:00
Sagi Grimberg fedd9e1f75 IB/cq: Don't process more than the given budget
The caller might not want this overhead.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 22:19:48 -04:00
David Marchand 9fcd67d177 IB/rxe: increment msn only when completing a request
According to C9-147, MSN should only be incremented when the last packet of
a multi packet request has been received.

"Logically, the requester associates a sequential Send Sequence Number
(SSN) with each WQE posted to the send queue. The SSN bears a one-
to-one relationship to the MSN returned by the responder in each re-
sponse packet. Therefore, when the requester receives a response, it in-
terprets the MSN as representing the SSN of the most recent request
completed by the responder to determine which send WQE(s) can be
completed."

Fixes: 8700e3e7c4 ("Soft RoCE driver")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 22:07:27 -04:00
Bart Van Assche 0957c29f78 IB/core: Restore I/O MMU, s390 and powerpc support
Avoid that the following error message is reported on the console
while loading an RDMA driver with I/O MMU support enabled:

DMAR: Allocating domain for mlx5_0 failed

Ensure that DMA mapping operations that use to_pci_dev() to
access to struct pci_dev see the correct PCI device. E.g. the s390
and powerpc DMA mapping operations use to_pci_dev() even with I/O
MMU support disabled.

This patch preserves the following changes of the DMA mapping updates
patch series:
- Introduction of dma_virt_ops.
- Removal of ib_device.dma_ops.
- Removal of struct ib_dma_mapping_ops.
- Removal of an if-statement from each ib_dma_*() operation.
- IB HW drivers no longer set dma_device directly.

Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Fixes: commit 99db949403 ("IB/core: Remove ib_device.dma_device")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: parav@mellanox.com
Tested-by: parav@mellanox.com
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 21:51:16 -04:00
Leon Romanovsky a1c5dd1322 IB/rxe: Update documentation link
All Soft-RoCE (rxe) is handled now in rdma-core user space library,
so the documentation. The patch below updates the documentation
link to that new location.

Reported-by: Josh Beavers <josh.beavers@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 21:15:28 -04:00
Dan Carpenter 004d18ea99 RDMA/ocrdma: fix a type issue in ocrdma_put_pd_num()
We want to return zero on success or negative error codes.  The type
should be int and not u8.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 21:11:15 -04:00
Dan Carpenter ded2602353 IB/rxe: double free on error
"goto err;" has it's own kfree_skb() call so it's a double free.  We
only need to free on the "goto exit;" path.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 20:53:32 -04:00
Aditya Sarwade b172679b0d RDMA/vmw_pvrdma: Activate device on ethernet link up
Restore device state when ethernet link changes to active.

Acked-by: George Zhang <georgezhang@vmware.com>
Acked-by: Jorgen Hansen <jhansen@vmware.com>
Acked-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 20:49:53 -04:00
Adit Ranadive e51c2fb033 RDMA/vmw_pvrdma: Dont hardcode QP header page
Moved the header page count to a macro.

Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 20:49:53 -04:00
Adit Ranadive 6332dee83d RDMA/vmw_pvrdma: Cleanup unused variables
Removed the unused nreq and redundant index variables.
Moved hardcoded async and cq ring pages number to macro.

Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 20:49:53 -04:00
Jason Gunthorpe cb88645596 infiniband: Fix alignment of mmap cookies to support VIPT caching
When vmalloc_user is used to create memory that is supposed to be mmap'd
to user space, it is necessary for the mmap cookie (eg the offset) to be
aligned to SHMLBA.

This creates a situation where all virtual mappings of the same physical
page share the same virtual cache index and guarantees VIPT coherence.
Otherwise the cache is non-coherent and the kernel will not see writes
by userspace when reading the shared page (or vice-versa).

Reported-by: Josh Beavers <josh.beavers@gmail.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 16:50:51 -04:00
Sagi Grimberg 86f46aba8d IB/core: Protect against self-requeue of a cq work item
We need to make sure that the cq work item does not
run when we are destroying the cq. Unlike flush_work,
cancel_work_sync protects against self-requeue of the
work item (which we can do in ib_cq_poll_work).

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 16:40:31 -04:00
Shiraz Saleem 871a8623d3 i40iw: Receive netdev events post INET_NOTIFIER state
Netdev notification events are de-registered only when all
client iwdev instances are removed. If a single client is closed
and re-opened, netdev events could arrive even before the Control
Queue-Pair (CQP) is created, causing a NULL pointer dereference crash
in i40iw_get_cqp_request. Fix this by allowing netdev event
notification only after we have reached the INET_NOTIFIER state with
respect to device initialization.

Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 16:23:29 -04:00
Logan Gunthorpe 985087157c infiniband: utilize the new cdev_set_parent function
This replaces the suspect looking cdev.kobj.parent lines with the
equivalent cdev_set_parent function. This is a straightforward change
that's largely cosmetic but it does push the kobj.parent ownership
into char_dev.c where it belongs.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-21 06:44:33 +01:00
Jason Gunthorpe a0d78193dc IB/ucm: utilize new cdev_device_add helper function
The use after free is not triggerable here because the cdev holds
the module lock and the only device_unregister is only triggered by
module unload, however make the change for consistency.

To make this work the cdev_del needs to move out of the struct device
release function.

This cleans up the error path significantly and thus also fixes a minor
bug where the devnum would not be released if cdev_add failed.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-21 06:44:33 +01:00
Mintz, Yuval be086e7c53 qed*: Utilize Firmware 8.15.3.0
This patch advances the qed* drivers into using the newer firmware -
This solves several firmware bugs, mostly related [but not limited to]
various init/deinit issues in various offloaded protocols.

It also introduces a major 4-Cached SGE change in firmware, which can be
seen in the storage drivers' changes.

In addition, this firmware is required for supporting the new QL41xxx
series of adapters; While this patch doesn't add the actual support,
the firmware contains the necessary initialization & firmware logic to
operate such adapters [actual support would be added later on].

Changes from Previous versions:
-------------------------------
 - V2 - fix kbuild-test robot warnings

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13 15:33:09 -07:00
Ingo Molnar 0881e7bd34 sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h>
But first update usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar 589ee62844 sched/headers: Prepare to remove the <linux/mm_types.h> dependency from <linux/sched.h>
Update code that relied on sched.h including various MM types for them.

This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:37 +01:00
Ingo Molnar 174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar 3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar 6e84f31522 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar 0c98d344fe sched/core: Remove the tsk_cpus_allowed() wrapper
So the original intention of tsk_cpus_allowed() was to 'future-proof'
the field - but it's pretty ineffectual at that, because half of
the code uses ->cpus_allowed directly ...

Also, the wrapper makes the code longer than the original expression!

So just get rid of it. This also shrinks <linux/sched.h> a bit.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:24 +01:00
Linus Torvalds 86292b33d4 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - a few MM remainders

 - misc things

 - autofs updates

 - signals

 - affs updates

 - ipc

 - nilfs2

 - spelling.txt updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
  mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()
  mm: add arch-independent testcases for RODATA
  hfs: atomically read inode size
  mm: clarify mm_struct.mm_{users,count} documentation
  mm: use mmget_not_zero() helper
  mm: add new mmget() helper
  mm: add new mmgrab() helper
  checkpatch: warn when formats use %Z and suggest %z
  lib/vsprintf.c: remove %Z support
  scripts/spelling.txt: add some typo-words
  scripts/spelling.txt: add "followings" pattern and fix typo instances
  scripts/spelling.txt: add "therfore" pattern and fix typo instances
  scripts/spelling.txt: add "overwriten" pattern and fix typo instances
  scripts/spelling.txt: add "overwritting" pattern and fix typo instances
  scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances
  scripts/spelling.txt: add "disassocation" pattern and fix typo instances
  scripts/spelling.txt: add "omited" pattern and fix typo instances
  scripts/spelling.txt: add "explictely" pattern and fix typo instances
  scripts/spelling.txt: add "applys" pattern and fix typo instances
  scripts/spelling.txt: add "configuartion" pattern and fix typo instances
  ...
2017-02-27 23:09:29 -08:00
Linus Torvalds f7878dc3a9 Merge branch 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Several noteworthy changes.

   - Parav's rdma controller is finally merged. It is very straight
     forward and can limit the abosolute numbers of common rdma
     constructs used by different cgroups.

   - kernel/cgroup.c got too chubby and disorganized. Created
     kernel/cgroup/ subdirectory and moved all cgroup related files
     under kernel/ there and reorganized the core code. This hurts for
     backporting patches but was long overdue.

   - cgroup v2 process listing reimplemented so that it no longer
     depends on allocating a buffer large enough to cache the entire
     result to sort and uniq the output. v2 has always mangled the sort
     order to ensure that users don't depend on the sorted output, so
     this shouldn't surprise anybody. This makes the pid listing
     functions use the same iterators that are used internally, which
     have to have the same iterating capabilities anyway.

   - perf cgroup filtering now works automatically on cgroup v2. This
     patch was posted a long time ago but somehow fell through the
     cracks.

   - misc fixes asnd documentation updates"

* 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits)
  kernfs: fix locking around kernfs_ops->release() callback
  cgroup: drop the matching uid requirement on migration for cgroup v2
  cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
  cgroup: misc cleanups
  cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
  cgroup: track migration context in cgroup_mgctx
  cgroup: cosmetic update to cgroup_taskset_add()
  rdmacg: Fixed uninitialized current resource usage
  cgroup: Add missing cgroup-v2 PID controller documentation.
  rdmacg: Added documentation for rdmacg
  IB/core: added support to use rdma cgroup controller
  rdmacg: Added rdma cgroup controller
  cgroup: fix a comment typo
  cgroup: fix RCU related sparse warnings
  cgroup: move namespace code to kernel/cgroup/namespace.c
  cgroup: rename functions for consistency
  cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
  cgroup: separate out cgroup1_kf_syscall_ops
  cgroup: refactor mount path and clearly distinguish v1 and v2 paths
  cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
  ...
2017-02-27 21:41:08 -08:00
Vegard Nossum f1f1007644 mm: add new mmgrab() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Masahiro Yamada 608595ed9b scripts/spelling.txt: add "therfore" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  therfore||therefore

Besides, tidy up comment blocks for 80-col wrapping.

Link: http://lkml.kernel.org/r/1481573103-11329-31-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Masahiro Yamada b8a14f3379 scripts/spelling.txt: add "overwriten" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  overwrien||overwritten

Link: http://lkml.kernel.org/r/1481573103-11329-30-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Linus Torvalds ac1820fb28 This is a tree wide change and has been kept separate for that reason.
Bart Van Assche noted that the ib DMA mapping code was significantly
 similar enough to the core DMA mapping code that with a few changes
 it was possible to remove the IB DMA mapping code entirely and
 switch the RDMA stack to use the core DMA mapping code.  This resulted
 in a nice set of cleanups, but touched the entire tree.  This branch
 will be submitted separately to Linus at the end of the merge window
 as per normal practice for tree wide changes like this.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
 2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
 CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
 Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
 xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
 0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
 1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
 ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
 IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
 ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
 XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
 3e7mgpcwC+3XfA/6vU3F
 =e0Si
 -----END PGP SIGNATURE-----

Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma DMA mapping updates from Doug Ledford:
 "Drop IB DMA mapping code and use core DMA code instead.

  Bart Van Assche noted that the ib DMA mapping code was significantly
  similar enough to the core DMA mapping code that with a few changes it
  was possible to remove the IB DMA mapping code entirely and switch the
  RDMA stack to use the core DMA mapping code.

  This resulted in a nice set of cleanups, but touched the entire tree
  and has been kept separate for that reason."

* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
  IB/core: Remove ib_device.dma_device
  nvme-rdma: Switch from dma_device to dev.parent
  RDS: net: Switch from dma_device to dev.parent
  IB/srpt: Modify a debug statement
  IB/srp: Switch from dma_device to dev.parent
  IB/iser: Switch from dma_device to dev.parent
  IB/IPoIB: Switch from dma_device to dev.parent
  IB/rxe: Switch from dma_device to dev.parent
  IB/vmw_pvrdma: Switch from dma_device to dev.parent
  IB/usnic: Switch from dma_device to dev.parent
  IB/qib: Switch from dma_device to dev.parent
  IB/qedr: Switch from dma_device to dev.parent
  IB/ocrdma: Switch from dma_device to dev.parent
  IB/nes: Remove a superfluous assignment statement
  IB/mthca: Switch from dma_device to dev.parent
  IB/mlx5: Switch from dma_device to dev.parent
  IB/mlx4: Switch from dma_device to dev.parent
  IB/i40iw: Remove a superfluous assignment statement
  IB/hns: Switch from dma_device to dev.parent
  ...
2017-02-25 13:45:43 -08:00
Dave Jiang 11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Linus Torvalds af17fe7a63 Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree,
 I keept it separate from the remainder of the RDMA stack submission
 that is based on 4.10-rc3.
 
 This branch contains:
 
 - Various mlx4 and mlx5 fixes and minor changes
 - Support for adding a tag match rule to flow specs
 - Support for cvlan offload operation for raw ethernet QPs
 - A change to the core IB code to recognize raw eth capabilities and
   enumerate them (touches non-Mellanox code)
 - Implicit On-Demand Paging memory registration support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYrx+WAAoJELgmozMOVy/du70P/1kpW2xY9Le04c3K7na2XOYl
 AUVIDrW/8Go63tpOaM7jBT3k4GlwVFr3IOmBpS24KbW/THxjhyUeP5L5+z2x+go+
 jkQOgtPWWEHr5zP3MzsNyB8fDx1YQOnJwEXxybQRW/cbw4CLjnhP+ezd6FdV/3Yy
 pPEqDVlAErzvNweG+n2r1pjcUbR8uneC3inyMLnyzUBz4CHKmC8fgD3/qJIM+DNb
 gtFT5xHFIXKCigWdQ/EwsTDcHub43V8OXlI5sO7loG6vToOUATMkjI4oOUNhDmYS
 X7XLN3yRK9QHEfb5kutXIZEWzTGh7LiFtUYGaNNYqqzDfSiMRc9NC5kTOfplEXDV
 Uo+AGb6Fh1zYIOzNk7o+tazIv3LaLv6+Fcm+9bbe0VUIqasaylsePqaTwMuIzx/I
 xP5nitmd5lbYo8WdlasVdG6mH1DlJEUbU30v4DpmTpxCP6jGpog7lexyGyF3TgzS
 NhnG0IiIClWh3WQ2/GdsFK/obIdFkpLeASli1hwD81vzPfly9zc2YpgqydZI3WCr
 q6hTXYnANcP6+eciCpQPO7giRdXdiKey08Uoq/2jxb7Qbm4daG6UwopjvH9/lm1F
 m6UDaDvzNYm+Rx+bL/+KSx9JO9+fJB1L51yCmvLGpWi6yJI4ZTfanHNMBsCua46N
 Kev/DSpIAzX1WOBkte+a
 =rspQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull Mellanox rdma updates from Doug Ledford:
 "Mellanox specific updates for 4.11 merge window

  Because the Mellanox code required being based on a net-next tree, I
  keept it separate from the remainder of the RDMA stack submission that
  is based on 4.10-rc3.

  This branch contains:

   - Various mlx4 and mlx5 fixes and minor changes

   - Support for adding a tag match rule to flow specs

   - Support for cvlan offload operation for raw ethernet QPs

   - A change to the core IB code to recognize raw eth capabilities and
     enumerate them (touches non-Mellanox code)

   - Implicit On-Demand Paging memory registration support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
  IB/mlx5: Fix configuration of port capabilities
  IB/mlx4: Take source GID by index from HW GID table
  IB/mlx5: Fix blue flame buffer size calculation
  IB/mlx4: Remove unused variable from function declaration
  IB: Query ports via the core instead of direct into the driver
  IB: Add protocol for USNIC
  IB/mlx4: Support raw packet protocol
  IB/mlx5: Support raw packet protocol
  IB/core: Add raw packet protocol
  IB/mlx5: Add implicit MR support
  IB/mlx5: Expose MR cache for mlx5_ib
  IB/mlx5: Add null_mkey access
  IB/umem: Indicate that process is being terminated
  IB/umem: Update on demand page (ODP) support
  IB/core: Add implicit MR flag
  IB/mlx5: Support creation of a WQ with scatter FCS offload
  IB/mlx5: Enable QP creation with cvlan offload
  IB/mlx5: Enable WQ creation and modification with cvlan offload
  IB/mlx5: Expose vlan offloads capabilities
  IB/uverbs: Enable QP creation with cvlan offload
  ...
2017-02-23 11:27:49 -08:00
Linus Torvalds 4cc4b9323f First set of updates for 4.11 kernel merge window
- Add new Broadcom bnxt_re RoCE driver
 - rxe driver updates
 - ioctl cleanups
 - ETH_P_IBOE declaration cleanup
 - IPoIB changes
 - Add port state cache
 - Allow srpt driver to accept guids as port names in config
 - Update to hfi1 driver
 - Update to srp driver
 - Lots of misc. minor changes all over
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYrfewAAoJELgmozMOVy/dFnEP/2Qe7NqXRqxLS0ZqsQseFHgQ
 jd236E7R/XtQQTE3PTcrWL0mq0DRF6tMEjfhUASKTbZVfCBTniJAoXYrvWhN/STq
 LxAdigdV/0SPbxO3r9B1Xvk2v5BySaIBkaUDvcEXzT4e7UVQwZgxDkhhsYeY0Z/r
 9bNB5760PzW8uO5cctXccNcWztZnW0IUZuAHVfQCPjZ7svoGwLnNDW6YQx+FsEkW
 tbPdzMXX8VKHlC5RcKbfOOBjdNyrUpWl+uvWEc/7mazKscp4yKVFZL7PcxqPJSfd
 aKdfqXYawhjZZpyws8Kn0rhkfT7xWKD/y9G5STykRJPj9/n1BDScFkmyDQhtP5bJ
 GANzdgH0z7Dt9LkcAs86A8EVBbIdbdT2cpPVu7t0uWEIsJw/O5ThKpgjnrrTm6m+
 89tgqLZooifTEsdj4UkZoyktrD3J9LSNZkgVmWtRn01W3oYFOPbdM4TmBZtg+/Yl
 VGmOJEHMEsNuJBcJcOuRJ1MVz2LebXmPUcB0RXzgmHHgulZ/DqoOtlpg5JNmJcr5
 wpw/yppkBop4V4+etJBlzDsZNmZZlX+AY0ZLqQJsDHNszDjwXgAy5Rn5FYIdMyk4
 ff0FKb5dzASSxHRDxAsu2uoGaREM0NkpA0UYiIZbepGLSO8PuFG2ScQ6qzU47vqu
 9SEzOaaQY2S2uqFFFnYp
 =ugNm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "First set of updates for 4.11 kernel merge window

   - Add new Broadcom bnxt_re RoCE driver
   - rxe driver updates
   - ioctl cleanups
   - ETH_P_IBOE declaration cleanup
   - IPoIB changes
   - Add port state cache
   - Allow srpt driver to accept guids as port names in config
   - Update to hfi1 driver
   - Update to srp driver
   - Lots of misc minor changes all over"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (114 commits)
  RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0."
  rdma_cm: fail iwarp accepts w/o connection params
  IB/srp: Drain the send queue before destroying a QP
  IB/core: Add support for draining IB_POLL_DIRECT completion queues
  IB/srp: Improve an error path
  IB/srp: Make a diagnostic message more informative
  IB/srp: Document locking conventions
  IB/srp: Fix race conditions related to task management
  IB/srp: Avoid that duplicate responses trigger a kernel bug
  IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
  RDMA/qedr: Fix some error handling
  RDMA/bnxt_re: add DCB dependency
  IB/hns: include linux/module.h
  IB/vmw_pvrdma: Expose vendor error to ULPs
  vmw_pvrdma: switch to pci_alloc_irq_vectors
  IB/hfi1: use size_t for passing array length
  IB/ipoib: Remove redudant label
  IB/ipoib: remove the unnecessary memory free
  IB/mthca: switch to pci_alloc_irq_vectors
  IB/hfi1: Code reuse with memdup_copy
  ...
2017-02-23 08:27:57 -08:00
Stephen Rothwell db690328a7 RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0."
When the firmware interface spec was updated, a constant element was
renamed.  The rename missed the instances in the bnxt_re driver
because it wasn't upstream yet.  This updates the bnxt_re driver
with the rename.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-22 15:40:17 -05:00
Steve Wise f2625f7db4 rdma_cm: fail iwarp accepts w/o connection params
cma_accept_iw() needs to return an error if conn_params is NULL.
Since this is coming from user space, we can crash.

Reported-by: Shaobo He <shaobo@cs.utah.edu>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-22 15:35:03 -05:00
Linus Torvalds 3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Linus Torvalds cdc194705d SCSI misc on 20170220
This update includes the usual round of major driver updates (ncr5380,
 ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
 megaraid_sas, ).  There's also an assortment of minor fixes and the
 major update of switching a bunch of drivers to pci_alloc_irq_vectors
 from Christoph.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYq5adAAoJEAVr7HOZEZN4bjUP/Atk7CSZVnC75pcYmncbEGCx
 ysOlEHK4uW2HhiAYk3PlYMk+pKrMHet2zsbbM9PHJfopdOHZ7Sq1+UZZVeqE1Zun
 8pe0NhON+fZx7XAnevdEvnSSULQZ+AGfjZO72iUwkJiN3ozYaFtCITOyn49l4GpR
 ra9emskBh7CQOFW2voGn1AKeDijPYGx3+TO4AUrWjVMiByR06gb1bmImx+ljiUrs
 jzRJPfrt90ORcTdpMateyN2EXxudcASMhX03SJ6fRI84hPAhMCROMbTv8RnzOTE4
 DPbnvbYUowlHt43iUhJHSwGdkRRaRBnkzQENBp1fNrNzZgF6vB7+kShxbonrYB2p
 gC4ewaJr0BNj+HsUnvTpe3WseiPOcfsnBsKilPLKBlm2dCKEXqFox/dj/T1uexxg
 HoyFrl3u8fyEqVHrzRS4M9t/njWh0NFmXxb0wBdj+lkVFTRErGSKQ8SfOqshuSGs
 P8NN88jy8vC7uqgzKBJ+UH3ehzn3qfBxasFHIC/e2awY9FqKjHGTxKMmSVpjXVxy
 wCvE2FQ3k/qEj2XSM6f7/NGytlSOlju5q1rFtHPW2M+TFSh0LJWCnmVjR/Zle9em
 pBWmtIgCv8W5b41zL2H94nLWAZbfdrrNU/XnX88l47LKnmorte/PGhpxu36NEsMS
 VCgreQmFMdMRY+WzDWl1
 =cBQx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates (ncr5380,
  ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
  megaraid_sas, ...).

  There's also an assortment of minor fixes and the major update of
  switching a bunch of drivers to pci_alloc_irq_vectors from Christoph"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits)
  scsi: megaraid_sas: handle dma_addr_t right on 32-bit
  scsi: megaraid_sas: array overflow in megasas_dump_frame()
  scsi: snic: switch to pci_irq_alloc_vectors
  scsi: megaraid_sas: driver version upgrade
  scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2
  scsi: megaraid_sas: Indentation and smatch warning fixes
  scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints
  scsi: megaraid_sas: Increase internal command pool
  scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete
  scsi: megaraid_sas: Bail out the driver load if ld_list_query fails
  scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void
  scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR
  scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool
  scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion
  scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero
  scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate
  scsi: megaraid_sas: update can_queue only if the new value is less
  scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions
  scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID
  scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD
  ...
2017-02-21 11:51:42 -08:00
Linus Torvalds 42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
Bart Van Assche 9294000d6d IB/srp: Drain the send queue before destroying a QP
A quote from the IB spec:

However, if the Consumer does not wait for the Affiliated Asynchronous
Last WQE Reached Event, then WQE and Data Segment leakage may occur.
Therefore, it is good programming practice to tear down a QP that is
associated with an SRQ by using the following process:
* Put the QP in the Error State;
* wait for the Affiliated Asynchronous Last WQE Reached Event;
* either:
  * drain the CQ by invoking the Poll CQ verb and either wait for CQ
    to be empty or the number of Poll CQ operations has exceeded CQ
    capacity size; or
  * post another WR that completes on the same CQ and wait for this WR to return as a WC;
* and then invoke a Destroy QP or Reset QP.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:55 -05:00
Bart Van Assche f039f44fc3 IB/core: Add support for draining IB_POLL_DIRECT completion queues
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:55 -05:00
Bart Van Assche b02c15360b IB/srp: Improve an error path
Avoid that the following message is printed if login fails:

scsi host0: ib_srp: Sending CM DREQ failed

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:54 -05:00
Bart Van Assche a7139ca80f IB/srp: Make a diagnostic message more informative
Report the destination port GID if connecting fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:53 -05:00
Bart Van Assche 93c76dbb74 IB/srp: Document locking conventions
Use lockdep_assert_held() statements to verify at run-time
whether the proper locks are held.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:53 -05:00
Bart Van Assche 0a6fdbdeb1 IB/srp: Fix race conditions related to task management
Avoid that srp_process_rsp() overwrites the status information
in ch if the SRP target response timed out and processing of
another task management function has already started. Avoid that
issuing multiple task management functions concurrently triggers
list corruption. This patch prevents that the following stack
trace appears in the system log:

WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0
list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68
CPU: 8 PID: 9269 Comm: sg_reset Tainted: G        W       4.10.0-rc7-dbg+ #3
Call Trace:
 dump_stack+0x68/0x93
 __warn+0xc6/0xe0
 warn_slowpath_fmt+0x4a/0x50
 __list_del_entry_valid+0xbc/0xc0
 wait_for_completion_timeout+0x12e/0x170
 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp]
 srp_reset_device+0x5b/0x110 [ib_srp]
 scsi_ioctl_reset+0x1c7/0x290
 scsi_ioctl+0x12a/0x420
 sd_ioctl+0x9d/0x100
 blkdev_ioctl+0x51e/0x9f0
 block_ioctl+0x38/0x40
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:52 -05:00
Bart Van Assche 6cb72bc1b4 IB/srp: Avoid that duplicate responses trigger a kernel bug
After srp_process_rsp() returns there is a short time during which
the scsi_host_find_tag() call will return a pointer to the SCSI
command that is being completed. If during that time a duplicate
response is received, avoid that the following call stack appears:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: srp_recv_done+0x450/0x6b0 [ib_srp]
Oops: 0000 [#1] SMP
CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1
Call Trace:
 <IRQ>
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_poll_handler+0x1d/0x70 [ib_core]
 irq_poll_softirq+0xba/0x120
 __do_softirq+0xba/0x4c0
 irq_exit+0xbe/0xd0
 smp_apic_timer_interrupt+0x38/0x50
 apic_timer_interrupt+0x90/0xa0
 </IRQ>
RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:52 -05:00
Bart Van Assche d6c58dc40f IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
Tests have shown that the following error message is reported when
using SG-GAPS registration with an mlx5 adapter:

scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 2500002a ad9fafd1
scsi host1: ib_srp: reconnect succeeded
mlx5_0:dump_cqe:262:(pid 7369): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000032 00105dd0
scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138

Hence avoid using SG-GAPS memory registrations. Additionally,
always configure the blk_queue_virt_boundary() to avoid to trigger
a mapping failure when using adapters that support SG-GAPS (e.g.
mlx5).

Fixes: commit ad8e66b4a8 ("IB/srp: fix mr allocation when the device supports sg gaps")
Fixes: commit 509c5f33f4 ("IB/srp: Prevent mapping failures")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mark Bloch <markb@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:41 -05:00
Christophe Jaillet 4cd33aafe4 RDMA/qedr: Fix some error handling
'qedr_alloc_pbl_tbl()' can not return NULL.

In qedr_init_user_queue():
 - simplify the test for the return value, no need to test for NULL
 - propagate the error pointer if needed, otherwise 0 (success) is returned.
   This is spurious.

In init_mr_info():
 - test the return value with IS_ERR
 - propagate the error pointer if needed instead of an exlictit -ENOMEM.
   This is a no-op as the only error pointer that we can have here is
   already -ENOMEM

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:29 -05:00
Arnd Bergmann e57f774db1 RDMA/bnxt_re: add DCB dependency
When CONFIG_DCB is disabled, we get a link error:

drivers/infiniband/built-in.o: In function `bnxt_re_setup_qos':
trace.c:(.text+0x155774): undefined reference to `dcb_ieee_getapp_mask'
trace.c:(.text+0x155774): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `dcb_ieee_getapp_mask'
trace.c:(.text+0x155794): undefined reference to `dcb_ieee_getapp_mask'
trace.c:(.text+0x155794): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `dcb_ieee_getapp_mask'

Like the other drivers that use this function, a Kconfig dependency is
the correct fix.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:29 -05:00
Arnd Bergmann 3ecc16c82c IB/hns: include linux/module.h
I ran into a build error on arm64 randconfig testing:

infiniband/hw/hns/hns_roce_main.c:539:1: error: data definition has no type or storage class [-Werror]
infiniband/hw/hns/hns_roce_main.c:539:1: error: type defaults to 'int' in declaration of 'MODULE_DEVICE_TABLE' [-Werror=implicit-int]
infiniband/hw/hns/hns_roce_main.c:539:1: error: parameter names (without types) in function declaration [-Werror]
infiniband/hw/hns/hns_roce_main.c:979:226: error: data definition has no type or storage class [-Werror]
infiniband/hw/hns/hns_roce_main.c:979:226: error: type defaults to 'int' in declaration of 'module_init' [-Werror=implicit-int]
infiniband/hw/hns/hns_roce_main.c:979:1: error: parameter names (without types) in function declaration [-Werror]

Including the module.h makes it build again.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:28 -05:00
Yuval Shaia c67294b70b IB/vmw_pvrdma: Expose vendor error to ULPs
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:28 -05:00
Christoph Hellwig 7bf3976d6c vmw_pvrdma: switch to pci_alloc_irq_vectors
.. and greatly clean up the irq handling boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:20 -05:00
Arnd Bergmann 64b2ae74e8 IB/hfi1: use size_t for passing array length
gcc-7 produces a mysterious warning about the size argument being potentially out
of range:

drivers/infiniband/hw/hfi1/verbs.c: In function 'init_cntr_names':
drivers/infiniband/hw/hfi1/verbs.c:1644:2: error: 'memcpy': specified size between 18446744071562067968 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]

This seems to refer to a the case where an 64-bit size_t gets truncated
into a negative 'int' and subsequently turned into a high 64-bit number
again.

The fix is clearly to use size_t here, which matches the type that gets
used for this value elsewhere.

Fixes: b7481944b0 ("IB/hfi1: Show statistics counters under IB stats interface")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:47 -05:00
Zhu Yanjun 23536dfaa3 IB/ipoib: Remove redudant label
There are 2 labels to mark the same statements. Replace the 2 labels
with one versatile labe.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:46 -05:00
Zhu Yanjun c5e8f57b0b IB/ipoib: remove the unnecessary memory free
In the function ipoib_cm_nonsrq_init_rx, the memory is not
allocated successfully. It is not necessary to free it.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:46 -05:00
Christoph Hellwig f50cccdd03 IB/mthca: switch to pci_alloc_irq_vectors
Trivial switch to the new API for this driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:45 -05:00
Michael J. Ruhl 1bb0d7b781 IB/hfi1: Code reuse with memdup_copy
Update several usages of kmalloc/user_copy to memdup_copy and
memdup_copy_nul.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:45 -05:00
Don Hiatt 832666c163 IB/hfi1, qib, rdmavt: Move AETH defines to rdma/ib_hdrs.h
Rename RVT AETH defines and export in rdma/ib_hdrs.h

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:44 -05:00
Don Hiatt 881fccb864 IB/hfi1: Add rvt_rnr_tbl_to_usec function
Return usec from an index into ib_rvt_rnr_table.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:44 -05:00
Michael J. Ruhl db069ecb5d IB/hfi1: Do not set physical link state if DC is in the shutdown state
If the DC is in shutdown state, the set link state function will return
an error.  Since this is not a failure in this state, make sure to
only call set link state if the DC is on.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:43 -05:00
Jakub Byczkowski c27aad00d1 IB/hfi1: Modify logging frequency of DCC errors
Use rate-limit state to limit number of messages logged
to kernel message buffer for DCC errors. Add new macro
dd_dev_info_ratelimited for that propose. Replace all
dd_dev_info calls in handle_dcc_err function with
rate-limited version.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:43 -05:00
Mike Marciniszyn f9215b5e53 IB/rdmavt, IB/hfi1, IB/qib: Correct ack count for passive (RTR) QPs
The send complete for RC QPs mismanages the ack count when the
responder side is only in RTR.

A QP in that state cannot send requests, but it can be the target
for operations that elicit responses.

Adjust the RC completion logic to correct the count maintenance
by reflecting RECV_OK in a new state test.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:42 -05:00
Brian Welty 3fc4a0906f IB/qib: Updates to use rdmavt's SGE helper routines
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:42 -05:00
Brian Welty 1198fcea8a IB/hfi1, rdmavt: Move SGE state helper routines into rdmavt
To improve code reuse, add small SGE state helper routines to rdmavt_mr.h.
Leverage these in hfi1, including refactoring of hfi1_copy_sge.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:41 -05:00
Brian Welty 0128fceaf9 IB/hfi1, rdmavt: Update copy_sge to use boolean arguments
Convert copy_sge and related SGE state functions to use boolean.
For determining if QP is in user mode, add helper function in rdmavt_qp.h.
This is used to determine if QP needs the last byte ordering.
While here, change rvt_pd.user to a boolean.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:41 -05:00
Venkata Sandeep Dhanalakota b4238e7057 IB/qib: Use new rdmavt timers
Reduce qib code footprint by using the rdmavt timers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:40 -05:00
Venkata Sandeep Dhanalakota 56acbbfb46 IB/hfi1: Use new rdmavt timers
Reduce hfi1 code footprint by using the rdmavt timers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:39 -05:00
Venkata Sandeep Dhanalakota 11a10d4bc7 IB/rdmavt: Adding timer logic to rdmavt
To move common code across target to rdmavt for code reuse.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:39 -05:00
Brian Welty 696513e8cf IB/hfi1, qib, rdmavt: Move AETH credit functions into rdmavt
Add rvt_compute_aeth() and rvt_get_credit() as shared functions in
rdmavt, moved from hfi1/qib logic.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:38 -05:00
Brian Welty beb5a04267 IB/hfi1, qib, rdmavt: Move two IB event functions into rdmavt
Add rvt_rc_error() and rvt_comm_est() as shared functions in
rdmavt, moved from hfi1/qib logic.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:38 -05:00
Sebastian Sanchez c03c08d50b IB/hfi1: Check upper-case EFI variables
The EFI variable that provides board ID is named
by the PCI address of the device, which is published
in upper-case, while the HFI1 driver reads the EFI
variable in lower-case.
This prevents returning the correct board id when
queried through sysfs. Read EFI variables in
upper-case if the lower-case read fails.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:37 -05:00
Sebastian Sanchez 76327627be IB/hfi1: Reduce oversized fields in struct hfi1_packet
Some fields in struct hfi1_packet are oversized.
Reduce them.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:37 -05:00
Mike Marciniszyn d7c76e91aa IB/hfi1: Add additional fields to qp_stats
The r_psn and s_rnr_retry are missing.

Add with this patch.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:36 -05:00
Sebastian Sanchez b448bf9a0d IB/hfi1: Allocate context data on memory node
There are some memory allocation calls in hfi1_create_ctxtdata()
that do not use the numa function parameter. This
can cause cache lines to be filled over QPI.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:36 -05:00
Sebastian Sanchez 338adfdddf IB/rdmavt: Use per-CPU reference count for MRs
Having per-CPU reference count for each MR prevents
cache-line bouncing across the system. Thus, it
prevents bottlenecks. Use per-CPU reference counts
per MR.

The per-CPU reference count for FMRs is used in
atomic mode to allow accurate testing of the busy
state. Other MR types run in per-CPU mode MR until
they're freed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:35 -05:00
Sebastian Sanchez f3e862cb68 IB/hfi1: Access hfi1_ibport through rcd pointer
Receive code paths use the QP's device and port
number to access the struct hfi1_ibport. When an
instance of struct hfi1_ctxtdata is present, it can
be used to access struct hfi1_ibport through a pointer.
This makes struct hfi1_ibport lookup time faster as an
array doesn't have to be indexed and access fields in
other cache-lines.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:35 -05:00