Commit graph

664712 commits

Author SHA1 Message Date
Vishwanathapura, Niranjana d4829ea603 IB/hfi1: OPA_VNIC RDMA netdev support
Add support to create and free OPA_VNIC rdma netdev devices.
Implement netstack interface functionality including xmit_skb,
receive side NAPI etc. Also implement rdma netdev control functions.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:03:12 -04:00
Vishwanathapura, Niranjana 1bd671ab3f IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) function
OPA VEMA function interfaces with the Infiniband MAD stack to exchange the
management information packets with the Ethernet Manager (EM).
It interfaces with the OPA VNIC netdev function to SET/GET the management
information. The information exchanged with the EM includes class port
details, encapsulation configuration, various counters, unicast and
multicast MAC list and the MAC table. It also supports sending traps
to the EM.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana cfd34f8eb0 IB/opa-vnic: VNIC Ethernet Management Agent (VEMA) interface
OPA VNIC EMA interface functions are the management interfaces to the OPA
VNIC netdev. Add support to add and remove VNIC ports. Implement the
required GET/SET management interface functions and processing of new
management information. Add support to send trap notifications upon various
events like interface status change, unicast/multicast mac list update and
mac address change.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 174e03d7e6 IB/opa-vnic: VNIC MAC table support
OPA VNIC MAC table contains the MAC address to DLID mappings provided by
the Ethernet manager. During transmission, the MAC table provides the MAC
address to DLID translation. Implement MAC table using simple hash list.
Also provide support to update/query the MAC table by Ethernet manager.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 009b7dd40c IB/opa-vnic: VNIC statistics support
OPA VNIC driver statistics support maintains various counters including
standard netdev counters and the Ethernet manager defined counters.
Add the Ethtool hook to read the counters.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 72dc761440 IB/opa-vnic: VNIC Ethernet Management (EM) structure definitions
Define VNIC EM MAD structures and the associated macros. These structures
are used for information exchange between VNIC EM agent (EMA) on the host
and the Ethernet manager. These include the virtual ethernet switch (vesw)
port information, vesw port mac table, summay and error counters,
vesw port interface mac lists and the EMA trap.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 7d6f728c67 IB/opa-vnic: Virtual Network Interface Controller (VNIC) netdev
OPA VNIC netdev function supports Ethernet functionality over Omni-Path
fabric by encapsulating Ethernet packets inside Omni-Path packet header.
It allocates a rdma netdev device and interfaces with the network stack to
provide standard Ethernet network interfaces. It overrides HFI1 device's
netdev operations where it is required.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 62e4594940 IB/opa-vnic: Virtual Network Interface Controller (VNIC) interface
Define OPA VNIC interface between hardware independent VNIC
functionality and the hardware dependent VNIC functionality.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana 2fc7757264 IB/opa-vnic: RDMA NETDEV interface
Add rdma netdev interface to ib device structure allowing rdma netdev
devices to be allocated by ib clients.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:38 -04:00
Vishwanathapura, Niranjana c73690ca16 IB/opa-vnic: Virtual Network Interface Controller (VNIC) documentation
Add OPA VNIC design document explaining the VNIC architecture and the
driver design.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 12:01:06 -04:00
Doug Ledford 23790ba2d7 Merge branch 'k.o/for-4.12' into k.o/for-4.12-rdma-netdevice 2017-04-20 12:00:41 -04:00
Matan Barak db1b5ddd53 IB/core: Rename uverbs event file structure
Previously, ib_uverbs_event_file was suffixed by _file as it contained
the actual file information. Since it's now only used as base struct
for ib_uverbs_async_event_file and ib_uverbs_completion_event_file,
we change its name to ib_uverbs_event_queue. This represents its
logical role better.

Fixes: 1e7710f3f6 ('IB/core: Change completion channel to use the reworked objects schema')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak e0fcc61113 IB/core: Don't use is_async in event files to infer events size
Previously, we inferred the events size in ib_uverbs_event_read by
using the is_async flag. Instead of that, we pass the event size
directly.

Fixes: 1e7710f3f6 ('IB/core: Change completion channel to use the reworked objects schema')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak c52d8114d1 IB/core: A small refactor in destroy WQ handler
Instead of having uverbs_uobject_put both in the error flow and the
good flow, we unite them.

Fixes: fd3c7904db ('IB/core: Change idr objects to use the new schema')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak d9edfc5a4f IB/core: Nullify ib_uobject during allocation
Currently, we initialize all fields of ib_uobject straight after
allocation. Therefore, a kmalloc was sufficient. Since ib_uobject
could be embedded in a type specific structure, we nullify it to
spare programmer errors.

Fixes: 3832125624 ('IB/core: Add support for idr types')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak f025c48958 IB/core: Don't pass the lock state to _rdma_remove_commit_uobject
The only scenario where this function was called while the lock is
already taken is in the context cleanup scenario. Thus, in order not
to pass the lock state to this function, we just call the remove logic
straight from the cleanup context function.

Fixes: 3832125624 ('IB/core: Add support for idr types')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
Matan Barak 30004b861a IB/core: Rename write flag to exclusive in rdma_core
We rename the "write" flags to "exclusive", as it's used for both
WRITE and DESTROY actions.

Fixes: 3832125624 ('IB/core: Add support for idr types')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20 11:44:07 -04:00
David S. Miller 70d40b366d Merge branch 'mlx5-RDMA-netdevice'
Saeed Mahameed says:

====================
Mellanox, mlx5 RDMA net device support

This series provides the lower level mlx5 support of RDMA netdevice
creation API [1] suggested and introduced by Intel's HFI OPA VNIC
netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation.

mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current
IPoIB ULP generic netdevice, providing:
	- mlx5 RSS support.
	- mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..).
	- Full mlx5 HW features transparent to the ULP itself.

The idea here is to reuse and benefit from the already implemented mlx5e netdevice
management and channels API for both etherent and RDMA netdevices, since both IPoIB
and Ethernet netdevices share same common mlx5 HW resources (with some small
exceptions) and share most of the control/data path logic, it is more natural to
have them share the same code.

The differences between IPoIB and Ethernet netdevices can be summarized to:

Steering:
In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet
the traffic is handled by vports and vport steering is managed by e-switch or FW.

For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS
HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to
them (underlay QP will be 0 in case of Ethernet).

RX,TX:
Since IPoIB traffic is different, slightly modified RX and TX handlers are required,
still we do some code reuse in data path via common helper functions.

All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet
and IPoIB netdevices, e.g.
	- Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..)
	- Offloads, checksum, GRO, LRO, TSO, and more.
        - netdevice logic and non Ethernet specific ndos (open/close, etc..)

In order to achieve what we want:

In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored
the mlx5 steering code to accept the underlay QP as a parameter for creating steering
objects and enabled flow steering for IB link.

Then we are going to use the mlx5e netdevice profile, which is already used to separate between
NIC and VF representors netdevices, to create new type of IPoIB netdevice profile.

For that, one small refactoring is required to make mlx5e netdevice profile management
more genetic and agnostic to link type which is done in patch #4.

In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a
skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches,
using mlx5e already existing APIs.

Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS
resources, same as mlx5e but without vlan and L2 steering tables.

Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources
same as mlx5e but with one TC (tc = 0) support.

Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels.

Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts.

Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function and implement the
mlx5i IPoIB xmit routine.

Patch #13 and #14, Have an RX handler per netdevice profile. We already do this before this series
in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make
the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers.

Patch #15, Small cleanup to avoid e-switch with IPoIB netdev.

In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3]
- which was alread submitted to the rdma mailing list - and this series is required
plus an extra small patch [4] which will connect between both sides and actually enables the offload.

Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable
the feature.

Thanks,
Saeed.

[1] https://patchwork.kernel.org/patch/9676637/

[2] https://lwn.net/Articles/715453/
    https://patchwork.kernel.org/patch/9587815/

[3] https://patchwork.kernel.org/patch/9672069/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:33 -04:00
Erez Shitrit 93d576af3c hw/mlx5: Add New bit to check over QP creation
Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:32 -04:00
Saeed Mahameed 955bc48081 net/mlx5e: E-switch vport manager is valid for ethernet only
Currently the driver support only ethernet eswitch, and we want to
protect downstream IPoIB netdev from trying to access it in IB link.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:32 -04:00
Saeed Mahameed 9d6bd752c6 net/mlx5e: IPoIB, RX handler
Implement IPoIB RX SKB handler.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:32 -04:00
Saeed Mahameed 20fd0c193f net/mlx5e: RX handlers per netdev profile
In order to have different RX handler per profile, fix and refactor the
current code to take the rx handler directly from the netdevice profile
rather than computing it on runtime as it was done with the switchdev
mode representor rx handler.

This will also remove the current wrong assumption in mlx5e_alloc_rq
code that mlx5e_priv->ppriv is of the type vport_rep.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed 258545449b net/mlx5e: IPoIB, Xmit flow
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed 77bdf8950b net/mlx5e: Xmit flow break down
Break current mlx5e xmit flow into smaller blocks (helper functions)
in order to reuse them for IPoIB SKB transmission.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed ec8fd927b7 net/mlx5e: IPoIB, Underlay QP
Create IPoIB underlay QP needed by the IPoIB netdevice profile for RSS
and TX HW context to perform on IPoIB traffic.

Reset the underlay QP on dev_uninit ndo to stop IPoIB traffic going
through this QP when the ULP IPoIB decides to cleanup.

Implement attach/detach mcast RDMA netdev callbacks for later RDMA
netdev use.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed 603f4a4521 net/mlx5e: IPoIB, Basic netdev ndos open/close
Implement open/close of IPoIB netdevice ndos using mlx5e's
channels API to manage data path resources (RQs/SQs/CQs).

Set IPoIB netdev address on dev_init ndo.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed 5426a0b274 net/mlx5e: IPoIB, TX TIS creation
Modify mlx5e tis creation function to accept underlay qp number, which
will be needed by IPoIB.

Implement mlx5i (IPoIB) tx init/cleanup netdevice profile flows to
create one TIS with the IPoIB underlay qp, for IPoIB TX SQs.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed bc81b9d326 net/mlx5e: IPoIB, RSS flow steering tables
Like the mlx5e ethernet mode, on IPoIB mode we need to create RX steering
tables, but IPoIB do not require MAC and VLAN steering tables so the
only tables we create in here are:
1. TTC Table (Traffic Type Classifier table for RSS steering)
2. ARFS Table (for accelerated RFS support)

Creation of those tables is identical to mlx5e ethernet mode, hence the
use of mlx5e_create_ttc_table and mlx5e_arfs_create_tables.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed 8f493ffd88 net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs
Implement IPoIB RX RSS (RQTs and TIRs) HW objects creation,
All we do here is simply reuse the mlx5e implementation to create
direct and indirect (RSS) steering HW objects.

For that we just expose
mlx5e_{create,destroy}_{direct,indirect}_{rqt,tir} functions into en.h
and call them from ipoib.c in init/cleanup_rx IPoIB netdevice profile
callbacks.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed 48935bbb7a net/mlx5e: IPoIB, Add netdevice profile skeleton
Create mlx5e IPoIB netdevice profile skeleton in the new ipoib.c
file with empty implementation.

Downstream patches will provide the full mlx5 rdma netdevice acceleration
support for IPoIB into this new file, by using the mlx5e netdevice
profile and new mlx5_channels APIs and infrastructures.
Same as already done in mlx5e NIC netdevice and switchdev mode VF
representors.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed 2c3b5beec4 net/mlx5e: More generic netdev management API
In preparation for mlx5e RDMA net_device support, here we generalize
mlx5e_attach/detach in a way that those functions will be agnostic
to link type.  For that we move ethernet specific NIC net device logic out
of those functions into {nic,rep}_{enable/disable} mlx5e NIC and
representor profiles callbacks.

Also some of the logic was moved only to NIC profile since it is not right
to have this logic for representor net device (e.g. set port MTU).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:29 -04:00
Erez Shitrit ffdb8827ec net/mlx5: Enable flow-steering for IB link
Get the relevant capabilities if supports ipoib_enhanced_offloads and
init the flow steering table accordingly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:29 -04:00
Erez Shitrit b3ba51498b net/mlx5: Refactor create flow table method to accept underlay QP
IB flow tables need the underlay qp to perform flow steering.
Here we change the API of the flow tables creation to accept the
underlay QP number as a parameter in order to support IB (IPoIB) flow
steering.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:29 -04:00
Erez Shitrit 500a3d0ded net/mlx5: Add IPoIB enhanced offloads bits to mlx5_ifc
New capability bit: ipoib_enhanced_offloads, indicates new ability for UD
QP to do RSS and enhanced IPoIB offloads and acceleration.

Add underlay_qpn to the TIS and flow_table objects In order to support
SET_ROOT command, to connect between IPoIB QPs and flow steering tables.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:29 -04:00
Haiyang Zhang f72860afa2 hv_netvsc: Exclude non-TCP port numbers from vRSS hashing
Azure hosts are not supporting non-TCP port numbers in vRSS hashing for
now. For example, UDP packet loss rate will be high if port numbers are
also included in vRSS hash.

So, we created this patch to use only IP numbers for hashing in non-TCP
traffic.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:05:19 -04:00
Haiyang Zhang 8db91f6a9b hv_netvsc: Fix the queue index computation in forwarding case
If the outgoing skb has a RX queue mapping available, we use the queue
number directly, other than put it through Send Indirection Table.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:05:19 -04:00
Vivien Didelot a6a71f19fe net: dsa: isolate legacy code
This patch moves as is the legacy DSA code from dsa.c to legacy.c,
except the few shared symbols which remain in dsa.c.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:03:17 -04:00
David S. Miller 6b6cbc1471 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were simply overlapping changes.  In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.

In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15 21:16:30 -04:00
Linus Torvalds 1bf4b1268e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "Just a small update to xpad driver to recognize yet another gamepad,
  and another change making sure userio.h is exported"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - add support for Razer Wildcat gamepad
  uapi: add missing install of userio.h
2017-04-14 17:51:16 -07:00
Linus Torvalds 7e703eccf0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Things seem to be settling down as far as networking is concerned,
  let's hope this trend continues...

   1) Add iov_iter_revert() and use it to fix the behavior of
      skb_copy_datagram_msg() et al., from Al Viro.

   2) Fix the protocol used in the synthetic SKB we cons up for the
      purposes of doing a simulated route lookup for RTM_GETROUTE
      requests. From Florian Larysch.

   3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong
      Wang.

   4) Don't call netdev_change_features with the team lock held, from
      Xin Long.

   5) Revert TCP F-RTO extension to catch more spurious timeouts because
      it interacts very badly with some middle-boxes. From Yuchung
      Cheng.

   6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from
      Guillaume Nault.

   7) ctnetlink uses bit positions where it should be using bit masks,
      fix from Liping Zhang.

   8) Missing RCU locking in netfilter helper code, from Gao Feng.

   9) Avoid double frees and use-after-frees in tcp_disconnect(), from
      Eric Dumazet.

  10) Don't do a changelink before we register the netdevice in
      bridging, from Ido Schimmel.

  11) Lock the ipv6 device address list properly, from Rabin Vincent"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage
  netfilter: nft_hash: do not dump the auto generated seed
  drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201
  ipv6: Fix idev->addr_list corruption
  net: xdp: don't export dev_change_xdp_fd()
  bridge: netlink: register netdevice before executing changelink
  bridge: implement missing ndo_uninit()
  bpf: reference may_access_skb() from __bpf_prog_run()
  tcp: clear saved_syn in tcp_disconnect()
  netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
  netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
  netfilter: make it safer during the inet6_dev->addr_list traversal
  netfilter: ctnetlink: make it safer when checking the ct helper name
  netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
  netfilter: ctnetlink: using bit to represent the ct event
  netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
  net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb
  l2tp: don't mask errors in pppol2tp_getsockopt()
  l2tp: don't mask errors in pppol2tp_setsockopt()
  tcp: restrict F-RTO to work-around broken middle-boxes
  ...
2017-04-14 17:38:24 -07:00
Linus Torvalds 91174391bf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of small fixes for x86:

   - fix locking in RDT to prevent memory leaks and freeing in use
     memory

   - prevent setting invalid values for vdso32_enabled which cause
     inconsistencies for user space resulting in application crashes.

   - plug a race in the vdso32 code between fork and sysctl which causes
     inconsistencies for user space resulting in application crashes.

   - make MPX signal delivery work in compat mode

   - make the dmesg output of traps and faults readable again"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
  x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
  x86/vdso: Plug race between mapping and ELF header setup
  x86/vdso: Ensure vdso32_enabled gets set to valid values only
  x86/signals: Fix lower/upper bound reporting in compat siginfo
2017-04-14 17:00:01 -07:00
Linus Torvalds 07c7016de7 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two small fixes for perf:

   - the move to support cross arch annotation introduced per arch
     initialization requirements, fullfill them for s/390 (Christian
     Borntraeger)

   - add the missing initialization to the LBR entries to avoid exposing
     random or stale data"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
  perf annotate s390: Fix perf annotate error -95 (4.10 regression)
2017-04-14 16:58:38 -07:00
Linus Torvalds d295917a47 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "The irq department provides:

   - two fixes for the CPU affinity spread infrastructure to prevent
     unbalanced spreading in corner cases which leads to horrible
     performance, because interrupts are rather aggregated than spread

   - add a missing spinlock initializer in the imx-gpcv2 init code"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-imx-gpcv2: Fix spinlock initialization
  irq/affinity: Fix extra vecs calculation
  irq/affinity: Fix CPU spread for unbalanced nodes
2017-04-14 16:57:14 -07:00
Linus Torvalds f399ecb4b4 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Thomas Gleixner:
 "Three fixes from EFI land:

   - prevent accessing a Graphic Output Device (GOP) which the kernel
     does not know to handle

   - prevent PCI reconfiguration to modify a BAR which covers the
     framebuffer because that's already in use through the EFI GOP
     interface

   - avoid reserving EFI runtime regions as this results in bogus memory
     mappings"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Don't try to reserve runtime regions
  efi/fb: Avoid reconfiguration of BAR that covers the framebuffer
  efi/libstub: Skip GOP with PIXEL_BLT_ONLY format
2017-04-14 16:55:33 -07:00
Linus Torvalds 4b31ac485d Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Dave Sterba collected a few more fixes for the last rc.

  These aren't marked for stable, but I'm putting them in with a batch
  were testing/sending by hand for this release"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix potential use-after-free for cloned bio
  Btrfs: fix segmentation fault when doing dio read
  Btrfs: fix invalid dereference in btrfs_retry_endio
  btrfs: drop the nossd flag when remounting with -o ssd
2017-04-14 16:53:45 -07:00
Linus Torvalds 5466f4dfce Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull more CIFS fixes from Steve French:
 "As promised, here is the remaining set of cifs/smb3 fixes for stable
  (and a fix for one regression) now that they have had additional
  review and testing"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix SMB3 mount without specifying a security mechanism
  CIFS: store results of cifs_reopen_file to avoid infinite wait
  CIFS: remove bad_network_name flag
  CIFS: reconnect thread reschedule itself
  CIFS: handle guest access errors to Windows shares
  CIFS: Fix null pointer deref during read resp processing
2017-04-14 16:51:29 -07:00
Linus Torvalds 82f1faa867 fbdev fixes for v4.11-rc6:
- fix probing time checks in omapfb driver (regression fix)
 - fix optional VBAT support in ssd1307fb driver (regression fix)
 - fix connecting to backend in xen-fbfront driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJY8LdxAAoJEH4ztj+gR8ILHkMP/j3ThWNPtSnIQpzSDpASEdl6
 u99PTeTpCWPWsDkl/c9OIZGFVfDXbMwUVeUMDRX/3tieMsy/Lhx53p4NLn3JnBAC
 UNyHAE9/3vLHohMeTHp+TOMW7oHM3+hFfpzI50IW89UpimfAD2VTHBHxptMDESbt
 5XhRqWUEbgRuN+WOEvMGICzRIxuFflMIm9KXTHemsU2cnaK6TOpO7bCUQoIfo6Vs
 0bNxY+064xAFbv+2Rx7yBI955fqNBhvlpq6KzHenPB4U0kDu5DzJ5+p8zk5tpGkW
 hm6K06PtO4XegBeHbRCR/gzAvldBFMdLOG3XgKbMLdPqzr+JHiNALoLb/Ywt55MZ
 C60V7abJTWqGreXzZhIJJLYxf/NLjfOeABNWa4O97+dE+DPElN4PlqGsoAEqxWkf
 Fq3MASwe1B3TuCs88G4GaJBdl10Q2jRl3zuFqf/S5BkqDCzz999bTcORL4wTYYWs
 g6zURDeGOdbAFizcMwTNn1bst0nuwoTEqEVsqvO4i7nxi5iqdDV40Tq5I2YWLIPF
 bykm2amViUdnBC9TXnSLLCS9IJiyG2HgkVEKbqAa0Uq3kC8K4wkf5rg/K94MUUl3
 k0COZhV54KLoRJte54vfmCxNEJG/pUQ8OTOEUC1VaWInmPJeRHoAK4dbQgdPYYxe
 EeWPffl2JBojM02N7wwp
 =nRWS
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux

Pull fbdev fixes from Bartlomiej Zolnierkiewicz:

 - fix probing time checks in omapfb driver (regression fix)

 - fix optional VBAT support in ssd1307fb driver (regression fix)

 - fix connecting to backend in xen-fbfront driver

* tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux:
  fbdev: omapfb: delete check_required_callbacks()
  xen, fbfront: fix connecting to backend
  fbdev/ssd1307fb: fix optional VBAT support
2017-04-14 09:18:17 -07:00
Linus Torvalds e16d8b6e1f Power management fixes for v4.11-rc7
- Allow CPUs to be put back online even if the cpufreq driver is
    unable to work with them (eg. due to missing information from
    platform firmware), which was the previous behavior expected by
    users, but changed in the 4.9 time frame (Chen Yu).
 
  - Fix a few minor issues in the turbostat utility, introduced mostly
    during the recent update of it (Len Brown, Doug Smythies).
 
  - Fix a cpupower utility bug causing it to report incorrect values
    for turbo frequencies in some cases (Ben Hutchings).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJY8LH9AAoJEILEb/54YlRxUHgQAJUQjhyCfiGzcJ0vpI0pfTgR
 MPWlWTtzlSHZ7dlh9iWAaOOjMuGCCylDnZCxo39PXF9EhRfY8xOGqYhhmEyoLtjl
 wAI5ikysLnZjCpCgXETiLRpLCR/wa9fX0s8VXY1qDdvTiu2HkmW1BnyB/fHLIsC7
 dLxkyQyj9DolLsoHRfkd7V3ACLHvKdOsP9U3ul1lRB4r1esEWP8xTdMWQawS26uc
 g4TSUX9ugMTjZwCn3YUa+k+iMs2DNZAo51uNsBR6szaNK5ZHg0UqDsWJZiGPoO3F
 tyt4yAPQG97wsuuJ3oMs8A4tWQU97c3HDccSz8+QXd2HtUk90IE8zs9LRQul8D2d
 FOd0huAm9LJ1TkUKpgiF5tmga831IXJUDnHqieCyRQBiVlUKxLRyngHclBW8YOft
 FmIzfp8HRhaajk67d5qsMhBtWTpnlPhz+2vvp56VzVVdFoed/6TRJNfenUYpojh9
 adn9sxpwOW3TJGGBPBw8QX3DAn36aMOmPY+sRM3NXFhaUGsJJJrOU5oJfnMM/RNd
 oODV4H5ttjRZbEDE66HaNw4jZv7Gm4yqD6qrT3WGztVNUbQBFPTBju3ExJYU+wmz
 Bj5kGKsDyT+/2dkgVcMLz1Ylkl0OGPTRFQ+4mtx8RfwQECrojZmBq24OVzRUfV0b
 ZyrH1fAtTAnwUhp6+L7V
 =kclf
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a cpufreq core regression related to CPU online/offline and
  several issues in the turbostat and cpupower utilities.

  Specifics:

   - Allow CPUs to be put back online even if the cpufreq driver is
     unable to work with them (eg. due to missing information from
     platform firmware), which was the previous behavior expected by
     users, but changed in the 4.9 time frame (Chen Yu).

   - Fix a few minor issues in the turbostat utility, introduced mostly
     during the recent update of it (Len Brown, Doug Smythies).

   - Fix a cpupower utility bug causing it to report incorrect values
     for turbo frequencies in some cases (Ben Hutchings)"

* tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
  cpufreq: Bring CPUs up even if cpufreq_online() failed
  tools/power turbostat: update version number
  tools/power turbostat: fix impossibly large CPU%c1 value
  tools/power turbostat: turbostat.8 add missing column definitions
  tools/power turbostat: update HWP dump to decimal from hex
  tools/power turbostat: enable package THERM_INTERRUPT dump
  tools/power turbostat: show missing Core and GFX power on SKL and KBL
  tools/power turbostat: bugfix: GFXMHz column not changing
2017-04-14 09:16:23 -07:00
Linus Torvalds 321ae379af ACPI fixes for v4.11-rc7
- Revert a recent ACPICA commit targeted at catching firmware bugs
    which promptly did that and caused functional problems to appear
    (Rafael Wysocki).
 
  - Fix a device enumeration problem introduced in the 4.8 time frame
    which caused the ACPI docking station driver to report incorrect
    status via sysfs among other things (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJY8LI7AAoJEILEb/54YlRxXf8P/R21ZJmIX8V/k3+KUuBC6Elb
 09UESbCQIU77dpcXbtBIZwoQt7I6oOza9r39sO/cD/v1nPYT1d3nJkfDe0WnMlus
 FwFHYfR/owyxgHnc7qR4XzR29tXMA4fPcbi9Wab5lo7WEc7yXG1UG4c1henhxpdZ
 YKqqUbuG8E4lQC8ENQP4oo6LyztJFBi5XSa5GrONGEHy54CAbHdBUw9DdnFAQovQ
 Uu2qbodfgNLFZf68n6VuX74nwuxkrlXh44p96C1SduOs1M6N1lrUAofMPu1xQiIG
 u5yLYN/tc7btr6l1VFdlQUFHEE62RnF2czyDHIgYoVdfGAK9TIvz45RA//UDqQzi
 9s0bKcVcUn9cWJDA6yKtiDXCSqPyuDSdZQgOsG21Oh16eXZma5oDk0KV+pvWPlin
 WvbrhYCp69B9Y0fmvsAQAOauPF4mV1RzjEAfo4FgRVAhYZn/TDB2HPE6zHChyKom
 gDx0KmlBGO74MYZ4qhuGzCGLdhPWRDTFxK/I1i3sO+cBOB4ct2dz1foB557OSS72
 VwLFn8rGSxuWY+Dnu6whP3mB8j+efB2mj3wZBflBOGq4XVWbbj6gSiG8eb7fEBPv
 0QNHuoRCZt26I31SziYlB4AhgVAcdpUHy7dFn156UZ9WvVlBK6+DzdYAZFfcg5uc
 dWJeYgk+n5tROY4c/zFS
 =8T5S
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:

 "These revert a recent ACPICA commit that turned out to be problematic
  and fix a device enumeration breakage from the 4.8 cycle.

  Specifics:

   - Revert a recent ACPICA commit targeted at catching firmware bugs
     which promptly did that and caused functional problems to appear
     (Rafael Wysocki).

   - Fix a device enumeration problem introduced in the 4.8 time frame
     which caused the ACPI docking station driver to report incorrect
     status via sysfs among other things (Rafael Wysocki)"

* tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPICA: Resources: Not a valid resource if buffer length too long"
  ACPI / scan: Set the visited flag for all enumerated devices
2017-04-14 09:05:42 -07:00
Linus Torvalds 1882e562d3 Fixes /dev/mem to read back zeros for System RAM areas in the 1MB exception
area on x86 to avoid exposing RAM or tripping hardened usercopy.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJY7ncoAAoJEIly9N/cbcAm6o0QAJKhA+/CnTRr/knMv0ZE7EW0
 AuP/Hxdxfu/OCIc+BMDApdfme4yGWLjiD2Jx6GNDy9o1FaKCdJ3MaCOlPNlNa/5o
 V+n6z2d7CNDpaiNelhUs38JZGK2aSTYC9a0xQ9JEsQnaunwfHUiirZkdL+ajJI4p
 4XOlajWq/mvnBetv8EyZRmBSy51HghNQmk+I0OtyerufZCwwOsbKeDcYr2lqxe7R
 WtBtvKJF1p55nsNMXG8L62+q4gY5NGtspwQ/7MLrYwmHI9eOdRLzXZdrqH52PvuF
 H1sk6xQ4Xl89Fp43akybaGu6UyTPU09r1Y9LSpgxNApvqdDOsqB+zpD7gq3iWX/c
 dtORmMOV3JHyATZkDISX8dN/Qx6bXnsfpfempFd/d+YvdOyh8yRw+ZMCy/2Zx1XP
 EaEzHMn6DuOGaROhtDGywXylw1CXFzohnfbeCJ2wiQuPWXPDkyFyqmWjwntkP+TD
 jzx+M6glP0Vq7UHScLcJ6mvu65UnfMdNSo+/t4mS2Xg2xsyG5maQ4GQoxAbpmW26
 uSZIrxSFlq0kffeyoG9l5lnbTKI24pDf9O98ZiyBM2fOytdQ2LBtxbHI9I6DPHYS
 u9QQDsETuWj8LPqcy2stp8BNloTIUdbWwIcuCT/MME/s5qpdkRMEwmLQlXVz64zk
 BcmDSmhY7ohAaD8dAnlz
 =5z+N
 -----END PGP SIGNATURE-----

Merge tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull CONFIG_STRICT_DEVMEM fix from Kees Cook:
 "Fixes /dev/mem to read back zeros for System RAM areas in the 1MB
  exception area on x86 to avoid exposing RAM or tripping hardened
  usercopy"

* tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: Tighten x86 /dev/mem with zeroing reads
2017-04-14 08:57:20 -07:00