1
0
Fork 0
Commit Graph

616 Commits (1204c70d9dcba31164f78ad5d8c88c42335d51f8)

Author SHA1 Message Date
Tariq Toukan 84d1bb2b13 net/mlx5e: kTLS, Limit DUMP wqe size
HW expects the data size in DUMP WQEs to be up to MTU.
Make sure they are in range.

We elevate the frag page refcount by 'n-1', in addition to the
one obtained in tx_sync_info_get(), having an overall of 'n'
references. We bulk increments by using a single page_ref_add()
command, to optimize perfermance.
The refcounts are released one by one, by the corresponding completions.

Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18 12:11:53 -07:00
Tariq Toukan 0c258dec8d net/mlx5e: Tx, Fix assumption of single WQEBB of NOP in cleanup flow
Cited patch removed the assumption only in datapath.
Here we remove it also form control/cleanup flow.

Fixes: 9ab0233728 ("net/mlx5e: Tx, Don't implicitly assume SKB-less wqe has one WQEBB")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18 12:11:51 -07:00
David S. Miller 22c63d9c94 mlx5-updates-2019-09-05
1) Allover mlx5 cleanups
 
 2) Added port congestion counters to ethtool stats:
 
 Add 3 counters per priority to ethtool using PPCNT:
   2.1) rx_prio[p]_buf_discard - the number of packets discarded by device
        due to lack of per host receive buffers
   2.2) rx_prio[p]_cong_discard - the number of packets discarded by device
        due to per host congestion
   2.3) rx_prio[p]_marked - the number of packets ECN marked by device due
        to per host congestion
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1xgcsACgkQSD+KveBX
 +j7XSwf6A6Ri61kZ4fLRfrKcMn7rq7uzUn855UjtNQUpDlmoKSu9lY+SGtMQMENq
 7AMvUmgZJe2Sw47o3N9mxqPa86HS9uGDFhnF7HOKxa/4uzoqqvvdget4BhP0h1xS
 tRDXScnuZfPIs1nuA0w1obgzYb0FwOJhtB1m3rQ6iywAohwmM8mYe9jPREfIGaoy
 U9p+DZEJHIC2YKy4G7hymbNbaKMgMG9IYl9axNyqbGaA9xPTPO4+pBBFvEXffLh6
 UAk7p3XHV0ZIiKq1STIqDJpz6ucP88ywg9Hnkkrz2U82xuk/E43tBvIM9pxgl/bh
 UhJcoL2Wlr5aHNHgIlBgWZ07Q+J0aQ==
 =iM0A
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-09-05

1) Allover mlx5 cleanups

2) Added port congestion counters to ethtool stats:

Add 3 counters per priority to ethtool using PPCNT:
  2.1) rx_prio[p]_buf_discard - the number of packets discarded by device
       due to lack of per host receive buffers
  2.2) rx_prio[p]_cong_discard - the number of packets discarded by device
       due to per host congestion
  2.3) rx_prio[p]_marked - the number of packets ECN marked by device due
       to per host congestion
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07 17:40:18 +02:00
David S. Miller 1e46c09ec1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add the ability to use unaligned chunks in the AF_XDP umem. By
   relaxing where the chunks can be placed, it allows to use an
   arbitrary buffer size and place whenever there is a free
   address in the umem. Helps more seamless DPDK AF_XDP driver
   integration. Support for i40e, ixgbe and mlx5e, from Kevin and
   Maxim.

2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
   application can wake up the kernel for rx/tx processing which
   avoids busy-spinning of the latter, useful when app and driver
   is located on the same core. Support for i40e, ixgbe and mlx5e,
   from Magnus and Maxim.

3) bpftool fixes for printf()-like functions so compiler can actually
   enforce checks, bpftool build system improvements for custom output
   directories, and addition of 'bpftool map freeze' command, from Quentin.

4) Support attaching/detaching XDP programs from 'bpftool net' command,
   from Daniel.

5) Automatic xskmap cleanup when AF_XDP socket is released, and several
   barrier/{read,write}_once fixes in AF_XDP code, from Björn.

6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
   inclusion as well as libbpf versioning improvements, from Andrii.

7) Several new BPF kselftests for verifier precision tracking, from Alexei.

8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.

9) And more BPF kselftest improvements all over the place, from Stanislav.

10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.

11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.

12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.

13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.

14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.

15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
    Peter, Wei, Yue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06 16:49:17 +02:00
Maxim Mikityanskiy 7f7edefda1 net/mlx5e: Remove unnecessary clear_bit()s
Don't clear MLX5E_SQ_STATE_ENABLED on error in mlx5e_open_txqsq and
mlx5e_open_icosq, because it's not set there, and is 0 by default.

Fixes: acc6c5953a ("net/mlx5e: Split open/close channels to stages")
Fixes: 9d18b5144a ("net/mlx5e: Split open/close ICOSQ into stages")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-05 14:44:42 -07:00
Vlad Buslov daa664a5cd net/mlx5e: Move local var definition into ifdef block
New local variable "struct flow_block_offload *f" was added to
mlx5e_setup_tc() in recent rtnl lock removal patches. The variable is used
in code that is only compiled when CONFIG_MLX5_ESWITCH is enabled. This
results compilation warning about unused variable when CONFIG_MLX5_ESWITCH
is not set. Move the variable definition into eswitch-specific code block
from the beginning of mlx5e_setup_tc() function.

Fixes: c9f14470d0 ("net: sched: add API for registering unlocked offload block callbacks")
Reported-by: tanhuazhong <tanhuazhong@huawei.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30 15:12:05 -07:00
Marina Varshaver 25948b87dd net/mlx5e: Support TSO and TX checksum offloads for IP-in-IP
tunnels

Add TX offloads support for IP-in-IP tunneled packets by reporting
the needed netdev features.

Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Signed-off-by: Avihu Hagag <avihuh@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-28 11:49:05 -07:00
Marina Varshaver e3a53bc536 net/mlx5e: Improve stateless offload capability check
Use generic function for checking tunnel stateless offload capability
instead of separate macros.

Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-28 11:49:05 -07:00
Maxim Mikityanskiy 45f171b118 net/mlx5e: Support LAG TX port affinity distribution
When the VF LAG is in use, round-robin the TX affinity of channels among
the different ports, if supported by the firmware. Create a set of TISes
per port, while doing round-robin of the channels over the different
sets. Let all SQs of a channel share the same set of TISes.

If lag_tx_port_affinity HCA cap bit is supported, num_lag_ports > 1 and
we aren't the LACP owner (PF in the regular use), assign the affinities,
otherwise use tx_affinity == 0 in TIS context to let the FW assign the
affinities itself. The TISes of the LACP owner are mapped only to the
native physical port.

For VFs, the starting port for round-robin is determined by its vhca_id,
because a VF may have only one channel if attached to a single-core VM.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-28 11:49:03 -07:00
Tariq Toukan 3c14562663 net/mlx5e: Expose new function for TIS destroy loop
For better modularity and code sharing.
Function internal change to be introduced in the next patches.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-28 11:49:03 -07:00
Vlad Buslov c9f14470d0 net: sched: add API for registering unlocked offload block callbacks
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.

Register mlx5 tc block offload callbacks as "unlocked".

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Eran Ben Elisha cef35af34d net/mlx5e: Add mlx5e HV VHCA stats agent
HV VHCA stats agent is responsible on running a preiodic rx/tx
packets/bytes stats update. Currently the supported format is version
MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data
transfer from the VF to the PF.

The reporter fetch the statistics data from all opened channels, fill it
in a buffer and send it to mlx5_hv_vhca_write_agent.

As the stats layer should include some metadata per block (sequence and
offset), the HV VHCA layer shall modify the buffer before actually send it
over block 1.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22 00:25:12 -07:00
Aya Levin 8276ea1353 net/mlx5e: Report and recover from CQE with error on RQ
Add support for report and recovery from error on completion on RQ by
setting the queue back to ready state. Handle only errors with a
syndrome indicating the RQ might enter error state and could be
recovered.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:18 -07:00
Aya Levin 32c57fb268 net/mlx5e: Report and recover from rx timeout
Add support for report and recovery from rx timeout. On driver open we
post NOP work request on the rx channels to trigger napi in order to
fillup the rx rings. In case napi wasn't scheduled due to a lost
interrupt, perform EQ recovery.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:17 -07:00
Aya Levin be5323c837 net/mlx5e: Report and recover from CQE error on ICOSQ
Add support for report and recovery from error on completion on ICOSQ.
Deactivate RQ and flush, then deactivate ICOSQ. Set the queue back to
ready state (firmware) and reset the ICOSQ and the RQ (software
resources). Finally, activate the ICOSQ and the RQ.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:17 -07:00
Aya Levin 9d18b5144a net/mlx5e: Split open/close ICOSQ into stages
Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into
open and activate where open handles creation and activate enables the
queue. Do a symmetric thing in close flow: split into close and
deactivate.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:17 -07:00
Aya Levin 9032e7192e net/mlx5e: Add support to rx reporter diagnose
Add rx reporter, which supports diagnose call-back. Diagnostics output
include: information common to all RQs: RQ type, RQ size, RQ stride
size, CQ size and CQ stride size. In addition advertise information per
RQ and its related icosq and attached CQ.

$ devlink health diagnose pci/0000:00:0b.0 reporter rx
 Common config:
   RQ:
     type: 2 stride size: 2048 size: 8
   CQ:
     stride size: 64 size: 1024
 RQs:
   channel ix: 0 rqn: 4308 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
   CQ:
     cqn: 1032 HW status: 0
   channel ix: 1 rqn: 4313 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
   CQ:
     cqn: 1036 HW status: 0
   channel ix: 2 rqn: 4318 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
   CQ:
     cqn: 1040 HW status: 0
   channel ix: 3 rqn: 4323 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
   CQ:
     cqn: 1044 HW status: 0

$ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp
{
    "Common config": {
        "RQ": {
            "type": 2,
            "stride size": 2048,
            "size": 8
        },
        "CQ": {
            "stride size": 64,
            "size": 1024
        }
    },
    "RQs": [ {
            "channel ix": 0,
            "rqn": 4308,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1032,
                "HW status": 0
            }
        },{
            "channel ix": 1,
            "rqn": 4313,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1036,
                "HW status": 0
            }
        },{
            "channel ix": 2,
            "rqn": 4318,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1040,
                "HW status": 0
            }
        },{
            "channel ix": 3,
            "rqn": 4323,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1044,
                "HW status": 0
            }
        } ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:17 -07:00
Aya Levin 11af6a6d09 net/mlx5e: Add helper functions for reporter's basics
Introduce helper functions for create and destroy reporters and update
channels. In the following patch, rx reporter is added and it will use
these helpers too.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:16 -07:00
Aya Levin 06293ae4fa net/mlx5e: Change naming convention for reporter's functions
Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following
patches in the set rx reporter is added, the new naming convention is
more uniformed.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:15 -07:00
Aya Levin 4edc17fdfd net/mlx5e: Rename reporter header file
Rename reporter.h -> health.h so patches in the set can use it for
health related functionality.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20 13:08:15 -07:00
David S. Miller 446bf64b61 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge conflict of mlx5 resolved using instructions in merge
commit 9566e650bf.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 11:54:03 -07:00
Magnus Karlsson 9116e5e2b1 xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup
This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new
ndo provides the same functionality as before but with the addition of
a new flags field that is used to specifiy if Rx, Tx or both should be
woken up. The previous ndo only woke up Tx, as implied by the
name. The i40e and ixgbe drivers (which are all the supported ones)
are updated with this new interface.

This new ndo will be used by the new need_wakeup functionality of XDP
sockets that need to be able to wake up both Rx and Tx driver
processing.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17 23:07:31 +02:00
Aya Levin 276d197e70 net/mlx5e: Fix error flow of CQE recovery on tx reporter
CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving the recovery
function, to allow further recoveries to take place. This allows removal
of clearing recovery bit on sq activate.

Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-08 13:01:20 -07:00
David S. Miller 13dfb3fa49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Just minor overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 18:44:57 -07:00
Qian Cai 60d60c8fbd net/mlx5e: always initialize frag->last_in_page
The commit 069d11465a ("net/mlx5e: RX, Enhance legacy Receive Queue
memory scheme") introduced an undefined behaviour below due to
"frag->last_in_page" is only initialized in mlx5e_init_frags_partition()
when,

if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE)

or after bailed out the loop,

for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++)

As the result, there could be some "frag" have uninitialized
value of "last_in_page".

Later, get_frag() obtains those "frag" and check "frag->last_in_page" in
mlx5e_put_rx_frag() and triggers the error during boot. Fix it by always
initializing "frag->last_in_page" to "false" in
mlx5e_init_frags_partition().

UBSAN: Undefined behaviour in
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:325:12
load of value 170 is not a valid value for type 'bool' (aka '_Bool')
Call trace:
 dump_backtrace+0x0/0x264
 show_stack+0x20/0x2c
 dump_stack+0xb0/0x104
 __ubsan_handle_load_invalid_value+0x104/0x128
 mlx5e_handle_rx_cqe+0x8e8/0x12cc [mlx5_core]
 mlx5e_poll_rx_cq+0xca8/0x1a94 [mlx5_core]
 mlx5e_napi_poll+0x17c/0xa30 [mlx5_core]
 net_rx_action+0x248/0x940
 __do_softirq+0x350/0x7b8
 irq_exit+0x200/0x26c
 __handle_domain_irq+0xc8/0x128
 gic_handle_irq+0x138/0x228
 el1_irq+0xb8/0x140
 arch_cpu_idle+0x1a4/0x348
 do_idle+0x114/0x1b0
 cpu_startup_entry+0x24/0x28
 rest_init+0x1ac/0x1dc
 arch_call_rest_init+0x10/0x18
 start_kernel+0x4d4/0x57c

Fixes: 069d11465a ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 11:13:05 -07:00
Aya Levin baf6dfdb10 net/mlx5e: Set tx reporter only on successful creation
When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01 12:33:32 -07:00
Tariq Toukan b431302e92 net/mlx5e: Tx, Soften inline mode VLAN dependencies
If capable, use zero inline mode in TX WQE for non-VLAN packets.
For VLAN ones, keep the enforcement of at least L2 inline mode,
unless the WQE VLAN insertion offload cap is on.

Performance:
Tested single core packet rate of 64Bytes.

NIC: ConnectX-5
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz

pktgen:
Before: 12.46 Mpps
After:  14.65 Mpps (+17.5%)

XDP_TX:
The MPWQE flow is not affected, as it already has this optimization.
So we test with priv-flag xdp_tx_mpwqe: off.

Before:  9.90 Mpps
After:  10.20 Mpps (+3%)

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Tested-by: Noam Stolero <noams@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01 12:33:31 -07:00
Vlad Buslov 226f2ca307 net/mlx5e: Change flow flags type to unsigned long
To remove dependency on rtnl lock and allow concurrent modification of
'flags' field of tc flow structure, change flow flag type to unsigned long
and use atomic bit ops for reading and changing the flags. Implement
auxiliary functions for setting, resetting and getting specific flag, and
for checking most often used flag values.

Always set flags with smp_mb__before_atomic() to ensure that all
mlx5e_tc_flow are updated before concurrent readers can read new flags
value. Rearrange all code paths to actually set flow->rule[] pointers
before setting the OFFLOADED flag. On read side, use smp_mb__after_atomic()
when accessing flags to ensure that offload-related flow fields are only
read after the flags.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:24 -07:00
Saeed Mahameed 4240196776 net/mlx5e: Avoid warning print when not required
When disabling CQE compression in favor of time-stamping, don't show a
warning when CQE compression is already disabled.

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:23 -07:00
Huy Nguyen 842a2eb28f net/mlx5e: Print a warning when LRO feature is dropped or not allowed
When user enables LRO via ethtool and if the RQ mode is legacy,
mlx5e_fix_features drops the request without any explanation.
Add netdev_warn to cover this case.

Fixes: 6c3a823e1e ("net/mlx5e: RX, Remove HW LRO support in legacy RQ")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:23 -07:00
Tariq Toukan 694826e366 net/mlx5e: Fix wrong max num channels indication
No XSK support in the enhanced IPoIB driver and representors.
Add a profile property to specify this, and enhance the logic
that calculates the max number of channels to take it into
account.

Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-25 13:30:59 -07:00
David S. Miller 114a5c3240 mlx5-fixes-2019-07-11
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0ng7AACgkQSD+KveBX
 +j41FQgAjcUgjCiq6gmwI62EsMpZjgjwKhGfYz56jwSmNDgG7Ltj9o/K/+b+oL3u
 ZVPO9YgNstWuoPyApE12MzsA+kx/ydRz8GW6IrI5v71JK3yy8Mw/HrYVyUKEnzH4
 BVnSRnvHgvCnjZ3SjmUb6Xs74vmoxti9cYr0w0XtBogzuXlfpSYILYFiqrkks6sH
 VEmD9P3dTBNQUviYwkRcHhJoI7NqdQBApn9/30F4dQsmnUsikoaom5GovrbBw817
 L3Q+U/9+2Xp5k1i7J2MLFSL6Vyaxs/48eRKxs3YubBmrtwBjPj+aOYdRdDOscguQ
 WcDRiYPOW72cGIlFtYyuzApqo+J+xw==
 =mq0x
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2019-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-07-11

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v4.15
('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn')

For -stable v5.1
('net/mlx5e: Fix port tunnel GRE entropy control')
('net/mlx5e: Rx, Fix checksum calculation for new hardware')
('net/mlx5e: Fix return value from timeout recover function')
('net/mlx5e: Fix error flow in tx reporter diagnose')

For -stable v5.2
('net/mlx5: E-Switch, Fix default encap mode')

Conflict note: This pull request will produce a small conflict when
merged with net-next.
In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
Take the hunk from net and replace:
esw_offloads_steering_init(esw, vf_nvports, total_nvports);
with:
esw_offloads_steering_init(esw);
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 15:06:37 -07:00
Saeed Mahameed 2f1f5a7731 net/mlx5e: Fix unused variable warning when CONFIG_MLX5_ESWITCH is off
In mlx5e_setup_tc "priv" variable is not being used if
CONFIG_MLX5_ESWITCH is off, one way to fix this is to actually use it.

mlx5e_setup_tc_mqprio also needs the "priv" variable and it extracts it
on its own. We can simply pass priv to mlx5e_setup_tc_mqprio instead of
netdev and avoid extracting the priv var, which will also resolve the
compiler warning.

Fixes: 4e95bc268b ("net: flow_offload: add flow_block_cb_setup_simple()")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
CC: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11 15:04:38 -07:00
Saeed Mahameed db849faa9b net/mlx5e: Rx, Fix checksum calculation for new hardware
CQE checksum full mode in new HW, provides a full checksum of rx frame.
Covering bytes starting from eth protocol up to last byte in the received
frame (frame_size - ETH_HLEN), as expected by the stack.

Fixing up skb->csum by the driver is not required in such case. This fix
is to avoid wrong checksum calculation in drivers which already support
the new hardware with the new checksum mode.

Fixes: 85327a9c41 ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-11 11:45:03 -07:00
Pablo Neira Ayuso f9e30088d2 net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
And any other existing fields in this structure that refer to tc.
Specifically:

* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:51 -07:00
Pablo Neira Ayuso 955bcb6ea0 drivers: net: use flow block API
This patch updates flow_block_cb_setup_simple() to use the flow block API.
Several drivers are also adjusted to use it.

This patch introduces the per-driver list of flow blocks to account for
blocks that are already in use.

Remove tc_block_offload alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso 4e95bc268b net: flow_offload: add flow_block_cb_setup_simple()
Most drivers do the same thing to set up the flow block callbacks, this
patch adds a helper function to do this.

This preparation patch reduces the number of changes to adapt the
existing drivers to use the flow block callback API.

This new helper function takes a flow block list per-driver, which is
set to NULL until this driver list is used.

This patch also introduces the flow_block_command and
flow_block_binder_type enumerations, which are renamed to use
FLOW_BLOCK_* in follow up patches.

There are three definitions (aliases) in order to reduce the number of
updates in this patch, which go away once drivers are fully adapted to
use this flow block API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09 14:38:50 -07:00
Ivan Khoronzhuk 1da4bbeffe net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.

This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.

This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.

This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.

This patch is a modified version of Ivan Khoronzhuk's original patch.

Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Fixes: 5c67bf0ec4 ("net: netsec: Use page_pool API")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Tariq Toukan d2ead1f360 net/mlx5e: Add kTLS TX HW offload support
Add support for transmit side kernel-TLS acceleration.
Offload the crypto encryption to HW.

Per TLS connection:
- Use a separate TIS to maintain the HW context.
- Use a separate encryption key.
- Maintain static and progress HW contexts by posting the proper
  WQEs at creation time, or upon resync.
- Use a special DUMP opcode to replay the previous frags and sync
  the HW context.

To make sure the SQ is able to serve an xmit request, increase
SQ stop room to cover:
- static params WQE,
- progress params WQE, and
- resync DUMP per frag.

Currently supporting TLS 1.2, and key size 128bit.

Tested over SimX simulator.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 16:29:20 -07:00
Tariq Toukan 2b257a6e73 net/mlx5e: Re-work TIS creation functions
Let the EN TIS creation function (mlx5e_create_tis) be responsible
for applying common mdev related fields.
Other specific fields must be set by the caller and passed within
the inbox.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 16:29:19 -07:00
Tariq Toukan 01614d4f60 net/mlx5e: Tx, Unconstify SQ stop room
Use an SQ field for stop_room, and use the larger value only if TLS
is supported.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 16:29:19 -07:00
Tariq Toukan 542578c679 net/mlx5e: Move helper functions to a new txrx datapath header
Take datapath helper functions to a new header file en/txrx.h.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 16:29:19 -07:00
David S. Miller 61c2491db7 mlx5-update-2019-07-04
This series adds mlx5 support for devlink fw versions query.
 
 1) Implement the required low level firmware commands
 2) Implement the devlink knobs and callbacks for fw versions query.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0eZOQACgkQSD+KveBX
 +j6a0gf/SbSY2yDz6iyDZy7Zt0e7onD395UyWIoXbXJAI/aMggvdNE233dh369xa
 d4Zo8zvFijiFpPKmP0p+WZfvR5YPPWgtOYDv1MbaTz5lqsGYUh2kN9FaTyCNZW3O
 jeNidxjsne70VBrpmOZJX2mxduKwRdxchRg2iQPdeZQGO5Uq3nEdOfJVx/mhsIGw
 iRczF4IAQpArZEyKXgqMSrtJ99FJ7JSJXnsqjVJyUC1zyCZ+Cq3VV55+u5VCBokd
 qr/VSYqFd38SEHR5l4nXDXLX23pOwWNhUdPgGopjak8bN3k19vk6GfQ2bzLFnaOW
 4fx1k4uTixKOrbLkgzay/YSsoF2TDw==
 =Bwl7
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-07-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-update-2019-07-04

This series adds mlx5 support for devlink fw versions query.

1) Implement the required low level firmware commands
2) Implement the devlink knobs and callbacks for fw versions query.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-05 16:24:27 -07:00
Saeed Mahameed e08a976a16 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:

1) Add the required HW definitions and structures for upcoming TLS
   support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
   management on ECPF embedded CPU NIC:
   5.1) Consolidate querying eswitch number of VFs
   5.2) Register event handler at the correct E-Switch init stage
   5.3) Setup PF's inline mode and vlan pop when the ECPF is the
        E-Swtich manager ( the host PF is basically a VF ).
   5.4) Handle Vport UC address changes in switchdev mode.

6) Cleanup the rep and netdev reference when unloading IB rep.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

i# All conflicts fixed but you are still merging.
2019-07-04 16:42:59 -04:00
David S. Miller c4cde5804d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-07-03

The following pull-request contains BPF updates for your *net-next* tree.

There is a minor merge conflict in mlx5 due to 8960b38932 ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:

** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
  static int mlx5e_open_cq(struct mlx5e_channel *c,
                           struct dim_cq_moder moder,
                           struct mlx5e_cq_param *param,
                           struct mlx5e_cq *cq)
  =======
  int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
                    struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
  >>>>>>> e5a3e259ef

Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...

  drivers/net/ethernet/mellanox/mlx5/core/en.h +977

... and in mlx5e_open_xsk() ...

  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64

... needs the same rename from net_dim_cq_moder into dim_cq_moder.

** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
          int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
          struct dim_cq_moder icocq_moder = {0, 0};
          struct net_device *netdev = priv->netdev;
          struct mlx5e_channel *c;
          unsigned int irq;
  =======
          struct net_dim_cq_moder icocq_moder = {0, 0};
  >>>>>>> e5a3e259ef

Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.

Let me know if you run into any issues. Anyway, the main changes are:

1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.

2) Addition of two new per-cgroup BPF hooks for getsockopt and
   setsockopt along with a new sockopt program type which allows more
   fine-grained pass/reject settings for containers. Also add a sock_ops
   callback that can be selectively enabled on a per-socket basis and is
   executed for every RTT to help tracking TCP statistics, both features
   from Stanislav.

3) Follow-up fix from loops in precision tracking which was not propagating
   precision marks and as a result verifier assumed that some branches were
   not taken and therefore wrongly removed as dead code, from Alexei.

4) Fix BPF cgroup release synchronization race which could lead to a
   double-free if a leaf's cgroup_bpf object is released and a new BPF
   program is attached to the one of ancestor cgroups in parallel, from Roman.

5) Support for bulking XDP_TX on veth devices which improves performance
   in some cases by around 9%, from Toshiaki.

6) Allow for lookups into BPF devmap and improve feedback when calling into
   bpf_redirect_map() as lookup is now performed right away in the helper
   itself, from Toke.

7) Add support for fq's Earliest Departure Time to the Host Bandwidth
   Manager (HBM) sample BPF program, from Lawrence.

8) Various cleanups and minor fixes all over the place from many others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04 12:48:21 -07:00
Yishai Hadas 38164b7719 net/mlx5: mlx5_core_create_cq() enhancements
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:59:32 +03:00
Bodong Wang f6455de0b0 net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang e1d974d03e net/mlx5: Handle host PF vport mac/guid for ECPF
When ECPF is eswitch manager, it has the privilege to query and
configure the mac and node guid of host PF.

While vport number of host PF is 0, the vport command should be
issued with other_vport set in this case as the cmd is issued by
ECPF vport(0xfffe).

Add a specific function to query own vport mac. Low level functions
are used by vport manager to query/modify any vport mac and node guid.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Gavi Teitz a90f88fe55 net/mlx5e: Don't refresh TIRs when updating representor SQs
Refreshing TIRs is done in order to update the TIRs with the current
state of SQs in the transport domain, so that the TIRs can filter out
undesired self-loopback packets based on the source SQ of the packet.

Representor TIRs will only receive packets that originate from their
associated vport, due to dedicated steering, and therefore will never
receive self-loopback packets, whose source vport will be the vport of
the E-Switch manager, and therefore not the vport associated with the
representor. As such, it is not necessary to refresh the representors'
TIRs, since self-loopback packets can't reach them.

Since representors only exist in switchdev mode, and there is no
scenario in which a representor will exist in the transport domain
alongside a non-representor, it is not necessary to refresh the
transport domain's TIRs upon changing the state of a representor's
queues. Therefore, do not refresh TIRs upon such a change. Achieve
this by adding an update_rx callback to the mlx5e_profile, which
refreshes TIRs for non-representors and does nothing for representors,
and replace instances of mlx5e_refresh_tirs() upon changing the state
of the queues with update_rx().

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-28 16:04:00 -07:00
Ariel Levkovich 5dc9520bf0 net/mlx5e: Report netdevice MPLS features
Set supported device features in the netdevice MPLS features mask.
This will enable HW checksumming and TSO for MPLS tagged traffic.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-28 16:03:59 -07:00