Commit graph

619479 commits

Author SHA1 Message Date
David S. Miller 1678c1134f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2016-09-23

Only two patches this time:

1) Fix a comment reference to struct xfrm_replay_state_esn.
   From Richard Guy Briggs.

2) Convert xfrm_state_lookup to rcu, we don't need the
   xfrm_state_lock anymore in the input path.
   From Florian Westphal.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:18:19 -04:00
David S. Miller 834d964980 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-09-22

This series contains updates to i40e and i40evf only.

Sridhar fixes link state event handling by updating the carrier and
starts/stops the Tx queues based on the link state notification from PF.

Brady fixes an issue where a user defined RSS hash key was not being
set because a user defined indirection table is not supplied when changing
the hash key, so if an indirection table is not supplied now, then a
default one is created and the hash key is correctly set.  Also fixed
an issue where when NPAR was enabled, we were still using pf->mac_seid
to perform the dump port query. Instead, go through the VSI to determine
the correct ID to use in either case.

Mitch provides one fix where a conditional return code was reversed, so
he does a "switheroo" to fix the issue.

Carolyn has two fixes, first fixes an issue in the virt channel code,
where a return code was not checked for NULL when applicable.  Second,
fixes an issue where we were byte swapping the port parameter, then
byte swapping it again in function execution.

Colin Ian King fixes a potential NULL pointer dereference.

Bimmy changes up i40evf_up_complete() to be void since it always returns
success anyways, which allows cleaning up of code which checked the
return code from this function.

Alex fixed an issue where the driver was incorrectly assuming that we
would always be pulling no more than 1 descriptor from each fragment.
So to correct this, we just need to make certain to test all the way to
the end of the fragments as it is possible for us to span 2 descriptors
in the block before us so we need to guarantee that even the last 6
descriptors have enough data to fill a full frame.

v2: dropped patches 1-3, 10 and 12 from the original series since Or
    Gerlitz pointed out several areas of improvement in the implementation
    of the VF Port representor netdev.  Sridhar is re-working the series
    for later submission.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:14:57 -04:00
David S. Miller 1ad0751d42 Merge branch 'mlx4-vf-vlan-802.1ad'
Tariq Toukan says:

====================
mlx4 VF vlan protocol 802.1ad support

This patchset adds VF VLAN protocol 802.1ad support to the
mlx4 driver.
We extended the VF VLAN API with an additional parameter
for VLAN protocol, and kept 802.1Q as drivers' default.

We prepared a userspace support (ip link tool).
The patch will be submitted to the iproute2 mailing list.

The ip link tool VF VLAN protocol parameter is optional (default: 802.1Q).
A configuration command of VF VLAN that is used prior to this patchset
will result in same functionality as today's (VST with VLAN protocol 802.1Q).

The series generated against net-next commit:
688dc5369a "Merge branch 'mlx4-next'"

All maintainers of the modified modules are in cc.

v3:
  Expand the UAPI to a nested list to support future use-cases.
  Use a more formal feature name.

v2:
  Drop patch 4 ("net/mlx4_core: Add an option to configure SVLAN TPID").
  Patch 1/5: Update commit log.
  2-3/5: Split patch 2 into two patches, to separate between changes
         done in mlx4_core and the ones done in mlx4_en.
  4-5/5: Split patch 3 into two patches, to separate between the
         addition of a protocol parameter and the actual implementation
	 in mlx4_en.
	 In addition, we implement a handshake mechanism so PF and VF
	 exchange their VST QinQ support capability.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:02:22 -04:00
Moshe Shemesh b42959dc35 net/mlx4: Add VF vlan protocol 802.1ad support
Move the vf to VST 802.1ad mode (mlx4 VST QinQ mode) by setting vf vlan
protocol to 802.1ad.
VST 802.1ad mode in mlx4, is used for STAG strip/insertion by PF, while
the CTAG is set by the VF.
Read current vlan protocol as part of the vf configuration state.

Upon setting vf vlan protocol to 802.1ad, we use a mechanism of handshake
to verify that both the vf and the pf driver version support it.
The handshake uses the command QUERY_FUNC_CAP:
- The vf sets a pre-defined support bit in input modifier.
- A pf that supports the feature sends the request to the vf through a
  pre-defined field in the output mailbox.
- In case vf does not support the feature, the pf will fail the control
  command (in this case, IP link tool command to set the vf vlan
  protocol to 802.1ad).

No change in VST 802.1Q mode.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:01:27 -04:00
Moshe Shemesh 79aab093a0 net: Update API for VF vlan protocol 802.1ad support
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
the ability for user-space application to specify it for the VF, as an
option to support 802.1ad.
We adjusted IP Link tool to support this option.

For future use cases, the new UAPI supports multiple vlans. For now we
limit the list size to a single vlan in kernel.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older versions of IP Link tool.

Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
We kept 802.1Q as the drivers' default vlan protocol.
Suitable ip link tool command examples:
  Set vf vlan protocol 802.1ad:
    ip link set eth0 vf 1 vlan 100 proto 802.1ad
  Set vf to VST (802.1Q) mode:
    ip link set eth0 vf 1 vlan 100 proto 802.1Q
  Or by omitting the new parameter
    ip link set eth0 vf 1 vlan 100

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:01:26 -04:00
Moshe Shemesh 0815fe3a86 net/mlx4_en: Disable vlan HW acceleration when in VF vlan protocol 802.1ad mode
In Ethernet VF, disable vlan HW acceleration on VF
while it is set to VF vlan protocol 802.1ad mode.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:01:26 -04:00
Moshe Shemesh 7c3d21c815 net/mlx4_core: Preparation for VF vlan protocol 802.1ad
Check device capability to support VF vlan protocol 802.1ad mode.
Add vport attribute vlan protocol.
Init vport vlan protocol by default to 802.1Q.
Add update QP support for VF vlan protocol 802.1ad.
Add func capability vlan_offload_disable to disable all
vlan HW acceleration on VF while the VF is set to VF vlan protocol
802.1ad mode.
No change in VF vlan protocol 802.1Q (VST) mode.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:01:26 -04:00
Moshe Shemesh c9cc599a96 net/mlx4_core: Fix QUERY FUNC CAP flags
Separate QUERY_FUNC_CAP flags0 from QUERY_FUNC_CAP flags, as 'flags' is
already used for another set of flags in FUNC CAP, while phv bit should be
part of a different set of flags.
Remove QUERY_FUNC_CAP port_flags field, as it is not in use.

Fixes: 77fc29c4bb ('net/mlx4_core: Preparations for 802.1ad VLAN support')
Fixes: 5cc914f108 ('mlx4_core: Added FW commands and their wrappers for supporting SRIOV')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:01:26 -04:00
David S. Miller fb2a3d5c7c Revert "xen-netback: create a debugfs node for hash information"
This reverts commit c0c64c1523.

There is no vif->ctrl_task member, so this change broke
the build.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 09:09:31 -04:00
David S. Miller 7a048d5adc Merge branch 'bpf-helper-improvements'
Daniel Borkmann says:

====================
Few minor BPF helper improvements

Just a few minor improvements around BPF helpers, first one is a
fix but given this late stage and that it's not really a critical
one, I think net-next is just fine. For details please see the
individual patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:36 -04:00
Daniel Borkmann 7a4b28c6cc bpf: add helper to invalidate hash
Add a small helper that complements 36bbef52c7 ("bpf: direct packet
write and access for helpers for clsact progs") for invalidating the
current skb->hash after mangling on headers via direct packet write.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:28 -04:00
Daniel Borkmann 669dc4d76d bpf: use bpf_get_smp_processor_id_proto instead of raw one
Same motivation as in commit 80b48c4457 ("bpf: don't use raw processor
id in generic helper"), but this time for XDP typed programs. Thus, allow
for preemption checks when we have DEBUG_PREEMPT enabled, and otherwise
use the raw variant.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:28 -04:00
Daniel Borkmann 2d48c5f933 bpf: use skb_to_full_sk helper in bpf_skb_under_cgroup
We need to use skb_to_full_sk() helper introduced in commit bd5eb35f16
("xfrm: take care of request sockets") as otherwise we miss tcp synack
messages, since ownership is on request socket and therefore it would
miss the sk_fullsock() check. Use skb_to_full_sk() as also done similarly
in the bpf_get_cgroup_classid() helper via 2309236c13 ("cls_cgroup:
get sk_classid only from full sockets") fix to not let this fall through.

Fixes: 4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:27 -04:00
David S. Miller c14fec3969 Merge branch 'hv_netvsc-next'
Stephen Hemminger says:

====================
hv_netvsc changes

These are mostly about improving the handling of interaction between
the virtual network device (netvsc) and the SR-IOV VF network device.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:54 -04:00
Stephen Hemminger f7ad75b753 hv_netvsc: count multicast packets received
Useful for debugging issues with multicast and SR-IOV to keep track
of number of received multicast packets.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger 9cbcc42806 hv_netvsc: remove VF in flight counters
Since VF reference is now protected by RCU, no longer need the VF usage
counter and can use device flags to see whether to inject or not.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger f207c10d98 hv_netvsc: use RCU to protect vf_netdev
The vf_netdev pointer in the netvsc device context can simply be protected
by RCU because network device destruction is already RCU synchronized.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger e8ff40d4bf hv_netvsc: improve VF device matching
The code to associate netvsc and VF devices can be made less error prone
by using a better matching algorithms.

On registration, use the permanent address which avoids any possible
issues caused by device MAC address being changed. For all other callbacks,
search by the netdevice pointer value to ensure getting the correct
network device.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger ee837a1373 hv_netvsc: simplify callback event code
The callback handler for netlink events can be simplified:
 * Consolidate check for netlink callback events about this driver itself.
 * Ignore non-Ethernet devices.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger 07d0f0008c hv_netvsc: dev hold/put reference to VF
The netvsc driver holds a pointer to the virtual function network device if
managing SR-IOV association. In order to ensure that the VF network device
does not disappear, it should be using dev_hold/dev_put to get a reference
count.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:48 -04:00
Stephen Hemminger 17db4bcef3 hv_netvsc: use consume_skb
Packets that are transmitted in normal path should use consume_skb
instead of kfree_skb. This allows for better tracing of packet drops.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:48 -04:00
David S. Miller dd5a3005eb Merge branch 'dsa-port-fast-ageing'
Vivien Didelot says:

====================
net: dsa: add port fast ageing

Today the DSA drivers are in charge of flushing the MAC addresses
associated to a port when its STP state changes from Learning or
Forwarding, to Disabled or Blocking or Listening.

This makes the drivers more complex and hides this generic switch logic.

This patchset introduces a new optional port_fast_age operation to
dsa_switch_ops, to move this logic to the DSA layer and keep drivers
simple. b53 and mv88e6xxx are updated accordingly.
====================

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:59 -04:00
Vivien Didelot 749efcb814 net: dsa: mv88e6xxx: implement DSA port fast ageing
Now that the DSA layer handles port fast ageing on correct STP change,
simplify _mv88e6xxx_port_state and implement mv88e6xxx_port_fast_age.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Vivien Didelot 597698f1e0 net: dsa: b53: implement DSA port fast ageing
Remove the fast ageing logic from b53_br_set_stp_state and implement the
new DSA switch port_fast_age operation instead.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Vivien Didelot 732f794c1b net: dsa: add port fast ageing
Today the DSA drivers are in charge of flushing the MAC addresses
associated to a port when its STP state changes from Learning or
Forwarding, to Disabled or Blocking or Listening.

This makes the drivers more complex and hides the generic switch logic.
Introduce a new optional port_fast_age operation to dsa_switch_ops, to
move this logic to the DSA layer and keep drivers simple.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Vivien Didelot 4acfee8143 net: dsa: add port STP state helper
Add a void helper to set the STP state of a port, checking first if the
required routine is provided by the driver.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Iyappan Subramanian e3978673f5 drivers: net: xgene: Fix MSS programming
Current driver programs static value of MSS in hardware register for TSO
offload engine to segment the TCP payload regardless the MSS value
provided by network stack.

This patch fixes this by programming hardware registers with the
stack provided MSS value.

Since the hardware has the limitation of having only 4 MSS registers,
this patch uses reference count of mss values being used.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:38 -04:00
Colin Ian King e12934d980 cxgb4: fix signed wrap around when decrementing index idx
Change predecrement compare to post decrement compare to avoid an
unsigned integer wrap-around comparison when decrementing idx in
the while loop.

For example, when idx is zero, the current situation will
predecrement idx in the while loop, wrapping idx to the maximum
signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx].

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:25:16 -04:00
David S. Miller c7b9e63341 Merge branch 'mlx5-sriov-vlan-push-pop'
Saeed Mahameed says:

====================
Mellanox 100G SRIOV offloads vlan push/pop

From Or Gerlitz:

This series further enhances the SRIOV TC offloads of mlx5 to handle
the TC vlan push and pop actions. This serves a common use-case in
virtualization systems where the virtual switch add (push) vlan tags
to packets sent from VMs and removes (pop) vlan tags before the packet
is received by the VM. We use the new E-Switch switchdev mode and the
TC vlan action to achieve that also in SW defined SRIOV environments by
offloading TC rules that contain this action along with forwarding
(TC mirred/redirect action) the packet.

In the first patch we add some helpers to access the TC vlan action info
by offloading drivers. The next five patches don't add any new functionality,
they do some refactoring and cleanups in the current code to be used next.

The seventh patch deals with supporting vlans by the mlx5 e-switch in switchdev
mode. The eighth patch does the vlan action offload from TC and the last patch
adds matching for vlans as typically required by TC flows that involve vlan
pop action.

The series was applied on top of commit 524605e "cxgb4: Convert to use simple_open()"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:19 -04:00
Or Gerlitz 095b6cfd69 net/mlx5e: Add TC vlan match parsing
Enhance the parsing of offloaded TC rules matches to handle vlans.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz 8b32580df1 net/mlx5e: Add TC vlan action for SRIOV offloads
Parse TC vlan actions and set the required elements to allow offloading.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz f5f8247609 net/mlx5: E-Switch, Support VLAN actions in the offloads mode
Many virtualization systems use a policy under which a vlan tag is
pushed to packets sent by guests, and popped before the packet is
forwarded to the VM.

The current generation of the mlx5 HW doesn't fully support that on
a per flow level. As such, we are addressing the above common use
case with the SRIOV e-Switch abilities to push vlan into packets
sent by VFs and pop vlan from packets forwarded to VFs.

The HW can match on the correct vlan being present in packets
forwarded to VFs (eSwitch steering is done before stripping
the tag), so this part is offloaded as is.

A common practice for vlans is to avoid both push vlan and pop vlan
for inter-host VM/VM (east-west) communication because in this case,
push on egress cancels out with pop on ingress.

For supporting that, we use a global eswitch vlan pop policy, hence
allowing guest A to communicate with both remote VM B and local VM C.
This works since the HW pops the vlan only if it exists (e.g for
C --> A packets but not for B --> A packets).

On the slow path, when a VF vport has an offloaded flow which involves
pushing vlans, wheres another flow is not currently offloaded, the
packets from the 2nd flow seen by the VF representor on the host have
vlan. The VF rep driver removes such vlan before calling into the host
networking stack.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz 8515c581df net/mlx5e: Refactor retrival of skb from rx completion element (cqe)
Factor the relevant code into a static inline helper (skb_from_cqe)
doing that.

Move the call to napi_gro_receive to be carried out just
after mlx5e_complete_rx_cqe returns.

Both changes are to be used for the VF representor as well
in the next commit.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz 776b12b674 net/mlx5: Put elements related to offloaded TC rule in one struct
Put the representors related to the source and dest vports and the
action in struct mlx5_esw_flow_attr which is used while setting the FDB rule.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz e33dfe316c net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan
The HW can be programmed to push vlan, pop vlan or both.

A factorization step towards using the push/pop capabilties in the
eswitch offloads mode. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz bac9b6aa1d net/mlx5: E-Switch, Set vport representor fields explicitly on registration
The structure we use for the eswitch vport representor (mlx5_eswitch_rep)
has some fields which are set from upper layers in the driver when they
register the rep. Use explicit setting on registration time for them and
avoid global memcpy. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz 9deb2241f1 net/mlx5: E-Switch, Set the vport when registering the uplink rep
Set the vport value in the PF entry to be that of the uplink so
we can use it blindly over the tc / eswitch offload code without
translating it each time we deal with the uplink representor.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz 53e89941ba net_sched: act_vlan: add helper inlines to access tcf_vlan info
Needed e.g for offloading drivers to pick the relevant attributes.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:11 -04:00
Eric Dumazet fefa569a9d net_sched: sch_fq: account for schedule/timers drifts
It looks like the following patch can make FQ very precise, even in VM
or stressed hosts. It matters at high pacing rates.

We take into account the difference between the time that was programmed
when last packet was sent, and current time (a drift of tens of usecs is
often observed)

Add an EWMA of the unthrottle latency to help diagnostics.

This latency is the difference between current time and oldest packet in
delayed RB-tree. This accounts for the high resolution timer latency,
but can be different under stress, as fq_check_throttled() can be
opportunistically be called from a dequeue() called after an enqueue()
for a different flow.

Tested:
// Start a 10Gbit flow
$ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr &

Before patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17106.04 756876.84   1102.75 1119049.02      0.00      0.00      0.52

After patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17867.00 800245.90   1151.77 1183172.12      0.00      0.00      0.52

A new iproute2 tc can output the 'unthrottle latency' :

$ tc -s qd sh dev eth0 | grep latency
  0 gc, 0 highprio, 32490767 throttled, 2382 ns latency

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:19:06 -04:00
Bert Kenward 429baa6f0e sfc: check async completer is !NULL before calling
Add a NULL check before calling asynchronous MCDI completion functions
during device removal.

Fixes: 7014d7f6 ("sfc: allow asynchronous MCDI without completion function")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:18:35 -04:00
David S. Miller 88e4f75900 Merge branch 'sctp-fix-gap-ack-blocks'
Marcelo Ricardo Leitner says:

====================
sctp: fix the handling of SACK Gap Ack blocks

sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte()
macros, which is weird and confusing.

Once the offset to ctsn is calculated, all wrapping is already handled
and thus to verify the Gap Ack blocks we can just use pure
less/big-or-equal than checks.

Also, rename gap variable to tsn_offset, so it's more meaningful, as
it doesn't point to any gap at all.

Even so, I don't think this discrepancy resulted in any practical bug.

This patch is a preparation for the next one, which will introduce
typecheck() for TSN_lte() macros and would cause a compile error here.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:55:04 -04:00
Marcelo Ricardo Leitner 182691d099 sctp: improve how SSN, TSN and ASCONF serial are compared
Make it similar to time_before() macros:
- easier to understand
- make use of typecheck() to avoid working on unexpected variable types
  (made the issue on previous patch visible)
- for _[lg]te versions, slighly faster, as the compiler used to generate
  a sequence of cmp/je/cmp/js instructions and now it's sub/test/jle
  (for _lte):

Before, for sctp_outq_sack:
	if (primary->cacc.changeover_active) {
    1f01:	80 b9 84 02 00 00 00 	cmpb   $0x0,0x284(%rcx)
    1f08:	74 6e                	je     1f78 <sctp_outq_sack+0xe8>
		u8 clear_cycling = 0;

		if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) {
    1f0a:	8b 81 80 02 00 00    	mov    0x280(%rcx),%eax
	return ((s) - (t)) & TSN_SIGN_BIT;
}

static inline int TSN_lte(__u32 s, __u32 t)
{
	return ((s) == (t)) || (((s) - (t)) & TSN_SIGN_BIT);
    1f10:	8b 7d bc             	mov    -0x44(%rbp),%edi
    1f13:	39 c7                	cmp    %eax,%edi
    1f15:	74 25                	je     1f3c <sctp_outq_sack+0xac>
    1f17:	39 f8                	cmp    %edi,%eax
    1f19:	78 21                	js     1f3c <sctp_outq_sack+0xac>
			primary->cacc.changeover_active = 0;

After:
	if (primary->cacc.changeover_active) {
    1ee7:	80 b9 84 02 00 00 00 	cmpb   $0x0,0x284(%rcx)
    1eee:	74 73                	je     1f63 <sctp_outq_sack+0xf3>
		u8 clear_cycling = 0;

		if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) {
    1ef0:	8b 81 80 02 00 00    	mov    0x280(%rcx),%eax
    1ef6:	2b 45 b4             	sub    -0x4c(%rbp),%eax
    1ef9:	85 c0                	test   %eax,%eax
    1efb:	7e 26                	jle    1f23 <sctp_outq_sack+0xb3>
			primary->cacc.changeover_active = 0;

*_lt() generated pretty much the same code.
Tested with gcc (GCC) 6.1.1 20160621.

This patch also removes SSN_lte as it is not used and cleanups some
comments.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:54:58 -04:00
Marcelo Ricardo Leitner a3007446e5 sctp: fix the handling of SACK Gap Ack blocks
sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte()
macros, which is weird and confusing.

Once the offset to ctsn is calculated, all wrapping is already handled
and thus to verify the Gap Ack blocks we can just use pure
less/big-or-equal than checks.

Also, rename gap variable to tsn_offset, so it's more meaningful, as
it doesn't point to any gap at all.

Even so, I don't think this discrepancy resulted in any practical bug.

This patch is a preparation for the next one, which will introduce
typecheck() for TSN_lte() macros and would cause a compile error here.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:54:58 -04:00
WANG Cong 21641c2e1f net_sched: check NULL on error path in route4_change()
On error path in route4_change(), 'f' could be NULL,
so we should check NULL before calling tcf_exts_destroy().

Fixes: b9a24bb76b ("net_sched: properly handle failure case of tcf_exts_init()")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:51:49 -04:00
David S. Miller d6989d4bbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
Lihong Yang 903e68323b i40evf: remove unnecessary error checking against i40e_shutdown_adminq
The i40e_shutdown_adminq function never returns failure. There is no need to
check the non-0 return value. Clean up the unnecessary error checking and
warning against it.

Change-ID: Ibb616f09cfb93bd1a872ebf3241a15fb8354b31b
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22 22:33:41 -07:00
Alexander Duyck 841493a3f6 i40e: Limit TX descriptor count in cases where frag size is greater than 16K
The i40e driver was incorrectly assuming that we would always be pulling
no more than 1 descriptor from each fragment.  It is in fact possible for
us to end up with the case where 2 descriptors worth of data may be pulled
when a frame is larger than one of the pieces generated when aligning the
payload to either 4K or pieces smaller than 16K.

To adjust for this we just need to make certain to test all the way to the
end of the fragments as it is possible for us to span 2 descriptors in the
block before us so we need to guarantee that even the last 6 descriptors
have enough data to fill a full frame.

Change-ID: Ic2ecb4d6b745f447d334e66c14002152f50e2f99
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22 22:33:41 -07:00
Bimmy Pujari cb130a0b41 i40evf: remove unnecessary error checking against i40evf_up_complete
Function i40evf_up_complete() always returns success. Changed this to a
void type and removed the code that checks the return status and prints
an error message.

Change-ID: I8c400f174786b9c855f679e470f35af292fb50ad
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22 22:33:40 -07:00
Sridhar Samudrala 3f341acc1c i40evf: Fix link state event handling
Currently disabling the link state from PF via
	ip link set enp5s0f0 vf 0 state disable
doesn't disable the CARRIER on the VF.

This patch updates the carrier and starts/stops the tx queues based on the
link state notification from PF.

  PF: enp5s0f0, VF: enp5s2
  #modprobe i40e
  #echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
  #ip link set enp5s2 up
  #ip -d link show enp5s2
  175: enp5s2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
      link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
  #ip link set enp5s0f0 vf 0 state disable
  #ip -d link show enp5s0f0
  171: enp5s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 72 numrxqueues 72 portid 6805ca2e7268
      vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
      vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
  #ip -d link show enp5s2
  175: enp5s2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
       link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 16 numrxqueues 16

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22 22:33:40 -07:00
Colin Ian King ff918912e1 i40e: avoid potential null pointer dereference when assigning len
There is a sanitcy check for desc being null in the first line of
function i40evf_debug_aq.  However, before that, aq_desc is cast from
desc, and aq_desc is being dereferenced on the assignment of len, so
this could be a potential null pointer deference.  Fix this by moving
the initialization of len to the code block where len is being used
and hence at this point we know it is OK to dereference aq_desc.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22 22:33:40 -07:00