Commit graph

635608 commits

Author SHA1 Message Date
Eric Dumazet e68b6e50fa udp: enable busy polling for all sockets
UDP busy polling is restricted to connected UDP sockets.

This is because sk_busy_loop() only takes care of one NAPI context.

There are cases where it could be extended.

1) Some hosts receive traffic on a single NIC, with one RX queue.

2) Some applications use SO_REUSEPORT and associated BPF filter
   to split the incoming traffic on one UDP socket per RX
queue/thread/cpu

3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()

This patch records the napi_id of first received skb, giving more
reach to busy polling.

Tested:

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read

lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done

Before patch :
   27867   28870   37324   41060   41215
   36764   36838   44455   41282   43843
After patch :
   73920   73213   70147   74845   71697
   68315   68028   75219   70082   73707

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 10:44:31 -05:00
David S. Miller fcd2b0da73 Merge branch 'rds-ha-failover-fixes'
Sowmini Varadhan says:

====================
RDS: TCP: HA/Failover fixes

This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:

 while (1); do
   # modprobe rds-tcp on test nodes
   # run rds-stress in bi-dir mode between test machine pair
   # modprobe -r rds-tcp on test nodes
 done

rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the
bugs fixed in this series.

Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the  test overnight, without any issues.

Each patch has a detailed description of the root-cause fixed
by the patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:35:19 -05:00
Sowmini Varadhan 1a0e100fb2 RDS: TCP: Force every connection to be initiated by numerically smaller IP address
When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit 241b271952 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").

The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.

This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.

A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:35:18 -05:00
Sowmini Varadhan 905dd4184e RDS: TCP: Track peer's connection generation number
The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
(b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().

In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.

To achieve this, the RDS handshake probe added as part of
commit 5916e2c155 ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:35:18 -05:00
Sowmini Varadhan 315ca6d98e RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list
As noted in rds_recv_incoming() sequence numbers on data packets
can decreas for the failover case, and the Rx path is equipped
to recover from this, if the RDS_FLAG_RETRANSMITTED is set
on the rds header of an incoming message with a suspect sequence
number.

The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
flag in the rds_message, so make sure the flag is set on messages
queued for retransmission.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:35:18 -05:00
LABBE Corentin b3e5106962 net: stmmac: replace if (netif_msg_type) by their netif_xxx counterpart
As sugested by Joe Perches, we could replace all
if (netif_msg_type(priv)) dev_xxx(priv->devices, ...)
by the simpler macro netif_xxx(priv, hw, priv->dev, ...)

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:30:30 -05:00
LABBE Corentin de9a2165a5 net: stmmac: replace hardcoded function name by __func__
Some printing have the function name hardcoded.
It is better to use __func__ instead.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:30:30 -05:00
LABBE Corentin 38ddc59d65 net: stmmac: replace all pr_xxx by their netdev_xxx counterpart
The stmmac driver use lots of pr_xxx functions to print information.
This is bad since we cannot know which device logs the information.
(moreover if two stmmac device are present)

Furthermore, it seems that it assumes wrongly that all logs will always
be subsequent by using a dev_xxx then some indented pr_xxx like this:
kernel: sun7i-dwmac 1c50000.ethernet: no reset control found
kernel:  Ring mode enabled
kernel:  No HW DMA feature register supported
kernel:  Normal descriptors
kernel:  TX Checksum insertion supported

So this patch replace all pr_xxx by their netdev_xxx counterpart.
Excepts for some printing where netdev "cause" unpretty output like:
sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found
In those case, I keep dev_xxx.

In the same time I remove some "stmmac:" print since
this will be a duplicate with that dev_xxx displays.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:30:30 -05:00
Eric Dumazet 29c58472ec net_sched: sch_fq: use hash_ptr()
When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful,
and I chose hash_32().

Linus Torvalds and George Spelvin fixed this issue, so we can
use hash_ptr() to get more entropy on 64bit arches with Terabytes
of memory, and avoid the cast games.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 13:28:56 -05:00
Eric Dumazet d30d9ccbfa net/mlx5e: remove napi_hash_del() calls
Calling napi_hash_del() after netif_napi_del() is pointless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 12:06:20 -05:00
Eric Dumazet bb07fafaec net/mlx4_en: remove napi_hash_del() call
There is no need calling napi_hash_del()+synchronize_rcu() before
calling netif_napi_del()

netif_napi_del() does this already.

Using napi_hash_del() in a driver is useful only when dealing with
a batch of NAPI structures, so that a single synchronize_rcu() can
be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 11:17:05 -05:00
David S. Miller 6a02f5eb6a Merge branch 'mlxsw-i2c'
Jiri Pirko says:

====================
mlxsw: Introduce support for I2C bus

Vadim says:

This patchset adds I2C access support for SwitchX, SwitchX2, SwitchIB,
SwitchIB2 and Spectrum silicones.

It contains:
 - Small changes in mlxsw core code, needed for I2C bus support;
 - I2C driver, which obtains I2C input/output mailboxes setting and
   provides command interface implementation.
 - Minimal driver, which works on top of I2C driver and allows running
   of mlxsw command interface over I2C bus;

Use case:
On system, which does not have PCI to ASIC (BMC), hwmon functionality
(sensors, pwm, tacho) will be available through I2C.

Usage (manual probing):
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device

Sysfs interface:
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/pwm1
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/temp1_input
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:05 -05:00
Vadim Pasternak d556e92916 mlxsw: minimal: Add I2C support for Mellanox ASICs
Add I2C access support for Mellanox ASICs:
- Virtual Protocol Interconnect switches SwitchX, SwitchX2,
  providing InfiniBand, Ethernet and Fibre Channel connectivity;
- Infiniband switches SwitchIB, SwitchIB2:
- Ethernet switch Spectrum.

Example of probing activation:
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:04 -05:00
Vadim Pasternak 4ebd00bc5f mlxsw: Invoke driver's init/fini methods only if defined
We are going to add a minimal driver on top of the mlxsw core
infrastructure, which will be mainly used for hardware monitoring in
Baseboard management controller (BMC) installations.

Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not
initialize the ASIC and therefore doesn't need to implement the init() and
fini() methods in its 'mlxsw_driver' struct.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:04 -05:00
Vadim Pasternak 6882b0aee1 mlxsw: Introduce support for I2C bus
Add I2C bus implementation for Mellanox Technologies Switch ASICs.
This includes command interface implementation using input / out mailboxes,
whose location is retrieved from the firmware during probe time.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:04 -05:00
Vadim Pasternak c711e27a18 mlxsw: Add bus capability flag
The mlxsw core infrastructure currently assumes that communication with
the ASIC is always possible using Ethernet management datagrams (EMADs),
but this is only possible when the PCI bus is used.

The bus capability flag is added to indicate EMAD support and make core
initialize EMAD communication only when it's set. Otherwise, register
access is done using command interface.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:04 -05:00
Julia Lawall d01eb808c7 net: netcp: replace IS_ERR_OR_NULL by IS_ERR
knav_queue_open always returns an ERR_PTR value, never NULL.  This can be
confirmed by unfolding the function calls and conforms to the function's
documentation.  Thus, replace IS_ERR_OR_NULL by IS_ERR in error checks.

The change is made using the following semantic patch:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x;
statement S;
@@

x = knav_queue_open(...);
if (
-   IS_ERR_OR_NULL
+   IS_ERR
    (x)) S
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:26:36 -05:00
Xin Long 7fda702f93 sctp: use new rhlist interface on sctp transport rhashtable
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
to hash a node to one chain. If in one host thousands of assocs connect
to one server with the same lport and different laddrs (although it's
not a normal case), all the transports would be hashed into the same
chain.

It may cause to keep returning -EBUSY when inserting a new node, as the
chain is too long and sctp inserts a transport node in a loop, which
could even lead to system hangs there.

The new rhlist interface works for this case that there are many nodes
with the same key in one chain. It puts them into a list then makes this
list be as a node of the chain.

This patch is to replace rhashtable_ interface with rhltable_ interface.
Since a chain would not be too long and it would not return -EBUSY with
this fix when inserting a node, the reinsert loop is also removed here.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:22:17 -05:00
David S. Miller 00a636e776 Merge branch 'bnxt_en-next'
Michael Chan says:

====================
bnxt_en: Updates.

New firmware spec. update, autoneg update, and UDP RSS support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:11:07 -05:00
Michael Chan a011952a1a bnxt_en: Add ethtool -n|-N rx-flow-hash support.
To display and modify the RSS hash.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:11:07 -05:00
Michael Chan 87da7f796d bnxt_en: Add UDP RSS support for 57X1X chips.
The newer chips have proper support for 4-tuple UDP RSS.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:11:07 -05:00
Michael Chan 286ef9d64e bnxt_en: Enhance autoneg support.
On some dual port NICs, the speed setting on one port can affect the
available speed on the other port.  Add logic to detect these changes
and adjust the advertised speed settings when necessary.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:11:07 -05:00
Michael Chan 16d663a69f bnxt_en: Update firmware interface spec to 1.5.4.
Use the new FORCE_LINK_DWN bit to shutdown link during close.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:11:07 -05:00
Eric Dumazet 89c4b442b7 netpoll: more efficient locking
Callers of netpoll_poll_lock() own NAPI_STATE_SCHED

Callers of netpoll_poll_unlock() have BH blocked between
the NAPI_STATE_SCHED being cleared and poll_lock is released.

We can avoid the spinlock which has no contention, and use cmpxchg()
on poll_owner which we need to set anyway.

This removes a possible lockdep violation after the cited commit,
since sk_busy_loop() re-enables BH before calling busy_poll_stop()

Fixes: 217f697436 ("net: busy-poll: allow preemption in sk_busy_loop()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 18:32:02 -05:00
Rafal Ozieblo 1629dd4f76 cadence: Add LSO support.
New Cadence GEM hardware support Large Segment Offload (LSO):
TCP segmentation offload (TSO) as well as UDP fragmentation
offload (UFO). Support for those features was added to the driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 17:54:55 -05:00
Arnd Bergmann 08348995c4 netronome: don't access real_num_rx_queues directly
The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
is enabled, so we now get a build failure when that is turned off:

netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?

As far as I can tell, the check here is only used as an optimization that
we can skip in order to fix the compilation. If sysfs is disabled,
the following netif_set_real_num_rx_queues() has no effect.

Fixes: 164d1e9e5d ("nfp: add support for ethtool .set_channels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 17:05:48 -05:00
Eric Dumazet 973334a1d3 sfc: remove napi_hash_del() call
Calling napi_hash_del() after netif_napi_del() is pointless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 17:04:49 -05:00
David Lebrun a23a8f5bc1 lwtunnel: subtract tunnel headroom from mtu on output redirect
This patch changes the lwtunnel_headroom() function which is called
in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
value when the lwtunnel state is OUTPUT_REDIRECT.

This patch enables e.g. SR-IPv6 encapsulations to work without
manually setting the route mtu.

Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 17:01:15 -05:00
Ido Schimmel d331d30360 mlxsw: spectrum_router: Adjust placement of FIB abort warning
The recent merge commit bb598c1b8c ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause
the FIB abort warning to fire whenever we flush the FIB tables - either
during module removal or actual abort.

Move it back to its rightful location in the FIB abort function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 15:18:01 -05:00
Andrew Lunn 0b6e3d0322 net: dsa: mv88e6xxx: Respect SPEED_UNFORCED, don't set force bit
The SPEED_UNFORCED indicates the MAC & PHY should perform
auto-negotiation to determine a speed which works. If this is called
for, don't set the force bit. If it is set, the MAC actually does
10Gbps, why the internal PHYs don't support.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 15:12:51 -05:00
David S. Miller a11485b737 Merge branch 'amd-xgbe-next'
Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2016-11-15

This patch series addresses some minor issues found in the recently
accepted patch series for the AMD XGBE driver.

The following fixes are included in this driver update series:

- Fix a possibly uninitialized variable in the debugfs support
- Fix the GPIO pin number constraint check

This patch series is based on net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:57:45 -05:00
Lendacky, Thomas 1c1f619e45 amd-xgbe: Fix maximum GPIO value check
The GPIO support in the hardware allows for up to 16 GPIO pins, enumerated
from 0 to 15.  The driver uses the wrong value (16) to validate the GPIO
pin range in the routines to set and clear the GPIO output pins.  Update
the code to use the correct value (15).

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:57:44 -05:00
Lendacky, Thomas 5e4cbaa7fb amd-xgbe: Fix possible uninitialized variable
The debugfs support in the driver uses a common routine to write the
debugfs values. In this routine, if the input file position is non-zero
then the write routine will not return an error and an output parameter
will not have been set. Because an error isn't returned an uninitialized
value will be written into a register.

Fix the common write routine to return an error if the input file position
is non-zero, which will propagate back to the caller.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:57:44 -05:00
David S. Miller 54dbf3a577 Merge branch 'nway-reset'
Florian Fainelli says:

====================
net: Implenent ethtool::nway_reset for a few drivers

This patch series depends on "net: phy: Centralize auto-negotation restart"
since it provides phy_ethtool_nway_reset as a helper function.

The drivers here already support PHYLIB, so there really is no reason why
restarting auto-negotiation would not be possible with these.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:44:02 -05:00
Florian Fainelli 13f0ac4109 net: ethernet: marvell: pxa168_eth: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:44:01 -05:00
Florian Fainelli 00606c4986 net: ethernet: mvpp2: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:44:01 -05:00
Florian Fainelli 5489ee8976 net: ethernet: mvneta: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:44:01 -05:00
Florian Fainelli 3d3ba5685e net: ethoc: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:44:00 -05:00
Florian Fainelli cab247e9df net: stmmac: Implement ethtool::nway_reset
Utilize the generic phy_ethtool_nway_reset() helper function to
implement an autonegotiation restart.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:44:00 -05:00
David S. Miller fc3f9146ce Merge branch 'busypoll-preemption-and-other-optimizations'
Eric Dumazet says:

====================
net: busy-poll: allow preemption and other optimizations

It is time to have preemption points in sk_busy_loop() and improve
its scalability.

Also napi_complete() and friends can tell drivers when it is safe to
not re-enable device interrupts, saving some overhead under
high busy polling.

mlx4 and bnx2x are changed accordingly, to show how this busy polling
status can be exploited by drivers.

Next steps will implement Zach Brown suggestion, where NAPI polling
would be enabled all the time for some chosen queues.
This is needed for efficient epoll() support anyway.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:40:59 -05:00
Eric Dumazet 80f1c21c53 bnx2x: switch to napi_complete_done()
Switch from napi_complete() to napi_complete_done()
for better GRO support (gro_flush_timeout) and core NAPI
features.

Do not rearm interrupts if we are busy polling,
to reduce bus and interrupts overhead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:40:58 -05:00
Eric Dumazet 2e71328375 net/mlx4_en: use napi_complete_done() return value
Do not rearm interrupts if we are busy polling.

mlx4 uses separate CQ for TX and RX, so number of TX interrupts
does not change, unfortunately.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:40:58 -05:00
Eric Dumazet 364b605573 net: busy-poll: return busypolling status to drivers
NAPI drivers use napi_complete_done() or napi_complete() when
they drained RX ring and right before re-enabling device interrupts.

In busy polling, we can avoid interrupts being delivered since
we are polling RX ring in a controlled loop.

Drivers can chose to use napi_complete_done() return value
to reduce interrupts overhead while busy polling is active.

This is optional, legacy drivers should work fine even
if not updated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:40:58 -05:00
Eric Dumazet 21cb84c48c net: busy-poll: remove need_resched() from sk_can_busy_loop()
Now sk_busy_loop() can schedule by itself, we can remove
need_resched() check from sk_can_busy_loop()

Also add a const to its struct sock parameter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:40:58 -05:00
Eric Dumazet 217f697436 net: busy-poll: allow preemption in sk_busy_loop()
After commit 4cd13c21b2 ("softirq: Let ksoftirqd do its job"),
sk_busy_loop() needs a bit of care :
softirqs might be delayed since we do not allow preemption yet.

This patch adds preemptiom points in sk_busy_loop(),
and makes sure no unnecessary cache line dirtying
or atomic operations are done while looping.

A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL

This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
so that sk_busy_loop() does not have to grab it again.

Similarly, netpoll_poll_lock() is done one time.

This gives about 10 to 20 % improvement in various busy polling
tests, especially when many threads are busy polling in
configurations with large number of NIC queues.

This should allow experimenting with bigger delays without
hurting overall latencies.

Tested:
 On a 40Gb mlx4 NIC, 32 RX/TX queues.

 echo 70 >/proc/sys/net/core/busy_read
 for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done

    Before:      After:
 1:   90072   92819
 2:  157289  184007
 3:  235772  213504
 4:  344074  357513
 5:  394755  458267
 6:  461151  487819
 7:  549116  625963
 8:  544423  716219
 9:  720460  738446
10:  794686  837612
11:  915998  923960
12:  937507  925107
13: 1019677  971506
14: 1046831 1113650
15: 1114154 1148902
16: 1105221 1179263
17: 1266552 1299585
18: 1258454 1383817
19: 1341453 1312194
20: 1363557 1488487
21: 1387979 1501004
22: 1417552 1601683
23: 1550049 1642002
24: 1568876 1601915
25: 1560239 1683607
26: 1640207 1745211
27: 1706540 1723574
28: 1638518 1722036
29: 1734309 1757447
30: 1782007 1855436
31: 1724806 1888539
32: 1717716 1944297
33: 1778716 1869118
34: 1805738 1983466
35: 1815694 2020758
36: 1893059 2035632
37: 1843406 2034653
38: 1888830 2086580
39: 1972827 2143567
40: 1877729 2181851

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 13:40:57 -05:00
Martin KaFai Lau 2874aa2e46 bpf: Fix compilation warning in __bpf_lru_list_rotate_inactive
gcc-6.2.1 gives the following warning:
kernel/bpf/bpf_lru_list.c: In function ‘__bpf_lru_list_rotate_inactive.isra.3’:
kernel/bpf/bpf_lru_list.c:201:28: warning: ‘next’ may be used uninitialized in this function [-Wmaybe-uninitialized]

The "next" is currently initialized in the while() loop which must have >=1
iterations.

This patch initializes next to get rid of the compiler warning.

Fixes: 3a08c2fd76 ("bpf: LRU List")
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 11:30:56 -05:00
David Lebrun 46738b1317 ipv6: sr: add option to control lwtunnel support
This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable
support of encapsulation with the lightweight tunnels. When this option
is enabled, CONFIG_LWTUNNEL is automatically selected.

Fix commit 6c8702c60b ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")

Without a proper option to control lwtunnel support for SR-IPv6, if
CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence
of seg6_iptunnel_init() failure with EOPNOTSUPP:

NET: Registered protocol family 10
IPv6: Attempt to unregister permanent protocol 6
IPv6: Attempt to unregister permanent protocol 136
IPv6: Attempt to unregister permanent protocol 17
NET: Unregistered protocol family 10

Tested (compiling, booting, and loading ipv6 module when relevant)
with possible combinations of CONFIG_IPV6={y,m,n},
CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}.

Reported-by: Lorenzo Colitti <lorenzo@google.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 11:29:46 -05:00
David S. Miller 21b23daeba Merge branch 'alx-multiqueue-support'
Tobias Regnery says:

====================
alx: add multi queue support

This patchset lays the groundwork for multi queue support in the alx driver
and enables multi queue support for the tx path by default. The hardware
supports up to 4 tx queues.

Benefits are better utilization of multi core cpus and the usage of the
msi-x support by default which splits the handling of rx / tx and misc
other interrupts.

The rx path is a little bit harder because apparently (based on the limited
information from the downstream driver) the hardware supports up to 8 rss
queues but only has one hardware descriptor ring on the rx side. So the rx
path will be part of another patchset.

Tested on my AR8161 ethernet adapter with different tests:
- there are no regressions observed during my daily usage
- iperf tcp and udp tests shows no performance regressions
- netperf TCP_RR and UDP_RR shows a slight performance increase of about
  1-2% with this patchset applied

This work is based on the downstream driver at github.com/qca/alx

Changes in V2:
	- drop unneeded casts in alx_alloc_rx_ring (Patch 1)
	- add additional information about testing and benefit to the
	  changelog
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 22:46:31 -05:00
Tobias Regnery d768319cd4 alx: enable multiple tx queues
Enable multiple tx queues by default based on the number of online cpus. The
hardware supports up to four tx queues.

Based on the downstream driver at github.com/qca/alx

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 22:46:30 -05:00
Tobias Regnery f58e0f7747 alx: enable msi-x interrupts by default
Remove the module parameter to enable msi-x support and enable msi-x
interrupts unconditionally by default. This is a preparatory step to enable
multi queue support by default, because this is only working with msi-x
interrupts.

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 22:46:30 -05:00