1
0
Fork 0
Commit Graph

481940 Commits (9c41f2baa90b95c13007fd9617d5a0251b68968b)

Author SHA1 Message Date
Lendacky, Thomas 5cdec67967 amd-xgbe-phy: Let AMD_XGBE_PHY depend on HAS_IOMEM
The amd-xgbe-phy driver needs to perform ioremap calls, so add HAS_IOMEM
to its build dependency.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:13 -05:00
Lendacky, Thomas 474809b9e1 amd-xgbe: Let AMD_XGBE depend on HAS_IOMEM
The amd-xgbe driver needs to perform ioremap calls, so add HAS_IOMEM
to its build dependency.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:13 -05:00
Lendacky, Thomas 0c95a1faaa amd-xgbe-phy: Sync PCS and PHY modes after reset
This patch adds support to sync the states of the PCS and the PHY
after a reset is performed.  If the PCS and the PHY are not in the
same state after reset an extra mode change would be performed. This
extra mode change might not be needed if the PCS and the PHY are
synced up after reset.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas a7beaf2300 amd-xgbe: Fix a spelling error
This patch fixes the spelling of the word "descriptor" in a couple
of locations.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas f6ac862845 amd-xgbe: Add receive side scaling ethtool support
This patch adds support for ethtool receive side scaling (RSS) commands.
Support is added to get/set the RSS hash key and the RSS lookup table.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas 5b9dfe299e amd-xgbe: Provide support for receive side scaling
This patch provides support for receive side scaling (RSS). RSS allows
for spreading incoming network packets across the Rx queues.  When used
in conjunction with the per DMA channel interrupt support, this allows
the receive processing to be spread across multiple processors.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas 9227dc5e57 amd-xgbe: Add support for per DMA channel interrupts
This patch provides support for interrupts that are generated by the
Tx/Rx DMA channel pairs of the device.  This allows for Tx and Rx
processing to run across multiple processsors.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas 174fd2597b amd-xgbe: Implement split header receive support
Provide support for splitting IP packets so that the header and
payload can be sent to different DMA addresses.  This will allow
the IP header to be put into the linear part of the skb while the
payload can be added as frags.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas 08dcc47c06 amd-xgbe: Use page allocations for Rx buffers
Use page allocations for Rx buffers instead of pre-allocating skbs
of a set size.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas aa96bd3c9f amd-xgbe: Use the u32 data type for descriptors
The Tx and Rx descriptors are unsigned 32 bit values.  Use the u32
type, rather than unsigned int, to map these descriptors.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas a9d41981e9 amd-xgbe: Rename pre_xmit function to dev_xmit
The pre_xmit function name implies that it performs operations prior
to transmitting the packet when in fact it is responsible for setting
up the descriptors and initiating the transmit.  Rename this to
function from pre_xmit to dev_xmit, which is consistent with the name
used during receive processing - dev_read.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:12 -05:00
Lendacky, Thomas 4780b7cae6 amd-xgbe: Move ring allocation to device open
Move the channel and ring tracking structures allocation to device
open.  This will allow for future support to vary the number of Tx/Rx
queues without unloading the module.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 21:50:11 -05:00
Gregory Fong 66f1c44887 bridge: include in6.h in if_bridge.h for struct in6_addr
if_bridge.h uses struct in6_addr ip6, but wasn't including the in6.h
header.  Thomas Backlund originally sent a patch to do this, but this
revealed a redefinition issue: https://lkml.org/lkml/2013/1/13/116

The redefinition issue should have been fixed by the following Linux
commits:
ee262ad827 inet: defines IPPROTO_* needed for module alias generation
cfd280c912 net: sync some IP headers with glibc

and the following glibc commit:
6c82a2f8d7c8e21e39237225c819f182ae438db3 Coordinate IPv6 definitions for Linux and glibc

so actually include the header now.

Reported-by: Colin Guthrie <colin@mageia.org>
Reported-by: Christiaan Welvaart <cjw@daneel.dyndns.org>
Reported-by: Thomas Backlund <tmb@mageia.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 17:13:34 -05:00
Marcelo Leitner 1f37bf87aa tcp: zero retrans_stamp if all retrans were acked
Ueki Kohei reported that when we are using NewReno with connections that
have a very low traffic, we may timeout the connection too early if a
second loss occurs after the first one was successfully acked but no
data was transfered later. Below is his description of it:

When SACK is disabled, and a socket suffers multiple separate TCP
retransmissions, that socket's ETIMEDOUT value is calculated from the
time of the *first* retransmission instead of the *latest*
retransmission.

This happens because the tcp_sock's retrans_stamp is set once then never
cleared.

Take the following connection:

                      Linux                    remote-machine
                        |                           |
         send#1---->(*1)|--------> data#1 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*2)|----> data#1(retrans) ---->|
                  | (*3)|<---------- ACK <----------|
                  |     |                           |
                  |     :                           :
                  |     :                           :
                  |     :                           :
                16 minutes (or more)                :
                  |     :                           :
                  |     :                           :
                  |     :                           :
                  |     |                           |
         send#2---->(*4)|--------> data#2 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*5)|----> data#2(retrans) ---->|
                  |     |                           |
                  |     |                           |
                RTO*2   :                           :
                  |     |                           |
                  |     |                           |
      ETIMEDOUT<----(*6)|                           |

(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.

At this point the first "retransmission event" has passed and been
recovered from. Any future retransmission is a completely new "event".

(*4) After 16 minutes (to correspond with retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and (*4).

The socket's timeout SHOULD be calculated from this point in time, but
instead it's calculated from the prior "event" 16 minutes ago.

(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT.

Therefore, now we clear retrans_stamp as soon as all data during the
loss window is fully acked.

Reported-by: Ueki Kohei
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:59:49 -05:00
WANG Cong 25de4668d0 ipv6: move INET6_MATCH() to include/net/inet6_hashtables.h
It is only used in net/ipv6/inet6_hashtables.c.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:59:04 -05:00
David S. Miller 51f3d02b98 net: Add and use skb_copy_datagram_msg() helper.
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:46:40 -05:00
David S. Miller 1d76c1d028 Merge branch 'gue-next'
Tom Herbert says:

====================
gue: Remote checksum offload

This patch set implements remote checksum offload for
GUE, which is a mechanism that provides checksum offload of
encapsulated packets using rudimentary offload capabilities found in
most Network Interface Card (NIC) devices. The outer header checksum
for UDP is enabled in packets and, with some additional meta
information in the GUE header, a receiver is able to deduce the
checksum to be set for an inner encapsulated packet. Effectively this
offloads the computation of the inner checksum. Enabling the outer
checksum in encapsulation has the additional advantage that it covers
more of the packet than the inner checksum including the encapsulation
headers.

Remote checksum offload is described in:
http://tools.ietf.org/html/draft-herbert-remotecsumoffload-01

The GUE transmit and receive paths are modified to support the
remote checksum offload option. The option contains a checksum
offset and checksum start which are directly derived from values
set in stack when doing CHECKSUM_PARTIAL. On receipt of the option, the
operation is to calculate the packet checksum from "start" to end of
the packet (normally derived for checksum complete), and then set
the resultant value at checksum "offset" (the checksum field has
already been primed with the pseudo header). This emulates a NIC
that implements NETIF_F_HW_CSUM.

The primary purpose of this feature is to eliminate cost of performing
checksum calculation over a packet when encpasulating.

In this patch set:
  - Move fou_build_header into fou.c and split it into a couple of
    functions
  - Enable offloading of outer UDP checksum in encapsulation
  - Change udp_offload to support remote checksum offload, includes
    new GSO type and ensuring encapsulated layers (TCP) doesn't try to
    set a checksum covered by RCO
  - TX support for RCO with GUE. This is configured through ip_tunnel
    and set the option on transmit when packet being encapsulated is
    CHECKSUM_PARTIAL
  - RX support for RCO with GUE for normal and GRO paths. Includes
    resolving the offloaded checksum

v2:
  Address comments from davem: Move accounting for private option
  field in gue_encap_hlen to patch in which we add the remote checksum
  offload option.

Testing:

I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200
streams, comparing GUE with and without remote checksum offload (doing
checksum-unnecessary to complete conversion in both cases). These
were run on mlnx4 and bnx2x. Some mlnx4 results are below.

GRE/GUE
    TCP_STREAM
      IPv4, with remote checksum offload
        9.71% TX CPU utilization
        7.42% RX CPU utilization
        36380 Mbps
      IPv4, without remote checksum offload
        12.40% TX CPU utilization
        7.36% RX CPU utilization
        36591 Mbps
    TCP_RR
      IPv4, with remote checksum offload
        77.79% CPU utilization
	91/144/216 90/95/99% latencies
        1.95127e+06 tps
      IPv4, without remote checksum offload
        78.70% CPU utilization
        89/152/297 90/95/99% latencies
        1.95458e+06 tps

IPIP/GUE
    TCP_STREAM
      With remote checksum offload
        10.30% TX CPU utilization
        7.43% RX CPU utilization
        36486 Mbps
      Without remote checksum offload
        12.47% TX CPU utilization
        7.49% RX CPU utilization
        36694 Mbps
    TCP_RR
      With remote checksum offload
        77.80% CPU utilization
        87/153/270 90/95/99% latencies
        1.98735e+06 tps
      Without remote checksum offload
        77.98% CPU utilization
        87/150/287 90/95/99% latencies
        1.98737e+06 tps

SIT/GUE
    TCP_STREAM
      With remote checksum offload
        9.68% TX CPU utilization
        7.36% RX CPU utilization
        35971 Mbps
      Without remote checksum offload
        12.95% TX CPU utilization
        8.04% RX CPU utilization
        36177 Mbps
    TCP_RR
      With remote checksum offload
        79.32% CPU utilization
        94/158/295 90/95/99% latencies
        1.88842e+06 tps
      Without remote checksum offload
        80.23% CPU utilization
        94/149/226 90/95/99% latencies
        1.90338e+06 tps

VXLAN
    TCP_STREAM
        35.03% TX CPU utilization
        20.85% RX CPU utilization
        36230 Mbps
    TCP_RR
        77.36% CPU utilization
        84/146/270 90/95/99% latencies
        2.08063e+06 tps

We can also look at CPU time in csum_partial using perf (with bnx2x
setup). For GRE with TCP_STREAM I see:

    With remote checksum offload
        0.33% TX
        1.81% RX
    Without remote checksum offload
        6.00% TX
        0.51% RX

I suspect the fact that time in csum_partial noticably increases
with remote checksum offload for RX is due to taking the cache miss on
the encapsulated header in that function. By similar reasoning, if on
the TX side the packet were not in cache (say we did a splice from a
file whose data was never touched by the CPU) the CPU savings for TX
would probably be more pronounced.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:34:47 -05:00
Tom Herbert a8d31c128b gue: Receive side of remote checksum offload
Add processing of the remote checksum offload option in both the normal
path as well as the GRO path. The implements patching the affected
checksum to derive the offloaded checksum.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:04 -05:00
Tom Herbert b17f709a24 gue: TX support for using remote checksum offload option
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert c1aa8347e7 gue: Protocol constants for remote checksum offload
Define a private flag for remote checksun offload as well as a length
for the option.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert e585f23636 udp: Changes to udp_offload to support remote checksum offload
Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).

Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert 5024c33ac3 gue: Add infrastructure for flags and options
Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert 4bcb877d25 udp: Offload outer UDP tunnel csum if available
In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert 63487babf0 net: Move fou_build_header into fou.c and refactor
Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:02 -05:00
David S. Miller 46d3802627 Merge branch 'stmmac-net'
Giuseppe Cavallaro says:

====================
stmmac: review and fix lock and atomicity

Recently some issues have been reported for the driver for locking mechanism
and atomicity.

In fact, enabling DEBUG support to prove lock and to verify if sleeping while
atomic context some warnings occur at runtime. I have reproduced all on STi
platforms.

Concerning the tx path, I had provided a patch time ago but
I discarded the idea to completely remove locks; in this patch-set we can have
some useful fixes instead of.

This patch-set is to fix the atomicity in the PM stuff where I tried to collect
all the points and advice reported in the past weeks.
As final result, on my side no warnings and no problem when suspend/resume the
driver on STi boxes.

I also added a patch that fixes the locks for the EEE.
As pointed in some thread there was a design problem behind the eee
initialization and I have tried to fix that before.
As final result no issues when proving locks too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:23:09 -05:00
Giuseppe CAVALLARO 777da230c5 stmmac: fix atomicity in pm routines
This patch is to fix the atomicity when suspend and resume the
driver. The clk api have been changed (as reported by Hao Liang)
and the skb allocation is done out of the hw setup function and
taking care about the GFP flags.

Reported-by: Hao Liang <hliang1025@gmail.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Hao Liang <hliang1025@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:22:57 -05:00
Giuseppe CAVALLARO 4741cf9cec stmmac: fix concurrency in eee initialization.
This patch aims to fix the concurrency in eee initialization
inside the stmmac driver and related warnings when enable
DEBUG_ATOMIC_SLEEP.

Prior this patch, the stmmac_eee_init could be called in several places
as shown below:

stmmac_open  stmmac_resume         PHY Layer
    |            |                     |
  stmmac_hw_setup           stmmac_adjust_link
    |                                  |           stmmac ethtool
    |__________________________|______________|
                                       |
                                 stmmac_eee_init

The patch removes the stmmac_eee_init call inside the stmmac_hw_setup
that is unnecessary. It is sufficient to call it in the adjust_link to
always guarantee that EEE is always configured at mac level too.

Fixing the lock protection now it is covered another case (not
considered before). The stmmac_eee_init could be called by the ethtool
so critical sections must be protected inside this function too.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:22:57 -05:00
Giuseppe CAVALLARO b9d73704aa stmmac: fix lock in stmmac_set_rx_mode
When compile with CONFIG_PROVE_LOCKING the following warnings happen:

[snip]

    HARDIRQ-ON-W at:
                        [<c0480c1c>] _raw_spin_lock+0x3c/0x4c
                        [<c02c2828>] stmmac_set_rx_mode+0x18/0x3c
                        [<c038b2cc>] dev_set_rx_mode+0x1c/0x28
                        [<c038b38c>] __dev_open+0xb4/0xf8
                        [<c038b5a8>] __dev_change_flags+0x94/0x128
                        [<c038b6a8>] dev_change_flags+0x10/0x48
                        [<c062afe0>] ip_auto_config+0x1d4/0x1084
                        [<c000873c>] do_one_initcall+0x108/0x15c
                        [<c060ec50>] kernel_init_freeable+0x1a8/0x248
                        [<c0472cc0>] kernel_init+0x8/0x160
                        [<c000dfc8>] ret_from_fork+0x14/0x2c
     INITIAL USE at:
                       [<c0480c1c>] _raw_spin_lock+0x3c/0x4c
                       [<c02c2828>] stmmac_set_rx_mode+0x18/0x3c
                       [<c038b2cc>] dev_set_rx_mode+0x1c/0x28
                       [<c038b38c>] __dev_open+0xb4/0xf8
                       [<c038b5a8>] __dev_change_flags+0x94/0x128
                       [<c038b6a8>] dev_change_flags+0x10/0x48
                       [<c062afe0>] ip_auto_config+0x1d4/0x1084
                       [<c000873c>] do_one_initcall+0x108/0x15c
                       [<c060ec50>] kernel_init_freeable+0x1a8/0x248
                       [<c0472cc0>] kernel_init+0x8/0x160
                       [<c000dfc8>] ret_from_fork+0x14/0x2c

so the patch just removes the lock protection in the stmmac_set_rx_mode

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Emilio Lopez <emilio@elopez.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:22:56 -05:00
Fabrice Gasnier 758a0ab59b stmmac: release tx lock, in case of dma mapping error.
Add missing spin_unlock when tx frames gets dropped.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:22:56 -05:00
Fabrice Gasnier 16ee817e43 stmmac: fix stmmac_tx_avail should be called with TX locked
stmmac_tx_avail() may lie if used unprotected. It's using cur_tx
and dirty_tx index. These index may be already in use by tx_clean
when entering xmit routine. So, this should be called locked.

This can cause transmit queue to be stuck, with following message:
NETDEV WATCHDOG: eth0 (stmmaceth): transmit queue 0 timed out

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:22:56 -05:00
David S. Miller 890b7916d0 Merge branch 'stmmac-next'
Giuseppe Cavallaro says:

====================
stmmac: review driver Koptions

Recently many Koption options have been added to have new glue logic on several
platforms.

The main goal behind this work is to guarantee that the driver built
fine on all the branches where it is present independently of which
glue logic is selected.

IMHO, it is better to remove all the not necessary Koption(s) that can hide
build problems when something changes in the driver and especially when
the DT compatibility allows us to manage all the platform data.

I compiled the driver w/o any issue on net-next Git for:

  x86, arm and sh4.

In case of there are build problems on some repos now it will be
easy to catch them and cherry-pick patches from mainstream.

For sure, do not hesitate to contact me in case of issue.

Also this set removes STMMAC_DEBUG_FS and BUS_MODE_DA. The latter is useless
and the former can be replaced by DEBUG_FS (always to make safe the build).

V2: patch-set re-based on top of the latest updates for net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:14:54 -05:00
Giuseppe CAVALLARO 98fbebcb6d stmmac: remove BUS_MODE_DA
This is a very old and often unused option to configure
a bit in a register inside the DMA. This support should
not stay under Koption and should be extended for new chips too.
This will be do later maybe via device-tree parameters.
Also no performance impact when remove this setting on STi platforms.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:14:43 -05:00
Giuseppe CAVALLARO 50fb4f7474 stmmac: remove STMMAC_DEBUG_FS
the STMMAC_DEBUG_FS Koption is now removed from the
driver configuration and this support will be built
by default when DEBUG_FS is present. This can also be
useful on building driver verification.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:14:43 -05:00
Giuseppe CAVALLARO c0d540661d stmmac: remove specific SoC Koption from platform.
This patch removes all the Koptions added to build the glue-logic files
for all different architectures: DWMAC_MESON, DWMAC_SUNXI, DWMAC_STI ...
Nowadays the stmmac needs to be compiled on several platforms; in some
case it very convenient to guarantee that its build is always completed
with success on all the branches where the driver is present.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:14:43 -05:00
Chen Gang b994ca6b67 drivers: net: ethernet: xilinx: xilinx_emaclite: revert the original commit "1db3ddff1602edf2390b7667dcbaa0f71512e3ea"
Microblaze is a fpga soft core, it can be customized easily, which may
cause many various hardware version strings.

So the original fix patch based on hard-coded compatible version strings
is not a good idea (although it is correct for current issue). For it,
there will be a new solving way soon (which based on the device tree).

The original issue is related with qemu, so can only change the hardware
version string in qemu for it, then keep the original driver no touch (
qemu is for virtualization which has much easier life than real world).

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:00:51 -05:00
Rasmus Villemoes 9cdb5dbf79 include/linux/socket.h: Fix comment
File descriptors are always closed on exit :-)

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:52:45 -05:00
Loganaden Velvindron 219b5f29a5 net: Add missing descriptions for fwmark_reflect for ipv4 and ipv6.
It was initially sent by Lorenzo Colitti, but was subsequently
lost in the final diff he submitted.

Signed-off-by: Loganaden Velvindron <logan@elandsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:43:57 -05:00
Jesse Gross d3ca9eafc0 geneve: Unregister pernet subsys on module unload.
The pernet ops aren't ever unregistered, which causes a memory
leak and an OOPs if the module is ever reinserted.

Fixes: 0b5e8b8eea ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:00:51 -05:00
Jesse Gross 45cac46e51 geneve: Set GSO type on transmit.
Geneve does not currently set the inner protocol type when
transmitting packets. This causes GSO segmentation to fail on NICs
that do not support Geneve offloading.

CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 15:00:51 -05:00
Vladimir Zapolskiy 30349bdbc4 net: phy: spi_ks8995: remove sysfs bin file by registered attribute
When a sysfs binary file is asked to be removed, it is found by
attribute name, so strictly speaking this change is not a fix, but
just in case when attribute name is changed in the driver or sysfs
internals are changed, it might be better to remove the previously
created file using right the same binary attribute.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:18:45 -05:00
Fabian Frederick 6cf1093e58 udp: remove blank line between set and test
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:12:10 -05:00
Florent Fourcot 869ba988fe ipv6: trivial, add bracket for the if block
The "else" block is on several lines and use bracket.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:10:19 -05:00
David S. Miller 15e4123ba8 Merge branch 'xgene-net'
Iyappan Subramanian says:

====================
drivers: net: xgene: Fix crash for backward compatibility

This patch set fixes the following issues that were reported during regression.

Patch 1,2 : Adds backward compatibility with the older firmware (<= 1.13.28).
Patch 3   : Use separate hardware resources (descriptor ring, prefetch buffer)
	   that are not shared with the firmware
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:08:47 -05:00
Iyappan Subramanian bdd330f050 drivers: net: xgene: fix: Use separate resources
This patch fixes the following kernel crash during SGMII based 1GbE probe.

	BUG: Bad page state in process swapper/0  pfn:40fe6ad
	page:ffffffbee37a75d8 count:-1 mapcount:0 mapping:          (null) index:0x0
	flags: 0x0()
	page dumped because: nonzero _count
	Modules linked in:
	CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0+ #7
	Call trace:
	[<ffffffc000087fa0>] dump_backtrace+0x0/0x12c
	[<ffffffc0000880dc>] show_stack+0x10/0x1c
	[<ffffffc0004d981c>] dump_stack+0x74/0xc4
	[<ffffffc00012fe70>] bad_page+0xd8/0x128
	[<ffffffc000133000>] get_page_from_freelist+0x4b8/0x640
	[<ffffffc000133260>] __alloc_pages_nodemask+0xd8/0x834
	[<ffffffc0004194f8>] __netdev_alloc_frag+0x124/0x1b8
	[<ffffffc00041bfdc>] __netdev_alloc_skb+0x90/0x10c
	[<ffffffc00039ff30>] xgene_enet_refill_bufpool+0x11c/0x280
	[<ffffffc0003a11a4>] xgene_enet_process_ring+0x168/0x340
	[<ffffffc0003a1498>] xgene_enet_napi+0x1c/0x50
	[<ffffffc00042b454>] net_rx_action+0xc8/0x18c
	[<ffffffc0000b0880>] __do_softirq+0x114/0x24c
	[<ffffffc0000b0c34>] irq_exit+0x94/0xc8
	[<ffffffc0000e68a0>] __handle_domain_irq+0x8c/0xf4
	[<ffffffc000081288>] gic_handle_irq+0x30/0x7c

This was due to hardware resource sharing conflict with the firmware. This
patch fixes this crash by using resources (descriptor ring, prefetch buffer)
that are not shared.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:08:42 -05:00
Iyappan Subramanian c3f4465d27 drivers: net: xgene: Backward compatibility with older firmware
This patch adds support when used with older firmware (<= 1.13.28).

- Added xgene_ring_mgr_init() to check whether ring manager is initialized
- Calling xgene_ring_mgr_init() from xgene_port_ops.reset()
- To handle errors, changed the return type of xgene_port_ops.reset()

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:08:42 -05:00
Iyappan Subramanian 09c9e0593d dtb: xgene: fix: Backward compatibility with older firmware
The following kernel crash was reported when using older firmware (<= 1.13.28).

[    0.980000] libphy: APM X-Gene MDIO bus: probed
[    1.130000] Unhandled fault: synchronous external abort (0x96000010) at 0xffffff800009a17c
[    1.140000] Internal error: : 96000010 [#1] SMP
[    1.140000] Modules linked in:
[    1.140000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0+ #21
[    1.140000] task: ffffffc3f0110000 ti: ffffffc3f0064000 task.ti: ffffffc3f0064000
[    1.140000] PC is at ioread32+0x58/0x68
[    1.140000] LR is at xgene_enet_setup_ring+0x18c/0x1cc
[    1.140000] pc : [<ffffffc0003cec68>] lr : [<ffffffc00053dad8>] pstate: a0000045
[    1.140000] sp : ffffffc3f0067b20
[    1.140000] x29: ffffffc3f0067b20 x28: ffffffc000aa8ea0
[    1.140000] x27: ffffffc000bb2000 x26: ffffffc000a64270
[    1.140000] x25: ffffffc000b05ad8 x24: ffffffc0ff99ba58
[    1.140000] x23: 0000000000004000 x22: 0000000000004000
[    1.140000] x21: 0000000000000200 x20: 0000000000200000
[    1.140000] x19: ffffffc0ff99ba18 x18: ffffffc0007a6000
[    1.140000] x17: 0000000000000007 x16: 000000000000000e
[    1.140000] x15: 0000000000000001 x14: 0000000000000000
[    1.140000] x13: ffffffbeedb71320 x12: 00000000ffffff80
[    1.140000] x11: 0000000000000002 x10: 0000000000000000
[    1.140000] x9 : 0000000000000000 x8 : ffffffc3eb2a4000
[    1.140000] x7 : 0000000000000000 x6 : 0000000000000000
[    1.140000] x5 : 0000000001080000 x4 : 000000007d654010
[    1.140000] x3 : ffffffffffffffff x2 : 000000000003ffff
[    1.140000] x1 : ffffff800009a17c x0 : ffffff800009a17c

The issue was that the older firmware does not support 10GbE and
SGMII based 1GBE interfaces.

This patch changes the address length of the reg property of sgmii0 and xgmii
nodes and serves as preparatory patch for the fix.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Reported-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 17:08:42 -05:00
Fabian Frederick 05006e8c59 esp4: remove assignment in if condition
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:57:49 -05:00
John W. Linville bf515fb11a This relatively large batch of changes is comprised of the
following:
  * large mac80211-hwsim changes from Ben, Jukka and a bit myself
  * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
    University in Prague and Volkswagen Group Research
  * minstrel VHT work from Karl
  * more CSA work from Luca
  * WMM admission control support in mac80211 (myself)
  * various smaller fixes, spelling corrections, and minor API additions
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUWMs/AAoJEDBSmw7B7bqrmBQQAIbfAe7wH1WifRtOnhw3zWQQ
 K36+Edf3HlQ+EIkSs63QousRj2e7pGDOyhzMWLaqsmeTLteUtlGbr7qwiJO1QZdf
 Ml2V5O2s+b8hUIClDBVQF2L6+GGUmRUdQqvDDhkN1guoxD/Nk8cNtsRkSdiXWJWy
 R48NzvYDflBhc8uqPtR8jDb10eM3c00YP9HB+w9hYAfizD+FRue7UNp4MQIqwp9V
 HdKRT6L2n/6QA+Mzse0rMDes5qI7nIUNgj+hjqgJSnhITPMgGR5j/pitnVHrr81M
 ngOipBFG3svsQrwZh8nM4Llp0cM4Gs+GlgCieu9+TJpr2sY00Z3kYcp0pxtDoSxz
 Wblqz9n/bnW9mrkEfl12XqwwT5vguchwHoZ9cXhejDxSawWXoTRx20uW4ahO8ArA
 kWwwjTBVsQ5WMCtOBiqggzNKghwCc2ILmcZnjGdg9aNXcWsmQ4vyeCfG2QxBz/UB
 Grv/f9NSy6mzKQ34yv+lyR7rFZ8XcT03EVAnZSYz8X0ZZGxwtFupRp1RrBh1KPtD
 TJoe6Q71FfHKYRJ2xgygYkQFo+r9d0BKBeerq+Vu2hBeaqyi4aUwSj7d1sUaaq6N
 tL8fmAUqFjVOOUFeH1g07Xke5QD+yrEC7sJKkeRMfcRGB+dEa+2m3I5p4WDz9bWM
 AEvFSsYr/I9KI4d1huXD
 =6GIj
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"This relatively large batch of changes is comprised of the
following:
 * large mac80211-hwsim changes from Ben, Jukka and a bit myself
 * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
   University in Prague and Volkswagen Group Research
 * minstrel VHT work from Karl
 * more CSA work from Luca
 * WMM admission control support in mac80211 (myself)
 * various smaller fixes, spelling corrections, and minor API additions"

Conflicts:
	drivers/net/wireless/ath/wil6210/cfg80211.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-04 16:18:12 -05:00
David S. Miller 90284c2bc9 Merge branch 'ecn_via_routing_table'
Florian Westphal says:

====================
net: allow setting ecn via routing table

Here is v4 of the patchset, its exactly the same as v3 except in patch3/3
where I added the missing 'const' qualifier to a function argument that
Eric spotted during review.

I preserved Erics Acks so that he doesn't have to resend them.

v3 cover letter:

When using syn cookies, then do not simply trust that the echoed timestamp
was not modified to make sure that ecn is not turned on magically when it
is disabled on the host.

The first two patches, which were not part of earlier series, prepare
the cookie code for the ecn route metrics change by allowing is to
more easily use the existing dst object for ecn validation.

The 3rd patch adds the ecn route metric feature support.
It is almost the same as in v2, except that we'll now also test the
dst_features when decoding a syn cookie timestamp that indicates ecn support.

These three patches then allow turning on explicit congestion notification
based on the destination network.

For example, assuming the default tcp_ecn sysctl '2', the following will
enable ecn (tcp_ecn=1 behaviour, i.e. request ecn to be enabled for a
tcp connection) for all connections to hosts inside the 192.168.2/24 network:

ip route change 192.168.2.0/24 dev eth0 features ecn

Having a more fine-grained per-route setting can be beneficial for
various reasons, for example 1) within data centers, or 2) local ISPs
may deploy ECN support for their own video/streaming services [1], etc.

Joint work with Daniel Borkmann, feature suggested by Hannes Frederic Sowa.

The patch to enable this in iproute2 will be posted shortly, it is currently
also available here:
http://git.breakpoint.cc/cgit/fw/iproute2.git/commit/?h=iproute_features&id=8843d2d8973fb81c78a7efe6d42e3a17d739003e

[1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:46 -05:00
Florian Westphal f7b3bec6f5 net: allow setting ecn via routing table
This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.

One can use 'ip route change dev $dev $net features ecn' to toggle this.

Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.

There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:

Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.

It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.

We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.

Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).

 [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
 [2] http://ecn.ethz.ch/

Joint work with Daniel Borkmann.

Reference: http://thread.gmane.org/gmane.linux.network/335797
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:09 -05:00