Commit graph

636827 commits

Author SHA1 Message Date
Linus Torvalds 8bca927f13 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Lots more phydev and probe error path leaks in various drivers by
    Johan Hovold.

 2) Fix race in packet_set_ring(), from Philip Pettersson.

 3) Use after free in dccp_invalid_packet(), from Eric Dumazet.

 4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric
    Dumazet.

 5) When tunneling between ipv4 and ipv6 we can be left with the wrong
    skb->protocol value as we enter the IPSEC engine and this causes all
    kinds of problems. Set it before the output path does any
    dst_output() calls, from Eli Cooper.

 6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from
    Florian Fainelli.

 7) Various netfilter nat bug fixes from FLorian Westphal.

 8) Fix memory leak in ipvlan_link_new(), from Gao Feng.

 9) Locking fixes, particularly wrt. socket lookups, in l2tp from
    Guillaume Nault.

10) Avoid invoking rhash teardowns in atomic context by moving netlink
    cb->done() dump completion from a worker thread. Fix from Herbert
    Xu.

11) Buffer refcount problems in tun and macvtap on errors, from Jason
    Wang.

12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user
    selects BBR. Fix from Julian Wollrath.

13) Fix deadlock in transmit path on altera TSE driver, from Lino
    Sanfilippo.

14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita
    Yushchenko.

15) tc_tunnel_key needs to be properly exported to userspace via uapi,
    fix from Roi Dayan.

16) rds_tcp_init_net() doesn't unregister notifier in error path, fix
    from Sowmini Varadhan.

17) Stale packet header pointer access after pskb_expand_head() in
    genenve driver, fix from Sabrina Dubroca.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
  net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
  geneve: avoid use-after-free of skb->data
  tipc: check minimum bearer MTU
  net: renesas: ravb: unintialized return value
  sh_eth: remove unchecked interrupts for RZ/A1
  net: bcmgenet: Utilize correct struct device for all DMA operations
  NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
  cdc_ether: Fix handling connection notification
  ip6_offload: check segs for NULL in ipv6_gso_segment.
  RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
  Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
  ipv6: Set skb->protocol properly for local output
  ipv4: Set skb->protocol properly for local output
  packet: fix race condition in packet_set_ring
  net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
  net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
  net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
  net: ethernet: stmmac: platform: fix outdated function header
  net: ethernet: stmmac: dwmac-meson8b: fix probe error path
  net: ethernet: stmmac: dwmac-generic: fix probe error path
  ...
2016-12-02 11:45:27 -08:00
Eric Dumazet b98b0bc8c4 net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...

Note that before commit 8298193012 ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.

This needs to be backported to all known linux kernels.

Again, many thanks to syzkaller team for discovering this gem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 14:10:14 -05:00
Sabrina Dubroca 5b01014759 geneve: avoid use-after-free of skb->data
geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
only needed as an argument to ip_tunnel_ecn_encap(), move this
directly in the function call.

Fixes: 08399efc63 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 14:07:11 -05:00
Michal Kubeček 3de81b7588 tipc: check minimum bearer MTU
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.

As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.

Fixes: b97bf3fd8f ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 14:03:20 -05:00
David S. Miller f0d21e8947 linux-can-fixes-for-4.9-20161201
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEES2FAuYbJvAGobdVQPTuqJaypJWoFAlhAkJ0THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRA9O6olrKklarJlCACp3d7A2H/xNs+9mz0KNW/r1Dn+gwjh
 ZMvcHEs0FvYjzdpgstd7I7My1iNGbNsjm4MNwyfsGoLAEivPl+tmDcvVR+rpBJRE
 k5jtHUGAwkKEUubVT199Oj4YY5YVUTi7YmqedArXMdHP4kRx+efhrjtmjI2CoL/j
 XS2W0YA8MmaKgyTS/UTElZ8/SlX6o1ZJp5JrEFUexyURw7qlTThKJ+jDdhrL7pUx
 zxamj3umv6F5N21ZIGZTBIMEzVOSfU3nPZGC5M/K4rNNznM/X7sYM8ALjY/bSPNU
 zykf/Ut5BcLaLXm/tyNZvNfv7Y8LCCImO05U8BY9yDDgigmgF62YhT8w
 =iiHQ
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.9-20161201' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-12-02

this is a pull request for net/master.

There are two patches by Stephane Grosjean, who adds support for the new
PCAN-USB X6 USB interface to the pcan_usb driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 14:02:13 -05:00
Dan Carpenter 50d5aa4cf8 net: renesas: ravb: unintialized return value
We want to set the other "err" variable here so that we can return it
later.  My version of GCC misses this issue but I caught it with a
static checker.

Fixes: 9f70eb339f ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:59:47 -05:00
David S. Miller ab17cb1fea wireless-drivers-next patches for 4.10
Major changes:
 
 rsi
 
 * filter rx frames
 * configure tx power
 * make it possible to select antenna
 * support 802.11d
 
 brcmfmac
 
 * cleanup of scheduled scan code
 * support for bcm43341 chipset with different chip id
 * support rev6 of PCIe device interface
 
 ath10k
 
 * add spectral scan support for QCA6174 and QCA9377 families
 * show used tx bitrate with 10.4 firmware
 
 wil6210
 
 * add power save mode support
 * add abort scan functionality
 * add support settings retry limit for short frames
 
 bcma
 
 * add Dell Inspiron 3148
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJYQGivAAoJEG4XJFUm622bqG0IAJtSGt4Fxv2jL7GPmPpEUtYK
 F6G1PCk9LxO44rOZ15E/CT1vPk6Bnwqp9brdngmXwl7jc+jGs4MQN7g6cD4UZgPm
 gxjx8cah2HPRVgEE7PeOILthRxwPA+9klycsvwtglkgQ1SpQVmLHDTLpeOAkRluY
 olJGINoGHTD6osud6p3oKK+VP891omJvu8TPqRjhrhLhbQTWAuTxl2Gsdye30yag
 CsdaEZb9wdUEBoS80EVRwvgBzqrdKU5kGDGbuzytcyrFrRHo4flti1KgxDg3nIpI
 jC4Liwg0yE/aYZlfMqi/960rt8AttCJBDt/vwqp0mOE4IwFsE9Yaio6xXUonAC8=
 =a6a/
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2016-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.10

Major changes:

rsi

* filter rx frames
* configure tx power
* make it possible to select antenna
* support 802.11d

brcmfmac

* cleanup of scheduled scan code
* support for bcm43341 chipset with different chip id
* support rev6 of PCIe device interface

ath10k

* add spectral scan support for QCA6174 and QCA9377 families
* show used tx bitrate with 10.4 firmware

wil6210

* add power save mode support
* add abort scan functionality
* add support settings retry limit for short frames

bcma

* add Dell Inspiron 3148
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:58:10 -05:00
Chris Brandt 33d446dbba sh_eth: remove unchecked interrupts for RZ/A1
When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.

irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<c036b71c>] sh_eth_interrupt
Disabling IRQ #21

Fixes: db893473d3 ("sh_eth: Add support for r7s72100")
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:54:50 -05:00
Florian Fainelli 8c4799ac79 net: bcmgenet: Utilize correct struct device for all DMA operations
__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
same struct device during unmap that was used for the map operation,
which makes DMA-API debugging warn about it. Fix this by always using
&priv->pdev->dev throughout the driver, using an identical device
reference for all map/unmap calls.

Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:53:34 -05:00
David S. Miller 4f4f907a67 Merge branch 'mvneta-64bit'
Gregory CLEMENT says:

====================
Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver

The Armada 37xx is a new ARMv8 SoC from Marvell using same network
controller as the older Armada 370/38x/XP SoCs. This series adapts the
driver in order to be able to use it on this new SoC. The main changes
are:

- 64-bits support: the first patches allow using the driver on a 64-bit
  architecture.

- MBUS support: the mbus configuration is different on Armada 37xx
  from the older SoCs.

- per cpu interrupt: Armada 37xx do not support per cpu interrupt for
  the NETA IP, the non-per-CPU behavior was added back.

The first patch is an optimization in the rx path in swbm mode.
The second patch remove unnecessary allocation for HWBM.
The first item is solved by patches 4 and 5.
The 2 last items are solved by patch 6.
In patch 7 the dt support is added.

Beside Armada 37xx, this series have been again tested on Armada XP
and Armada 38x (with Hardware Buffer Management and with Software
Buffer Management).

This is the 6th version of the series:
- 1st version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/469588.html

- 2nd version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470476.html

- 3rd version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470901.html

- 4th version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/471039.html

- 5th version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/471478.html

Changelog:
v5 -> v6:
 - Added Tested-by from  Marcin Wojtas on the series
 - Added Reviewed-by from Jisheng Zhang on patch 3
 - Fix eth1 phy mode for Armada 3720 DB board on patch 7

v4 -> v5:
 - remove unnecessary cast in patch 3

v3 -> v4:
 - Adding new patch: "net: mvneta: do not allocate buffer in rxq init
   with HWBM"

 - Simplify the HWBM case in patch 3 as suggested by Marcin

v2 -> v3:
 - Adding patch 1 "Optimize rx path for small frame"

 - Fix the kbuild error by moving the "phys_addr += pp->rx_offset_correction;"
  line from patch 2 to patch 3 where rx_offset_correction is introduced.

 - Move the memory allocation of the buf_virt_addr of the rxq to be
   called by the probe function in order to avoid a memory leak.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:02 -05:00
Gregory CLEMENT ea7ae8854a ARM64: dts: marvell: Add network support for Armada 3700
Add neta nodes for network support both in device tree for the SoC and
the board.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:01 -05:00
Marcin Wojtas 2636ac3cc2 net: mvneta: Add network support for Armada 3700 SoC
Armada 3700 is a new ARMv8 SoC from Marvell using same network controller
as older Armada 370/38x/XP. There are however some differences that
needed taking into account when adding support for it:

* open default MBUS window to 4GB of DRAM - Armada 3700 SoC's Mbus
  configuration for network controller has to be done on two levels:
  global and per-port. The first one is inherited from the
  bootloader. The latter can be opened in a default way, leaving
  arbitration to the bus controller.  Hence filled mbus_dram_target_info
  structure is not needed

* make per-CPU operation optional - Recent patches adding RSS and XPS
  support for Armada 38x/XP enabled per-CPU operation of the controller
  by default. Contrary to older SoC's Armada 3700 SoC's network
  controller is not capable of per-CPU processing due to interrupt lines'
  connectivity.  This patch restores non-per-CPU operation, which is now
  optional and depends on neta_armada3700 flag value in mvneta_port
  structure. In order not to complicate the code, separate interrupt
  subroutine is implemented.

For now, on the Armada 3700, RSS is disabled as the current
implementation depend on the per cpu interrupts.

[gregory.clement@free-electrons.com: extract from a larger patch, replace
some ifdef and port to net-next for v4.10]

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:01 -05:00
Gregory CLEMENT f34dacccb4 net: mvneta: Only disable mvneta_bm for 64-bits
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:01 -05:00
Marcin Wojtas 8d5047cf9c net: mvneta: Convert to be 64 bits compatible
Prepare the mvneta driver in order to be usable on the 64 bits platform
such as the Armada 3700.

[gregory.clement@free-electrons.com]: this patch was extract from a larger
one to ease review and maintenance.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Gregory CLEMENT f88bee1c4b net: mvneta: Use cacheable memory to store the rx buffer virtual address
Until now the virtual address of the received buffer were stored in the
cookie field of the rx descriptor. However, this field is 32-bits only
which prevents to use the driver on a 64-bits architecture.

With this patch the virtual address is stored in an array not shared with
the hardware (no more need to use the DMA API). Thanks to this, it is
possible to use cache contrary to the access of the rx descriptor member.

The change is done in the swbm path only because the hwbm uses the cookie
field, this also means that currently the hwbm is not usable in 64-bits.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Jisheng Zhang <jszhang@marvell.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Gregory CLEMENT e9f6499965 net: mvneta: Do not allocate buffer in rxq init with HWBM
For HWBM all buffers are allocated in mvneta_bm_construct() and in runtime
they are put into descriptors by hardware. There is no need to fill them
at this point.

Suggested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Gregory CLEMENT ac83b7ddf2 net: mvneta: Optimize rx path for small frame
For small frame reuse the phys_addr variable instead of accessing the
uncacheable value in the rx descriptor.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Linus Torvalds ed8d747fd2 Fix up a couple of field names in the CREDITS file
Ozgur Karatas reported that the very first entry in the CREDITS file had
the wrong tag for name (M: instead of N: - it happened when moving the
entry from the MAINTAINERS file, where 'M:' stands for "Maintainer").

And when I went looking, I found a couple of other cases of wrong
tagging too.

Reported-by: Ozgur Karatas <mueddib@yandex.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-02 10:48:50 -08:00
David S. Miller b5b5eca9aa Merge branch 'bpf-support-for-sockets'
David Ahern says:

====================
net: Add bpf support for sockets

The recently added VRF support in Linux leverages the bind-to-device
API for programs to specify an L3 domain for a socket. While
SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
program has support for it. Even for those programs that do support it,
the API requires processes to be started as root (CAP_NET_RAW) which
is not desirable from a general security perspective.

This patch set leverages Daniel Mack's work to attach bpf programs to
a cgroup to provide a capability to set sk_bound_dev_if for all
AF_INET{6} sockets opened by a process in a cgroup when the sockets
are allocated.

For example:
 1. configure vrf (e.g., using ifupdown2)
        auto eth0
        iface eth0 inet dhcp
            vrf mgmt

        auto mgmt
        iface mgmt
            vrf-table auto

 2. configure cgroup
        mount -t cgroup2 none /tmp/cgroupv2
        mkdir /tmp/cgroupv2/mgmt
        test_cgrp2_sock /tmp/cgroupv2/mgmt 15

 3. set shell into cgroup (e.g., can be done at login using pam)
        echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs

At this point all commands run in the shell (e.g, apt) have sockets
automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'),
including processes not running as root.

This capability enables running any program in a VRF context and is key
to deploying Management VRF, a fundamental configuration for networking
gear, with any Linux OS installation.

This patchset also exports the socket family, type and protocol as
read-only allowing bpf filters to deny a process in a cgroup the ability
to open specific types of AF_INET or AF_INET6 sockets.

v7
- comments from Alexei

v6
- add export of socket family, type and protocol
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:21 -05:00
David Ahern 554ae6e792 samples/bpf: add userspace example for prohibiting sockets
Add examples preventing a process in a cgroup from opening a socket
based family, protocol and type.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:09 -05:00
David Ahern 4f2e7ae56e samples/bpf: Update bpf loader for cgroup section names
Add support for section names starting with cgroup/skb and cgroup/sock.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:09 -05:00
David Ahern aa4c1037a3 bpf: Add support for reading socket family, type, protocol
Add socket family, type and protocol to bpf_sock allowing bpf programs
read-only access.

Add __sk_flags_offset[0] to struct sock before the bitfield to
programmtically determine the offset of the unsigned int containing
protocol and type.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:09 -05:00
David Ahern ad2805dc79 samples: bpf: add userspace example for modifying sk_bound_dev_if
Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:08 -05:00
David Ahern 6102365876 bpf: Add new cgroup attach type to enable sock modifications
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:08 -05:00
David Ahern b2cd12574a bpf: Refactor cgroups code in prep for new type
Code move and rename only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:44:56 -05:00
Daniele Palmas 9bd813da24 NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
This patch adds support for PID 0x1040 of Telit LE922A.

The qmi adapter requires to have DTR set for proper working,
so QMI_WWAN_QUIRK_DTR has been enabled.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:42:12 -05:00
Kristian Evensen d5c83d0d1d cdc_ether: Fix handling connection notification
Commit bfe9b9d2df ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduced a work-around in usbnet_cdc_status() for devices that exported
cdc carrier on twice on connect. Before the commit, this behavior caused
the link state to be incorrect. It was assumed that all CDC Ethernet
devices would either export this behavior, or send one off and then one on
notification (which seems to be the default behavior).

Unfortunately, it turns out multiple devices sends a connection
notification multiple times per second (via an interrupt), even when
connection state does not change. This has been observed with several
different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
After bfe9b9d2df, the link state has been set as down and then up for
each notification. This has caused a flood of Netlink NEWLINK messages and
syslog to be flooded with messages similar to:

cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped

This commit fixes the behavior by reverting usbnet_cdc_status() to how it
was before bfe9b9d2df. The work-around has been moved to a separate
status-function which is only called when a known, affect device is
detected.

v1->v2:

* Do not open-code netif_carrier_ok() (thanks Henning Schild).
* Call netif_carrier_off() instead of usb_link_change(). This prevents
calling schedule_work() twice without giving the work queue a chance to be
processed (thanks Bjørn Mork).

Fixes: bfe9b9d2df ("cdc_ether: Improve ZTE MF823/831/910 handling")
Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:40:09 -05:00
Artem Savkov 6b6ebb6b01 ip6_offload: check segs for NULL in ipv6_gso_segment.
segs needs to be checked for being NULL in ipv6_gso_segment() before calling
skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:

[   97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[   97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[   97.825214] PGD 0 [   97.827047]
[   97.828540] Oops: 0000 [#1] SMP
[   97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
[   97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
[   97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
[   97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000
[   97.947720] RIP: 0010:[<ffffffff816e52f9>]  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[   97.956251] RSP: 0018:ffff88012fc43a10  EFLAGS: 00010207
[   97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594
[   97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000
[   97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000
[   97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3
[   97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000
[   97.997198] FS:  0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000
[   98.005280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0
[   98.018149] Stack:
[   98.020157]  00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
[   98.027584]  0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
[   98.035010]  ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
[   98.042437] Call Trace:
[   98.044879]  <IRQ> [   98.046803]  [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3]
[   98.053156]  [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130
[   98.059158]  [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110
[   98.064985]  [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0
[   98.070899]  [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70
[   98.077073]  [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0
[   98.082726]  [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690
[   98.088554]  [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50
[   98.094380]  [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20
[   98.099863]  [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
[   98.106907]  [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge]
[   98.113430]  [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60
[   98.118735]  [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0
[   98.124216]  [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge]
[   98.130480]  [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[   98.137785]  [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge]
[   98.143701]  [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge]
[   98.150834]  [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge]
[   98.157355]  [<ffffffff8102fb89>] ? sched_clock+0x9/0x10
[   98.162662]  [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0
[   98.168403]  [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20
[   98.174926]  [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0
[   98.180580]  [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60
[   98.186494]  [<ffffffff815ee625>] process_backlog+0x95/0x140
[   98.192145]  [<ffffffff815edccd>] net_rx_action+0x16d/0x380
[   98.197713]  [<ffffffff8170cff1>] __do_softirq+0xd1/0x283
[   98.203106]  [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30
[   98.209107]  <EOI> [   98.211029]  [<ffffffff8108a5c0>] do_softirq+0x50/0x60
[   98.216166]  [<ffffffff815ec853>] netif_rx_ni+0x33/0x80
[   98.221386]  [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun]
[   98.227388]  [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun]
[   98.233129]  [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net]
[   98.239392]  [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net]
[   98.245916]  [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost]
[   98.251919]  [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost]
[   98.258440]  [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180
[   98.264094]  [<ffffffff810a44d9>] kthread+0xd9/0xf0
[   98.268965]  [<ffffffff810a4400>] ? kthread_park+0x60/0x60
[   98.274444]  [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30
[   98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
[   98.299425] RIP  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[   98.305612]  RSP <ffff88012fc43a10>
[   98.309094] CR2: 00000000000000cc
[   98.312406] ---[ end trace 726a2c7a2d2d78d0 ]---

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:34:58 -05:00
Eric Dumazet 7f7bf1606f mlx4: fix use-after-free in mlx4_en_fold_software_stats()
My recent commit to get more precise rx/tx counters in ndo_get_stats64()
can lead to crashes at device dismantle, as Jesper found out.

We must prevent mlx4_en_fold_software_stats() trying to access
tx/rx rings if they are deleted.

Fix this by adding a test against priv->port_up in
mlx4_en_fold_software_stats()

Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
allows us to eventually broadcast the latest/current counters to
rtnetlink monitors.

Fixes: 40931b8511 ("mlx4: give precise rx/tx bytes/packets counters")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:33:32 -05:00
Sunil Goutham bd3ad7d3a1 net: thunderx: Fix transmit queue timeout issue
Transmit queue timeout issue is seen in two cases
- Due to a race condition btw setting stop_queue at xmit()
  and checking for stopped_queue in NAPI poll routine, at times
  transmission from a SQ comes to a halt. This is fixed
  by using barriers and also added a check for SQ free descriptors,
  incase SQ is stopped and there are only CQE_RX i.e no CQE_TX.
- Contrary to an assumption, a HW errata where HW doesn't stop transmission
  even though there are not enough CQEs available for a CQE_TX is
  not fixed in T88 pass 2.x. This results in a Qset error with
  'CQ_WR_FULL' stalling transmission. This is fixed by adjusting
  RXQ's  RED levels for CQ level such that there is always enough
  space left for CQE_TXs.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:32:59 -05:00
Sowmini Varadhan 721c7443dc RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
If some error is encountered in rds_tcp_init_net, make sure to
unregister_netdevice_notifier(), else we could trigger a panic
later on, when the modprobe from a netns fails.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:29:26 -05:00
David S. Miller 9aac3c1879 Merge branch 'offloading-tc-rules-hw'
Hadar Hen Zion says:

====================
Offloading tc rules using underline Hardware device

This series adds flower classifier support in offloading tc rules when the
Software ingress device is different from the Hardware ingress device,
such as when dealing with IP tunnels

The first two patches are a small fixes to flower, checking the skip_hw flag
wasn't set before calling the Hardware offloading functions which will try to
offload the rule.

The next two patches are infrastructure patches, a preparation for the fourth
patch which is adding support in flower to offload rules when the ingress
device is not a Hardware device and therefore can't offload.
In this case ndo_setup_tc is called with the mirred (egress) device.

The last three patchs are adding mlx5e support to offload rules using the new
"egress_device" flag.

Thanks,
Hadar

Changes from v0:
- check if CONFIG_NET_CLS_ACT is defined befor calling tc_action_ops get_dev()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:38 -05:00
Hadar Hen Zion ebe06875ff net/mlx5e: Support adding ingress tc rule when egress device flag is set
When ndo_setup_tc is called with an egress_dev flag set, it means that
the ndo call was executed on the mirred action (egress) device and not
on the ingress device.

In order to support this kind of ndo_setup_tc call, and insert the
correct decap rule to the hardware, the uplink device on the same eswitch
should be found.

Currently, we use this resolution between the mirred device and the
uplink on the same eswitch to offload vxlan shared device decap rules.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:38 -05:00
Hadar Hen Zion 726293f1f8 net/mlx5e: Save the represntor netdevice as part of the representor
Replace the representor private data to a net_device pointer holding the
representor netdevice, instead of void pointer holding mlx5e_priv.

It will be used by a new eswitch service function, returning the uplink representor
netdevice.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:37 -05:00
Hadar Hen Zion 718f13e72b net/mlx5e: Bring back representor's ndos that were accidentally removed
The VF Representor udp tunnel ndo entries were removed by mistake,
return them.

Fixes: 370bad0f9a ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:37 -05:00
Hadar Hen Zion 7091d8c705 net/sched: cls_flower: Add offload support using egress Hardware device
In order to support hardware offloading when the device given by the tc
rule is different from the Hardware underline device, extract the mirred
(egress) device from the tc action when a filter is added, using the new
tc_action_ops, get_dev().

Flower caches the information about the mirred device and use it for
calling ndo_setup_tc in filter change, update stats and delete.

Calling ndo_setup_tc of the mirred (egress) device instead of the
ingress device will allow a resolution between the software ingress
device and the underline hardware device.

The resolution will take place inside the offloading driver using
'egress_device' flag added to tc_to_netdev struct which is provided to
the offloading driver.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:37 -05:00
Hadar Hen Zion 255cb30425 net/sched: act_mirred: Add new tc_action_ops get_dev()
Adding support to a new tc_action_ops.
get_dev is a general option which allows to get the underline
device when trying to offload a tc rule.

In case of mirred action the returned device is the mirred (egress)
device.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:36 -05:00
Hadar Hen Zion 3036dab670 net/sched: cls_flower: Provide a filter to replace/destroy hardware filter functions
Instead of providing many arguments to fl_hw_{replace/destroy}_filter
functions, just provide cls_fl_filter struct that includes all the relevant
args.

This patches doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:36 -05:00
Hadar Hen Zion 796852197c net/sched: cls_flower: Try to offload only if skip_hw flag isn't set
Check skip_hw flag isn't set before calling
fl_hw_{replace/destroy}_filter and fl_hw_update_stats functions.

Replace the call to tc_should_offload with tc_can_offload.
tc_can_offload only checks if the device supports offloading, the check for
skip_hw flag is done earlier in the flow.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:36 -05:00
Hadar Hen Zion 55330f0596 net/sched: Add separate check for skip_hw flag
Creating a difference between two possible cases:
1. Not offloading tc rule since the user sets 'skip_hw' flag.
2. Not offloading tc rule since the device doesn't support offloading.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:36 -05:00
Florian Westphal 25429d7b7d tcp: allow to turn tcp timestamp randomization off
Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."

Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:49:59 -05:00
Florian Westphal 95a22caee3 tcp: randomize tcp timestamp offsets for each connection
jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.

commit ceaa1fef65 ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).

So only two items are left:
 - add a tsoffset for request sockets
 - extend the tcp isn generator to also return another 32bit number
   in addition to the ISN.

Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.

Includes fixes from Eric Dumazet.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:49:59 -05:00
David S. Miller 7df5358d47 Merge branch 'qed-iscsi'
Manish Rangankar says:

====================
Add QLogic FastLinQ iSCSI (qedi) driver.

This series introduces hardware offload iSCSI initiator driver for the
41000 Series Converged Network Adapters (579xx chip) by Qlogic. The overall
driver design includes a common module ('qed') and protocol specific
dependent modules ('qedi' for iSCSI).

This is an open iSCSI driver, modifications to open iSCSI user components
'iscsid', 'iscsiuio', etc. are required for the solution to work. The user
space changes are also in the process of being submitted.

    https://groups.google.com/forum/#!forum/open-iscsi

The 'qed' common module, under drivers/net/ethernet/qlogic/qed/, is
enhanced with functionality required for the iSCSI support. This series
is based on:

    net tree base: Merge of net and net-next as of 11/29/2016

Changes from RFC v2:

  1. qedi patches are squashed into single patch to prevent krobot
     warning.
  2. Fixed 'hw_p_cpuq' incompatible pointer type.
  3. Fixed sparse incompatible types in comparison expression.
  4. Misc fixes with latest 'checkpatch --strict' option.
  5. Remove int_mode option from MODULE_PARAM.
  6. Prefix all MODULE_PARAM params with qedi_*.
  7. Use CONFIG_QED_ISCSI instead of CONFIG_QEDI
  8. Added bad task mem access fix.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:44:38 -05:00
Yuval Mintz 1d6cff4fca qed: Add iSCSI out of order packet handling.
This patch adds out of order packet handling for hardware offloaded
iSCSI. Out of order packet handling requires driver buffer allocation
and assistance.

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:44:38 -05:00
Yuval Mintz fc831825f9 qed: Add support for hardware offloaded iSCSI.
This adds the backbone required for the various HW initalizations
which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
4xxxx line of adapters - FW notification, resource initializations, etc.

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:44:37 -05:00
Eli Cooper 80d1106aea Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
This reverts commit ae148b0858
("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").

skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
dst_output() is called. It is no longer necessary to do it for each tunnel.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:34:22 -05:00
Eli Cooper b4e479a96f ipv6: Set skb->protocol properly for local output
When xfrm is applied to TSO/GSO packets, it follows this path:

    xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
when xfrm is involved.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:34:22 -05:00
Eli Cooper f418043910 ipv4: Set skb->protocol properly for local output
When xfrm is applied to TSO/GSO packets, it follows this path:

    xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

where skb_gso_segment() relies on skb->protocol to function properly.

This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
fixing a bug where GSO packets sent through a sit tunnel are dropped
when xfrm is involved.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:34:22 -05:00
Philip Pettersson 84ac726023 packet: fix race condition in packet_set_ring
When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.

This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.

The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.

Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:16:49 -05:00
Linus Torvalds 4aa675aaf2 KVM fixes for v4.9-rc8
All architectures avoid memory corruption in an error path.
 ARM prevents bogus acknowledgement of interrupts.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJYQaDPAAoJEED/6hsPKofoq8gH/iJR/fcYg1ovboEaDIDdm/PI
 XzbNgrYZID8Nk04chU2Dh1eD8k3DG64txuOEs+jf3XBPYNnU8TAlw6qHVMG6kzGJ
 zA0CLGgH62DKXLuvnDJ75mpiJmzioGd4hdk0G8CIb9W2ySSUgrcmXMI3AoVP44lY
 LKTCITKq6ePfQ7AIbd3a6YXaR0ZTNP52e1Y4vx+Hsl9WcrMUGKyCmd9IcDI9DrZr
 ahMn+wx3Wzvb/NzH25OYkMAC9X5C7+b6O0IZm0ie8F8iU+JLlgGNiAHxQ5yAbu28
 hzINTTUnwIxgoi/ZN0M8i+fo0RLKq5OCzPMTnUUgdloBL786XREW2t3Ca0kqavg=
 =XdK2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "All architectures avoid memory corruption in an error path. ARM
  prevents bogus acknowledgement of interrupts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: use after free in kvm_ioctl_create_device()
  KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs
2016-12-02 09:15:26 -08:00