1
0
Fork 0
Commit Graph

58935 Commits (redonkable)

Author SHA1 Message Date
Andrey Zhizhikin ec63282b52 This is the 5.4.93 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmARRPQACgkQONu9yGCS
 aT7jAg//SFgHtf8wdnuWP7vANyU+MV8fGTs2No729MXuDEZLMwI9uwlkegcNRatI
 G9zCbuPpoXyQFo5wHVYmS1z97dt+SbAY8bO6qjGJBO6e3Pxbo+DEiGCl70Lm6qqu
 8Z3yuECpNID6A3rAgkE2jDBnMr6QolU4hjKnsf8VEVRwDYDjWxTaxZvtS0tGZJf9
 em/F7+1T1cFd2va6FLhyrin1Mu6J/YgZ9NcZTotx/wV5UUwsp/TCxkciUUa4MgkX
 Tv0rt2LSGx2DKw9pGoQi/oXLpyFbQFAM37KWSto7oS7cPzY1FJ7z1Yxcu18J+v5Y
 bsVpCrtDqmrjI8vkcO+8cGcGPXPTT0liUWpWzLX3wiXAZW876fuJrUPFg1LszZoN
 oztyaQTLSCgfYrS21aKOsP3DP2PPRl2TUCslOQwABJrGJ6nLhTyjiF3g+bV2GPlH
 N5f1vutsyp90YkNqywWdK9rito7JFgawqlw63oS65EYsFLmVFeBAdU+b/ecAw53O
 k09HqkZt7RZXVKRpNkLbGfBkGY3wrsiV33SCMpuSQ2lZWyUfwaFmANuaQWeJ5pvl
 nfv09NfzK0cTlMCkHDl5HIqhErvuTSDC6DowEitpQhv23JddgU9KVvWcS5xrJFpP
 9+9QfqkMk9uq0DIYKTSKC+tOrMl7xxsaSG7Xm5MOxr3GXPg1BPE=
 =Fkdf
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmASfSkACgkQ7G51OISz
 Hs2FIRAAkHxX7RhYxsEFknN7akxJ5yI661vpgvt+ZRFIkn+3zb9uC/rlP1r+PnuP
 G/YNkjnLpm0w42NqFXVmYl1wPRRfiKr0NAImFkmoSHrMQsYxXTozTTgIqsOhQdC7
 0mpRZ6JRatp5FHYPG4/oeGmvvnyCjB9uJAqBB/9VAfUziWNCVnxQK/VeVRJKrPSl
 zkmXwqx46vFkjQC4TGDjSPKKUGvwSpPjsF220wA+cZXjwZdiTurYAZEeNe6CT3yl
 IIzeHVF7kdvCW+7ZvjK0pHRuTipysgrRCnEqSDgIDe/XZJnY08ETcwiNWHCeGs6P
 e93m+Meb8ZMjmAHbctwoqrx3LP46Sq8hL/Pzk85lX5P/mXzp6O0+GQu39uO6uvqm
 R/brXylELjL2b4Qcozkjo4aC6fIi3KtJKqFP2nVRKJRTTBJDvIb+f1R0+iNUBun9
 3SFQTOdQgPjQfTQn+0YQ5nywanMFSVkKlsyQ+CtdBxUrR2B1GP8nFWSLaHzM36cr
 Yo9bOzjBAQMtNu1SYX0RDM1P5zuenKxdA3LK0zEBobHsiUT8YCGMCiyZFyFrXuN3
 pKXZGhVKHznVjIhn+VdEeXaKTR1DZlZcwSMGxC8Rrzx3JTJlCMYtI6CNdGR+8XM1
 GOsPolmu9KrDM6kpnZmy+4JT1l4cflPcMQ5R167Jfe5kVOCqSN8=
 =KDG9
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.93' into 5.4-2.2.x-imx

This is the 5.4.93 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-28 09:00:22 +00:00
Enke Chen f7020c437e tcp: fix TCP_USER_TIMEOUT with zero window
commit 9d9b1ee0b2 upstream.

The TCP session does not terminate with TCP_USER_TIMEOUT when data
remain untransmitted due to zero window.

The number of unanswered zero-window probes (tcp_probes_out) is
reset to zero with incoming acks irrespective of the window size,
as described in tcp_probe_timer():

    RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
    as long as the receiver continues to respond probes. We support
    this by default and reset icsk_probes_out with incoming ACKs.

This counter, however, is the wrong one to be used in calculating the
duration that the window remains closed and data remain untransmitted.
Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the
actual issue.

In this patch a new timestamp is introduced for the socket in order to
track the elapsed time for the zero-window probes that have not been
answered with any non-zero window ack.

Fixes: 9721e709fa ("tcp: simplify window probe aborting on USER_TIMEOUT")
Reported-by: William McCall <william.mccall@gmail.com>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210115223058.GA39267@localhost.localdomain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:55 +01:00
Eric Dumazet 945d182a04 tcp: do not mess with cloned skbs in tcp_add_backlog()
commit b160c28548 upstream.

Heiner Kallweit reported that some skbs were sent with
the following invalid GSO properties :
- gso_size > 0
- gso_type == 0

This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2.

Juerg Haefliger was able to reproduce a similar issue using
a lan78xx NIC and a workload mixing TCP incoming traffic
and forwarded packets.

The problem is that tcp_add_backlog() is writing
over gso_segs and gso_size even if the incoming packet will not
be coalesced to the backlog tail packet.

While skb_try_coalesce() would bail out if tail packet is cloned,
this overwriting would lead to corruptions of other packets
cooked by lan78xx, sharing a common super-packet.

The strategy used by lan78xx is to use a big skb, and split
it into all received packets using skb_clone() to avoid copies.
The drawback of this strategy is that all the small skb share a common
struct skb_shared_info.

This patch rewrites TCP gso_size/gso_segs handling to only
happen on the tail skb, since skb_try_coalesce() made sure
it was not cloned.

Fixes: 4f693b55c3 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Juerg Haefliger <juergh@canonical.com>
Tested-by: Juerg Haefliger <juergh@canonical.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209423
Link: https://lore.kernel.org/r/20210119164900.766957-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:55 +01:00
Tariq Toukan ff64094dc7 net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
commit a3eb4e9d4c upstream.

With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
logically done when RXCSUM offload is off.

Fixes: 14136564c8 ("net: Add TLS RX offload feature")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:54 +01:00
Matteo Croce b47a3c32c4 ipv6: set multicast flag on the multicast route
commit ceed9038b2 upstream.

The multicast route ff00::/8 is created with type RTN_UNICAST:

  $ ip -6 -d route
  unicast ::1 dev lo proto kernel scope global metric 256 pref medium
  unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
  unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium

Set the type to RTN_MULTICAST which is more appropriate.

Fixes: e8478e80e5 ("net/ipv6: Save route type in rt6_info")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:54 +01:00
Eric Dumazet b778940f2a net_sched: reject silly cell_log in qdisc_get_rtab()
commit e4bedf48aa upstream.

iproute2 probably never goes beyond 8 for the cell exponent,
but stick to the max shift exponent for signed 32bit.

UBSAN reported:
UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
shift exponent 130 is too large for 32-bit type 'int'
CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x183/0x22e lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:148 [inline]
 __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
 __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389
 qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435
 cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180
 qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246
 tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662
 rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564
 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg net/socket.c:672 [inline]
 ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345
 ___sys_sendmsg net/socket.c:2399 [inline]
 __sys_sendmsg+0x319/0x400 net/socket.c:2432
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:54 +01:00
Eric Dumazet 4ed347901f net_sched: avoid shift-out-of-bounds in tcindex_set_parms()
commit bcd0cf19ef upstream.

tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT
attribute is not silly.

UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29
shift exponent 255 is too large for 32-bit type 'int'
CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 valid_perfect_hash net/sched/cls_tcindex.c:260 [inline]
 tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425
 tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546
 tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127
 rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:672
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2336
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2390
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2423
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20210114185229.1742255-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:54 +01:00
Matteo Croce bc757ba6dc ipv6: create multicast route with RTPROT_KERNEL
commit a826b04303 upstream.

The ff00::/8 multicast route is created without specifying the fc_protocol
field, so the default RTPROT_BOOT value is used:

  $ ip -6 -d route
  unicast ::1 dev lo proto kernel scope global metric 256 pref medium
  unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
  unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium

As the documentation says, this value identifies routes installed during
boot, but the route is created when interface is set up.
Change the value to RTPROT_KERNEL which is a better value.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:53 +01:00
Guillaume Nault 60fb547a3d udp: mask TOS bits in udp_v4_early_demux()
commit 8d2b51b008 upstream.

udp_v4_early_demux() is the only function that calls
ip_mc_validate_source() with a TOS that hasn't been masked with
IPTOS_RT_MASK.

This results in different behaviours for incoming multicast UDPv4
packets, depending on if ip_mc_validate_source() is called from the
early-demux path (udp_v4_early_demux) or from the regular input path
(ip_route_input_noref).

ECN would normally not be used with UDP multicast packets, so the
practical consequences should be limited on that side. However,
IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
with the non-early-demux path behaviour.

Reproducer:

  Setup two netns, connected with veth:
  $ ip netns add ns0
  $ ip netns add ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up
  $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
  $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10

  In ns0, add route to multicast address 224.0.2.0/24 using source
  address 198.51.100.10:
  $ ip -netns ns0 address add 198.51.100.10/32 dev lo
  $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10

  In ns1, define route to 198.51.100.10, only for packets with TOS 4:
  $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10

  Also activate rp_filter in ns1, so that incoming packets not matching
  the above route get dropped:
  $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1

  Now try to receive packets on 224.0.2.11:
  $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -

  In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
  tos 6 for socat):
  $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6

  The "test0" message is properly received by socat in ns1, because
  early-demux has no cached dst to use, so source address validation
  is done by ip_route_input_mc(), which receives a TOS that has the
  ECN bits masked.

  Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
  $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6

  The "test1" message isn't received by socat in ns1, because, now,
  early-demux has a cached dst to use and calls ip_mc_validate_source()
  immediately, without masking the ECN bits.

Fixes: bc044e8db7 ("udp: perform source validation for mcast early demux")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:53 +01:00
Alexander Lobakin 5a3890bad3 skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too
commit 66c556025d upstream.

Commit 3226b158e6 ("net: avoid 32 x truesize under-estimation for
tiny skbs") ensured that skbs with data size lower than 1025 bytes
will be kmalloc'ed to avoid excessive page cache fragmentation and
memory consumption.
However, the fix adressed only __napi_alloc_skb() (primarily for
virtio_net and napi_get_frags()), but the issue can still be achieved
through __netdev_alloc_skb(), which is still used by several drivers.
Drivers often allocate a tiny skb for headers and place the rest of
the frame to frags (so-called copybreak).
Mirror the condition to __netdev_alloc_skb() to handle this case too.

Since v1 [0]:
 - fix "Fixes:" tag;
 - refine commit message (mention copybreak usecase).

[0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me

Fixes: a1c7fff7e1 ("net: netdev_alloc_skb() use build_skb()")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:53 +01:00
Guillaume Nault 8a0b8e26f7 netfilter: rpfilter: mask ecn bits before fib lookup
commit 2e5a6266fb upstream.

RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.

Reproducer:

  Create two netns, connected with a veth:
  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up
  $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/32 dev veth10

  Add a route to ns1 in ns0:
  $ ip -netns ns0 route add 192.0.2.11/32 dev veth01

  In ns1, only packets with TOS 4 can be routed to ns0:
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10

  Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
  is 4:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
    ... 0% packet loss ...

  Now use iptable's rpfilter module in ns1:
  $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP

  Not-ECT and ECT(1) packets still pass:
  $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
    ... 0% packet loss ...
  $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
    ... 0% packet loss ...

  But ECT(0) and ECN packets are dropped:
  $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
    ... 100% packet loss ...
  $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
    ... 100% packet loss ...

After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.

Fixes: 8f97339d3f ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:52 +01:00
Andrey Zhizhikin 0f940c106e This is the 5.4.92 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAMOqAACgkQONu9yGCS
 aT5y+A//dHc3oRvCuXWaRS2Zhmx2KyZNOMkmElQnqi1aMcnrRhyIzNZ5gwCftYp6
 9EzhryrjioTZMHd14eYwwjyT2yckoBFKNsW+cPJ4YgqB8TtVD5a/2ygYAXBrHVkW
 Fj3fXeJZmkRk9U156Gw/O8GP/BJ2ld/lk89IYYNkdjXwjjKyyOotBDGMSou4Swjl
 8EciEzb3fyn8DvbD2bCFit5RgaNH2OMr0uTITS7RyLNmhBoZSfJo62KbFxYbnFti
 I3EKxVhnJemNzU+jWNpczZxTyOodMAzcOWbpttJTIxpGDsivWSXM3kDbIq1HT7pe
 xAfYEtkL+kgLb4EPIzdNue6GRQlRKbgwsfs/ralQ9iPFvL9GHP4zvMj6wGV1Qzjw
 4PI+wc76ZNlQMtkntGrOWRDmYrTICL1UY3Uh93SmaYKWSMRATuHK6LFe+y+7tIK7
 Yo/XAdlAzzmc3cGh4ikC1zj4WchRG9/GlfucnFGqxBuxZGXq8WBStBIOkHda4vFg
 a5Ncli+PyOID22AtXb8It6JFI70arZ53CUAwCRqRA7FYlrzZrcsZe15uuB72yDTZ
 mPeaNplWiIXPn8vWMDGFBX5Zhysgb/8FGXtSaFCOnE3QUVHPIE2hoLUlClfJIqxf
 f4uGh5HfquTXZUXzlvoM8tgKPzfpkrqZe1JKNdCh+khI6VzxX8Q=
 =B0JT
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAMjpwACgkQ7G51OISz
 Hs0jEA/7BXp59JKq3heiHRkYUgZHyYwZ9/9TjP9Ax9EzZUU930T+3ra2pFs0UPfI
 uHIEoYS0tai68lRppxCh50x6d24dfVPyo0ZGKG6AWPmpEnn+HbYT1tHKbzVugciI
 DQ+JvcwIv1OWkA7ueYnVjGshGYnmOV4t9WUU1Z9IXI0JDDoJnfoPrFVEN3wRWgpy
 PRoTqGAYmAVseWJAjQEDfzBZN7BMbA70Cxj7mf5Z37VEE9XG3k5mAvqIR540X/9I
 L7251rTjOvb/ke5EDwpX92iq9dBtqsj7t3j2/oXLI8hHnt6ot1M14YF0xcST4Ljv
 qTK4KwBgTKfUJ6MhedGeR8+UhG6GDgq/UP25Rs6uSZpRzHR/ubn1i++AJMiFDsdC
 tRVo54NcxibrESqsZ5ynHlUzlGY6F7aIV+AY9uEV3EplMjdgI7s02grIazvT3M7I
 g5vRzkXGtCc8WQ9qS/XDj1cKBWbmmTalCUqbfs6Apr0CmuVCRMa91NJaVA5oDtNc
 T3MOZKm6kuDwbLOevhgpANIU+Run1TzG33tfbE4LWbb6wDz7S052iNN1PZWs728L
 TqyIqFrrRwZ6icNBoNwvzBWz0iRC0daTpUZGnmbhQA3pn/PsHWZpLQwnA3jajrx9
 GwYOkLp4fFz0KXsTQ3RV3DKhWAQ6kpn4/mLVH5Kv3ZaHVA1QIfo=
 =Kbmk
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.92' into 5.4-2.2.x-imx

This is the 5.4.92 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-23 21:01:13 +00:00
Lorenzo Bianconi d04c7938d0 mac80211: check if atf has been disabled in __ieee80211_schedule_txq
commit c13cf5c159 upstream.

Check if atf has been disabled in __ieee80211_schedule_txq() in order to
avoid a given sta is always put to the beginning of the active_txqs list
and never moved to the end since deficit is not decremented in
ieee80211_sta_register_airtime()

Fixes: b4809e9484 ("mac80211: Add airtime accounting and scheduling to TXQs")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/93889406c50f1416214c079ca0b8c9faecc5143e.1608975195.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:58:00 +01:00
Felix Fietkau d46996cb4b mac80211: do not drop tx nulldata packets on encrypted links
commit 2463ec86cd upstream.

ieee80211_tx_h_select_key drops any non-mgmt packets without a key when
encryption is used. This is wrong for nulldata packets that can't be
encrypted and are sent out for probing clients and indicating 4-address
mode.

Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Fixes: a0761a3017 ("mac80211: drop data frames without key on encrypted links")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20201218191525.1168-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:58:00 +01:00
Hoang Le 56e8947bcf tipc: fix NULL deref in tipc_link_xmit()
[ Upstream commit b774134464 ]

The buffer list can have zero skb as following path:
tipc_named_node_up()->tipc_node_xmit()->tipc_link_xmit(), so
we need to check the list before casting an &sk_buff.

Fault report:
 [] tipc: Bulk publication failure
 [] general protection fault, probably for non-canonical [#1] PREEMPT [...]
 [] KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf]
 [] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.10.0-rc4+ #2
 [] Hardware name: Bochs ..., BIOS Bochs 01/01/2011
 [] RIP: 0010:tipc_link_xmit+0xc1/0x2180
 [] Code: 24 b8 00 00 00 00 4d 39 ec 4c 0f 44 e8 e8 d7 0a 10 f9 48 [...]
 [] RSP: 0018:ffffc90000006ea0 EFLAGS: 00010202
 [] RAX: dffffc0000000000 RBX: ffff8880224da000 RCX: 1ffff11003d3cc0d
 [] RDX: 0000000000000019 RSI: ffffffff886007b9 RDI: 00000000000000c8
 [] RBP: ffffc90000007018 R08: 0000000000000001 R09: fffff52000000ded
 [] R10: 0000000000000003 R11: fffff52000000dec R12: ffffc90000007148
 [] R13: 0000000000000000 R14: 0000000000000000 R15: ffffc90000007018
 [] FS:  0000000000000000(0000) GS:ffff888037400000(0000) knlGS:000[...]
 [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [] CR2: 00007fffd2db5000 CR3: 000000002b08f000 CR4: 00000000000006f0

Fixes: af9b028e27 ("tipc: make media xmit call outside node spinlock context")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Link: https://lore.kernel.org/r/20210108071337.3598-1-hoang.h.le@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:58:00 +01:00
Daniel Borkmann 55bac51762 net, sctp, filter: remap copy_from_user failure error
[ no upstream commit ]

Fix a potential kernel address leakage for the prerequisite where there is
a BPF program attached to the cgroup/setsockopt hook. The latter can only
be attached under root, however, if the attached program returns 1 to then
run the related kernel handler, an unprivileged program could probe for
kernel addresses that way. The reason this is possible is that we're under
set_fs(KERNEL_DS) when running the kernel setsockopt handler. Aside from
old cBPF there is also SCTP's struct sctp_getaddrs_old which contains
pointers in the uapi struct that further need copy_from_user() inside the
handler. In the normal case this would just return -EFAULT, but under a
temporary KERNEL_DS setting the memory would be copied and we'd end up at
a different error code, that is, -EINVAL, for both cases given subsequent
validations fail, which then allows the app to distinguish and make use of
this fact for probing the address space. In case of later kernel versions
this issue won't work anymore thanks to Christoph Hellwig's work that got
rid of the various temporary set_fs() address space overrides altogether.
One potential option for 5.4 as the only affected stable kernel with the
least complexity would be to remap those affected -EFAULT copy_from_user()
error codes with -EINVAL such that they cannot be probed anymore. Risk of
breakage should be rather low for this particular error case.

Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Reported-by: Ryota Shiga (Flatt Security)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:58:00 +01:00
David Howells 52e0b20c8c rxrpc: Fix handling of an unsupported token type in rxrpc_read()
[ Upstream commit d52e419ac8 ]

Clang static analysis reports the following:

net/rxrpc/key.c:657:11: warning: Assigned value is garbage or undefined
                toksize = toksizes[tok++];
                        ^ ~~~~~~~~~~~~~~~

rxrpc_read() contains two consecutive loops.  The first loop calculates the
token sizes and stores the results in toksizes[] and the second one uses
the array.  When there is an error in identifying the token in the first
loop, the token is skipped, no change is made to the toksizes[] array.
When the same error happens in the second loop, the token is not skipped.
This will cause the toksizes[] array to be out of step and will overrun
past the calculated sizes.

Fix this by making both loops log a message and return an error in this
case.  This should only happen if a new token type is incompletely
implemented, so it should normally be impossible to trigger this.

Fixes: 9a059cd5ca ("rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()")
Reported-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/161046503122.2445787.16714129930607546635.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:58:00 +01:00
Eric Dumazet 5c466480d7 net: avoid 32 x truesize under-estimation for tiny skbs
[ Upstream commit 3226b158e6 ]

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd3 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:59 +01:00
Jakub Kicinski f6499a78e5 net: sit: unregister_netdevice on newlink's error path
[ Upstream commit 47e4bb147a ]

We need to unregister the netdevice if config failed.
.ndo_uninit takes care of most of the heavy lifting.

This was uncovered by recent commit c269a24ce0 ("net: make
free_netdev() more lenient with unregistering devices").
Previously the partially-initialized device would be left
in the system.

Reported-and-tested-by: syzbot+2393580080a2da190f04@syzkaller.appspotmail.com
Fixes: e2f1f072db ("sit: allow to configure 6rd tunnels via netlink")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20210114012947.2515313-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:59 +01:00
Baptiste Lepers c213d85cae rxrpc: Call state should be read with READ_ONCE() under some circumstances
[ Upstream commit a95d25dd7b ]

The call state may be changed at any time by the data-ready routine in
response to received packets, so if the call state is to be read and acted
upon several times in a function, READ_ONCE() must be used unless the call
state lock is held.

As it happens, we used READ_ONCE() to read the state a few lines above the
unmarked read in rxrpc_input_data(), so use that value rather than
re-reading it.

Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/161046715522.2450566.488819910256264150.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:59 +01:00
Petr Machata 6d57b582fb net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
[ Upstream commit df85bc140a ]

In commit 826f328e2b ("net: dcb: Validate netlink message in DCB
handler"), Linux started rejecting RTM_GETDCB netlink messages if they
contained a set-like DCB_CMD_ command.

The reason was that privileges were only verified for RTM_SETDCB messages,
but the value that determined the action to be taken is the command, not
the message type. And validation of message type against the DCB command
was the obvious missing piece.

Unfortunately it turns out that mlnx_qos, a somewhat widely deployed tool
for configuration of DCB, accesses the DCB set-like APIs through
RTM_GETDCB.

Therefore do not bounce the discrepancy between message type and command.
Instead, in addition to validating privileges based on the actual message
type, validate them also based on the expected message type. This closes
the loophole of allowing DCB configuration on non-admin accounts, while
maintaining backward compatibility.

Fixes: 2f90b8657e ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver")
Fixes: 826f328e2b ("net: dcb: Validate netlink message in DCB handler")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/a3edcfda0825f2aa2591801c5232f2bbf2d8a554.1610384801.git.me@pmachata.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:59 +01:00
Petr Machata d52f5929d9 net: dcb: Validate netlink message in DCB handler
[ Upstream commit 826f328e2b ]

DCB uses the same handler function for both RTM_GETDCB and RTM_SETDCB
messages. dcb_doit() bounces RTM_SETDCB mesasges if the user does not have
the CAP_NET_ADMIN capability.

However, the operation to be performed is not decided from the DCB message
type, but from the DCB command. Thus DCB_CMD_*_GET commands are used for
reading DCB objects, the corresponding SET and DEL commands are used for
manipulation.

The assumption is that set-like commands will be sent via an RTM_SETDCB
message, and get-like ones via RTM_GETDCB. However, this assumption is not
enforced.

It is therefore possible to manipulate DCB objects without CAP_NET_ADMIN
capability by sending the corresponding command in an RTM_GETDCB message.
That is a bug. Fix it by validating the type of the request message against
the type used for the response.

Fixes: 2f90b8657e ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver")
Signed-off-by: Petr Machata <me@pmachata.org>
Link: https://lore.kernel.org/r/a2a9b88418f3a58ef211b718f2970128ef9e3793.1608673640.git.me@pmachata.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:58 +01:00
Willem de Bruijn 814e047762 esp: avoid unneeded kmap_atomic call
[ Upstream commit 9bd6b629c3 ]

esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for
the esp trailer.

It accesses the page with kmap_atomic to handle highmem. But
skb_page_frag_refill can return compound pages, of which
kmap_atomic only maps the first underlying page.

skb_page_frag_refill does not return highmem, because flag
__GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP.
That also does not call kmap_atomic, but directly uses page_address,
in skb_copy_to_page_nocache. Do the same for ESP.

This issue has become easier to trigger with recent kmap local
debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP.

Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:58 +01:00
Aya Levin ff6d4e8da7 net: ipv6: Validate GSO SKB before finish IPv6 processing
[ Upstream commit b210de4f8c ]

There are cases where GSO segment's length exceeds the egress MTU:
 - Forwarding of a TCP GRO skb, when DF flag is not set.
 - Forwarding of an skb that arrived on a virtualisation interface
   (virtio-net/vhost/tap) with TSO/GSO size set by other network
   stack.
 - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
   interface with a smaller MTU.
 - Arriving GRO skb (or GSO skb in a virtualised environment) that is
   bridged to a NETIF_F_TSO tunnel stacked over an interface with an
   insufficient MTU.

If so:
 - Consume the SKB and its segments.
 - Issue an ICMP packet with 'Packet Too Big' message containing the
   MTU, allowing the source host to reduce its Path MTU appropriately.

Note: These cases are handled in the same manner in IPv4 output finish.
This patch aligns the behavior of IPv6 and the one of IPv4.

Fixes: 9e50849054 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:57 +01:00
Baptiste Lepers 982e763ea3 udp: Prevent reuseport_select_sock from reading uninitialized socks
[ Upstream commit fd2ddef043 ]

reuse->socks[] is modified concurrently by reuseport_add_sock. To
prevent reading values that have not been fully initialized, only read
the array up until the last known safe index instead of incorrectly
re-reading the last index of the array.

Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:56 +01:00
Andrey Zhizhikin a6f0cb3ff6 This is the 5.4.91 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAHFkkACgkQONu9yGCS
 aT5DMg//TWHV1loe76Jy6mT7SavddKkO+C6YXdGMYN4vVKJqYzASSqqmkIGYZVOj
 G5GnILybNjA9aJIqX4vXTXs3YslWZN+rd//GYRyBTE7SwlNI8Lho1ZJq8VqtWo+x
 jxm+2QNX8wBb9QuCqsnLOVidWVOQ9dcz0GC6/N8gKcAWJ71B2RpwKQxnEXjlJp3f
 m5cX+Vnm3XnJkdT4mmycV3h4gnOrwhIUGbu8iLbPTmfZf5aZ14eD2Su8gpcunWat
 7JY2z1u4jSpkKspG5eVn8wmL1aB5+WhkqU5+rOtHZ+KJZvRY0wTnmIQEBCw0bAW+
 49tIthuJF8wC7oa3hXoXMNG8K112ffeeF2Hm29WFbpFYRinIjGt/MPmg2A1sM+C1
 jVQewVOArNLA0lo5m1jun2/c56EEGFKKODzJR7Epphdi+bsY7DSttIfIIzwUqTc5
 9wgZG81+l9uP/ohTm7vG8hQcANt0DN+X8wet+HqpuO5Mj5T6150dKW4zQhdOljBH
 GL/O/31DfIUmLJL50+X6kn47c0noZlwEmZc+buVxdO5bC27cK6awEE3gQeCTgsWj
 Ok1Sa+3FwwEPnKs8zInYP69U/obvNxBhdxrccrUOViGBxsXKHMPEnXG2bUuiV/7v
 KnuO9z1Pj3+YAdZTwWygdJcZNdCAwGL4ekQV9N/Pxeg6ejq2E3Q=
 =TOgX
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAHVv8ACgkQ7G51OISz
 Hs16CQ//Y47jRspSDzkvfrtOEyTgatnuEmHyAT44CeEMCg1mSp9ESao3fP9e+SPu
 qzNGeAKdlsI9FxA3T0N4HohuOU6fxnKPcD+cGISgM4/5lwfVXm27g8M4TqkyCoqm
 MRwNUIF5vdKOW6czEDFdMRBpoNMEvB9gnWTx0QRBX7ngV6zXwwfdh0WIEuFsSe4P
 LRFMTIrUfCcSxYKT/QRm2k2Jah/h+MdXs19Ig0djcwqHCltGIeYM8hq31DLL8U6Q
 qSS8l5Ql5IE3rozGTdV93ySCr1apsrHiV6dNN00rVsHKfNQjiq0YJD6jjyCP1d+R
 e4RAUPREhODv+J27qYKRp3bSovKJvIn6FFcyjgE2ScGDhNfZalw0XiClwmC/QFUX
 nsDZVtW9dAePJVS2gsWKrUSPwGT7posr3cc5coYeZ/i4p3NjDhqsUl3AerLaYHqq
 VSKNjwT38yHTelB/6BaDdw8eHkaSuLGOIp2CA6HaGoS4xPiDTqvI+AZROFMWcK/m
 gV/VuRlfB8vLObjN7zxolsHT+UT3+WM8hR3sDCEyU3NsJvVxeuxTJtL2qb4zcrRR
 hoDeX5ATOoiKyx56VQR5V/iwr31RXzxFPMRxOc3SePYiZAuJAfy/+SIfquL8sJ4u
 HP8Bx5jieCbLsCazyk0X2QfA6fCsI+avOrOwY9PAWdsB6XKqx8s=
 =IdJz
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.91' into 5.4-2.2.x-imx

This is the 5.4.91 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-19 22:02:36 +00:00
Florian Westphal 516bd00e5a netfilter: nft_compat: remove flush counter optimization
commit 2f941622fd upstream.

WARNING: CPU: 1 PID: 16059 at lib/refcount.c:31 refcount_warn_saturate+0xdf/0xf
[..]
 __nft_mt_tg_destroy+0x42/0x50 [nft_compat]
 nft_target_destroy+0x63/0x80 [nft_compat]
 nf_tables_expr_destroy+0x1b/0x30 [nf_tables]
 nf_tables_rule_destroy+0x3a/0x70 [nf_tables]
 nf_tables_exit_net+0x186/0x3d0 [nf_tables]

Happens when a compat expr is destoyed from abort path.
There is no functional impact; after this work queue is flushed
unconditionally if its pending.

This removes the waitcount optimization.  Test of repeated
iptables-restore of a ~60k kubernetes ruleset doesn't indicate
a slowdown.  In case the counter is needed after all for some workloads
we can revert this and increment the refcount for the
!= NFT_PREPARE_TRANS case to avoid the increment/decrement imbalance.

While at it, also flush for match case, this was an oversight
in the original patch.

Fixes: ffe8923f10 ("netfilter: nft_compat: make sure xtables destructors have run")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:26:19 +01:00
Dinghao Liu 9351148633 netfilter: nf_nat: Fix memleak in nf_nat_init
commit 869f4fdaf4 upstream.

When register_pernet_subsys() fails, nf_nat_bysource
should be freed just like when nf_ct_extend_register()
fails.

Fixes: 1cd472bf03 ("netfilter: nf_nat: add nat hook register functions to nf_nat")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:26:19 +01:00
Jesper Dangaard Brouer 49fc6d92b4 netfilter: conntrack: fix reading nf_conntrack_buckets
commit f6351c3f1c upstream.

The old way of changing the conntrack hashsize runtime was through changing
the module param via file /sys/module/nf_conntrack/parameters/hashsize. This
was extended to sysctl change in commit 3183ab8997 ("netfilter: conntrack:
allow increasing bucket size via sysctl too").

The commit introduced second "user" variable nf_conntrack_htable_size_user
which shadow actual variable nf_conntrack_htable_size. When hashsize is
changed via module param this "user" variable isn't updated. This results in
sysctl net/netfilter/nf_conntrack_buckets shows the wrong value when users
update via the old way.

This patch fix the issue by always updating "user" variable when reading the
proc file. This will take care of changes to the actual variable without
sysctl need to be aware.

Fixes: 3183ab8997 ("netfilter: conntrack: allow increasing bucket size via sysctl too")
Reported-by: Yoel Caspersen <yoel@kviknet.dk>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:26:19 +01:00
j.nixdorf@avm.de a34294774a net: sunrpc: interpret the return value of kstrtou32 correctly
commit 86b53fbf08 upstream.

A return value of 0 means success. This is documented in lib/kstrtox.c.

This was found by trying to mount an NFS share from a link-local IPv6
address with the interface specified by its index:

  mount("[fe80::1%1]:/srv/nfs", "/mnt", "nfs", 0, "nolock,addr=fe80::1%1")

Before this commit this failed with EINVAL and also caused the following
message in dmesg:

  [...] NFS: bad IP address specified: addr=fe80::1%1

The syscall using the same address based on the interface name instead
of its index succeeds.

Credits for this patch go to my colleague Christian Speich, who traced
the origin of this bug to this line of code.

Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de>
Fixes: 00cfaa943e ("replace strict_strto calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:26:18 +01:00
Vasily Averin c8dd8af4b3 netfilter: ipset: fixes possible oops in mtype_resize
[ Upstream commit 2b33d6ffa9 ]

currently mtype_resize() can cause oops

        t = ip_set_alloc(htable_size(htable_bits));
        if (!t) {
                ret = -ENOMEM;
                goto out;
        }
        t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits));

Increased htable_bits can force htable_size() to return 0.
In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR,
so follwoing access to t->hregion should trigger an OOPS.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:26:14 +01:00
Andrey Zhizhikin 873af59ba2 This is the 5.4.90 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAENzgACgkQONu9yGCS
 aT7khA//eTBSPP1vAJIqph0YgQbgCCzvzQTj5enM6F1cCZqVha8s0ZjY4fl9Mkky
 MTVmQdGEem4MoqypzFgAQPQn8KpoM//sQue+b9evny3wU/cmgry5Hs7H3F1/Y7Yv
 q27Q5jzRTmvcy4Up21FhpFE58FXCXiO5H58FrtKEuJtoCxk+akyGuF8Z0UH3Rvp/
 FTKjAKnfzQ9b3MjBJY16W3EqZnpLB+sFMhimS+QyHAr4biTXgIhM/ZebyKxYOGDw
 fq9MX5XCSM5Aka9RfWIGl8FF5y1IICkBQ0Il+xI7zsQwONFD9UIMhAcTE2LxybQT
 YsV/GJ7r/nZWSTcup+vD+tTNceXQoBY2EDGIKeX3rNme8cLWWJeDbTc7KbIkIi35
 ctRFeEcUiFMoQEhIXyi7c8DcOU4xjmTUXtigjhcLLzAODuOBriWbIsM81RuLwNGC
 i/jLYEWhQ+tXozLsmb1/7fL8mvAlZfD3Vwkm4aTSSPul1i52tqBnRZBSut0+KRMa
 +SOpxytl+H5tFV6Z3bI0lrtJ0xnKdr0oJj367JsxIG1yeOpkqe8CEFWW+14TsjqV
 R1ETqDTtqi8YTGfIgp4Q3EUe9LdoJwUQFKh1lv0SMKYac6vtz/C+MxziJXHPValE
 dNK3MocE1zpfMgnZpHP/IwbLOeiWfNl+ZL/wpD73EUr1PvUiRvQ=
 =4Noe
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAEuNMACgkQ7G51OISz
 Hs0XMw//UEpM14N+UcLb3yTHYyr/vksat/Rz0fT3R2bBMgmlxmTsUgDv5OZLiXlO
 mWMbN2k1kDvKl4EuiJfmOK+5FICoFkdBpXugomjLFI0BeJukywdz6l4omDTqHkE7
 dK3ntPdpTVFP0zJI7lMOk5DsKW/GprOOeFaw0PEO8255yoo59l8Ayxi3k6rWU3f+
 a1laEtpwyitIFBAW4KCOT50/f1vpVnqlhB+2fRpFUwGBceuv42p0fBTZ7vi98Mmu
 KtKRVHnPKQIROx07oFUlxeeNsyhRacJXXoSvyyfBGc9pf3vs8nCGXhITXSVuQHsh
 EZh4Ur007Q40sDDaH7FkC0jxqGOc8A/oCAwF9WCVoWy0yx0t/CKpMcgsoBl2H3MC
 v0/DusVQQod1ohcH0lG6fxM3roKkbbOF2HelRDnz+n7mw9biboKKL3Q+i2CNUmam
 KutJixqfW2NsfZo87ObSmn2iv7xcDAZWLO4axMgAfygTdoL39v/Ws7ZJ3IEujh/a
 zKAx9Mb4/TAv1OlmI+3wpsq+efSZ90fUAXnW+ymJazh+sJj/Do+61RcgoyDrG6U+
 SGfIa/rbrnG44jywPisAdLMjtg4YBC3ccSh0oqdbO3lrXyT+afL7XLuH8BfT/rYY
 e33E+xb4qOc4/bfWCFt/rVyZR4PoOo4TJHyoFugQ+N32U7V921c=
 =UIOp
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.90' into 5.4-2.2.x-imx

This is the 5.4.90 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-17 22:23:12 +00:00
Vasily Averin bbb2fee395 net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet
commit 54970a2fbb upstream.

syzbot reproduces BUG_ON in skb_checksum_help():
tun creates (bogus) skb with huge partial-checksummed area and
small ip packet inside. Then ip_rcv trims the skb based on size
of internal ip packet, after that csum offset points beyond of
trimmed skb. Then checksum_tg() called via netfilter hook
triggers BUG_ON:

        offset = skb_checksum_start_offset(skb);
        BUG_ON(offset >= skb_headlen(skb));

To work around the problem this patch forces pskb_trim_rcsum_slow()
to return -EINVAL in described scenario. It allows its callers to
drop such kind of packets.

Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0
Reported-by: syzbot+7010af67ced6105e5ab6@syzkaller.appspotmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1b2494af-2c56-8ee2-7bc0-923fcad1cdf8@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:38 +01:00
Ido Schimmel e6931e3eb0 nexthop: Unlink nexthop group entry in error path
[ Upstream commit 7b01e53eee ]

In case of error, remove the nexthop group entry from the list to which
it was previously added.

Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:32 +01:00
Ido Schimmel 3cecab93f2 nexthop: Fix off-by-one error in error path
[ Upstream commit 07e61a979c ]

A reference was not taken for the current nexthop entry, so do not try
to put it in the error path.

Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:32 +01:00
Florian Westphal 12e10b1212 net: ip: always refragment ip defragmented packets
[ Upstream commit bb4cc1a188 ]

Conntrack reassembly records the largest fragment size seen in IPCB.
However, when this gets forwarded/transmitted, fragmentation will only
be forced if one of the fragmented packets had the DF bit set.

In that case, a flag in IPCB will force fragmentation even if the
MTU is large enough.

This should work fine, but this breaks with ip tunnels.
Consider client that sends a UDP datagram of size X to another host.

The client fragments the datagram, so two packets, of size y and z, are
sent. DF bit is not set on any of these packets.

Middlebox netfilter reassembles those packets back to single size-X
packet, before routing decision.

packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit
isn't set.  At output time, ip refragmentation is skipped as well
because x is still smaller than the mtu of the output device.

If ttransmit device is an ip tunnel, the packet size increases to
x+overhead.

Also, tunnel might be configured to force DF bit on outer header.

In this case, packet will be dropped (exceeds MTU) and an ICMP error is
generated back to sender.

But sender already respects the announced MTU, all the packets that
it sent did fit the announced mtu.

Force refragmentation as per original sizes unconditionally so ip tunnel
will encapsulate the fragments instead.

The only other solution I see is to place ip refragmentation in
the ip_tunnel code to handle this case.

Fixes: d6b915e29f ("ip_fragment: don't forward defragmented DF packet")
Reported-by: Christian Perle <christian.perle@secunet.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:32 +01:00
Florian Westphal 41bfd41112 net: fix pmtu check in nopmtudisc mode
[ Upstream commit 50c661670f ]

For some reason ip_tunnel insist on setting the DF bit anyway when the
inner header has the DF bit set, EVEN if the tunnel was configured with
'nopmtudisc'.

This means that the script added in the previous commit
cannot be made to work by adding the 'nopmtudisc' flag to the
ip tunnel configuration. Doing so breaks connectivity even for the
without-conntrack/netfilter scenario.

When nopmtudisc is set, the tunnel will skip the mtu check, so no
icmp error is sent to client. Then, because inner header has DF set,
the outer header gets added with DF bit set as well.

IP stack then sends an error to itself because the packet exceeds
the device MTU.

Fixes: 23a3647bc4 ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:32 +01:00
Sean Tranchetti 7694654168 net: ipv6: fib: flush exceptions when purging route
[ Upstream commit d8f5c29653 ]

Route removal is handled by two code paths. The main removal path is via
fib6_del_route() which will handle purging any PMTU exceptions from the
cache, removing all per-cpu copies of the DST entry used by the route, and
releasing the fib6_info struct.

The second removal location is during fib6_add_rt2node() during a route
replacement operation. This path also calls fib6_purge_rt() to handle
cleaning up the per-cpu copies of the DST entries and releasing the
fib6_info associated with the older route, but it does not flush any PMTU
exceptions that the older route had. Since the older route is removed from
the tree during the replacement, we lose any way of accessing it again.

As these lingering DSTs and the fib6_info struct are holding references to
the underlying netdevice struct as well, unregistering that device from the
kernel can never complete.

Fixes: 2b760fcf5c ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:31 +01:00
Jakub Kicinski 37e6368a8d net: vlan: avoid leaks on register_vlan_dev() failures
[ Upstream commit 55b7ab1178 ]

VLAN checks for NETREG_UNINITIALIZED to distinguish between
registration failure and unregistration in progress.

Since commit cb626bf566 ("net-sysfs: Fix reference count leak")
registration failure may, however, result in NETREG_UNREGISTERED
as well as NETREG_UNINITIALIZED.

This fix is similer to cebb69754f ("rtnetlink: Fix
memory(net_device) leak when ->newlink fails")

Fixes: cb626bf566 ("net-sysfs: Fix reference count leak")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:31 +01:00
Andrey Zhizhikin 761cab513d This is the 5.4.89 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl/99ZgACgkQONu9yGCS
 aT609BAAg3AcT6t2WQFfY0LZwaT4u8Y7mg7gx2995vDhzWOei/o6AasogDpnv+ey
 fDIu0NwMTK73K5bDSas5pWirEi/+eCk1S0xxg8rLkHgHOYJD7z6Ktq5DlNv5nfNN
 KUl1jnEcZznk4Y3ogxDwJTHmXVCRZAlckn46YiCpYKZeZbA/IqHlzzle9Dwd3eLN
 ElZN6Vdq5vagJOxTuFAEdHLy8mxIWySN0Kh6Ac0VKaaxLbE3GsXXEUtin7nLe/nj
 19/98ije7vQaTUNdqMSu5FIQsZGHg+XNji7EGLvmF/nITEUdwzIWuMsP5/ArVpJn
 rjnmz2J3IuQix7X08PGcde/0T1scXxnspOrQyVnMgGEl9J/5NpewrIItGZGt3H0u
 /fTvohGXx1nvaavDii3u7/y+s038v9HeP9Br6ISlprwZP8Pg4arm0sPQ2aHbPQ1v
 GQZSqat6hOm8DvpkLr0mO4w/+RYgRaVLRCIf8jWoStPvS/pm4APaDvYPAjZdqPRm
 xPSOa9Irvg0UaiwIxiXJdPBvFELvUHexpSxTNGQWsXdNHfMROnK+B4c3MScbDVt8
 vevIh3PVYqENW5Nsn7mSwdWPRzmNaouW/2fWqYjCWxhaSGfqweOz/JawHrwuTTQj
 GRdTgEn9w6o3uj8hQIt7c0+QfGLSvZlHfyvl7JYk/cV6SoofI40=
 =0wHl
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/+FjUACgkQ7G51OISz
 Hs1oTxAAnXqdA+W4GsMTJYy8tN5PPGFh2yKAyrgfpsHVF8iHHWSDPZXD4x4w8p18
 NOk5lzfGYQD9KeJfqQmW3exvdSRzOBbff1F+tTtSwxG5VHTvZQeqqHCPO7ba9e5O
 b4A+lv3t4Oy+8F88NSdY32KEA9y6W1mZ3O4sojB1R48yBckIRGSbYZgLq7Zcd/2B
 /pEuDeG1PkXmz+vXKhNKIohfddXngxcv5B41bv+YuyzBdAcs44sks9tEh9ZzhppC
 zRal9uVMlIIF3OJSIYZ9mQDp70zZuirmN/Jf/4RQ2vNqBwvR1KbLH4EyCvFRZmDk
 7xrGBTLn99hM4Rw62JWzEWMCMD8BRfZegsBtXNO2UEg0VWOZONPDRlo2RVKbOVDm
 cFykPWCs32PuFa5Ys75clt2uL3Y2sw9SnAQUOLiMoWTdccsCP9HCXvMQPs+QlVWQ
 Y+W96naQzbt3Ri55t+mQ+esKmq65KEq1qdbU1dfd412Y+gME/2DCEdHSLL3vwu1M
 aeYBJOre93xKLIwZ0izmPuDnVI5FHqBzd6ZLGHEvnra0/uJuDJ3j2J7L9NPFL0D9
 s0+WjZImfh3E8kYDAJgCEftk3UadxewLtF6Y65KxpQB7mytR5SPSnv06fK9W5Lgf
 l/Fk5xWKcIltjPBjh6df0FMyhFCSH2IOCIFn5W44xglaWl937HI=
 =MeY4
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.89' into 5.4-2.2.x-imx

This is the 5.4.89 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-12 21:35:46 +00:00
Pablo Neira Ayuso a798b367a0 netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
commit 95cd4bca7b upstream.

If userspace requests a feature which is not available the original set
definition, then bail out with EOPNOTSUPP. If userspace sends
unsupported dynset flags (new feature not supported by this kernel),
then report EOPNOTSUPP to userspace. EINVAL should be only used to
report malformed netlink messages from userspace.

Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:24 +01:00
Florian Westphal 5e401ea716 netfilter: xt_RATEEST: reject non-null terminated string from userspace
commit 6cb56218ad upstream.

syzbot reports:
detected buffer overflow in strlen
[..]
Call Trace:
 strlen include/linux/string.h:325 [inline]
 strlcpy include/linux/string.h:348 [inline]
 xt_rateest_tg_checkentry+0x2a5/0x6b0 net/netfilter/xt_RATEEST.c:143

strlcpy assumes src is a c-string. Check info->name before its used.

Reported-by: syzbot+e86f7c428c8c50db65b4@syzkaller.appspotmail.com
Fixes: 5859034d7e ("[NETFILTER]: x_tables: add RATEEST target")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:24 +01:00
Vasily Averin 1dd6a790c2 netfilter: ipset: fix shift-out-of-bounds in htable_bits()
commit 5c8193f568 upstream.

htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds

UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 8498 Comm: syz-executor519
 Not tainted 5.10.0-rc7-next-20201208-syzkaller #0
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline]
 hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524
 ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115
 nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:672
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2345
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2399
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2432
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This patch replaces htable_bits() by simple fls(hashsize - 1) call:
it alone returns valid nbits both for round and non-round hashsizes.
It is normal to set any nbits here because it is validated inside
following htable_size() call which returns 0 for nbits>31.

Fixes: 1feab10d7e6d("netfilter: ipset: Unified hash type generation")
Reported-by: syzbot+d66bfadebca46cf61a2b@syzkaller.appspotmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:24 +01:00
Subash Abhinov Kasiviswanathan e0281bb5a8 netfilter: x_tables: Update remaining dereference to RCU
commit 443d6e86f8 upstream.

This fixes the dereference to fetch the RCU pointer when holding
the appropriate xtables lock.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: cc00bcaa58 ("netfilter: x_tables: Switch synchronization to RCU")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:24 +01:00
Davide Caratti 3f2a28930a net/sched: sch_taprio: ensure to reset/destroy all child qdiscs
[ Upstream commit 698285da79 ]

taprio_graft() can insert a NULL element in the array of child qdiscs. As
a consquence, taprio_reset() might not reset child qdiscs completely, and
taprio_destroy() might leak resources. Fix it by ensuring that loops that
iterate over q->qdiscs[] don't end when they find the first NULL item.

Fixes: 44d4775ca5 ("net/sched: sch_taprio: reset child qdiscs before freeing them")
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/13edef6778fef03adc751582562fba4a13e06d6a.1608240532.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:16 +01:00
Cong Wang 2b8aa896b1 erspan: fix version 1 check in gre_parse_header()
[ Upstream commit 085c7c4e1c ]

Both version 0 and version 1 use ETH_P_ERSPAN, but version 0 does not
have an erspan header. So the check in gre_parse_header() is wrong,
we have to distinguish version 1 from version 0.

We can just check the gre header length like is_erspan_type1().

Fixes: cb73ee40b1 ("net: ip_gre: use erspan key field for tunnel lookup")
Reported-by: syzbot+f583ce3d4ddf9836b27a@syzkaller.appspotmail.com
Cc: William Tu <u9012063@gmail.com>
Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:15 +01:00
Randy Dunlap e40b5fc791 net: sched: prevent invalid Scell_log shift count
[ Upstream commit bd1248f1dd ]

Check Scell_log shift size in red_check_params() and modify all callers
of red_check_params() to pass Scell_log.

This prevents a shift out-of-bounds as detected by UBSAN:
  UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22
  shift exponent 72 is too large for 32-bit type 'int'

Fixes: 8afa10cbe2 ("net_sched: red: Avoid illegal values")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com
Cc: Nogah Frankel <nogahf@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:14 +01:00
Guillaume Nault b16f883e71 ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
[ Upstream commit 21fdca22eb ]

RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.

For example, ECT(0) and ECT(1) packets could be treated differently.

  $ ip netns add ns0
  $ ip netns add ns1
  $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
  $ ip -netns ns0 link set dev lo up
  $ ip -netns ns1 link set dev lo up
  $ ip -netns ns0 link set dev veth01 up
  $ ip -netns ns1 link set dev veth10 up

  $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
  $ ip -netns ns1 address add 192.0.2.11/24 dev veth10

  $ ip -netns ns1 address add 192.0.2.21/32 dev lo
  $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
  $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0

With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):

  $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms

But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms

After this patch the ECN bits don't affect the result anymore:

  $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
  [...]
  64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms

Fixes: 35ebf65e85 ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:14 +01:00
Antoine Tenart 8602c20a91 net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc
[ Upstream commit 4ae2bb8164 ]

Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.

Fixes: 8af2c06ff4 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:13 +01:00
Antoine Tenart 1f6b04a2b2 net-sysfs: take the rtnl lock when storing xps_rxqs
[ Upstream commit 2d57b4f142 ]

Two race conditions can be triggered when storing xps rxqs, resulting in
various oops and invalid memory accesses:

1. Calling netdev_set_num_tc while netif_set_xps_queue:

   - netif_set_xps_queue uses dev->tc_num as one of the parameters to
     compute the size of new_dev_maps when allocating it. dev->tc_num is
     also used to access the map, and the compiler may generate code to
     retrieve this field multiple times in the function.

   - netdev_set_num_tc sets dev->tc_num.

   If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
   is set to a higher value through netdev_set_num_tc, later accesses to
   new_dev_maps in netif_set_xps_queue could lead to accessing memory
   outside of new_dev_maps; triggering an oops.

2. Calling netif_set_xps_queue while netdev_set_num_tc is running:

   2.1. netdev_set_num_tc starts by resetting the xps queues,
        dev->tc_num isn't updated yet.

   2.2. netif_set_xps_queue is called, setting up the map with the
        *old* dev->num_tc.

   2.3. netdev_set_num_tc updates dev->tc_num.

   2.4. Later accesses to the map lead to out of bound accesses and
        oops.

   A similar issue can be found with netdev_reset_tc.

One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_rxqs in a concurrent thread. With the right timing an oops is
triggered.

Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_rxqs_store.

Fixes: 8af2c06ff4 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:13 +01:00