redonkable/alistair23-linux

Author	SHA1	Message	Date
Andrew Lunn	fd55199d3b	net: ethtool: cabletest: Make ethnl_act_cable_test_tdr_cfg static kbuild test robot is reporting: net/ethtool/cabletest.c:230:5: warning: no previous prototype for Mark the function as static. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 17:28:30 -07:00
YueHaibing	8298a419a0	tipc: remove set but not used variable 'prev' Fixes gcc '-Wunused-but-set-variable' warning: net/tipc/msg.c: In function 'tipc_msg_append': net/tipc/msg.c:215:24: warning: variable 'prev' set but not used [-Wunused-but-set-variable] commit `0a3e060f34` ("tipc: add test for Nagle algorithm effectiveness") left behind this, remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 16:59:04 -07:00
Hangbin Liu	96d10d5b19	neigh: fix ARP retransmit timer guard In commit `19e16d220f` ("neigh: support smaller retrans_time settting") we add more accurate control for ARP and NS. But for ARP I forgot to update the latest guard in neigh_timer_handler(), then the next retransmit would be reset to jiffies + HZ/2 if we set the retrans_time less than 500ms. Fix it by setting the time_before() check to HZ/100. IPv6 does not have this issue. Reported-by: Jianwen Ji <jiji@redhat.com> Fixes: `19e16d220f` ("neigh: support smaller retrans_time settting") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 16:56:53 -07:00
Vladimir Oltean	04198499b2	net: dsa: tag_8021q: stop restoring VLANs from bridge Right now, our only tag_8021q user, sja1105, has the ability to restore bridge VLANs on its own, so this logic is unnecessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 16:45:46 -07:00
David S. Miller	f9e0ce3ddc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2020-05-29 The following pull-request contains BPF updates for your net tree. We've added 6 non-merge commits during the last 7 day(s) which contain a total of 4 files changed, 55 insertions(+), 34 deletions(-). The main changes are: 1) minor verifier fix for fmod_ret progs, from Alexei. 2) af_xdp overflow check, from Bjorn. 3) minor verifier fix for 32bit assignment, from John. 4) powerpc has non-overlapping addr space, from Petr. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 15:59:08 -07:00
Christoph Hellwig	5a892ff2fa	net: remove kernel_setsockopt No users left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 13:10:39 -07:00
Christoph Hellwig	c0425a4249	net: add a new bind_add method The SCTP protocol allows to bind multiple address to a socket. That feature is currently only exposed as a socket option. Add a bind_add method struct proto that allows to bind additional addresses, and switch the dlm code to use the method instead of going through the socket option from kernel space. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 13:10:39 -07:00
Christoph Hellwig	05bfd36614	sctp: refactor sctp_setsockopt_bindx Split out a sctp_setsockopt_bindx_kernel that takes a kernel pointer to the sockaddr and make sctp_setsockopt_bindx a small wrapper around it. This prepares for adding a new bind_add proto op. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 13:10:39 -07:00
David S. Miller	942110fdf2	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-05-29 1) Several fixes for ESP gro/gso in transport and beet mode when IPv6 extension headers are present. From Xin Long. 2) Fix a wrong comment on XFRMA_OFFLOAD_DEV. From Antony Antony. 3) Fix sk_destruct callback handling on ESP in TCP encapsulation. From Sabrina Dubroca. 4) Fix a use after free in xfrm_output_gso when used with vxlan. From Xin Long. 5) Fix secpath handling of VTI when used wiuth IPCOMP. From Xin Long. 6) Fix an oops when deleting a x-netns xfrm interface. From Nicolas Dichtel. 7) Fix a possible warning on policy updates. We had a case where it was possible to add two policies with the same lookup keys. From Xin Long. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 13:05:56 -07:00
David S. Miller	f26e9b2c0b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-05-29 1) Add IPv6 encapsulation support for ESP over UDP and TCP. From Sabrina Dubroca. 2) Remove unneeded reference when initializing xfrm interfaces. From Nicolas Dichtel. 3) Remove some indirect calls from the state_afinfo. From Florian Westphal. Please note that this pull request has two merge conflicts between commit: `0c922a4850` ("xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish") from Linus' tree and commit: `2ab6096db2` ("xfrm: remove output_finish indirection from xfrm_state_afinfo") from the ipsec-next tree. and between commit: `3986912f6a` ("ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctl") from the net-next tree and commit: `0146dca70b` ("xfrm: add support for UDPv6 encapsulation of ESP") from the ipsec-next tree. Both conflicts can be resolved as done in linux-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-29 13:02:33 -07:00
Sebastian Andrzej Siewior	e6da0edc24	Bluetooth: Acquire sk_lock.slock without disabling interrupts There was a lockdep which led to commit `fad003b6c8` ("Bluetooth: Fix inconsistent lock state with RFCOMM") Lockdep noticed that `sk->sk_lock.slock' was acquired without disabling the softirq while the lock was also used in softirq context. Unfortunately the solution back then was to disable interrupts before acquiring the lock which however made lockdep happy. It would have been enough to simply disable the softirq. Disabling interrupts before acquiring a spinlock_t is not allowed on PREEMPT_RT because these locks are converted to 'sleeping' spinlocks. Use spin_lock_bh() in order to acquire the `sk_lock.slock'. Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Reported-by: kbuild test robot <lkp@intel.com> [missing unlock] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-29 13:48:46 +02:00
Xin Long	f6a23d85d0	xfrm: fix a NULL-ptr deref in xfrm_local_error This patch is to fix a crash: [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access [ ] general protection fault: 0000 [#1] SMP KASAN PTI [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0 [ ] Call Trace: [ ] xfrm6_local_error+0x1eb/0x300 [ ] xfrm_local_error+0x95/0x130 [ ] __xfrm6_output+0x65f/0xb50 [ ] xfrm6_output+0x106/0x46f [ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel] [ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan] [ ] vxlan_xmit+0x6a0/0x4276 [vxlan] [ ] dev_hard_start_xmit+0x165/0x820 [ ] __dev_queue_xmit+0x1ff0/0x2b90 [ ] ip_finish_output2+0xd3e/0x1480 [ ] ip_do_fragment+0x182d/0x2210 [ ] ip_output+0x1d0/0x510 [ ] ip_send_skb+0x37/0xa0 [ ] raw_sendmsg+0x1b4c/0x2b80 [ ] sock_sendmsg+0xc0/0x110 This occurred when sending a v4 skb over vxlan6 over ipsec, in which case skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries to get ipv6 info from a ipv4 sk. This issue was actually fixed by Commit `628e341f31` ("xfrm: make local error reporting more robust"), but brought back by Commit `844d48746e` ("xfrm: choose protocol family by skb protocol"). So to fix it, we should call xfrm6_local_error() only when skb->protocol is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6. Fixes: `844d48746e` ("xfrm: choose protocol family by skb protocol") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-05-29 12:10:22 +02:00
Xiyu Yang	037e910b52	SUNRPC: Remove unreachable error condition in rpcb_getport_async() rpcb_getport_async() invokes rpcb_call_async(), which return the value of rpc_run_task() to "child". Since rpc_run_task() is impossible to return an ERR pointer, there is no need to add the IS_ERR() condition on "child" here. So we need to remove it. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-28 18:15:00 -04:00
NeilBrown	24c5efe41c	sunrpc: clean up properly in gss_mech_unregister() gss_mech_register() calls svcauth_gss_register_pseudoflavor() for each flavour, but gss_mech_unregister() does not call auth_domain_put(). This is unbalanced and makes it impossible to reload the module. Change svcauth_gss_register_pseudoflavor() to return the registered auth_domain, and save it for later release. Cc: stable@vger.kernel.org (v2.6.12+) Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651 Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-28 18:15:00 -04:00
NeilBrown	d47a5dc288	sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations. There is no valid case for supporting duplicate pseudoflavor registrations. Currently the silent acceptance of such registrations is hiding a bug. The rpcsec_gss_krb5 module registers 2 flavours but does not unregister them, so if you load, unload, reload the module, it will happily continue to use the old registration which now has pointers to the memory were the module was originally loaded. This could lead to unexpected results. So disallow duplicate registrations. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651 Cc: stable@vger.kernel.org (v2.6.12+) Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-28 18:15:00 -04:00
NeilBrown	f45db2b909	sunrpc: check that domain table is empty at module unload. The domain table should be empty at module unload. If it isn't there is a bug somewhere. So check and report. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651 Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-28 18:15:00 -04:00
Jonas Falkevik	45ebf73ebc	sctp: check assoc before SCTP_ADDR_{MADE_PRIM, ADDED} event Make sure SCTP_ADDR_{MADE_PRIM,ADDED} are sent only for associations that have been established. These events are described in rfc6458#section-6.1 SCTP_PEER_ADDR_CHANGE: This tag indicates that an address that is part of an existing association has experienced a change of state (e.g., a failure or return to service of the reachability of an endpoint via a specific transport address). Signed-off-by: Jonas Falkevik <jonas.falkevik@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 12:47:02 -07:00
Christoph Hellwig	095ae61253	tipc: call tsk_set_importance from tipc_topsrv_create_listener Avoid using kernel_setsockopt for the TIPC_IMPORTANCE option when we can just use the internal helper. The only change needed is to pass a struct sock instead of tipc_sock, which is private to socket.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:46 -07:00
Christoph Hellwig	298cd88a66	rxrpc: add rxrpc_sock_set_min_security_level Add a helper to directly set the RXRPC_MIN_SECURITY_LEVEL sockopt from kernel space without going through a fake uaccess. Thanks to David Howells for the documentation updates. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:46 -07:00
Christoph Hellwig	7d7207c2d5	ipv6: add ip6_sock_set_recvpktinfo Add a helper to directly set the IPV6_RECVPKTINFO sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:46 -07:00
Christoph Hellwig	18d5ad6232	ipv6: add ip6_sock_set_addr_preferences Add a helper to directly set the IPV6_ADD_PREFERENCES sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:46 -07:00
Christoph Hellwig	fce934949c	ipv6: add ip6_sock_set_recverr Add a helper to directly set the IPV6_RECVERR sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	9b115749ac	ipv6: add ip6_sock_set_v6only Add a helper to directly set the IPV6_V6ONLY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	c1f9ec5776	ipv4: add ip_sock_set_pktinfo Add a helper to directly set the IP_PKTINFO sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	2de569bda2	ipv4: add ip_sock_set_mtu_discover Add a helper to directly set the IP_MTU_DISCOVER sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> [rxrpc bits] Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	db45c0ef25	ipv4: add ip_sock_set_recverr Add a helper to directly set the IP_RECVERR sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	c4e446bf5a	ipv4: add ip_sock_set_freebind Add a helper to directly set the IP_FREEBIND sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	6ebf71bab9	ipv4: add ip_sock_set_tos Add a helper to directly set the IP_TOS sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	480aeb9639	tcp: add tcp_sock_set_keepcnt Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	d41ecaac90	tcp: add tcp_sock_set_keepintvl Add a helper to directly set the TCP_KEEPINTVL sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	71c48eb81c	tcp: add tcp_sock_set_keepidle Add a helper to directly set the TCP_KEEP_IDLE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	c488aeadcb	tcp: add tcp_sock_set_user_timeout Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	557eadfcc5	tcp: add tcp_sock_set_syncnt Add a helper to directly set the TCP_SYNCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	ddd061b8da	tcp: add tcp_sock_set_quickack Add a helper to directly set the TCP_QUICKACK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	12abc5ee78	tcp: add tcp_sock_set_nodelay Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	db10538a4b	tcp: add tcp_sock_set_cork Add a helper to directly set the TCP_CORK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	fe31a326a4	net: add sock_set_reuseport Add a helper to directly set the SO_REUSEPORT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:45 -07:00
Christoph Hellwig	26cfabf9cd	net: add sock_set_rcvbuf Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Christoph Hellwig	ce3d9544ce	net: add sock_set_keepalive Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Christoph Hellwig	783da70e83	net: add sock_enable_timestamps Add a helper to directly enable timestamps instead of setting the SO_TIMESTAMP* sockopts from kernel space and going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Christoph Hellwig	7594888c78	net: add sock_bindtoindex Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Christoph Hellwig	76ee0785f4	net: add sock_set_sndtimeo Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel space without going through a fake uaccess. The interface is simplified to only pass the seconds value, as that is the only thing needed at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Christoph Hellwig	6e43496745	net: add sock_set_priority Add a helper to directly set the SO_PRIORITY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Christoph Hellwig	c433594c07	net: add sock_no_linger Add a helper to directly set the SO_LINGER sockopt from kernel space with onoff set to true and a linger time of 0 without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Christoph Hellwig	b58f0e8f38	net: add sock_set_reuseaddr Add a helper to directly set the SO_REUSEADDR sockopt from kernel space without going through a fake uaccess. For this the iscsi target now has to formally depend on inet to avoid a mostly theoretical compile failure. For actual operation it already did depend on having ipv4 or ipv6 support. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:11:44 -07:00
Eric Dumazet	d29245692a	tcp: ipv6: support RFC 6069 (TCP-LD) Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:02:46 -07:00
David S. Miller	2200313aa0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Uninitialized when used in __nf_conntrack_update(), from Nathan Chancellor. 2) Comparison of unsigned expression in nf_confirm_cthelper(). 3) Remove 'const' type qualifier with no effect. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 10:54:18 -07:00
Markus Theil	a7528198ad	mac80211: support control port TX status reporting Add support for TX status reporting for the control port TX API; this will be used by hostapd when it moves to the control port TX API. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200527160334.19224-1-markus.theil@tu-ilmenau.de [fix commit message, it was referring to nl80211] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-28 09:02:14 +02:00
Christoph Hellwig	7a15b2e013	net: remove kernel_getsockopt No users left. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:11:33 -07:00
Vladimir Oltean	2b86cb8299	net: dsa: declare lockless TX feature for slave ports Be there a platform with the following layout: Regular NIC \| +----> DSA master for switch port \| +----> DSA master for another switch port After changing DSA back to static lockdep class keys in commit `1a33e10e4a` ("net: partially revert dynamic lockdep key changes"), this kernel splat can be seen: [ 13.361198] ============================================ [ 13.366524] WARNING: possible recursive locking detected [ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted [ 13.377874] -------------------------------------------- [ 13.383201] swapper/0/0 is trying to acquire lock: [ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.397879] [ 13.397879] but task is already holding lock: [ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.413593] [ 13.413593] other info that might help us debug this: [ 13.420140] Possible unsafe locking scenario: [ 13.420140] [ 13.426075] CPU0 [ 13.428523] ---- [ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.440924] [ 13.440924] * DEADLOCK * [ 13.440924] [ 13.446860] May be due to missing lock nesting notation [ 13.446860] [ 13.453668] 6 locks held by swapper/0/0: [ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 [ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560 [ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10 [ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.512000] [ 13.512000] stack backtrace: [ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 [ 13.530421] Call trace: [ 13.532871] dump_backtrace+0x0/0x1d8 [ 13.536539] show_stack+0x24/0x30 [ 13.539862] dump_stack+0xe8/0x150 [ 13.543271] __lock_acquire+0x1030/0x1678 [ 13.547290] lock_acquire+0xf8/0x458 [ 13.550873] _raw_spin_lock+0x44/0x58 [ 13.554543] __dev_queue_xmit+0x84c/0xbe0 [ 13.558562] dev_queue_xmit+0x24/0x30 [ 13.562232] dsa_slave_xmit+0xe0/0x128 [ 13.565988] dev_hard_start_xmit+0xf4/0x448 [ 13.570182] __dev_queue_xmit+0x808/0xbe0 [ 13.574200] dev_queue_xmit+0x24/0x30 [ 13.577869] neigh_resolve_output+0x15c/0x220 [ 13.582237] ip6_finish_output2+0x244/0xb10 [ 13.586430] __ip6_finish_output+0x1dc/0x298 [ 13.590709] ip6_output+0x84/0x358 [ 13.594116] mld_sendpack+0x2bc/0x560 [ 13.597786] mld_ifc_timer_expire+0x210/0x390 [ 13.602153] call_timer_fn+0xcc/0x400 [ 13.605822] run_timer_softirq+0x588/0x6e0 [ 13.609927] __do_softirq+0x118/0x590 [ 13.613597] irq_exit+0x13c/0x148 [ 13.616918] __handle_domain_irq+0x6c/0xc0 [ 13.621023] gic_handle_irq+0x6c/0x160 [ 13.624779] el1_irq+0xbc/0x180 [ 13.627927] cpuidle_enter_state+0xb4/0x4d0 [ 13.632120] cpuidle_enter+0x3c/0x50 [ 13.635703] call_cpuidle+0x44/0x78 [ 13.639199] do_idle+0x228/0x2c8 [ 13.642433] cpu_startup_entry+0x2c/0x48 [ 13.646363] rest_init+0x1ac/0x280 [ 13.649773] arch_call_rest_init+0x14/0x1c [ 13.653878] start_kernel+0x490/0x4bc Lockdep keys themselves were added in commit `ab92d68fc2` ("net: core: add generic lockdep keys"), and it's very likely that this splat existed since then, but I have no real way to check, since this stacked platform wasn't supported by mainline back then. >From Taehee's own words: This patch was considered that all stackable devices have LLTX flag. But the dsa doesn't have LLTX, so this splat happened. After this patch, dsa shares the same lockdep class key. On the nested dsa interface architecture, which you illustrated, the same lockdep class key will be used in __dev_queue_xmit() because dsa doesn't have LLTX. So that lockdep detects deadlock because the same lockdep class key is used recursively although actually the different locks are used. There are some ways to fix this problem. 1. using NETIF_F_LLTX flag. If possible, using the LLTX flag is a very clear way for it. But I'm so sorry I don't know whether the dsa could have LLTX or not. 2. using dynamic lockdep again. It means that each interface uses a separate lockdep class key. So, lockdep will not detect recursive locking. But this way has a problem that it could consume lockdep class key too many. Currently, lockdep can have 8192 lockdep class keys. - you can see this number with the following command. cat /proc/lockdep_stats lock-classes: 1251 [max: 8192] ... The [max: 8192] means that the maximum number of lockdep class keys. If too many lockdep class keys are registered, lockdep stops to work. So, using a dynamic(separated) lockdep class key should be considered carefully. In addition, updating lockdep class key routine might have to be existing. (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key()) 3. Using lockdep subclass. A lockdep class key could have 8 subclasses. The different subclass is considered different locks by lockdep infrastructure. But "lock-classes" is not counted by subclasses. So, it could avoid stopping lockdep infrastructure by an overflow of lockdep class keys. This approach should also have an updating lockdep class key routine. (lockdep_set_subclass()) 4. Using nonvalidate lockdep class key. The lockdep infrastructure supports nonvalidate lockdep class key type. It means this lockdep is not validated by lockdep infrastructure. So, the splat will not happen but lockdep couldn't detect real deadlock case because lockdep really doesn't validate it. I think this should be used for really special cases. (lockdep_set_novalidate_class()) Further discussion here: https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/ There appears to be no negative side-effect to declaring lockless TX for the DSA virtual interfaces, which means they handle their own locking. So that's what we do to make the splat go away. Patch tested in a wide variety of cases: unicast, multicast, PTP, etc. Fixes: `ab92d68fc2` ("net: core: add generic lockdep keys") Suggested-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:09:28 -07:00
Jonas Falkevik	50ce4c099b	sctp: fix typo sctp_ulpevent_nofity_peer_addr_change change typo in function name "nofity" to "notify" sctp_ulpevent_nofity_peer_addr_change -> sctp_ulpevent_notify_peer_addr_change Signed-off-by: Jonas Falkevik <jonas.falkevik@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:08:02 -07:00
Tariq Toukan	b3ae2459f8	net/tls: Add force_resync for driver resync This patch adds a field to the tls rx offload context which enables drivers to force a send_resync call. This field can be used by drivers to request a resync at the next possible tls record. It is beneficial for hardware that provides the resync sequence number asynchronously. In such cases, the packet that triggered the resync does not contain the information required for a resync. Instead, the driver requests resync for all the following TLS record until the asynchronous notification with the resync request TCP sequence arrives. A following series for mlx5e ConnectX-6DX TLS RX offload support will use this mechanism. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:07:13 -07:00
Cong Wang	759ae57f1b	net_sched: get rid of unnecessary dev_qdisc_reset() Resetting old qdisc on dev_queue->qdisc_sleeping in dev_qdisc_reset() is redundant, because this qdisc, even if not same with dev_queue->qdisc, is reset via qdisc_put() right after calling dev_graft_qdisc() when hitting refcnt 0. This is very easy to observe with qdisc_reset() tracepoint and stack traces. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:05:49 -07:00
Cong Wang	70f5096533	net_sched: avoid resetting active qdisc for multiple times Except for sch_mq and sch_mqprio, each dev queue points to the same root qdisc, so when we reset the dev queues with netdev_for_each_tx_queue() we end up resetting the same instance of the root qdisc for multiple times. Avoid this by checking the __QDISC_STATE_DEACTIVATED bit in each iteration, so for sch_mq/sch_mqprio, we still reset all of them like before, for the rest, we only reset it once. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:05:49 -07:00
Cong Wang	f5a7833e83	net_sched: add a tracepoint for qdisc creation With this tracepoint, we could know when qdisc's are created, especially those default qdisc's. Sample output: tc-736 [001] ...1 56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0 tc-736 [001] ...1 56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff tc-738 [001] ...1 56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100 tc-739 [001] ...1 56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200 tc-740 [001] ...1 56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100 tc-741 [001] ...1 56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200 tc-745 [000] .N.1 111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:05:49 -07:00
Cong Wang	a34dac0b90	net_sched: add tracepoints for qdisc_reset() and qdisc_destroy() Add two tracepoints for qdisc_reset() and qdisc_destroy() to track qdisc resetting and destroying. Sample output: tc-756 [000] ...3 138.355662: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355720: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355867: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355930: qdisc_destroy: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-757 [000] ...2 143.073780: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.073878: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074114: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074228: qdisc_destroy: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:05:49 -07:00
Cong Wang	4909daba37	net_sched: use qdisc_reset() in qdisc_destroy() qdisc_destroy() calls ops->reset() and cleans up qdisc->gso_skb and qdisc->skb_bad_txq, these are nearly same with qdisc_reset(), so just call it directly, and cosolidate the code for the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 15:05:49 -07:00
Eric Dumazet	a12daf13a4	tcp: rename tcp_v4_err() skb parameter This essentially reverts `4d1a2d9ec1` ("Revert Backoff [v3]: Rename skb to icmp_skb in tcp_v4_err()") Now we have tcp_ld_RTO_revert() helper, we can use the usual name for sk_buff parameter, so that tcp_v4_err() and tcp_v6_err() use similar names. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 14:57:27 -07:00
Eric Dumazet	f745664257	tcp: add tcp_ld_RTO_revert() helper RFC 6069 logic has been implemented for IPv4 only so far, right in the middle of tcp_v4_err() and was error prone. Move this code to one helper, to make tcp_v4_err() more readable and to eventually expand RFC 6069 to IPv6 in the future. Also perform sock_owned_by_user() check a bit sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 14:57:27 -07:00
Pablo Neira Ayuso	5b6743fb2c	netfilter: nf_tables: skip flowtable hooknum and priority on device updates On device updates, the hooknum and priority attributes are not required. This patch makes optional these two netlink attributes. Moreover, bail out with EOPNOTSUPP if userspace tries to update the hooknum and priority for existing flowtables. While at this, turn EINVAL into EOPNOTSUPP in case the hooknum is not ingress. EINVAL is reserved for missing netlink attribute / malformed netlink messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:35 +02:00
Pablo Neira Ayuso	05abe4456f	netfilter: nf_tables: allow to register flowtable with no devices A flowtable might be composed of dynamic interfaces only. Such dynamic interfaces might show up at a later stage. This patch allows users to register a flowtable with no devices. Once the dynamic interface becomes available, the user adds the dynamic devices to the flowtable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Pablo Neira Ayuso	abadb2f865	netfilter: nf_tables: delete devices from flowtable This patch allows users to delete devices from existing flowtables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Pablo Neira Ayuso	78d9f48f7f	netfilter: nf_tables: add devices to existing flowtable This patch allows users to add devices to an existing flowtable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Pablo Neira Ayuso	c42d8bda69	netfilter: nf_tables: pass hook list to flowtable event notifier Update the flowtable netlink notifier to take the list of hooks as input. This allows to reuse this function in incremental flowtable hook updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Pablo Neira Ayuso	389a2cbcb7	netfilter: nf_tables: add nft_flowtable_hooks_destroy() This patch adds a helper function destroy the flowtable hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Pablo Neira Ayuso	f9382669cf	netfilter: nf_tables: pass hook list to nft_{un,}register_flowtable_net_hooks() This patch prepares for incremental flowtable hook updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Pablo Neira Ayuso	d9246a5375	netfilter: nf_tables: generalise flowtable hook parsing Update nft_flowtable_parse_hook() to take the flowtable hook list as parameter. This allows to reuse this function to update the hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Romain Bellan	cb8aa9a3af	netfilter: ctnetlink: add kernel side filtering for dump Conntrack dump does not support kernel side filtering (only get exists, but it returns only one entry. And user has to give a full valid tuple) It means that userspace has to implement filtering after receiving many irrelevant entries, consuming resources (conntrack table is sometimes very huge, much more than a routing table for example). This patch adds filtering in kernel side. To achieve this goal, we: * Add a new CTA_FILTER netlink attributes, actually a flag list to parametize filtering * Convert some nlattr_to_tuple() functions, to allow a partial parsing of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not fully set) Filtering is now possible on: IP SRC/DST values * Ports for TCP and UDP flows * IMCP(v6) codes types and IDs Filtering is done as an "AND" operator. For example, when flags PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all values are dumped. Changes since v1: Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered Changes since v2: Move several constants to nf_internals.h Move a fix on netlink values check in a separate patch Add a check on not-supported flags Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack (not yet implemented) Code style issues Changes since v3: Fix compilation warning reported by kbuild test robot Changes since v4: Fix a regression introduced in v3 (returned EINVAL for valid netlink messages without CTA_MARK) Changes since v5: Change definition of CTA_FILTER_F_ALL Fix a regression when CTA_TUPLE_ZONE is not set Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 22:20:34 +02:00
Leon Romanovsky	8094ba0ace	RDMA/cma: Provide ECE reject reason IBTA declares "vendor option not supported" reject reason in REJ messages if passive side doesn't want to accept proposed ECE options. Due to the fact that ECE is managed by userspace, there is a need to let users to provide such rejected reason. Link: https://lore.kernel.org/r/20200526103304.196371-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-27 16:05:05 -03:00
Arnd Bergmann	b3b6a84c6a	bridge: multicast: work around clang bug Clang-10 and clang-11 run into a corner case of the register allocator on 32-bit ARM, leading to excessive stack usage from register spilling: net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=] Work around this by marking one of the internal functions as noinline_for_stack. Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:34:48 -07:00
Horatiu Vultur	20f6a05ef6	bridge: mrp: Rework the MRP netlink interface This patch reworks the MRP netlink interface. Before, each attribute represented a binary structure which made it hard to be extended. Therefore update the MRP netlink interface such that each existing attribute to be a nested attribute which contains the fields of the binary structures. In this way the MRP netlink interface can be extended without breaking the backwards compatibility. It is also using strict checking for attributes under the MRP top attribute. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:30:43 -07:00
Nathan Chancellor	d8e79f1dbc	nexthop: Fix type of event_type in call_nexthop_notifiers Clang warns: net/ipv4/nexthop.c:841:30: warning: implicit conversion from enumeration type 'enum nexthop_event_type' to different enumeration type 'enum fib_event_type' [-Wenum-conversion] call_nexthop_notifiers(net, NEXTHOP_EVENT_DEL, nh); ~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~ 1 warning generated. Use the right type for event_type so that clang does not warn. Fixes: `8590ceedb7` ("nexthop: add support for notifiers") Link: https://github.com/ClangBuiltLinux/linux/issues/1038 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:22:37 -07:00
Stefano Garzarella	7e0afbdfd1	vsock: fix timeout in vsock_accept() The accept(2) is an "input" socket interface, so we should use SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout. So this patch replace sock_sndtimeo() with sock_rcvtimeo() to use the right timeout in the vsock_accept(). Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:20:23 -07:00
Davide Caratti	bb2f930d6d	net/sched: fix infinite loop in sch_fq_pie this command hangs forever: # tc qdisc add dev eth0 root fq_pie flows 65536 watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028] [...] CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167 RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie] Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48 RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590 RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699 R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0 R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8 FS: 00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0 Call Trace: qdisc_create+0x3fd/0xeb0 tc_modify_qdisc+0x3be/0x14a0 rtnetlink_rcv_msg+0x5f3/0x920 netlink_rcv_skb+0x121/0x350 netlink_unicast+0x439/0x630 netlink_sendmsg+0x714/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5b4/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x9a/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 we can't accept 65536 as a valid number for 'nflows', because the loop on 'idx' in fq_pie_init() will never end. The extack message is correct, but it doesn't say that 0 is not a valid number for 'flows': while at it, fix this also. Add a tdc selftest to check correct validation of 'flows'. CC: Ivan Vecera <ivecera@redhat.com> Fixes: `ec97ecf1eb` ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:11:18 -07:00
Fedor Tokarev	118917d696	net: sunrpc: Fix off-by-one issues in 'rpc_ntop6' Fix off-by-one issues in 'rpc_ntop6': - 'snprintf' returns the number of characters which would have been written if enough space had been available, excluding the terminating null byte. Thus, a return value of 'sizeof(scopebuf)' means that the last character was dropped. - 'strcat' adds a terminating null byte to the string, thus if len == buflen, the null byte is written past the end of the buffer. Signed-off-by: Fedor Tokarev <ftokarev@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-05-27 10:08:26 -04:00
Pablo Neira Ayuso	4946ea5c12	netfilter: nf_conntrack_pptp: fix compilation warning with W=1 build >> include/linux/netfilter/nf_conntrack_pptp.h:13:20: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers] extern const char *const pptp_msg_name(u_int16_t msg); ^~~~~~ Reported-by: kbuild test robot <lkp@intel.com> Fixes: `4c559f15ef` ("netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 13:39:08 +02:00
Pablo Neira Ayuso	94945ad2b3	netfilter: conntrack: comparison of unsigned in cthelper confirmation net/netfilter/nf_conntrack_core.c: In function nf_confirm_cthelper: net/netfilter/nf_conntrack_core.c:2117:15: warning: comparison of unsigned expression in < 0 is always false [-Wtype-limits] 2117 \| if (protoff < 0 \|\| (frag_off & htons(~0x7)) != 0) \| ^ ipv6_skip_exthdr() returns a signed integer. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: `703acd70f2` ("netfilter: nfnetlink_cthelper: unbreak userspace helper support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 13:39:06 +02:00
Nathan Chancellor	46c1e0621a	netfilter: conntrack: Pass value of ctinfo to __nf_conntrack_update Clang warns: net/netfilter/nf_conntrack_core.c:2068:21: warning: variable 'ctinfo' is uninitialized when used here [-Wuninitialized] nf_ct_set(skb, ct, ctinfo); ^~~~~~ net/netfilter/nf_conntrack_core.c:2024:2: note: variable 'ctinfo' is declared here enum ip_conntrack_info ctinfo; ^ 1 warning generated. nf_conntrack_update was split up into nf_conntrack_update and __nf_conntrack_update, where the assignment of ctinfo is in nf_conntrack_update but it is used in __nf_conntrack_update. Pass the value of ctinfo from nf_conntrack_update to __nf_conntrack_update so that uninitialized memory is not used and everything works properly. Fixes: `ee04805ff5` ("netfilter: conntrack: make conntrack userspace helpers work again") Link: https://github.com/ClangBuiltLinux/linux/issues/1039 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-27 13:38:58 +02:00
Jerry Lee	890bd0f899	libceph: ignore pool overlay and cache logic on redirects OSD client should ignore cache/overlay flag if got redirect reply. Otherwise, the client hangs when the cache tier is in forward mode. [ idryomov: Redirects are effectively deprecated and no longer used or tested. The original tiering modes based on redirects are inherently flawed because redirects can race and reorder, potentially resulting in data corruption. The new proxy and readproxy tiering modes should be used instead of forward and readforward. Still marking for stable as obviously correct, though. ] Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/23296 URL: https://tracker.ceph.com/issues/36406 Signed-off-by: Jerry Lee <leisurelysw24@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-05-27 12:43:35 +02:00
Johannes Berg	c112992433	mac80211: fix HT-Control field reception for management frames If we receive management frames with an HT-Control field, we cannot parse them properly, as we assume a fixed length management header. Since we don't even need the HTC field (for these frames, or really at all), just remove it at the beginning of RX. Reported-by: Haggai Abramovsky <haggai.abramovsky@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200526143346.cf5ce70521c5.I333251a084ec4cfe67b7ef7efe2d2f1a33883931@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:03:27 +02:00
Patrick Steinhardt	a3b018febc	cfg80211: fix CFG82011_CRDA_SUPPORT still mentioning internal regdb Back with commit `c8c240e284` (cfg80211: reg: remove support for built-in regdb, 2015-10-15), support for using CFG80211_INTERNAL_REGDB was removed in favor of loading the regulatory database as firmware file. The documentation of CFG80211_CRDA_SUPPORT was not adjusted, though, which is why it still mentions mentions the old way of loading via the internal regulatory database. Remove it so that the kernel option only mentions using the firmware file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Link: https://lore.kernel.org/r/c56e60207fbd0512029de8c6276ee00f73491924.1589732954.git.ps@pks.im Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:03:27 +02:00
Tamizh Chelvam	9a5f648862	nl80211: Add support to configure TID specific Tx rate configuration This patch adds support to configure per TID Tx Rate configuration through NL80211_TID_CONFIG_ATTR_TX_RATE* attributes. And it uses nl80211_parse_tx_bitrate_mask api to validate the Tx rate mask. Signed-off-by: Tamizh Chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1589357504-10175-1-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:03:25 +02:00
Johannes Berg	1ea02224af	mac80211: allow SA-QUERY processing in userspace As discussed with Mathy almost two years ago in http://lore.kernel.org/r/20180806224857.14853-1-Mathy.Vanhoef@cs.kuleuven.be we should let userspace process SA-QUERY frames if it wants to, so that it can handle OCV (operating channel validation) which mac80211 doesn't know how to. Evidently I had been expecting Mathy to (re)send such a patch, but he never did, perhaps expecting me to do it after our discussion. In any case, this came up now with OCV getting more attention, so move the code around as discussed there to let userspace handle it, and do it properly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200526103131.1f9cf7e5b6db.Iae5b42b09ad2b1cbcbe13492002c43f0d1d51dfc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:04 +02:00
Markus Theil	dca9ca2d58	nl80211: add ability to report TX status for control port TX This adds the necessary capabilities in nl80211 to allow drivers to assign a cookie to control port TX frames (returned via extack in the netlink ACK message of the command) and then later report the frame's status. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200508144202.7678-2-markus.theil@tu-ilmenau.de [use extack cookie instead of explicit message, recombine patches] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:04 +02:00
Gustavo A. R. Silva	3c23215ba8	mac80211: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200507185907.GA15102@embeddedor Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:04 +02:00
Gustavo A. R. Silva	396fba0a59	cfg80211: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200507183909.GA12993@embeddedor Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:03 +02:00
Thomas Pedersen	2032f3b2f9	nl80211: support scan frequencies in KHz If the driver advertises NL80211_EXT_FEATURE_SCAN_FREQ_KHZ userspace can omit NL80211_ATTR_SCAN_FREQUENCIES in favor of an NL80211_ATTR_SCAN_FREQ_KHZ. To get scan results in KHz userspace must also set the NL80211_SCAN_FLAG_FREQ_KHZ. This lets nl80211 remain compatible with older userspaces while not requring and sending redundant (and potentially incorrect) scan frequency sets. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200430172554.18383-4-thomas@adapt-ip.com [use just nla_nest_start() (not _noflag) for NL80211_ATTR_SCAN_FREQ_KHZ] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:03 +02:00
Thomas Pedersen	942ba88ba9	nl80211: add KHz frequency offset for most wifi commands cfg80211 recently gained the ability to understand a frequency offset component in KHz. Expose this in nl80211 through the new attributes NL80211_ATTR_WIPHY_FREQ_OFFSET, NL80211_FREQUENCY_ATTR_OFFSET, NL80211_ATTR_CENTER_FREQ1_OFFSET, and NL80211_BSS_FREQUENCY_OFFSET. These add support to send and receive a KHz offset component with the following NL80211 commands: - NL80211_CMD_FRAME - NL80211_CMD_GET_SCAN - NL80211_CMD_AUTHENTICATE - NL80211_CMD_ASSOCIATE - NL80211_CMD_CONNECT Along with any other command which takes a chandef, ie: - NL80211_CMD_SET_CHANNEL - NL80211_CMD_SET_WIPHY - NL80211_CMD_START_AP - NL80211_CMD_RADAR_DETECT - NL80211_CMD_NOTIFY_RADAR - NL80211_CMD_CHANNEL_SWITCH - NL80211_JOIN_IBSS - NL80211_CMD_REMAIN_ON_CHANNEL - NL80211_CMD_JOIN_OCB - NL80211_CMD_JOIN_MESH - NL80211_CMD_TDLS_CHANNEL_SWITCH If the driver advertises a band containing channels with frequency offset, it must also verify support for frequency offset channels in its cfg80211 ops, or return an error. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200430172554.18383-3-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:02 +02:00
Thomas Pedersen	e76fede8bf	cfg80211: add KHz variants of frame RX API Drivers may wish to report the RX frequency in units of KHz. Provide cfg80211_rx_mgmt_khz() and wrap it with cfg80211_rx_mgmt() so exisiting drivers which can't report KHz anyway don't need to change. Add a similar wrapper for cfg80211_report_obss_beacon() so the frequency units stay somewhat consistent. This doesn't actually change the nl80211 API yet. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200430172554.18383-2-thomas@adapt-ip.com [fix mac80211 calling the non-khz version of obss beacon report, drop trace point name changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:02 +02:00
Sergey Matyukevich	c03369558c	nl80211: simplify peer specific TID configuration Current rule for applying TID configuration for specific peer looks overly complicated. No need to reject new TID configuration when override flag is specified. Another call with the same TID configuration, but without override flag, allows to apply new configuration anyway. Use the same approach as for the 'all peers' case: if override flag is specified, then reset existing TID configuration and immediately apply a new one. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20200424112905.26770-5-sergey.matyukevich.os@quantenna.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:02 +02:00
Sergey Matyukevich	33462e6823	cfg80211: add support for TID specific AMSDU configuration This patch adds support to control per TID MSDU aggregation using the NL80211_TID_CONFIG_ATTR_AMSDU_CTRL attribute. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20200424112905.26770-4-sergey.matyukevich.os@quantenna.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:01 +02:00
Sergey Matyukevich	60c2ef0ef0	mac80211: fix variable names in TID config methods Fix all variable names from 'tid' to 'tids' to avoid confusion. Now this is not TID number, but TID mask. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20200424112905.26770-3-sergey.matyukevich.os@quantenna.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-27 10:02:01 +02:00
Andrew Lunn	f2bc8ad31a	net: ethtool: Allow PHY cable test TDR data to configured Allow the user to configure where on the cable the TDR data should be retrieved, in terms of first and last sample, and the step between samples. Also add the ability to ask for TDR data for just one pair. If this configuration is not provided, it defaults to 1-150m at 1m intervals for all pairs. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v3: Move the TDR configuration into a structure Add a range check on step Use NL_SET_ERR_MSG_ATTR() when appropriate Move TDR configuration into a nest Document attributes in the request Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 23:22:21 -07:00
Andrew Lunn	6b4a0fc106	net: ethtool: Add helpers for cable test TDR data Add helpers for returning raw TDR helpers in netlink messages. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 23:22:20 -07:00
Andrew Lunn	1a644de29f	net: ethtool: Add generic parts of cable test TDR Add the generic parts of the code used to trigger a cable test and return raw TDR data. Any PHY driver which support this must implement the new driver op. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v2 Update nxp-tja11xx for API change. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 23:21:48 -07:00
Chris Packham	9a233d329e	net: sctp: Fix spelling in Kconfig help Change 'handeled' to 'handled' in the Kconfig help for SCTP. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:32:06 -07:00
Florian Westphal	4e637c70b5	mptcp: attempt coalescing when moving skbs to mptcp rx queue We can try to coalesce skbs we take from the subflows rx queue with the tail of the mptcp rx queue. If successful, the skb head can be discarded early. We can also free the skb extensions, we do not access them after this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:29:32 -07:00
Dmitry Vyukov	09d0310f07	net/smc: mark smc_pnet_policy as const Netlink policies are generally declared as const. This is safer and prevents potential bugs. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:19:13 -07:00
Paolo Abeni	0a82e230c6	mptcp: avoid NULL-ptr derefence on fallback In the MPTCP receive path we must cope with TCP fallback on blocking recvmsg(). Currently in such code path we detect the fallback condition, but we don't fetch the struct socket required for fallback. The above allowed syzkaller to trigger a NULL pointer dereference: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline] RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904 Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d RSP: 0018:ffffc90001077b98 EFLAGS: 00010202 RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000 FS: 00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891 inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:886 [inline] sock_recvmsg net/socket.c:904 [inline] __sys_recvfrom+0x2f3/0x470 net/socket.c:2057 __do_sys_recvfrom net/socket.c:2075 [inline] __se_sys_recvfrom net/socket.c:2071 [inline] __x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071 do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Address the issue initializing the struct socket reference before entering the fallback code. Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com Suggested-by: Ondrej Mosnacek <omosnace@redhat.com> Fixes: `8ab183deb2` ("mptcp: cope with later TCP fallback") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:18:24 -07:00
David S. Miller	745bd6f44c	One batch of changes, containing: * hwsim improvements from Jouni and myself, to be able to test more scenarios easily * some more HE (802.11ax) support * some initial S1G (sub 1 GHz) work for fractional MHz channels * some (action) frame registration updates to help DPP support * along with other various improvements/fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7L1A8ACgkQB8qZga/f l8T5RQ/9EUxFqBs0cWojwyad5nkesyl51eOnbvSCJJF14W93s2oMeikCynTPe8Vg km36041QZqGbwmU0yWC9Lmm4y3ja5qQGI+QW+vT6tutGQx6FgK5TzUfYXqiFZqf6 asqkvHpH4VqmbG1KEp0PZjIpW/OVK96pbvtXVnkrcMmjl2JjbRtAhyZQVNtt9ufJ 6wqKf8e6iYqMIInMFPLX+rl7UEknxDKVcqPbMMJmY8/iM1z9Elkg3rkRSMehC+mE 8cznZ6BsjAGCbMiA8K9fUo15lcMfZCJ1hAPzkD4TsJtMEJ0gYDo5jDB8TIpr5uoL 95OnlF8jokJIsO+1g4CyaNSQsmFIuDo84vW8LtGRu9qzTP0UwelxhjZLgE3xlP6b W+z5HomxfWkYhJhaNywLP3B1VPtJwX8dL/wpECOWHzNKXG7Rb6GqzUwaCRFb6Jjo TmFJ5wLoEZHhsXYO2dvcyTzCUCXviXvfq60a56IyCJN8wDqmcubePv0+NOHUmj3c E71NTYymM3j9agdSpXdCFLBXA1OgyIydeSNHuBlaPA4sK6tr4ikUtbOrABjYTaQz 2BB5fHEi8gs4EiHbSXqLFBot3JHljKJPsSN0wAgzQffN+6Kts9FG6HcrLsL+duDg lRdAzRrunE85S0QhsxeVIX216rX4W08sl0B1rJR+dTMX9ByblAk= =MVBJ -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== One batch of changes, containing: * hwsim improvements from Jouni and myself, to be able to test more scenarios easily * some more HE (802.11ax) support * some initial S1G (sub 1 GHz) work for fractional MHz channels * some (action) frame registration updates to help DPP support * along with other various improvements/fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:17:35 -07:00
David Ahern	1fd1c768f3	ipv4: nexthop version of fib_info_nh_uses_dev Similar to the last path, need to fix fib_info_nh_uses_dev for external nexthops to avoid referencing multiple nh_grp structs. Move the device check in fib_info_nh_uses_dev to a helper and create a nexthop version that is called if the fib_info uses an external nexthop. Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 16:06:07 -07:00
David Ahern	af7888ad9e	ipv4: Refactor nhc evaluation in fib_table_lookup FIB lookups can return an entry that references an external nexthop. While walking the nexthop struct we do not want to make multiple calls into the nexthop code which can result in 2 different structs getting accessed - one returning the number of paths the rest of the loop seeing a different nh_grp struct. If the nexthop group shrunk, the result is an attempt to access a fib_nh_common that does not exist for the new nh_grp struct but did for the old one. To fix that move the device evaluation code to a helper that can be used for inline fib_nh path as well as external nexthops. Update the existing check for fi->nh in fib_table_lookup to call a new helper, nexthop_get_nhc_lookup, which walks the external nexthop with a single rcu dereference. Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 16:06:07 -07:00
Nikolay Aleksandrov	90f33bffa3	nexthops: don't modify published nexthop groups We must avoid modifying published nexthop groups while they might be in use, otherwise we might see NULL ptr dereferences. In order to do that we allocate 2 nexthoup group structures upon nexthop creation and swap between them when we have to delete an entry. The reason is that we can't fail nexthop group removal, so we can't handle allocation failure thus we move the extra allocation on creation where we can safely fail and return ENOMEM. Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 16:06:06 -07:00
David Ahern	ac21753a5c	nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry Move nh_grp dereference and check for removing nexthop group due to all members gone into remove_nh_grp_entry. Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 16:06:06 -07:00
Guillaume Nault	61aec25a6d	cls_flower: Support filtering on multiple MPLS Label Stack Entries With struct flow_dissector_key_mpls now recording the first FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of these LSEs independently. In order to avoid creating new netlink attributes for every possible depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute that contains the list of LSEs to match. Each LSE is represented by another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains the attributes representing the depth and the MPLS fields to match at this depth (label, TTL, etc.). For each MPLS field, the mask is always set to all-ones, as this is what the original API did. We could allow user configurable masks in the future if there is demand for more flexibility. The new API also allows to only specify an LSE depth. In that case, Flower only verifies that the MPLS label stack depth is greater or equal to the provided depth (that is, an LSE exists at this depth). Filters that only match on one (or more) fields of the first LSE are dumped using the old netlink attributes, to avoid confusing user space programs that don't understand the new API. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:22:58 -07:00
Guillaume Nault	58cff782cc	flow_dissector: Parse multiple MPLS Label Stack Entries The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:22:58 -07:00
David S. Miller	fb8ddaa915	This cleanup patchset includes the following patches: - Fix revert dynamic lockdep key changes for batman-adv, by Sven Eckelmann - use rcu_replace_pointer() where appropriate, by Antonio Quartulli - Revert "disable ethtool link speed detection when auto negotiation off", by Sven Eckelmann -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl7M5/oWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoRbjD/9+7kbUOYkMEz1x0izlXdXLSlMK EVU0iaJAecRxwN1+o7l1XstGIn5+Y77WsjaB9KKaOHOqWddZN8bQ1/b1lpp4s47n GR650eZcuPjXgSPVmiq86iztRNspOAokuCzFUkVh+AVeLMe/yonSxMpA8QFoYZMD KLxqxlG1xlPqvI+AULLaXSKWlt2tPNWBdpRwxL7TjupmblBMMwVxvi4G7J2Hxest PV//Sakm250kZnQXtoTT3vftDwBwoTFNOZGk9j/2gPrkSWx6l1CAOxp0KKP+sHkU ZlJtqabv0YCFPh17moOCB1Fy/RAvwAgwps1wb3+sK+YSsKg6R5VyAE9eajQx9+nL ypxLSFMs4JWSE+7qw9g6TnoQa1QsXIKmYNqyy5CwGfLg4F42rzA2obSYVA/I93ny KG00tqJM5FOaybXH59fKh5pti+PevWtPc1L5N0QCM6RX99dw79QSThx9wI7OeKfy xLTClx6y0Y2dZC80+zJUQgxDCFnA8+cvNznXNL/TYmf3kqrgUeC1G0prAdBuDfsd 1AKwnVMVsVN4qUmxIrFFQHCtxu39GU2tU5RZy6WIAkpvN9zIYDvlaDOTnxZgfPpw 59FmX/+0Gj9mh9FUWkJkyuZlkqFaztIkRVb10W1kXfALAIgp30pfSYGsDi0+0Y5J g4/cmMGRYNCN5RcAIg== =vDHL -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20200526' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - Fix revert dynamic lockdep key changes for batman-adv, by Sven Eckelmann - use rcu_replace_pointer() where appropriate, by Antonio Quartulli - Revert "disable ethtool link speed detection when auto negotiation off", by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:19:29 -07:00
Tuong Lien	0a3e060f34	tipc: add test for Nagle algorithm effectiveness When streaming in Nagle mode, we try to bundle small messages from user as many as possible if there is one outstanding buffer, i.e. not ACK-ed by the receiving side, which helps boost up the overall throughput. So, the algorithm's effectiveness really depends on when Nagle ACK comes or what the specific network latency (RTT) is, compared to the user's message sending rate. In a bad case, the user's sending rate is low or the network latency is small, there will not be many bundles, so making a Nagle ACK or waiting for it is not meaningful. For example: a user sends its messages every 100ms and the RTT is 50ms, then for each messages, we require one Nagle ACK but then there is only one user message sent without any bundles. In a better case, even if we have a few bundles (e.g. the RTT = 300ms), but now the user sends messages in medium size, then there will not be any difference at all, that says 3 x 1000-byte data messages if bundled will still result in 3 bundles with MTU = 1500. When Nagle is ineffective, the delay in user message sending is clearly wasted instead of sending directly. Besides, adding Nagle ACKs will consume some processor load on both the sending and receiving sides. This commit adds a test on the effectiveness of the Nagle algorithm for an individual connection in the network on which it actually runs. Particularly, upon receipt of a Nagle ACK we will compare the number of bundles in the backlog queue to the number of user messages which would be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle mode will be kept for further message sending. Otherwise, we will leave Nagle and put a 'penalty' on the connection, so it will have to spend more 'one-way' messages before being able to re-enter Nagle. In addition, the 'ack-required' bit is only set when really needed that the number of Nagle ACKs will be reduced during Nagle mode. Testing with benchmark showed that with the patch, there was not much difference in throughput for small messages since the tool continuously sends messages without a break, so Nagle would still take in effect. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:16:52 -07:00
Tuong Lien	03b6fefd9b	tipc: add support for broadcast rcv stats dumping This commit enables dumping the statistics of a broadcast-receiver link like the traditional 'broadcast-link' one (which is for broadcast- sender). The link dumping can be triggered via netlink (e.g. the iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the indicator. The name of a broadcast-receiver link of a specific peer will be in the format: 'broadcast-link:<peer-id>'. For example: Link <broadcast-link:1001002> Window:50 packets RX packets:7841 fragments:2408/440 bundles:0/0 TX packets:0 fragments:0/0 bundles:0/0 RX naks:0 defs:124 dups:0 TX naks:21 acks:0 retrans:0 Congestion link:0 Send queue max:0 avg:0 In addition, the broadcast-receiver link statistics can be reset in the usual way via netlink by specifying that link name in command. Note: the 'tipc_link_name_ext()' is removed because the link name can now be retrieved simply via the 'l->name'. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:16:52 -07:00
Tuong Lien	a91d55d162	tipc: enable broadcast retrans via unicast In some environment, broadcast traffic is suppressed at high rate (i.e. a kind of bandwidth limit setting). When it is applied, TIPC broadcast can still run successfully. However, when it comes to a high load, some packets will be dropped first and TIPC tries to retransmit them but the packet retransmission is intentionally broadcast too, so making things worse and not helpful at all. This commit enables the broadcast retransmission via unicast which only retransmits packets to the specific peer that has really reported a gap i.e. not broadcasting to all nodes in the cluster, so will prevent from being suppressed, and also reduce some overheads on the other peers due to duplicates, finally improve the overall TIPC broadcast performance. Note: the functionality can be turned on/off via the sysctl file: echo 1 > /proc/sys/net/tipc/bc_retruni echo 0 > /proc/sys/net/tipc/bc_retruni Default is '0', i.e. the broadcast retransmission still works as usual. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:16:52 -07:00
Tuong Lien	c6ed7a5cc2	tipc: add back link trace events In the previous commit ("tipc: add Gap ACK blocks support for broadcast link"), we have removed the following link trace events due to the code changes: - tipc_link_bc_ack - tipc_link_retrans This commit adds them back along with some minor changes to adapt to the new code. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:16:52 -07:00
Tuong Lien	d7626b5acf	tipc: introduce Gap ACK blocks for broadcast link As achieved through commit `9195948fbf` ("tipc: improve TIPC throughput by Gap ACK blocks"), we apply the same mechanism for the broadcast link as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will consist of two parts built for both the broadcast and unicast types: 31 16 15 0 +-------------+-------------+-------------+-------------+ \| bgack_cnt \| ugack_cnt \| len \| +-------------+-------------+-------------+-------------+ - \| gap \| ack \| \| +-------------+-------------+-------------+-------------+ > bc gacks : : : \| +-------------+-------------+-------------+-------------+ - \| gap \| ack \| \| +-------------+-------------+-------------+-------------+ > uc gacks : : : \| +-------------+-------------+-------------+-------------+ - which is "automatically" backward-compatible. We also increase the max number of Gap ACK blocks to 128, allowing upto 64 blocks per type (total buffer size = 516 bytes). Besides, the 'tipc_link_advance_transmq()' function is refactored which is applicable for both the unicast and broadcast cases now, so some old functions can be removed and the code is optimized. With the patch, TIPC broadcast is more robust regardless of packet loss or disorder, latency, ... in the underlying network. Its performance is boost up significantly. For example, experiment with a 5% packet loss rate results: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 0m 42.46s user 0m 1.16s sys 0m 17.67s Without the patch: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 8m 27.94s user 0m 0.55s sys 0m 2.38s Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:16:52 -07:00
Eric Dumazet	239174945d	tcp: tcp_v4_err() icmp skb is named icmp_skb I missed the fact that tcp_v4_err() differs from tcp_v6_err(). After commit `4d1a2d9ec1` ("Rename skb to icmp_skb in tcp_v4_err()") the skb argument has been renamed to icmp_skb only in one function. I will in a future patch reconciliate these functions to avoid this kind of confusion. Fixes: `45af29ca76` ("tcp: allow traceroute -Mtcp for unpriv users") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:12:45 -07:00
Sven Eckelmann	9ad346c905	batman-adv: Revert "disable ethtool link speed detection when auto negotiation off" The commit `8c46fcd783` ("batman-adv: disable ethtool link speed detection when auto negotiation off") disabled the usage of ethtool's link_ksetting when auto negotation was enabled due to invalid values when used with tun/tap virtual net_devices. According to the patch, automatic measurements should be used for these kind of interfaces. But there are major flaws with this argumentation: * automatic measurements are not implemented * auto negotiation has nothing to do with the validity of the retrieved values The first point has to be fixed by a longer patch series. The "validity" part of the second point must be addressed in the same patch series by dropping the usage of ethtool's link_ksetting (thus always doing automatic measurements over ethernet). Drop the patch again to have more default values for various net_device types/configurations. The user can still overwrite them using the batadv_hardif's BATADV_ATTR_THROUGHPUT_OVERRIDE. Reported-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-05-26 09:23:33 +02:00
David S. Miller	963bdfc75c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Set VLAN tag in tcp reset/icmp unreachable packets to reject connections in the bridge family, from Michael Braun. 2) Incorrect subcounter flag update in ipset, from Phil Sutter. 3) Possible buffer overflow in the pptp conntrack helper, based on patch from Dan Carpenter. 4) Restore userspace conntrack helper hook logic that broke after hook consolidation rework. 5) Unbreak userspace conntrack helper registration via nfnetlink_cthelper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-25 18:28:42 -07:00
David S. Miller	1a6da4fcdd	A few changes: * fix a debugfs vs. wiphy rename crash * fix an invalid HE spec definition * fix a mesh timer crash -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl7L0skACgkQB8qZga/f l8RV0hAAnJaaF7hnBm3KuTgFWYdCUEc5IbaYZnD6TUM5xIX5IRrP4HtIrxL9K0sc h8AypNpvPU3OrDZOwoywMjD1LbgRo+91QK+3uo+ObUaGLpTytfeQVWu7x/yl7s+m SCbLBn9ahOipv3ODrR5JZ0PnNmbV7D9TSdtXHElQBpFd1KVJZMHeCvAzi6dZjT6H wVJ0JMHfQtl8prEBIqDFN8IbxYsMqYDBBoScqD0LOMg7TFGgSRlSLazE9pXPV7uT Q6wgSPmABa1C0lXI0TZcAT5Vkz5+9NpqC+lUtkBV4Eyrse5b8WNTeTWhvuybKMTf wdlDOoJg8CSWAbaMq5E8txzoimOZyqi9YKtWg2fbC4mtuM9Ur9JH+iO5oy9LoTkG DjR2dPEg3XQvczFJLlL/VmFp3c8amsEGd2DD00mIm9U1y9EDy/3GkoMQmDndBE3T /tvUDJkrH0pnntIIvn4kiDKMG47BV5Xm1wPfLKkwOY2K4z25Ze+D6ikzrFhenkqv s32J9D0m2jc4UygJcy+zdfqlNvckhrvrbhl+o0YaVHnqHpOYJpYkyq4nt+WdumBe fmEtUaPE4gC5PYiQPCz5Lnf4WtoC5fsj4jiRFBaJgotGEyZqBYN4zRhRXmIXOxRV /lJXHX3Uu9qY3RhSNWD/HDcz+p9D+tYTI+h4Sx9ffZiYKMolAPA= =+uUG -----END PGP SIGNATURE----- Merge tag 'mac80211-for-net-2020-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few changes: * fix a debugfs vs. wiphy rename crash * fix an invalid HE spec definition * fix a mesh timer crash ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-25 18:18:26 -07:00
Horatiu Vultur	617504c67e	bridge: mrp: Fix out-of-bounds read in br_mrp_parse The issue was reported by syzbot. When the function br_mrp_parse was called with a valid net_bridge_port, the net_bridge was an invalid pointer. Therefore the check br->stp_enabled could pass/fail depending where it was pointing in memory. The fix consists of setting the net_bridge pointer if the port is a valid pointer. Reported-by: syzbot+9c6f0f1f8e32223df9a4@syzkaller.appspotmail.com Fixes: `6536993371` ("bridge: mrp: Integrate MRP into the bridge") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-25 18:09:26 -07:00
Eric Dumazet	45af29ca76	tcp: allow traceroute -Mtcp for unpriv users Unpriv users can use traceroute over plain UDP sockets, but not TCP ones. $ traceroute -Mtcp 8.8.8.8 You do not have enough privileges to use this traceroute method. $ traceroute -n -Mudp 8.8.8.8 traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets 1 192.168.86.1 3.631 ms 3.512 ms 3.405 ms 2 10.1.10.1 4.183 ms 4.125 ms 4.072 ms 3 96.120.88.125 20.621 ms 19.462 ms 20.553 ms 4 96.110.177.65 24.271 ms 25.351 ms 25.250 ms 5 69.139.199.197 44.492 ms 43.075 ms 44.346 ms 6 68.86.143.93 27.969 ms 25.184 ms 25.092 ms 7 96.112.146.18 25.323 ms 96.112.146.22 25.583 ms 96.112.146.26 24.502 ms 8 72.14.239.204 24.405 ms 74.125.37.224 16.326 ms 17.194 ms 9 209.85.251.9 18.154 ms 209.85.247.55 14.449 ms 209.85.251.9 26.296 ms^C We can easily support traceroute over TCP, by queueing an error message into socket error queue. Note that applications need to set IP_RECVERR/IPV6_RECVERR option to enable this feature, and that the error message is only queued while in SYN_SNT state. socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IPV6, IPV6_RECVERR, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0 setsockopt(3, SOL_IPV6, IPV6_UNICAST_HOPS, [5], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, 28) = -1 EHOSTUNREACH (No route to host) recvmsg(3, {msg_name={sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, msg_namelen=1024->28, msg_iov=[{iov_base="`\r\337\320\0004\6\1&\7\370\260\200\231\16\27\0\0\0\0\0\0\0\0 \2\n\5f\10\2\227"..., iov_len=1024}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1590340680, tv_usec=272424}}, {cmsg_len=60, cmsg_level=SOL_IPV6, cmsg_type=IPV6_RECVERR}], msg_controllen=96, msg_flags=MSG_ERRQUEUE}, MSG_ERRQUEUE) = 144 Suggested-by: Maciej Żenczykowski <maze@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-25 17:54:06 -07:00
Dan Carpenter	6a1015b0b4	ipv4: potential underflow in compat_ip_setsockopt() The value of "n" is capped at 0x1ffffff but it checked for negative values. I don't think this causes a problem but I'm not certain and it's harmless to prevent it. Fixes: `2e04172875` ("ipv4: do compat setsockopt for MCAST_MSFILTER directly") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-25 17:51:28 -07:00
Vinay Kumar Yadav	0cada33241	net/tls: fix race condition causing kernel panic tls_sw_recvmsg() and tls_decrypt_done() can be run concurrently. // tls_sw_recvmsg() if (atomic_read(&ctx->decrypt_pending)) crypto_wait_req(-EINPROGRESS, &ctx->async_wait); else reinit_completion(&ctx->async_wait.completion); //tls_decrypt_done() pending = atomic_dec_return(&ctx->decrypt_pending); if (!pending && READ_ONCE(ctx->async_notify)) complete(&ctx->async_wait.completion); Consider the scenario tls_decrypt_done() is about to run complete() if (!pending && READ_ONCE(ctx->async_notify)) and tls_sw_recvmsg() reads decrypt_pending == 0, does reinit_completion(), then tls_decrypt_done() runs complete(). This sequence of execution results in wrong completion. Consequently, for next decrypt request, it will not wait for completion, eventually on connection close, crypto resources freed, there is no way to handle pending decrypt response. This race condition can be avoided by having atomic_read() mutually exclusive with atomic_dec_return(),complete().Intoduced spin lock to ensure the mutual exclution. Addressed similar problem in tx direction. v1->v2: - More readable commit message. - Corrected the lock to fix new race scenario. - Removed barrier which is not needed now. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-25 17:41:40 -07:00
Björn Töpel	b16a87d0ae	xsk: Add overflow check for u64 division, stored into u32 The npgs member of struct xdp_umem is an u32 entity, and stores the number of pages the UMEM consumes. The calculation of npgs npgs = size / PAGE_SIZE can overflow. To avoid overflow scenarios, the division is now first stored in a u64, and the result is verified to fit into 32b. An alternative would be storing the npgs as a u64, however, this wastes memory and is an unrealisticly large packet area. Fixes: `c0c77d8fb7` ("xsk: add user memory registration support sockopt") Reported-by: "Minh Bùi Quang" <minhquangbui99@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/CACtPs=GGvV-_Yj6rbpzTVnopgi5nhMoCcTkSkYrJHGQHJWFZMQ@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20200525080400.13195-1-bjorn.topel@gmail.com	2020-05-26 00:06:00 +02:00
Pablo Neira Ayuso	703acd70f2	netfilter: nfnetlink_cthelper: unbreak userspace helper support Restore helper data size initialization and fix memcopy of the helper data size. Fixes: `157ffffeb5` ("netfilter: nfnetlink_cthelper: reject too large userspace allocation requests") Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-25 20:39:14 +02:00
Pablo Neira Ayuso	ee04805ff5	netfilter: conntrack: make conntrack userspace helpers work again Florian Westphal says: "Problem is that after the helper hook was merged back into the confirm one, the queueing itself occurs from the confirm hook, i.e. we queue from the last netfilter callback in the hook-list. Therefore, on return, the packet bypasses the confirm action and the connection is never committed to the main conntrack table. To fix this there are several ways: 1. revert the 'Fixes' commit and have a extra helper hook again. Works, but has the drawback of adding another indirect call for everyone. 2. Special case this: split the hooks only when userspace helper gets added, so queueing occurs at a lower priority again, and normal enqueue reinject would eventually call the last hook. 3. Extend the existing nf_queue ct update hook to allow a forced confirmation (plus run the seqadj code). This goes for 3)." Fixes: `827318feb6` ("netfilter: conntrack: remove helper hook again") Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-25 20:39:14 +02:00
Pablo Neira Ayuso	4c559f15ef	netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code Dan Carpenter says: "Smatch complains that the value for "cmd" comes from the network and can't be trusted." Add pptp_msg_name() helper function that checks for the array boundary. Fixes: `f09943fefe` ("[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-25 20:39:14 +02:00
Phil Sutter	a164b95ad6	netfilter: ipset: Fix subcounter update skip If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE must be set, not unset. Fixes: `6e01781d1c` ("netfilter: ipset: set match: add support to match the counters") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-25 20:39:14 +02:00
Michael Braun	e9c284ec4b	netfilter: nft_reject_bridge: enable reject with bridge vlan Currently, using the bridge reject target with tagged packets results in untagged packets being sent back. Fix this by mirroring the vlan id as well. Fixes: `85f5b3086a` ("netfilter: bridge: add reject support") Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-25 20:39:05 +02:00
Valdis Kl ē tnieks	9f64fbdb77	bpfilter: document build requirements for bpfilter_umh It's not intuitively obvious that bpfilter_umh is a statically linked binary. Mention the toolchain requirement in the Kconfig help, so people have an easier time figuring out what's needed. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-05-26 00:03:16 +09:00
Xin Long	ed17b8d377	xfrm: fix a warning in xfrm_policy_insert_list This waring can be triggered simply by: # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 1 mark 0 mask 0x10 #[1] # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 2 mark 0 mask 0x1 #[2] # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 2 mark 0 mask 0x10 #[3] Then dmesg shows: [ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548 [ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030 [ ] Call Trace: [ ] xfrm_policy_inexact_insert+0x85/0xe50 [ ] xfrm_policy_insert+0x4ba/0x680 [ ] xfrm_add_policy+0x246/0x4d0 [ ] xfrm_user_rcv_msg+0x331/0x5c0 [ ] netlink_rcv_skb+0x121/0x350 [ ] xfrm_netlink_rcv+0x66/0x80 [ ] netlink_unicast+0x439/0x630 [ ] netlink_sendmsg+0x714/0xbf0 [ ] sock_sendmsg+0xe2/0x110 The issue was introduced by Commit `7cb8a93968` ("xfrm: Allow inserting policies with matching mark and different priorities"). After that, the policies [1] and [2] would be able to be added with different priorities. However, policy [3] will actually match both [1] and [2]. Policy [1] was matched due to the 1st 'return true' in xfrm_policy_mark_match(), and policy [2] was matched due to the 2nd 'return true' in there. It caused WARN_ON() in xfrm_policy_insert_list(). This patch is to fix it by only (the same value and priority) as the same policy in xfrm_policy_mark_match(). Thanks to Yuehaibing, we could make this fix better. v1->v2: - check policy->mark.v == pol->mark.v only without mask. Fixes: `7cb8a93968` ("xfrm: Allow inserting policies with matching mark and different priorities") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-05-25 16:04:12 +02:00
Johannes Berg	0bbab5f030	cfg80211: fix debugfs rename crash Removing the "if (IS_ERR(dir)) dir = NULL;" check only works if we adjust the remaining code to not rely on it being NULL. Check IS_ERR_OR_NULL() before attempting to dereference it. I'm not actually entirely sure this fixes the syzbot crash as the kernel config indicates that they do have DEBUG_FS in the kernel, but this is what I found when looking there. Cc: stable@vger.kernel.org Fixes: `d82574a8e5` ("cfg80211: no need to check return value of debugfs_create functions") Reported-by: syzbot+fd5332e429401bf42d18@syzkaller.appspotmail.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200525113816.fc4da3ec3d4b.Ica63a110679819eaa9fb3bc1b7437d96b1fd187d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-25 13:12:32 +02:00
Linus Lüssing	e2d4a80f93	mac80211: mesh: fix discovery timer re-arming issue / crash On a non-forwarding 802.11s link between two fairly busy neighboring nodes (iperf with -P 16 at ~850MBit/s TCP; 1733.3 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 4), so with frequent PREQ retries, usually after around 30-40 seconds the following crash would occur: [ 1110.822428] Unable to handle kernel read from unreadable memory at virtual address 00000000 [ 1110.830786] Mem abort info: [ 1110.833573] Exception class = IABT (current EL), IL = 32 bits [ 1110.839494] SET = 0, FnV = 0 [ 1110.842546] EA = 0, S1PTW = 0 [ 1110.845678] user pgtable: 4k pages, 48-bit VAs, pgd = ffff800076386000 [ 1110.852204] [0000000000000000] pgd=00000000f6322003, pud=00000000f62de003, *pmd=0000000000000000 [ 1110.861167] Internal error: Oops: 86000004 [#1] PREEMPT SMP [ 1110.866730] Modules linked in: pppoe ppp_async batman_adv ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 usbcore usb_common [ 1110.932190] Process swapper/3 (pid: 0, stack limit = 0xffff0000090c8000) [ 1110.938884] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.162 #0 [ 1110.944965] Hardware name: LS1043A RGW Board (DT) [ 1110.949658] task: ffff8000787a81c0 task.stack: ffff0000090c8000 [ 1110.955568] PC is at 0x0 [ 1110.958097] LR is at call_timer_fn.isra.27+0x24/0x78 [ 1110.963055] pc : [<0000000000000000>] lr : [<ffff0000080ff29c>] pstate: 00400145 [ 1110.970440] sp : ffff00000801be10 [ 1110.973744] x29: ffff00000801be10 x28: ffff000008bf7018 [ 1110.979047] x27: ffff000008bf87c8 x26: ffff000008c160c0 [ 1110.984352] x25: 0000000000000000 x24: 0000000000000000 [ 1110.989657] x23: dead000000000200 x22: 0000000000000000 [ 1110.994959] x21: 0000000000000000 x20: 0000000000000101 [ 1111.000262] x19: ffff8000787a81c0 x18: 0000000000000000 [ 1111.005565] x17: ffff0000089167b0 x16: 0000000000000058 [ 1111.010868] x15: ffff0000089167b0 x14: 0000000000000000 [ 1111.016172] x13: ffff000008916788 x12: 0000000000000040 [ 1111.021475] x11: ffff80007fda9af0 x10: 0000000000000001 [ 1111.026777] x9 : ffff00000801bea0 x8 : 0000000000000004 [ 1111.032080] x7 : 0000000000000000 x6 : ffff80007fda9aa8 [ 1111.037383] x5 : ffff00000801bea0 x4 : 0000000000000010 [ 1111.042685] x3 : ffff00000801be98 x2 : 0000000000000614 [ 1111.047988] x1 : 0000000000000000 x0 : 0000000000000000 [ 1111.053290] Call trace: [ 1111.055728] Exception stack(0xffff00000801bcd0 to 0xffff00000801be10) [ 1111.062158] bcc0: 0000000000000000 0000000000000000 [ 1111.069978] bce0: 0000000000000614 ffff00000801be98 0000000000000010 ffff00000801bea0 [ 1111.077798] bd00: ffff80007fda9aa8 0000000000000000 0000000000000004 ffff00000801bea0 [ 1111.085618] bd20: 0000000000000001 ffff80007fda9af0 0000000000000040 ffff000008916788 [ 1111.093437] bd40: 0000000000000000 ffff0000089167b0 0000000000000058 ffff0000089167b0 [ 1111.101256] bd60: 0000000000000000 ffff8000787a81c0 0000000000000101 0000000000000000 [ 1111.109075] bd80: 0000000000000000 dead000000000200 0000000000000000 0000000000000000 [ 1111.116895] bda0: ffff000008c160c0 ffff000008bf87c8 ffff000008bf7018 ffff00000801be10 [ 1111.124715] bdc0: ffff0000080ff29c ffff00000801be10 0000000000000000 0000000000400145 [ 1111.132534] bde0: ffff8000787a81c0 ffff00000801bde8 0000ffffffffffff 000001029eb19be8 [ 1111.140353] be00: ffff00000801be10 0000000000000000 [ 1111.145220] [< (null)>] (null) [ 1111.149917] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 1111.155741] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 1111.161130] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 1111.166002] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 1111.171825] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 [ 1111.177213] Exception stack(0xffff0000090cbe30 to 0xffff0000090cbf70) [ 1111.183642] be20: 0000000000000020 0000000000000000 [ 1111.191461] be40: 0000000000000001 0000000000000000 00008000771af000 0000000000000000 [ 1111.199281] be60: ffff000008c95180 0000000000000000 ffff000008c19360 ffff0000090cbef0 [ 1111.207101] be80: 0000000000000810 0000000000000400 0000000000000098 ffff000000000000 [ 1111.214920] bea0: 0000000000000001 ffff0000089167b0 0000000000000000 ffff0000089167b0 [ 1111.222740] bec0: 0000000000000000 ffff000008c198e8 ffff000008bf7018 ffff000008c19000 [ 1111.230559] bee0: 0000000000000000 0000000000000000 ffff8000787a81c0 ffff000008018000 [ 1111.238380] bf00: ffff00000801c000 ffff00000913ba34 ffff8000787a81c0 ffff0000090cbf70 [ 1111.246199] bf20: ffff0000080857cc ffff0000090cbf70 ffff0000080857d0 0000000000400145 [ 1111.254020] bf40: ffff000008018000 ffff00000801c000 ffffffffffffffff ffff0000080fa574 [ 1111.261838] bf60: ffff0000090cbf70 ffff0000080857d0 [ 1111.266706] [<ffff0000080832e8>] el1_irq+0xe8/0x18c [ 1111.271576] [<ffff0000080857d0>] arch_cpu_idle+0x10/0x18 [ 1111.276880] [<ffff0000080d7de4>] do_idle+0xec/0x1b8 [ 1111.281748] [<ffff0000080d8020>] cpu_startup_entry+0x20/0x28 [ 1111.287399] [<ffff00000808f81c>] secondary_start_kernel+0x104/0x110 [ 1111.293662] Code: bad PC value [ 1111.296710] ---[ end trace 555b6ca4363c3edd ]--- [ 1111.301318] Kernel panic - not syncing: Fatal exception in interrupt [ 1111.307661] SMP: stopping secondary CPUs [ 1111.311574] Kernel Offset: disabled [ 1111.315053] CPU features: 0x0002000 [ 1111.318530] Memory Limit: none [ 1111.321575] Rebooting in 3 seconds.. With some added debug output / delays we were able to push the crash from the timer callback runner into the callback function and by that shedding some light on which object holding the timer gets corrupted: [ 401.720899] Unable to handle kernel read from unreadable memory at virtual address 00000868 [...] [ 402.335836] [<ffff0000088fafa4>] _raw_spin_lock_bh+0x14/0x48 [ 402.341548] [<ffff000000dbe684>] mesh_path_timer+0x10c/0x248 [mac80211] [ 402.348154] [<ffff0000080ff29c>] call_timer_fn.isra.27+0x24/0x78 [ 402.354150] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 402.359974] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 402.365362] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 402.370231] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 402.376053] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 The issue happens due to the following sequence of events: 1) mesh_path_start_discovery(): -> spin_unlock_bh(&mpath->state_lock) before mesh_path_sel_frame_tx() 2) mesh_path_free_rcu() -> del_timer_sync(&mpath->timer) [...] -> kfree_rcu(mpath) 3) mesh_path_start_discovery(): -> mod_timer(&mpath->timer, ...) [...] -> rcu_read_unlock() 4) mesh_path_free_rcu()'s kfree_rcu(): -> kfree(mpath) 5) mesh_path_timer() starts after timeout, using freed mpath object So a use-after-free issue due to a timer re-arming bug caused by an early spin-unlocking. This patch fixes this issue by re-checking if mpath is about to be free'd and if so bails out of re-arming the timer. Cc: stable@vger.kernel.org Fixes: `050ac52cbe` ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol") Cc: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Link: https://lore.kernel.org/r/20200522170413.14973-1-linus.luessing@c0d3.blue Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-05-25 10:31:16 +02:00
David S. Miller	13209a8f73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 13:47:27 -07:00
Linus Torvalds	caffb99b69	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna Bhowmik. 2) Nexthop attributes aren't being checked properly because of mis-initialized iterator, from David Ahern. 3) Revert iop_idents_reserve() change as it caused performance regressions and was just working around what is really a UBSAN bug in the compiler. From Yuqi Jin. 4) Read MAC address properly from ROM in bmac driver (double iteration proceeds past end of address array), from Jeremy Kerr. 5) Add Microsoft Surface device IDs to r8152, from Marc Payne. 6) Prevent reference to freed SKB in __netif_receive_skb_core(), from Boris Sukholitko. 7) Fix ACK discard behavior in rxrpc, from David Howells. 8) Preserve flow hash across packet scrubbing in wireguard, from Jason A. Donenfeld. 9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric Dumazet. 10) Fix encryption error checking in kTLS code, from Vadim Fedorenko. 11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki. 12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet. 13) Fix use after free in mlxsw driver, from Jiri Pirko. 14) Order kTLS key destruction properly in mlx5 driver, from Tariq Toukan. 15) Check devm_platform_ioremap_resource() return value properly in several drivers, from Tiezhu Yang. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits) net: smsc911x: Fix runtime PM imbalance on error net/mlx4_core: fix a memory leak bug. net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend net: phy: mscc: fix initialization of the MACsec protocol mode net: stmmac: don't attach interface until resume finishes net: Fix return value about devm_platform_ioremap_resource() net/mlx5: Fix error flow in case of function_setup failure net/mlx5e: CT: Correctly get flow rule net/mlx5e: Update netdev txq on completions during closure net/mlx5: Annotate mutex destroy for root ns net/mlx5: Don't maintain a case of del_sw_func being null net/mlx5: Fix cleaning unmanaged flow tables net/mlx5: Fix memory leak in mlx5_events_init net/mlx5e: Fix inner tirs handling net/mlx5e: kTLS, Destroy key object after destroying the TIS net/mlx5e: Fix allowed tc redirect merged eswitch offload cases net/mlx5: Avoid processing commands before cmdif is ready net/mlx5: Fix a race when moving command interface to events mode net/mlx5: Add command entry handling completion rxrpc: Fix a memory leak in rxkad_verify_response() ...	2020-05-23 17:16:18 -07:00
Heiner Kallweit	316107119f	ethtool: propagate get_coalesce return value get_coalesce returns 0 or ERRNO, but the return value isn't checked. The returned coalesce data may be invalid if an ERRNO is set, therefore better check and propagate the return value. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:57:00 -07:00
Bartosz Golaszewski	cd16627fc0	net: devres: provide devm_register_netdev() Provide devm_register_netdev() - a device resource managed variant of register_netdev(). This new helper will only work for net_device structs that are also already managed by devres. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:56:17 -07:00
Bartosz Golaszewski	f75063abc3	net: devres: define a separate devres structure for devm_alloc_etherdev() Not using a proxy structure to store struct net_device doesn't save anything in terms of compiled code size or memory usage but significantly decreases the readability of the code with all the pointer casting. Define struct net_device_devres and use it in devm_alloc_etherdev_mqs(). Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:56:17 -07:00
Bartosz Golaszewski	cb8a14b205	net: move devres helpers into a separate source file There's currently only a single devres helper in net/ - devm variant of alloc_etherdev. Let's move it to net/devres.c with the intention of assing a second one: devm_register_netdev(). This new routine will need to know the address of the release function of devm_alloc_etherdev() so that it can verify (using devres_find()) that the struct net_device that's being passed to it is also resource managed. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:56:17 -07:00
Randy Dunlap	07a7f30819	net: psample: fix build error when CONFIG_INET is not enabled Fix psample build error when CONFIG_INET is not set/enabled by bracketing the tunnel code in #ifdef CONFIG_NET / #endif. ../net/psample/psample.c: In function ‘__psample_ip_tun_to_nlattr’: ../net/psample/psample.c:216:25: error: implicit declaration of function ‘ip_tunnel_info_opts’; did you mean ‘ip_tunnel_info_opts_set’? [-Werror=implicit-function-declaration] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Yotam Gigi <yotam.gi@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:36:05 -07:00
David S. Miller	a152b85984	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-05-23 The following pull-request contains BPF updates for your net-next tree. We've added 50 non-merge commits during the last 8 day(s) which contain a total of 109 files changed, 2776 insertions(+), 2887 deletions(-). The main changes are: 1) Add a new AF_XDP buffer allocation API to the core in order to help lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe as well as mlx5 have been moved over to the new API and also gained a small improvement in performance, from Björn Töpel and Magnus Karlsson. 2) Add getpeername()/getsockname() attach types for BPF sock_addr programs in order to allow for e.g. reverse translation of load-balancer backend to service address/port tuple from a connected peer, from Daniel Borkmann. 3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers being non-NULL, e.g. if after an initial test another non-NULL test on that pointer follows in a given path, then it can be pruned right away, from John Fastabend. 4) Larger rework of BPF sockmap selftests to make output easier to understand and to reduce overall runtime as well as adding new BPF kTLS selftests that run in combination with sockmap, also from John Fastabend. 5) Batch of misc updates to BPF selftests including fixing up test_align to match verifier output again and moving it under test_progs, allowing bpf_iter selftest to compile on machines with older vmlinux.h, and updating config options for lirc and v6 segment routing helpers, from Stanislav Fomichev, Andrii Nakryiko and Alan Maguire. 6) Conversion of BPF tracing samples outdated internal BPF loader to use libbpf API instead, from Daniel T. Lee. 7) Follow-up to BPF kernel test infrastructure in order to fix a flake in the XDP selftests, from Jesper Dangaard Brouer. 8) Minor improvements to libbpf's internal hashmap implementation, from Ian Rogers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 18:30:34 -07:00
Qiushi Wu	f45d01f4f3	rxrpc: Fix a memory leak in rxkad_verify_response() A ticket was not released after a call of the function "rxkad_decrypt_ticket" failed. Thus replace the jump target "temporary_error_free_resp" by "temporary_error_free_ticket". Fixes: `8c2f826dc3` ("rxrpc: Don't put crypto buffers on the stack") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: David Howells <dhowells@redhat.com> cc: Markus Elfring <Markus.Elfring@web.de>	2020-05-23 00:35:46 +01:00
Horatiu Vultur	4fb13499d3	bridge: mrp: Restore port state when deleting MRP instance When a MRP instance is deleted, then restore the port according to the bridge state. If the bridge is up then the ports will be in forwarding state otherwise will be in disabled state. Fixes: `9a9f26e8f7` ("bridge: mrp: Connect MRP API with the switchdev API") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 16:17:15 -07:00
Horatiu Vultur	7aa38018be	bridge: mrp: Add br_mrp_unique_ifindex function It is not allow to have the same net bridge port part of multiple MRP rings. Therefore add a check if the port is used already in a different MRP. In that case return failure. Fixes: `9a9f26e8f7` ("bridge: mrp: Connect MRP API with the switchdev API") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 16:17:15 -07:00
David S. Miller	4629ed2e48	RxRPC fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7FQH4ACgkQ+7dXa6fL C2u05BAAlilXCYGZU23/yQmMxxkmcG+jW+oDV9ySNDl6iJT+OtKxHEGReDD4D2f0 rhaBivgGcOnZy89AGjNrSVROGlwSBXl7ArJcjsfsx8AuNzxHUHQKWlW/k8n87qEt NCTze7f65IT6NowgYAFgJn5kIpY/9iKuNiCf6NGL3Z35wqxPvwNs6AQSGM495uvB el/ddkr8QzzjI9Ejsgzj94x4DAOjk4T4WzfWMAgyr1OEqz6vKNKkCwSKPySOsQAK 72JRaGhWA9rfAOkA7nAZpnjHdfFYnkFBOVQzmswOJYRYe3D/QY5D9PUlGIQ5OSjL yV5YOi/+AUrSif79NfEYXga0r/NFJMFqBg2zo/eiSrhfZZFZMDcagnGhzpGjbYF1 IaeIu4q/MQOQybi8m1GJhvFfPOhdKRn731jlsUvEoxK0TonSu/u64eus+qelQxOd uiIcu/kLxfPZSznUd8cXZ+Pffce0uBIRWq0nRQZ703TyHY+/gYo7ZGHr/FZNKaK4 lRNP4Nu3goLQCI40R7y7USnpX+kWfd4mYC9zl+VBSXG1JymYbOezXYrNATBCqpo6 9VoYtqDdo8ESksFUBqM7fGRDZ20nah6KdRGmnrPU+rpODHZEZmNN7D/rayU31wua VIbVw1WluvSVnQ8+b1BwJwvhJQ4CazyGcnbxDqx1zd07EjUiiJI= =jJvy -----END PGP SIGNATURE----- Merge tag 'rxrpc-fixes-20200520' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix retransmission timeout and ACK discard Here are a couple of fixes and an extra tracepoint for AF_RXRPC: (1) Calculate the RTO pretty much as TCP does, rather than making something up, including an initial 4s timeout (which causes return probes from the fileserver to fail if a packet goes missing), and add backoff. (2) Fix the discarding of out-of-order received ACKs. We mustn't let the hard-ACK point regress, nor do we want to do unnecessary retransmission because the soft-ACK list regresses. This is not trivial, however, due to some loose wording in various old protocol specs, the ACK field that should be used for this sometimes has the wrong information in it. (3) Add a tracepoint to log a discarded ACK. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:53:30 -07:00
Edward Cree	060b6381ef	net: flow_offload: simplify hw stats check handling Make FLOW_ACTION_HW_STATS_DONT_CARE be all bits, rather than none, so that drivers and __flow_action_hw_stats_check can use simple bitwise checks. Pre-fill all actions with DONT_CARE in flow_rule_alloc(), rather than relying on implicit semantics of zero from kzalloc, so that callers which don't configure action stats themselves (i.e. netfilter) get the correct behaviour by default. Only the kernel's internal API semantics change; the TC uAPI is unaffected. v4: move DONT_CARE setting to flow_rule_alloc() for robustness and simplicity. v3: set DONT_CARE in nft and ct offload. v2: rebased on net-next, removed RFC tags. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:52:08 -07:00
Vadim Fedorenko	1515aa70c0	mpls: Add support for IPv6 tunnels Add support for IPv6 tunnel devices in AF_MPLS. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:49:31 -07:00
Vadim Fedorenko	f200e98d97	ip6_tunnel: add generic MPLS receive support Add support for MPLS in receive side. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:49:31 -07:00
Vadim Fedorenko	f234efac2c	tunnel6: support for IPPROTO_MPLS This patch is just preparation for MPLS support in ip6_tunnel Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:49:30 -07:00
Vadim Fedorenko	6c11fbf97e	ip6_tunnel: add MPLS transmit support Add ETH_P_MPLS_UC as supported protocol. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:49:30 -07:00
Vadim Fedorenko	e7bb18e6c8	ip6_tunnel: simplify transmit path Merge ip{4,6}ip6_tnl_xmit functions into one universal ipxip6_tnl_xmit in preparation for adding MPLS support. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:49:30 -07:00
Jere Leppänen	d3e8e4c118	sctp: Start shutdown on association restart if in SHUTDOWN-SENT state and socket is closed Commit `bdf6fa52f0` ("sctp: handle association restarts when the socket is closed.") starts shutdown when an association is restarted, if in SHUTDOWN-PENDING state and the socket is closed. However, the rationale stated in that commit applies also when in SHUTDOWN-SENT state - we don't want to move an association to ESTABLISHED state when the socket has been closed, because that results in an association that is unreachable from user space. The problem scenario: 1. Client crashes and/or restarts. 2. Server (using one-to-one socket) calls close(). SHUTDOWN is lost. 3. Client reconnects using the same addresses and ports. 4. Server's association is restarted. The association and the socket move to ESTABLISHED state, even though the server process has closed its descriptor. Also, after step 4 when the server process exits, some resources are leaked in an attempt to release the underlying inet sock structure in ESTABLISHED state: IPv4: Attempt to release TCP socket in state 1 00000000377288c7 Fix by acting the same way as in SHUTDOWN-PENDING state. That is, if an association is restarted in SHUTDOWN-SENT state and the socket is closed, then start shutdown and don't move the association or the socket to ESTABLISHED state. Fixes: `bdf6fa52f0` ("sctp: handle association restarts when the socket is closed.") Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:47:59 -07:00
Eric Dumazet	1378817486	tipc: block BH before using dst_cache dst_cache_get() documents it must be used with BH disabled. sysbot reported : BUG: using smp_processor_id() in preemptible [00000000] code: /21697 caller is dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68 CPU: 0 PID: 21697 Comm: Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 check_preemption_disabled lib/smp_processor_id.c:47 [inline] debug_smp_processor_id.cold+0x88/0x9b lib/smp_processor_id.c:57 dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68 tipc_udp_xmit.isra.0+0xb9/0xad0 net/tipc/udp_media.c:164 tipc_udp_send_msg+0x3e6/0x490 net/tipc/udp_media.c:244 tipc_bearer_xmit_skb+0x1de/0x3f0 net/tipc/bearer.c:526 tipc_enable_bearer+0xb2f/0xd60 net/tipc/bearer.c:331 __tipc_nl_bearer_enable+0x2bf/0x390 net/tipc/bearer.c:995 tipc_nl_bearer_enable+0x1e/0x30 net/tipc/bearer.c:1003 genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline] genl_family_rcv_msg net/netlink/genetlink.c:718 [inline] genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:746 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmsg+0xec/0x1b0 net/socket.c:2449 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x45ca29 Fixes: `e9c1a79321` ("tipc: add dst_cache support for udp media") Cc: Xin Long <lucien.xin@gmail.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:39:00 -07:00
David S. Miller	d3b968bc2d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2020-05-22 The following pull-request contains BPF updates for your net tree. We've added 3 non-merge commits during the last 3 day(s) which contain a total of 5 files changed, 69 insertions(+), 11 deletions(-). The main changes are: 1) Fix to reject mmap()'ing read-only array maps as writable since BPF verifier relies on such map content to be frozen, from Andrii Nakryiko. 2) Fix breaking audit from secid_to_secctx() LSM hook by avoiding to use call_int_hook() since this hook is not stackable, from KP Singh. 3) Fix BPF flow dissector program ref leak on netns cleanup, from Jakub Sitnicki. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 14:35:35 -07:00
Todd Malsbary	bd6972226f	mptcp: use untruncated hash in ADD_ADDR HMAC There is some ambiguity in the RFC as to whether the ADD_ADDR HMAC is the rightmost 64 bits of the entire hash or of the leftmost 160 bits of the hash. The intention, as clarified with the author of the RFC, is the entire hash. This change returns the entire hash from mptcp_crypto_hmac_sha (instead of only the first 160 bits), and moves any truncation/selection operation on the hash to the caller. Fixes: `12555a2d97` ("mptcp: use rightmost 64 bits in ADD_ADDR HMAC") Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Todd Malsbary <todd.malsbary@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 14:21:24 -07:00
Roopa Prabhu	8590ceedb7	nexthop: add support for notifiers This patch adds nexthop add/del notifiers. To be used by vxlan driver in a later patch. Could possibly be used by switchdev drivers in the future. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 14:00:38 -07:00
Roopa Prabhu	1274e1cc42	vxlan: ecmp support for mac fdb entries Todays vxlan mac fdb entries can point to multiple remote ips (rdsts) with the sole purpose of replicating broadcast-multicast and unknown unicast packets to those remote ips. E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (E-VPN multihoming is analogous to multi-homed LAG implementations, but with the inter-switch peerlink replaced with a vxlan tunnel). In other words it needs support for mac ecmp. Furthermore, for faster convergence, E-VPN multihoming needs the ability to update fdb ecmp nexthops independent of the fdb entries. New route nexthop API is perfect for this usecase. This patch extends the vxlan fdb code to take a nexthop id pointing to an ecmp nexthop group. Changes include: - New NDA_NH_ID attribute for fdbs - Use the newly added fdb nexthop groups - makes vxlan rdsts and nexthop handling code mutually exclusive - since this is a new use-case and the requirement is for ecmp nexthop groups, the fdb add and update path checks that the nexthop is really an ecmp nexthop group. This check can be relaxed in the future, if we want to introduce replication fdb nexthop groups and allow its use in lieu of current rdst lists. - fdb update requests with nexthop id's only allowed for existing fdb's that have nexthop id's - learning will not override an existing fdb entry with nexthop group - I have wrapped the switchdev offload code around the presence of rdst [1] E-VPN RFC https://tools.ietf.org/html/rfc7432 [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365 [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf Includes a null check fix in vxlan_xmit from Nikolay v2 - Fixed build issue: Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 14:00:38 -07:00
Roopa Prabhu	38428d6871	nexthop: support for fdb ecmp nexthops This patch introduces ecmp nexthops and nexthop groups for mac fdb entries. In subsequent patches this is used by the vxlan driver fdb entries. The use case is E-VPN multihoming [1,2,3] which requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (This is analogous to a multi-homed LAG but over vxlan). Changes include new nexthop flag NHA_FDB for nexthops referenced by fdb entries. These nexthops only have ip. This patch includes appropriate checks to avoid routes referencing such nexthops. example: $ip nexthop add id 12 via 172.16.1.2 fdb $ip nexthop add id 13 via 172.16.1.3 fdb $ip nexthop add id 102 group 12/13 fdb $bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self [1] E-VPN https://tools.ietf.org/html/rfc7432 [2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365 [3] LPC talk with mention of nexthop groups for L2 ecmp http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf v4 - fixed uninitialized variable reported by kernel test robot Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 14:00:38 -07:00
Antonio Quartulli	cf78bb0bbc	batman-adv: use rcu_replace_pointer() where appropriate In commit `a63fc6b75c` ("rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()") a new helper macro named rcu_replace_pointer() was introduced to simplify code requiring to switch an rcu pointer to a new value while extracting the old one. Use rcu_replace_pointer() where appropriate to make code slimer. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-05-22 14:19:24 +02:00
Sven Eckelmann	2092c910e2	batman-adv: Revert "Drop lockdep.h include for soft-interface.c" The commit `1a33e10e4a` ("net: partially revert dynamic lockdep key changes") reverts the commit `ab92d68fc2` ("net: core: add generic lockdep keys"). But it forgot to also revert the commit `5759af0682` ("batman-adv: Drop lockdep.h include for soft-interface.c") which depends on the latter. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-05-22 14:19:23 +02:00
Jakub Sitnicki	5cf65922bb	flow_dissector: Drop BPF flow dissector prog ref on netns cleanup When attaching a flow dissector program to a network namespace with bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. If netns gets destroyed while a flow dissector is still attached, and there are no other references to the prog, we leak the reference and the program remains loaded. Leak can be reproduced by running flow dissector tests from selftests/bpf: # bpftool prog list # ./test_flow_dissector.sh ... selftests: test_flow_dissector [PASS] # bpftool prog list 4: flow_dissector name _dissect tag e314084d332a5338 gpl loaded_at 2020-05-20T18:50:53+0200 uid 0 xlated 552B jited 355B memlock 4096B map_ids 3,4 btf_id 4 # Fix it by detaching the flow dissector program when netns is going away. Fixes: `d58e468b11` ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com	2020-05-21 17:52:45 -07:00
Björn Töpel	26062b185e	xsk: Explicitly inline functions and move definitions In order to reduce the number of function calls, the struct xsk_buff_pool definition is moved to xsk_buff_pool.h. The functions xp_get_dma(), xp_dma_sync_for_cpu(), xp_dma_sync_for_device(), xp_validate_desc() and various helper functions are explicitly inlined. Further, move xp_get_handle() and xp_release() to xsk.c, to allow for the compiler to perform inlining. rfc->v1: Make sure xp_validate_desc() is inlined for Tx perf. (Maxim) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-15-bjorn.topel@gmail.com	2020-05-21 17:31:27 -07:00
Björn Töpel	82c41671ca	xdp: Simplify xdp_return_{frame, frame_rx_napi, buff} The xdp_return_{frame,frame_rx_napi,buff} function are never used, except in xdp_convert_zc_to_xdp_frame(), by the MEM_TYPE_XSK_BUFF_POOL memory type. To simplify and reduce code, change so that xdp_convert_zc_to_xdp_frame() calls xsk_buff_free() directly since the type is know, and remove MEM_TYPE_XSK_BUFF_POOL from the switch statement in __xdp_return() function. Suggested-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-14-bjorn.topel@gmail.com	2020-05-21 17:31:27 -07:00
Björn Töpel	0807892ecb	xsk: Remove MEM_TYPE_ZERO_COPY and corresponding code There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding code, including the "handle" member of struct xdp_buff. rfc->v1: Fixed spelling in commit message. (Björn) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com	2020-05-21 17:31:27 -07:00
Björn Töpel	2b43470add	xsk: Introduce AF_XDP buffer allocation API In order to simplify AF_XDP zero-copy enablement for NIC driver developers, a new AF_XDP buffer allocation API is added. The implementation is based on a single core (single producer/consumer) buffer pool for the AF_XDP UMEM. A buffer is allocated using the xsk_buff_alloc() function, and returned using xsk_buff_free(). If a buffer is disassociated with the pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is said to be released. Currently, the release function is only used by the AF_XDP internals and not visible to the driver. Drivers using this API should register the XDP memory model with the new MEM_TYPE_XSK_BUFF_POOL type. The API is defined in net/xdp_sock_drv.h. The buffer type is struct xdp_buff, and follows the lifetime of regular xdp_buffs, i.e. the lifetime of an xdp_buff is restricted to a NAPI context. In other words, the API is not replacing xdp_frames. In addition to introducing the API and implementations, the AF_XDP core is migrated to use the new APIs. rfc->v1: Fixed build errors/warnings for m68k and riscv. (kbuild test robot) Added headroom/chunk size getter. (Maxim/Björn) v1->v2: Swapped SoBs. (Maxim) v2->v3: Initialize struct xdp_buff member frame_sz. (Björn) Add API to query the DMA address of a frame. (Maxim) Do DMA sync for CPU till the end of the frame to handle possible growth (frame_sz). (Maxim) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-6-bjorn.topel@gmail.com	2020-05-21 17:31:26 -07:00
Björn Töpel	89e4a376e3	xsk: Move defines only used by AF_XDP internals to xsk.h Move the XSK_NEXT_PG_CONTIG_{MASK,SHIFT}, and XDP_UMEM_USES_NEED_WAKEUP defines from xdp_sock.h to the AF_XDP internal xsk.h file. Also, start using the BIT{,_ULL} macro instead of explicit shifts. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-5-bjorn.topel@gmail.com	2020-05-21 17:31:26 -07:00
Magnus Karlsson	a71506a4fd	xsk: Move driver interface to xdp_sock_drv.h Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver implementors to know what functions to use for zero-copy support. v4->v5: Fix -Wmissing-prototypes by include header file. (Jakub) Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-4-bjorn.topel@gmail.com	2020-05-21 17:31:26 -07:00
Björn Töpel	d20a1676df	xsk: Move xskmap.c to net/xdp/ The XSKMAP is partly implemented by net/xdp/xsk.c. Move xskmap.c from kernel/bpf/ to net/xdp/, which is the logical place for AF_XDP related code. Also, move AF_XDP struct definitions, and function declarations only used by AF_XDP internals into net/xdp/xsk.h. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-3-bjorn.topel@gmail.com	2020-05-21 17:31:26 -07:00
Sabrina Dubroca	41b4bd986f	net: don't return invalid table id error when we fall back to PF_UNSPEC In case we can't find a ->dumpit callback for the requested (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're in the same situation as if userspace had requested a PF_UNSPEC dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all the registered RTM_GETROUTE handlers. The requested table id may or may not exist for all of those families. commit `ae677bbb44` ("net: Don't return invalid table id error when dumping all families") fixed the problem when userspace explicitly requests a PF_UNSPEC dump, but missed the fallback case. For example, when we pass ipv6.disable=1 to a kernel with CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y, the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in rtnl_dump_all, and listing IPv6 routes will unexpectedly print: # ip -6 r Error: ipv4: MR table does not exist. Dump terminated commit `ae677bbb44` introduced the dump_all_families variable, which gets set when userspace requests a PF_UNSPEC dump. However, we can't simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the fallback case to get dump_all_families == true, because some messages types (for example RTM_GETRULE and RTM_GETNEIGH) only register the PF_UNSPEC handler and use the family to filter in the kernel what is dumped to userspace. We would then export more entries, that userspace would have to filter. iproute does that, but other programs may not. Instead, this patch removes dump_all_families and updates the RTM_GETROUTE handlers to check if the family that is being dumped is their own. When it's not, which covers both the intentional PF_UNSPEC dumps (as dump_all_families did) and the fallback case, ignore the missing table id error. Fixes: `cb167893f4` ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 17:25:50 -07:00
Vadim Fedorenko	57ebc8f085	net: ipip: fix wrong address family in init error path In case of error with MPLS support the code is misusing AF_INET instead of AF_MPLS. Fixes: `1b69e7e6c4` ("ipip: support MPLS over IPv4") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 17:24:25 -07:00
Vadim Fedorenko	635d939817	net/tls: free record only on encryption error We cannot free record on any transient error because it leads to losing previos data. Check socket error to know whether record must be freed or not. Fixes: `d10523d0b3` ("net/tls: free the record on encryption error") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 17:20:06 -07:00
Vadim Fedorenko	a7bff11f6f	net/tls: fix encryption error checking bpf_exec_tx_verdict() can return negative value for copied variable. In that case this value will be pushed back to caller and the real error code will be lost. Fix it using signed type and checking for positive value. Fixes: `d10523d0b3` ("net/tls: free the record on encryption error") Fixes: `d3b18ad31f` ("tls: add bpf support to sk_msg handling") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 17:20:06 -07:00
Oleksij Rempel	8066021915	ethtool: provide UAPI for PHY Signal Quality Index (SQI) Signal Quality Index is a mandatory value required by "OPEN Alliance SIG" for the 100Base-T1 PHYs [1]. This indicator can be used for cable integrity diagnostic and investigating other noise sources and implement by at least two vendors: NXP[2] and TI[3]. [1] http://www.opensig.org/download/document/218/Advanced_PHY_features_for_automotive_Ethernet_V1.0.pdf [2] https://www.nxp.com/docs/en/data-sheet/TJA1100.pdf [3] https://www.ti.com/product/DP83TC811R-Q1 Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 17:18:00 -07:00
Manivannan Sadhasivam	d28ea1fbbf	net: qrtr: Fix passing invalid reference to qrtr_local_enqueue() Once the traversal of the list is completed with list_for_each_entry(), the iterator (node) will point to an invalid object. So passing this to qrtr_local_enqueue() which is outside of the iterator block is erroneous eventhough the object is not used. So fix this by passing NULL to qrtr_local_enqueue(). Fixes: `bdabad3e36` ("net: Add Qualcomm IPC router") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 17:04:53 -07:00
Chris Mi	d8bed686ab	net: psample: Add tunnel support Currently, psample can only send the packet bits after decapsulation. The tunnel information is lost. Add the tunnel support. If the sampled packet has no tunnel info, the behavior is the same as before. If it has, add a nested metadata field named PSAMPLE_ATTR_TUNNEL and include the tunnel subfields if applicable. Increase the metadata length for sampled packet with the tunnel info. If new subfields of tunnel info should be included, update the metadata length accordingly. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 17:04:07 -07:00
Michal Kubecek	7c87e32d2e	ethtool: count header size in reply size estimate As ethnl_request_ops::reply_size handlers do not include common header size into calculated/estimated reply size, it needs to be added in ethnl_default_doit() and ethnl_default_notify() before allocating the message. On the other hand, strset_reply_size() should not add common header size. Fixes: `728480f124` ("ethtool: default handlers for GET requests") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-21 16:59:19 -07:00
Jason Gunthorpe	eafd47fc20	Linux 5.7-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7BzV8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg8EH/A2pXMTxtc96RI4S sttEsUQqbakFS0Z/2tQPpMGr/qW2e5eHgsTX/a3SiUeZiIXk6f4lMFkMuctzBf7p X77cNEDwGOEdbtCXTsMcmKSde7sP2zCXsPB8xTWLyE6rnaFRgikwwkeqgkIKhp1h bvOQV0t9HNGvxGAM0iZeOvQAvFl4vd7nS123/MYbir9cugfQUSJRueQ4BiCiJqVE 6cNA7/vFzDJuFGszzIrJ7HXn/IdQMMWHkvTDjgBw0GZw1mDbGFbfbZwOeTz1ojCt smUQ4tIFxBa/VA5zx7dOy2P2keHbSVf4VLkZRPcceT7OqVS65ETmFDp+qt5NdWM5 vZ8+7/0= =CyYH -----END PGP SIGNATURE----- Merge tag 'v5.7-rc6' into rdma.git for-next Linux 5.7-rc6 Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c resolved by deleting dr_cq_event, matching how netdev resolved it. Required for dependencies in the following patches. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-21 17:08:27 -03:00
J. Bruce Fields	6670ee2ef2	Merge branch 'nfsd-5.8' of git://linux-nfs.org/~cel/cel-2.6 into for-5.8-incoming Highlights of this series: * Remove serialization of sending RPC/RDMA Replies * Convert the TCP socket send path to use xdr_buf::bvecs (pre-requisite for RPC-on-TLS) * Fix svcrdma backchannel sendto return code * Convert a number of dprintk call sites to use tracepoints * Fix the "suggest braces around empty body in an 'else' statement" warning	2020-05-21 10:58:15 -04:00
Stephen Worley	d69100b8ee	net: nlmsg_cancel() if put fails for nhmsg Fixes data remnant seen when we fail to reserve space for a nexthop group during a larger dump. If we fail the reservation, we goto nla_put_failure and cancel the message. Reproduce with the following iproute2 commands: ===================== ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add dummy3 type dummy ip link add dummy4 type dummy ip link add dummy5 type dummy ip link add dummy6 type dummy ip link add dummy7 type dummy ip link add dummy8 type dummy ip link add dummy9 type dummy ip link add dummy10 type dummy ip link add dummy11 type dummy ip link add dummy12 type dummy ip link add dummy13 type dummy ip link add dummy14 type dummy ip link add dummy15 type dummy ip link add dummy16 type dummy ip link add dummy17 type dummy ip link add dummy18 type dummy ip link add dummy19 type dummy ip link add dummy20 type dummy ip link add dummy21 type dummy ip link add dummy22 type dummy ip link add dummy23 type dummy ip link add dummy24 type dummy ip link add dummy25 type dummy ip link add dummy26 type dummy ip link add dummy27 type dummy ip link add dummy28 type dummy ip link add dummy29 type dummy ip link add dummy30 type dummy ip link add dummy31 type dummy ip link add dummy32 type dummy ip link set dummy1 up ip link set dummy2 up ip link set dummy3 up ip link set dummy4 up ip link set dummy5 up ip link set dummy6 up ip link set dummy7 up ip link set dummy8 up ip link set dummy9 up ip link set dummy10 up ip link set dummy11 up ip link set dummy12 up ip link set dummy13 up ip link set dummy14 up ip link set dummy15 up ip link set dummy16 up ip link set dummy17 up ip link set dummy18 up ip link set dummy19 up ip link set dummy20 up ip link set dummy21 up ip link set dummy22 up ip link set dummy23 up ip link set dummy24 up ip link set dummy25 up ip link set dummy26 up ip link set dummy27 up ip link set dummy28 up ip link set dummy29 up ip link set dummy30 up ip link set dummy31 up ip link set dummy32 up ip link set dummy33 up ip link set dummy34 up ip link set vrf-red up ip link set vrf-blue up ip link set dummyVRFred up ip link set dummyVRFblue up ip ro add 1.1.1.1/32 dev dummy1 ip ro add 1.1.1.2/32 dev dummy2 ip ro add 1.1.1.3/32 dev dummy3 ip ro add 1.1.1.4/32 dev dummy4 ip ro add 1.1.1.5/32 dev dummy5 ip ro add 1.1.1.6/32 dev dummy6 ip ro add 1.1.1.7/32 dev dummy7 ip ro add 1.1.1.8/32 dev dummy8 ip ro add 1.1.1.9/32 dev dummy9 ip ro add 1.1.1.10/32 dev dummy10 ip ro add 1.1.1.11/32 dev dummy11 ip ro add 1.1.1.12/32 dev dummy12 ip ro add 1.1.1.13/32 dev dummy13 ip ro add 1.1.1.14/32 dev dummy14 ip ro add 1.1.1.15/32 dev dummy15 ip ro add 1.1.1.16/32 dev dummy16 ip ro add 1.1.1.17/32 dev dummy17 ip ro add 1.1.1.18/32 dev dummy18 ip ro add 1.1.1.19/32 dev dummy19 ip ro add 1.1.1.20/32 dev dummy20 ip ro add 1.1.1.21/32 dev dummy21 ip ro add 1.1.1.22/32 dev dummy22 ip ro add 1.1.1.23/32 dev dummy23 ip ro add 1.1.1.24/32 dev dummy24 ip ro add 1.1.1.25/32 dev dummy25 ip ro add 1.1.1.26/32 dev dummy26 ip ro add 1.1.1.27/32 dev dummy27 ip ro add 1.1.1.28/32 dev dummy28 ip ro add 1.1.1.29/32 dev dummy29 ip ro add 1.1.1.30/32 dev dummy30 ip ro add 1.1.1.31/32 dev dummy31 ip ro add 1.1.1.32/32 dev dummy32 ip next add id 1 via 1.1.1.1 dev dummy1 ip next add id 2 via 1.1.1.2 dev dummy2 ip next add id 3 via 1.1.1.3 dev dummy3 ip next add id 4 via 1.1.1.4 dev dummy4 ip next add id 5 via 1.1.1.5 dev dummy5 ip next add id 6 via 1.1.1.6 dev dummy6 ip next add id 7 via 1.1.1.7 dev dummy7 ip next add id 8 via 1.1.1.8 dev dummy8 ip next add id 9 via 1.1.1.9 dev dummy9 ip next add id 10 via 1.1.1.10 dev dummy10 ip next add id 11 via 1.1.1.11 dev dummy11 ip next add id 12 via 1.1.1.12 dev dummy12 ip next add id 13 via 1.1.1.13 dev dummy13 ip next add id 14 via 1.1.1.14 dev dummy14 ip next add id 15 via 1.1.1.15 dev dummy15 ip next add id 16 via 1.1.1.16 dev dummy16 ip next add id 17 via 1.1.1.17 dev dummy17 ip next add id 18 via 1.1.1.18 dev dummy18 ip next add id 19 via 1.1.1.19 dev dummy19 ip next add id 20 via 1.1.1.20 dev dummy20 ip next add id 21 via 1.1.1.21 dev dummy21 ip next add id 22 via 1.1.1.22 dev dummy22 ip next add id 23 via 1.1.1.23 dev dummy23 ip next add id 24 via 1.1.1.24 dev dummy24 ip next add id 25 via 1.1.1.25 dev dummy25 ip next add id 26 via 1.1.1.26 dev dummy26 ip next add id 27 via 1.1.1.27 dev dummy27 ip next add id 28 via 1.1.1.28 dev dummy28 ip next add id 29 via 1.1.1.29 dev dummy29 ip next add id 30 via 1.1.1.30 dev dummy30 ip next add id 31 via 1.1.1.31 dev dummy31 ip next add id 32 via 1.1.1.32 dev dummy32 i=100 while [ $i -le 200 ] do ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19 echo $i ((i++)) done ip next add id 999 group 1/2/3/4/5/6 ip next ls ======================== Fixes: `ab84be7e54` ("net: Initial nexthop code") Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-20 21:00:40 -07:00
Eric Dumazet	687775cec0	ax25: fix setsockopt(SO_BINDTODEVICE) syzbot was able to trigger this trace [1], probably by using a zero optlen. While we are at it, cap optlen to IFNAMSIZ - 1 instead of IFNAMSIZ. [1] BUG: KMSAN: uninit-value in strnlen+0xf9/0x170 lib/string.c:569 CPU: 0 PID: 8807 Comm: syz-executor483 Not tainted 5.7.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 strnlen+0xf9/0x170 lib/string.c:569 dev_name_hash net/core/dev.c:207 [inline] netdev_name_node_lookup net/core/dev.c:277 [inline] __dev_get_by_name+0x75/0x2b0 net/core/dev.c:778 ax25_setsockopt+0xfa3/0x1170 net/ax25/af_ax25.c:654 __compat_sys_setsockopt+0x4ed/0x910 net/compat.c:403 __do_compat_sys_setsockopt net/compat.c:413 [inline] __se_compat_sys_setsockopt+0xdd/0x100 net/compat.c:410 __ia32_compat_sys_setsockopt+0x62/0x80 net/compat.c:410 do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline] do_fast_syscall_32+0x3bf/0x6d0 arch/x86/entry/common.c:398 entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f57dd9 Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffae8c1c EFLAGS: 00000217 ORIG_RAX: 000000000000016e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000101 RDX: 0000000000000019 RSI: 0000000020000000 RDI: 0000000000000004 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Local variable ----devname@ax25_setsockopt created at: ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536 ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-20 20:59:07 -07:00
Al Viro	0edecc020b	atm: switch do_atmif_sioc() to direct use of atm_dev_ioctl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:36 -04:00
Al Viro	8cacb41659	atm: lift copyin from atm_dev_ioctl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:35 -04:00
Al Viro	36085049bc	atm: switch do_atm_iobuf() to direct use of atm_getnames() ... and sod the compat_alloc_user_space() with its complications Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:35 -04:00
Al Viro	a3929484af	atm: move copyin from atm_getnames() into the caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:34 -04:00
Al Viro	8c2348e36a	atm: separate ATM_GETNAMES handling from the rest of atm_dev_ioctl() atm_dev_ioctl() does copyin in two different ways - one for ATM_GETNAMES, another for everything else. Start with separating the former into a new helper (atm_getnames()). The next step will be to lift the copyin into the callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:33 -04:00
Al Viro	38c53ca3c1	batadv_socket_read(): get rid of pointless access_ok() address is passed only to copy_to_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:33 -04:00
Al Viro	bbced07d99	get rid of compat_mc_setsockopt() not used anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:32 -04:00
Al Viro	b212c322c8	handle the group_source_req options directly Native ->setsockopt() handling of these options (MCAST_..._SOURCE_GROUP and MCAST_{,UN}BLOCK_SOURCE) consists of copyin + call of a helper that does the actual work. The only change needed for ->compat_setsockopt() is a slightly different copyin - the helpers can be reused as-is. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:32 -04:00
Al Viro	fcfa0b09d3	ipv6: take handling of group_source_req options into a helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:31 -04:00
Al Viro	2bbf8c1ead	ipv4: take handling of group_source_req options into a helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:31 -04:00
Al Viro	2f984f11fd	ipv[46]: do compat setsockopt for MCAST_{JOIN,LEAVE}_GROUP directly direct parallel to the way these two are handled in the native ->setsockopt() instances - the helpers that do the real work are already separated and can be reused as-is in this case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:30 -04:00
Al Viro	168a2cca81	ipv6: do compat setsockopt for MCAST_MSFILTER directly similar to the ipv4 counterpart of that patch - the same trick used to align the tail array properly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:29 -04:00
Al Viro	d59eb177c8	ip6_mc_msfilter(): pass the address list separately that way we'll be able to reuse it for compat case Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:29 -04:00
Al Viro	2e04172875	ipv4: do compat setsockopt for MCAST_MSFILTER directly Parallel to what the native setsockopt() does, except that unlike the native setsockopt() we do not use memdup_user() - we want the sockaddr_storage fields properly aligned, so we allocate 4 bytes more and copy compat_group_filter at the offset 4, which yields the proper alignments. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:28 -04:00
Al Viro	e986d4dabc	set_mcast_msfilter(): take the guts of setsockopt(MCAST_MSFILTER) into a helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:28 -04:00
Al Viro	0dfe6581a7	get rid of compat_mc_getsockopt() now we can do MCAST_MSFILTER in compat ->getsockopt() without playing silly buggers with copying things back and forth. We can form a native struct group_filter (sans the variable-length tail) on stack, pass that + pointer to the tail of original request to the helper doing the bulk of the work, then do the rest of copyout - same as the native getsockopt() does. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:27 -04:00
Al Viro	931ca7ab7f	ip*_mc_gsfget(): lift copyout of struct group_filter into callers pass the userland pointer to the array in its tail, so that part gets copied out by our functions; copyout of everything else is done in the callers. Rationale: reuse for compat; the array is the same in native and compat, the layout of parts before it is different for compat. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:27 -04:00
Al Viro	e9c375fb5e	compat_ip{,v6}_setsockopt(): enumerate MCAST_... options explicitly We want to check if optname is among the MCAST_... ones; do that as an explicit switch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:26 -04:00
Al Viro	63287de66d	lift compat definitions of mcast [sg]etsockopt requests into net/compat.h We want to get rid of compat_mc_[sg]etsockopt() and to have that stuff handled without compat_alloc_user_space(), extra copying through userland, etc. To do that we'll need ipv4 and ipv6 instances of ->compat_[sg]etsockopt() to manipulate the 32bit variants of mcast requests, so we need to move the definitions of those out of net/compat.c and into a public header. This patch just does a mechanical move to include/net/compat.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 20:31:25 -04:00
Chuck Lever	8954c5c212	SUNRPC: Clean up request deferral tracepoints - Rename these so they are easy to enable and search for as a set - Move the tracepoints to get a more accurate sense of control flow - Tracepoints should not fire on xprt shutdown - Display memory address in case data structure had been corrupted - Abandon dprintk in these paths I haven't ever gotten one of these tracepoints to trigger. I wonder if we should simply remove them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 17:30:44 -04:00
Chuck Lever	fff1ebb269	SUNRPC: Restructure svc_udp_recvfrom() Clean up. At this point, we are not ready yet to support bio_vecs in the UDP transport implementation. However, we can clean up svc_udp_recvfrom() to match the tracing and straight-lining recently changes made in svc_tcp_recvfrom(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 17:30:24 -04:00
Chuck Lever	ca07eda33e	SUNRPC: Refactor svc_recvfrom() This function is not currently "generic" so remove the documenting comment and rename it appropriately. Its internals are converted to use bio_vecs for reading from the transport socket. In existing typical sunrpc uses of bio_vecs, the bio_vec array is allocated dynamically. Here, instead, an array of bio_vecs is added to svc_rqst. The lifetime of this array can be greater than one call to xpo_recvfrom(): - Multiple calls to xpo_recvfrom() might be needed to read an RPC message completely. - At some later point, rq_arg.bvecs will point to this array and it will carry the received message into svc_process(). I also expect that a future optimization will remove either the rq_vec or rq_pages array in favor of rq_bvec, thus conserving the size of struct svc_rqst. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 17:30:12 -04:00
John Hubbard	f78cdbd75a	rds: fix crash in rds_info_getsockopt() The conversion to pin_user_pages() had a bug: it overlooked the case of allocation of pages failing. Fix that by restoring an equivalent check. Reported-by: syzbot+118ac0af4ac7f785a45b@syzkaller.appspotmail.com Fixes: `dbfe7d7437` ("rds: convert get_user_pages() --> pin_user_pages()") Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: rds-devel@oss.oracle.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-20 14:08:06 -07:00
Chuck Lever	cca557a5a6	SUNRPC: Clean up svc_release_skb() functions Rename these functions using the convention used for other xpo method entry points. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 15:44:18 -04:00
Chuck Lever	6be8c59491	SUNRPC: Refactor recvfrom path dealing with incomplete TCP receives Clean up: move exception processing out of the main path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 15:43:35 -04:00
Chuck Lever	7dae1dd726	SUNRPC: Replace dprintk() call sites in TCP receive path Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-20 15:30:51 -04:00
Christoph Hellwig	b8d9e7f241	fs: make the pipe_buf_operations ->confirm operation optional Just return 0 for success if it is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 12:11:26 -04:00
Christoph Hellwig	76887c2567	fs: make the pipe_buf_operations ->steal operation optional Just return 1 for failure if it is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-20 12:11:26 -04:00
Luiz Augusto von Dentz	755dfcbca8	Bluetooth: Fix assuming EIR flags can result in SSP authentication EIR flags should just hint if SSP may be supported but we shall verify this with use of the actual features as the SSP bits may be disabled in the lower layers which would result in legacy authentication to be used. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-20 16:33:43 +02:00
David Howells	441fdee1ea	rxrpc: Fix ack discard The Rx protocol has a "previousPacket" field in it that is not handled in the same way by all protocol implementations. Sometimes it contains the serial number of the last DATA packet received, sometimes the sequence number of the last DATA packet received and sometimes the highest sequence number so far received. AF_RXRPC is using this to weed out ACKs that are out of date (it's possible for ACK packets to get reordered on the wire), but this does not work with OpenAFS which will just stick the sequence number of the last packet seen into previousPacket. The issue being seen is that big AFS FS.StoreData RPC (eg. of ~256MiB) are timing out when partly sent. A trace was captured, with an additional tracepoint to show ACKs being discarded in rxrpc_input_ack(). Here's an excerpt showing the problem. 52873.203230: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 0002449c q=00024499 fl=09 A DATA packet with sequence number 00024499 has been transmitted (the "q=" field). ... 52873.243296: rxrpc_rx_ack: c=000004ae 00012a2b DLY r=00024499 f=00024497 p=00024496 n=0 52873.243376: rxrpc_rx_ack: c=000004ae 00012a2c IDL r=0002449b f=00024499 p=00024498 n=0 52873.243383: rxrpc_rx_ack: c=000004ae 00012a2d OOS r=0002449d f=00024499 p=0002449a n=2 The Out-Of-Sequence ACK indicates that the server didn't see DATA sequence number 00024499, but did see seq 0002449a (previousPacket, shown as "p=", skipped the number, but firstPacket, "f=", which shows the bottom of the window is set at that point). 52873.252663: rxrpc_retransmit: c=000004ae q=24499 a=02 xp=14581537 52873.252664: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244bc q=00024499 fl=0b RETRANS The packet has been retransmitted. Retransmission recurs until the peer says it got the packet. 52873.271013: rxrpc_rx_ack: c=000004ae 00012a31 OOS r=000244a1 f=00024499 p=0002449e n=6 More OOS ACKs indicate that the other packets that are already in the transmission pipeline are being received. The specific-ACK list is up to 6 ACKs and NAKs. ... 52873.284792: rxrpc_rx_ack: c=000004ae 00012a49 OOS r=000244b9 f=00024499 p=000244b6 n=30 52873.284802: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=63505500 52873.284804: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c2 q=00024499 fl=0b RETRANS 52873.287468: rxrpc_rx_ack: c=000004ae 00012a4a OOS r=000244ba f=00024499 p=000244b7 n=31 52873.287478: rxrpc_rx_ack: c=000004ae 00012a4b OOS r=000244bb f=00024499 p=000244b8 n=32 At this point, the server's receive window is full (n=32) with presumably 1 NAK'd packet and 31 ACK'd packets. We can't transmit any more packets. 52873.287488: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=61327980 52873.287489: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c3 q=00024499 fl=0b RETRANS 52873.293850: rxrpc_rx_ack: c=000004ae 00012a4c DLY r=000244bc f=000244a0 p=00024499 n=25 And now we've received an ACK indicating that a DATA retransmission was received. 7 packets have been processed (the occupied part of the window moved, as indicated by f= and n=). 52873.293853: rxrpc_rx_discard_ack: c=000004ae r=00012a4c 000244a0<00024499 00024499<000244b8 However, the DLY ACK gets discarded because its previousPacket has gone backwards (from p=000244b8, in the ACK at 52873.287478 to p=00024499 in the ACK at 52873.293850). We then end up in a continuous cycle of retransmit/discard. kafs fails to update its window because it's discarding the ACKs and can't transmit an extra packet that would clear the issue because the window is full. OpenAFS doesn't change the previousPacket value in the ACKs because no new DATA packets are received with a different previousPacket number. Fix this by altering the discard check to only discard an ACK based on previousPacket if there was no advance in the firstPacket. This allows us to transmit a new packet which will cause previousPacket to advance in the next ACK. The check, however, needs to allow for the possibility that previousPacket may actually have had the serial number placed in it instead - in which case it will go outside the window and we should ignore it. Fixes: `1a2391c30c` ("rxrpc: Fix detection of out of order acks") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com>	2020-05-20 15:33:41 +01:00
David Howells	d1f129470e	rxrpc: Trace discarded ACKs Add a tracepoint to track received ACKs that are discarded due to being outside of the Tx window. Signed-off-by: David Howells <dhowells@redhat.com>	2020-05-20 15:33:41 +01:00
Luiz Augusto von Dentz	3ca44c16b0	Bluetooth: Consolidate encryption handling in hci_encrypt_cfm This makes hci_encrypt_cfm calls hci_connect_cfm in case the connection state is BT_CONFIG so callers don't have to check the state. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-20 16:30:33 +02:00
Eric Dumazet	4f65e2f483	net: unexport skb_gro_receive() skb_gro_receive() used to be used by SCTP, it is no longer the case. skb_gro_receive_list() is in the same category : never used from modules. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 19:55:36 -07:00
Neil Horman	20a785aa52	sctp: Don't add the shutdown timer if its already been added This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c #1 [f418dd7c] oops_end at c0863e02 #2 [f418dd90] do_invalid_op at c040aaca #3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 #4 [f418de5c] add_timer at c046fa5e #5 [f418de68] sctp_do_sm at f8db8c77 [sctp] #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] #7 [f418df48] inet_shutdown at c080baf9 #8 [f418df5c] sys_shutdown at c079eedf #9 [f418df70] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: jere.leppanen@nokia.com CC: marcelo.leitner@gmail.com CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:46:52 -07:00
Christoph Hellwig	8e3db0bbb2	ipv6: use ->ndo_tunnel_ctl in addrconf_set_dstaddr Use the new ->ndo_tunnel_ctl instead of overriding the address limit and using ->ndo_do_ioctl just to do a pointless user copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:12 -07:00
Christoph Hellwig	68ad6886dd	ipv6: streamline addrconf_set_dstaddr Factor out a addrconf_set_sit_dstaddr helper for the actual work if we found a SIT device, and only hold the rtnl lock around the device lookup and that new helper, as there is no point in holding it over a copy_from_user call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:12 -07:00
Christoph Hellwig	f098846044	ipv6: stub out even more of addrconf_set_dstaddr if SIT is disabled There is no point in copying the structure from userspace or looking up a device if SIT support is not disabled and we'll eventually return -ENODEV anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:12 -07:00
Christoph Hellwig	f60fe2df93	sit: impement ->ndo_tunnel_ctl Implement the ->ndo_tunnel_ctl method, and use ip_tunnel_ioctl to handle userspace requests for the SIOCGETTUNNEL, SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:12 -07:00
Christoph Hellwig	fd5d687b76	sit: refactor ipip6_tunnel_ioctl Split the ioctl handler into one function per command instead of having a all the logic sit in one giant switch statement. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:12 -07:00
Christoph Hellwig	c7e3670516	impr: use ->ndo_tunnel_ctl in ipmr_new_tunnel Use the new ->ndo_tunnel_ctl instead of overriding the address limit and using ->ndo_do_ioctl just to do a pointless user copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:12 -07:00
Christoph Hellwig	607259a695	net: add a new ndo_tunnel_ioctl method This method is used to properly allow kernel callers of the IPv4 route management ioctls. The exsting ip_tunnel_ioctl helper is renamed to ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls touching user memory, and is used for the guts of ndo_tunnel_ctl implementations. A new ip_tunnel_ioctl helper is added that can be wired up directly to the ndo_do_ioctl method and takes care of the copy to and from userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:11 -07:00
Christoph Hellwig	c1fd1182c4	ipv4: consolidate the VIFF_TUNNEL handling in ipmr_new_tunnel Also move the dev_set_allmulti call and the error handling into the ioctl helper. This allows reusing already looked up tunnel_dev pointer and the set up argument structure for the deletion in the error handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:11 -07:00
Christoph Hellwig	c384b8a70c	ipv4: streamline ipmr_new_tunnel Reduce a few level of indentation to simplify the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:45:11 -07:00
Boris Sukholitko	c0bbbdc32f	__netif_receive_skb_core: pass skb by reference __netif_receive_skb_core may change the skb pointer passed into it (e.g. in rx_handler). The original skb may be freed as a result of this operation. The callers of __netif_receive_skb_core may further process original skb by using pt_prev pointer returned by __netif_receive_skb_core thus leading to unpleasant effects. The solution is to pass skb by reference into __netif_receive_skb_core. v2: Added Fixes tag and comment regarding ppt_prev and skb invariant. Fixes: `88eb1944e1` ("net: core: propagate SKB lists through packet_type lookup") Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:38:00 -07:00
Martin KaFai Lau	88d7fcfa3b	net: inet_csk: Fix so_reuseport bind-address cache in tb->fast* The commit `637bc8bbe6` ("inet: reset tb->fastreuseport when adding a reuseport sk") added a bind-address cache in tb->fast. The tb->fast caches the address of a sk which has successfully been binded with SO_REUSEPORT ON. The idea is to avoid the expensive conflict search in inet_csk_bind_conflict(). There is an issue with wildcard matching where sk_reuseport_match() should have returned false but it is currently returning true. It ends up hiding bind conflict. For example, bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. / bind("[::2]:443"); / with SO_REUSEPORT. Succeed. / bind("[::]:443"); / with SO_REUSEPORT. Still Succeed where it shouldn't / The last bind("[::]:443") with SO_REUSEPORT on should have failed because it should have a conflict with the very first bind("[::1]:443") which has SO_REUSEPORT off. However, the address "[::2]" is cached in tb->fast in the second bind. In the last bind, the sk_reuseport_match() returns true because the binding sk's wildcard addr "[::]" matches with the "[::2]" cached in tb->fast. The correct bind conflict is reported by removing the second bind such that tb->fast cache is not involved and forces the bind("[::]:443") to go through the inet_csk_bind_conflict(): bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. / bind("[::]:443"); / with SO_REUSEPORT. -EADDRINUSE / The expected behavior for sk_reuseport_match() is, it should only allow the "cached" tb->fast address to be used as a wildcard match but not the address of the binding sk. To do that, the current "bool match_wildcard" arg is split into "bool match_sk1_wildcard" and "bool match_sk2_wildcard". This change only affects the sk_reuseport_match() which is only used by inet_csk (e.g. TCP). The other use cases are calling inet_rcv_saddr_equal() and this patch makes it pass the same "match_wildcard" arg twice to the "ipv[46]_rcv_saddr_equal(..., match_wildcard, match_wildcard)". Cc: Josef Bacik <jbacik@fb.com> Fixes: `637bc8bbe6` ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 15:36:21 -07:00
Julian Wiedmann	e9a36ca5f6	net/af_iucv: clean up function prototypes Remove a bunch of forward declarations (trivially shifting code around where needed), and make a few functions static. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 12:50:14 -07:00
Julian Wiedmann	dca1262f97	net/af_iucv: remove a redundant zero initialization txmsg is declared as {0}, no need to clear individual fields later on. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 12:50:14 -07:00
Julian Wiedmann	0d1c7664ed	net/af_iucv: replace open-coded U16_MAX Improve the readability of a range check. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 12:50:14 -07:00
Julian Wiedmann	585bc22095	net/af_iucv: remove pm support commit `394216275c` ("s390: remove broken hibernate / power management support") removed support for ARCH_HIBERNATION_POSSIBLE from s390. So drop the unused pm ops from the s390-only af_iucv socket code. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 12:50:14 -07:00
Julian Wiedmann	4b32f86bf1	net/iucv: remove pm support commit `394216275c` ("s390: remove broken hibernate / power management support") removed support for ARCH_HIBERNATION_POSSIBLE from s390. So drop the unused pm ops from the s390-only iucv bus driver. CC: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 12:50:14 -07:00
Todd Malsbary	12555a2d97	mptcp: use rightmost 64 bits in ADD_ADDR HMAC This changes the HMAC used in the ADD_ADDR option from the leftmost 64 bits to the rightmost 64 bits as described in RFC 8684, section 3.4.1. This issue was discovered while adding support to packetdrill for the ADD_ADDR v1 option. Fixes: `3df523ab58` ("mptcp: Add ADD_ADDR handling") Signed-off-by: Todd Malsbary <todd.malsbary@linux.intel.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-19 12:35:51 -07:00
Daniel Borkmann	1b66d25361	bpf: Add get{peer, sock}name attach types for sock_addr As stated in `983695fa67` ("bpf: fix unconnected udp hooks"), the objective for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be transparent to applications. In Cilium we make use of these hooks [0] in order to enable E-W load balancing for existing Kubernetes service types for all Cilium managed nodes in the cluster. Those backends can be local or remote. The main advantage of this approach is that it operates as close as possible to the socket, and therefore allows to avoid packet-based NAT given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses. This also allows to expose NodePort services on loopback addresses in the host namespace, for example. As another advantage, this also efficiently blocks bind requests for applications in the host namespace for exposed ports. However, one missing item is that we also need to perform reverse xlation for inet{,6}_getname() hooks such that we can return the service IP/port tuple back to the application instead of the remote peer address. The vast majority of applications does not bother about getpeername(), but in a few occasions we've seen breakage when validating the peer's address since it returns unexpectedly the backend tuple instead of the service one. Therefore, this trivial patch allows to customise and adds a getpeername() as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order to address this situation. Simple example: # ./cilium/cilium service list ID Frontend Service Type Backend 1 1.2.3.4:80 ClusterIP 1 => 10.0.0.10:80 Before; curl's verbose output example, no getpeername() reverse xlation: # curl --verbose 1.2.3.4 * Rebuilt URL to: 1.2.3.4/ * Trying 1.2.3.4... * TCP_NODELAY set * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0) > GET / HTTP/1.1 > Host: 1.2.3.4 > User-Agent: curl/7.58.0 > Accept: / [...] After; with getpeername() reverse xlation: # curl --verbose 1.2.3.4 * Rebuilt URL to: 1.2.3.4/ * Trying 1.2.3.4... * TCP_NODELAY set * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0) > GET / HTTP/1.1 > Host: 1.2.3.4 > User-Agent: curl/7.58.0 > Accept: / [...] Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed peer to the context similar as in inet{,6}_getname() fashion, but API-wise this is suboptimal as it always enforces programs having to test for ctx->peer which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split. Similarly, the checked return code is on tnum_range(1, 1), but if a use case comes up in future, it can easily be changed to return an error code instead. Helper and ctx member access is the same as with connect/sendmsg/etc hooks. [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net	2020-05-19 11:32:04 -07:00
Jesper Dangaard Brouer	d800bad67d	bpf: Fix too large copy from user in bpf_test_init Commit `bc56c919fc` ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().") recently changed bpf_prog_test_run_xdp() to use larger frames for XDP in order to test tail growing frames (via bpf_xdp_adjust_tail) and to have memory backing frame better resemble drivers. The commit contains a bug, as it tries to copy the max data size from userspace, instead of the size provided by userspace. This cause XDP unit tests to fail sporadically with EFAULT, an unfortunate behavior. The fix is to only copy the size specified by userspace. Fixes: `bc56c919fc` ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/158980712729.256597.6115007718472928659.stgit@firesoul	2020-05-19 17:56:34 +02:00
Alexey Gladkov	9d78edeaec	proc: proc_pid_ns takes super_block as an argument syzbot found that touch /proc/testfile causes NULL pointer dereference at tomoyo_get_local_path() because inode of the dentry is NULL. Before `c59f415a7c`, Tomoyo received pid_ns from proc's s_fs_info directly. Since proc_pid_ns() can only work with inode, using it in the tomoyo_get_local_path() was wrong. To avoid creating more functions for getting proc_ns, change the argument type of the proc_pid_ns() function. Then, Tomoyo can use the existing super_block to get pid_ns. Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com Fixes: `c59f415a7c` ("Use proc_pid_ns() to get pid_namespace from the proc superblock") Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2020-05-19 07:07:50 -05:00
Christoph Hellwig	dc13c8761c	ipv4,appletalk: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctl To prepare removing the global routing_ioctl hack start lifting the code into the ipv4 and appletalk ->compat_ioctl handlers. Unlike the existing handler we don't bother copying in the name - there are no compat issues for char arrays. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-18 17:35:02 -07:00
Christoph Hellwig	a500492354	appletalk: factor out a atrtr_ioctl_addrt helper Add a helper than can be shared with the upcoming compat ioctl handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-18 17:35:02 -07:00
Christoph Hellwig	3986912f6a	ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctl To prepare removing the global routing_ioctl hack start lifting the code into a newly added ipv6 ->compat_ioctl handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-18 17:35:02 -07:00
Christoph Hellwig	7c1552da90	ipv6: lift copy_from_user out of ipv6_route_ioctl Prepare for better compat ioctl handling by moving the user copy out of ipv6_route_ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-18 17:35:02 -07:00
Chuck Lever	a5cda73e49	SUNRPC: Restructure svc_tcp_recv_record() Refactor: svc_recvfrom() is going to be converted to read into bio_vecs in a moment. Unhook the only other caller, svc_tcp_recv_record(), which just wants to read the 4-byte stream record marker into a kvec. While we're in the area, streamline this helper by straight-lining the hot path, replace dprintk call sites with tracepoints, and reduce the number of atomic bit operations in this path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:23 -04:00
Chuck Lever	02648908d1	SUNRPC: Rename svc_sock::sk_reclen Clean up. I find the name of the svc_sock::sk_reclen field confusing, so I've changed it to better reflect its function. This field is not read directly to get the record length. Rather, it is a buffer containing a record marker that needs to be decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:23 -04:00
Chuck Lever	b4af59328c	SUNRPC: Trace server-side rpcbind registration events Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	a0469f46fa	SUNRPC: Replace dprintk call sites in TCP state change callouts Report TCP socket state changes and accept failures via tracepoints, replacing dprintk() call sites. No tracepoint is added in svc_tcp_listen_data_ready. There's no information available there that isn't also reported by the svcsock_new_socket and the accept failure tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	998024dee1	SUNRPC: Add more svcsock tracepoints In addition to tracing recently-updated socket sendto events, this commit adds a trace event class that can be used for additional svcsock-related tracepoints in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	d998882b4b	SUNRPC: Remove "#include <trace/events/skb.h>" Clean up: Commit `850cbaddb5` ("udp: use it's own memory accounting schema") removed the last skb-related tracepoint from svcsock.c, so it is no longer necessary to include trace/events/skb.h. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	11bbb0f76e	SUNRPC: Trace a few more generic svc_xprt events In lieu of dprintks or tracepoints in each individual transport implementation, introduce tracepoints in the generic part of the RPC layer. These typically fire for connection lifetime events, so shouldn't contribute a lot of noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	4b8f380e46	SUNRPC: Tracepoint to record errors in svc_xpo_create() Capture transport creation failures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	e979a173a0	svcrdma: Add tracepoints to report ->xpo_accept failures Failure to accept a connection is typically due to a problem specific to a transport type. Also, ->xpo_accept returns NULL on error rather than reporting a specific problem. So, add failure-specific tracepoints in svc_rdma_accept(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	decc13f7eb	svcrdma: Displayed remote IP address should match stored address Clean up: After commit `1e091c3bbf` ("svcrdma: Ignore source port when computing DRC hash"), the IP address stored in xpt_remote always has a port number of zero. Thus, there's no need to display the port number when displaying the IP address of a remote NFS/RDMA client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:22 -04:00
Chuck Lever	27ce629444	svcrdma: Rename tracepoints that record header decoding errors Clean up: Use a consistent naming convention so that these trace points can be enabled quickly via a glob. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Chuck Lever	f5046b8f43	svcrdma: Remove backchannel dprintk call sites Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Chuck Lever	ea740bd5f5	svcrdma: Fix backchannel return code Way back when I was writing the RPC/RDMA server-side backchannel code, I misread the TCP backchannel reply handler logic. When svc_tcp_recvfrom() successfully receives a backchannel reply, it does not return -EAGAIN. It sets XPT_DATA and returns zero. Update svc_rdma_recvfrom() to return zero. Here, XPT_DATA doesn't need to be set again: it is set whenever a new message is received, behind a spin lock in a single threaded context. Also, if handling the cb reply is not successful, the message is simply dropped. There's no special message framing to deal with as there is in the TCP case. Now that the handle_bc_reply() return value is ignored, I've removed the dprintk call sites in the error exit of handle_bc_reply() in favor of trace points in other areas that already report the error cases. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Chuck Lever	dbc17acd5d	svcrdma: trace undersized Write chunks Clean up: Replace a dprintk call site. This is the last remaining dprintk call site in svc_rdma_rw.c, so remove dprintk infrastructure as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Chuck Lever	9d20063892	svcrdma: Trace page overruns when constructing RDMA Reads Clean up: Replace a dprintk call site with a tracepoint. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Chuck Lever	f4e53e1ce3	svcrdma: Clean up handling of get_rw_ctx errors Clean up: Replace two dprintk call sites with a tracepoint. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Chuck Lever	2abfbe7e3a	svcrdma: Clean up the tracing for rw_ctx_init errors - De-duplicate code - Rename the tracepoint with "_err" to allow enabling via glob - Report the sg_cnt for the failing rw_ctx - Fix a dumb signage issue Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Chuck Lever	ca4faf543a	SUNRPC: Move xpt_mutex into socket xpo_sendto methods It appears that the RPC/RDMA transport does not need serialization of calls to its xpo_sendto method. Move the mutex into the socket methods that still need that serialization. Tail latencies are unambiguously better with this patch applied. fio randrw 8KB 70/30 on NFSv3, smaller numbers are better: clat percentiles (usec): With xpt_mutex: r \| 99.99th=[ 8848] w \| 99.99th=[ 9634] Without xpt_mutex: r \| 99.99th=[ 8586] w \| 99.99th=[ 8979] Serializing the construction of RPC/RDMA transport headers is not really necessary at this point, because the Linux NFS server implementation never changes its credit grant on a connection. If that should change, then svc_rdma_sendto will need to serialize access to the transport's credit grant fields. Reported-by: kbuild test robot <lkp@intel.com> [ cel: fix uninitialized variable warning ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-05-18 10:21:21 -04:00
Hsin-Yu Chao	56b5453a86	Bluetooth: Add SCO fallback for invalid LMP parameters error Bluetooth PTS test case HFP/AG/ACC/BI-12-I accepts SCO connection with invalid parameter at the first SCO request expecting AG to attempt another SCO request with the use of "safe settings" for given codec, base on section 5.7.1.2 of HFP 1.7 specification. This patch addresses it by adding "Invalid LMP Parameters" (0x1e) to the SCO fallback case. Verified with below log: < HCI Command: Setup Synchronous Connection (0x01\|0x0028) plen 17 Handle: 256 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x0380 3-EV3 may not be used 2-EV5 may not be used 3-EV5 may not be used > HCI Event: Command Status (0x0f) plen 4 Setup Synchronous Connection (0x01\|0x0028) ncmd 1 Status: Success (0x00) > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 1 > HCI Event: Synchronous Connect Complete (0x2c) plen 17 Status: Invalid LMP Parameters / Invalid LL Parameters (0x1e) Handle: 0 Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC) Link type: eSCO (0x02) Transmission interval: 0x00 Retransmission window: 0x02 RX packet length: 0 TX packet length: 0 Air mode: Transparent (0x03) < HCI Command: Setup Synchronous Connection (0x01\|0x0028) plen 17 Handle: 256 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 8 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x03c8 EV3 may be used 2-EV3 may not be used 3-EV3 may not be used 2-EV5 may not be used 3-EV5 may not be used > HCI Event: Command Status (0x0f) plen 4 Setup Synchronous Connection (0x01\|0x0028) ncmd 1 Status: Success (0x00) > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 5 > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 1 > HCI Event: Synchronous Connect Complete (0x2c) plen 17 Status: Success (0x00) Handle: 257 Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC) Link type: eSCO (0x02) Transmission interval: 0x06 Retransmission window: 0x04 RX packet length: 30 TX packet length: 30 Air mode: Transparent (0x03) Signed-off-by: Hsin-Yu Chao <hychao@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-18 10:00:22 +02:00
Łukasz Rymanowski	49c06c9eb1	Bluetooth: Fix for GAP/SEC/SEM/BI-10-C Security Mode 1 level 4, force us to use have key size 16 octects long. This patch adds check for that. This is required for the qualification test GAP/SEC/SEM/BI-10-C Logs from test when ATT is configured with sec level BT_SECURITY_FIPS < ACL Data TX: Handle 3585 flags 0x00 dlen 11 #28 [hci0] 3.785965 SMP: Pairing Request (0x01) len 6 IO capability: DisplayYesNo (0x01) OOB data: Authentication data not present (0x00) Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d) Max encryption key size: 16 Initiator key distribution: EncKey Sign (0x05) Responder key distribution: EncKey IdKey Sign (0x07) > ACL Data RX: Handle 3585 flags 0x02 dlen 11 #35 [hci0] 3.883020 SMP: Pairing Response (0x02) len 6 IO capability: DisplayYesNo (0x01) OOB data: Authentication data not present (0x00) Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d) Max encryption key size: 7 Initiator key distribution: EncKey Sign (0x05) Responder key distribution: EncKey IdKey Sign (0x07) < ACL Data TX: Handle 3585 flags 0x00 dlen 6 #36 [hci0] 3.883136 SMP: Pairing Failed (0x05) len 1 Reason: Encryption key size (0x06) Signed-off-by: Łukasz Rymanowski <lukasz.rymanowski@codecoup.pl> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-18 09:58:53 +02:00
Xin Long	3ffb93ba32	esp4: improve xfrm4_beet_gso_segment() to be more readable This patch is to improve the code to make xfrm4_beet_gso_segment() more readable, and keep consistent with xfrm6_beet_gso_segment(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-05-18 09:55:19 +02:00
John Hubbard	dbfe7d7437	rds: convert get_user_pages() --> pin_user_pages() This code was using get_user_pages_fast(), in a "Case 2" scenario (DMA/RDMA), using the categorization from [1]. That means that it's time to convert the get_user_pages_fast() + put_page() calls to pin_user_pages_fast() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: rds-devel@oss.oracle.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:37:45 -07:00
Florian Westphal	4930f4831b	net: allow __skb_ext_alloc to sleep mptcp calls this from the transmit side, from process context. Allow a sleeping allocation instead of unconditional GFP_ATOMIC. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:35:34 -07:00
Florian Westphal	5c8264435d	mptcp: remove inner wait loop from mptcp_sendmsg_frag previous patches made sure we only call into this function when these prerequisites are met, so no need to wait on the subflow socket anymore. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/7 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:35:34 -07:00
Florian Westphal	17091708d1	mptcp: fill skb page frag cache outside of mptcp_sendmsg_frag The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is another preparation patch that makes sure we call mptcp_sendmsg_frag only if the page frag cache has been refilled. Followup patch will remove the wait loop from mptcp_sendmsg_frag(). The retransmit worker doesn't need to do this refill as it won't transmit new mptcp-level data. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:35:34 -07:00
Florian Westphal	149f7c71e2	mptcp: fill skb extension cache outside of mptcp_sendmsg_frag The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is a preparation patch that makes sure we call mptcp_sendmsg_frag only if a skb extension has been allocated. Moreover, such allocation currently uses GFP_ATOMIC while it could use sleeping allocation instead. Followup patches will remove the wait loop from mptcp_sendmsg_frag() and will allow to do a sleeping allocation for the extension. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:35:34 -07:00
Florian Westphal	72511aab95	mptcp: avoid blocking in tcp_sendpages The transmit loop continues to xmit new data until an error is returned or all data was transmitted. For the blocking i/o case, this means that tcp_sendpages() may block on the subflow until more space becomes available, i.e. we end up sleeping with the mptcp socket lock held. Instead we should check if a different subflow is ready to be used. This restarts the subflow sk lookup when the tx operation succeeded and the tcp subflow can't accept more data or if tcp_sendpages indicates -EAGAIN on a blocking mptcp socket. In that case we also need to set the NOSPACE bit to make sure we get notified once memory becomes available. In case all subflows are busy, the existing logic will wait until a subflow is ready, releasing the mptcp socket lock while doing so. The mptcp worker already sets DONTWAIT, so no need to make changes there. v2: * set NOSPACE bit * add a comment to clarify that mptcp-sk sndbuf limits need to be checked as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:35:34 -07:00
Florian Westphal	fb529e62d3	mptcp: break and restart in case mptcp sndbuf is full Its not enough to check for available tcp send space. We also hold on to transmitted data for mptcp-level retransmits. Right now we will send more and more data if the peer can ack data at the tcp level fast enough, since that frees up tcp send buffer space. But we also need to check that data was acked and reclaimed at the mptcp level. Therefore add needed check in mptcp_sendmsg, flush tcp data and wait until more mptcp snd space becomes available if we are over the limit. Before we wait for more data, also make sure we start the retransmit timer if we ran out of sndbuf space. Otherwise there is a very small chance that we wait forever: * receiver is waiting for data * sender is blocked because mptcp socket buffer is full * at tcp level, all data was acked * mptcp-level snd_una was not updated, because last ack that acknowledged the last data packet carried an older MPTCP-ack. Restarting the retransmit timer avoids this problem: if TCP subflow is idle, data is retransmitted from the RTX queue. New data will make the peer send a new, updated MPTCP-Ack. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:35:34 -07:00
Florian Westphal	a0e17064d4	mptcp: move common nospace-pattern to a helper Paolo noticed that ssk_check_wmem() has same pattern, so add/use common helper for both places. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:35:34 -07:00
Yuqi Jin	a6211caa63	net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()" Commit `adb03115f4` ("net: get rid of an signed integer overflow in ip_idents_reserve()") used atomic_cmpxchg to replace "atomic_add_return" inside the function "ip_idents_reserve". The reason was to avoid UBSAN warning. However, this change has caused performance degrade and in GCC-8, fno-strict-overflow is now mapped to -fwrapv -fwrapv-pointer and signed integer overflow is now undefined by default at all optimization levels[1]. Moreover, it was a bug in UBSAN vs -fwrapv /-fno-strict-overflow, so Let's revert it safely. [1] https://gcc.gnu.org/gcc-8/changes.html Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiong Wang <jiongwang@huawei.com> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 12:33:51 -07:00
David Ahern	84be69b869	nexthop: Fix attribute checking for groups For nexthop groups, attributes after NHA_GROUP_TYPE are invalid, but nh_check_attr_group starts checking at NHA_GROUP. The group type defaults to multipath and the NHA_GROUP_TYPE is currently optional so this has slipped through so far. Fix the attribute checking to handle support of new group types. Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Signed-off-by: ASSOGBA Emery <assogba.emery@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-17 10:41:24 -07:00
Masahiro Yamada	8a2cc0505c	bpfilter: use 'userprogs' syntax to build bpfilter_umh The user mode helper should be compiled for the same architecture as the kernel. This Makefile reused the 'hostprogs' syntax by overriding HOSTCC with CC. Use the new syntax 'userprogs' to fix the Makefile mess. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org>	2020-05-17 18:52:01 +09:00
Masahiro Yamada	b1183b6dca	bpfilter: check if $(CC) can link static libc in Kconfig On Fedora, linking static glibc requires the glibc-static RPM package, which is not part of the glibc-devel package. CONFIG_CC_CAN_LINK does not check the capability of static linking, so you can enable CONFIG_BPFILTER_UMH, then fail to build: HOSTLD net/bpfilter/bpfilter_umh /usr/bin/ld: cannot find -lc collect2: error: ld returned 1 exit status Add CONFIG_CC_CAN_LINK_STATIC, and make CONFIG_BPFILTER_UMH depend on it. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org>	2020-05-17 18:52:01 +09:00
Masahiro Yamada	9371f86ecb	bpfilter: match bit size of bpfilter_umh to that of the kernel bpfilter_umh is built for the default machine bit of the compiler, which may not match to the bit size of the kernel. This happens in the scenario below: You can use biarch GCC that defaults to 64-bit for building the 32-bit kernel. In this case, Kbuild passes -m32 to teach the compiler to produce 32-bit kernel space objects. However, it is missing when building bpfilter_umh. It is built as a 64-bit ELF, and then embedded into the 32-bit kernel. The 32-bit kernel and 64-bit umh is a bad combination. In theory, we can have 32-bit umh running on 64-bit kernel, but we do not have a good reason to support such a usecase. The best is to match the bit size between them. Pass -m32 or -m64 to the umh build command if it is found in $(KBUILD_CFLAGS). Evaluate CC_CAN_LINK against the kernel bit-size. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-05-17 18:52:01 +09:00
Jakub Kicinski	75c36dbb1c	ethtool: don't call set_channels in drivers if config didn't change Don't call drivers if nothing changed. Netlink code already contains this logic. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 13:56:30 -07:00
Jakub Kicinski	7be92514b9	ethtool: check if there is at least one channel for TX/RX in the core Having a channel config with no ability to RX or TX traffic is clearly wrong. Check for this in the core so the drivers don't have to. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 13:56:30 -07:00
Christoph Paasch	a0c1d0eafd	mptcp: Use 32-bit DATA_ACK when possible RFC8684 allows to send 32-bit DATA_ACKs as long as the peer is not sending 64-bit data-sequence numbers. The 64-bit DSN is only there for extreme scenarios when a very high throughput subflow is combined with a long-RTT subflow such that the high-throughput subflow wraps around the 32-bit sequence number space within an RTT of the high-RTT subflow. It is thus a rare scenario and we should try to use the 32-bit DATA_ACK instead as long as possible. It allows to reduce the TCP-option overhead by 4 bytes, thus makes space for an additional SACK-block. It also makes tcpdumps much easier to read when the DSN and DATA_ACK are both either 32 or 64-bit. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 13:51:10 -07:00
DENG Qingfang	5e5502e012	net: dsa: mt7530: fix roaming from DSA user ports When a client moves from a DSA user port to a software port in a bridge, it cannot reach any other clients that connected to the DSA user ports. That is because SA learning on the CPU port is disabled, so the switch ignores the client's frames from the CPU port and still thinks it is at the user port. Fix it by enabling SA learning on the CPU port. To prevent the switch from learning from flooding frames from the CPU port, set skb->offload_fwd_mark to 1 for unicast and broadcast frames, and let the switch flood them instead of trapping to the CPU port. Multicast frames still need to be trapped to the CPU port for snooping, so set the SA_DIS bit of the MTK tag to 1 when transmitting those frames to disable SA learning. Fixes: `b8f126a8d5` ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 13:49:28 -07:00
Nicolas Dichtel	9efd6a3cec	netns: enable to inherit devconf from current netns The goal is to be able to inherit the initial devconf parameters from the current netns, ie the netns where this new netns has been created. This is useful in a containers environment where /proc/sys is read only. For example, if a pod is created with specifics devconf parameters and has the capability to create netns, the user expects to get the same parameters than his 'init_net', which is not the real init_net in this case. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 13:46:37 -07:00
Madhuparna Bhowmik	b6dd5acde3	ipv6: Fix suspicious RCU usage warning in ip6mr This patch fixes the following warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc4-next-20200507-syzkaller #0 Not tainted ----------------------------- net/ipv6/ip6mr.c:124 RCU-list traversed in non-reader section!! ipmr_new_table() returns an existing table, but there is no table at init. Therefore the condition: either holding rtnl or the list is empty is used. Fixes: `d1db275dd3` ("ipv6: ip6mr: support multiple tables") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 13:41:53 -07:00
Linus Torvalds	12bf0b632e	NFS client bugfixes for Linux 5.7 Highlights include: Stable fixes: - nfs: fix NULL deference in nfs4_get_valid_delegation Bugfixes: - Fix corruption of the return value in cachefiles_read_or_alloc_pages() - Fix several fscache cookie issues - Fix a fscache queuing race that can trigger a BUG_ON - NFS: Fix 2 use-after-free regressions due to the RPC_TASK_CRED_NOREF flag - SUNRPC: Fix a use-after-free regression in rpc_free_client_work() - SUNRPC: Fix a race when tearing down the rpc client debugfs directory - SUNRPC: Signalled ASYNC tasks need to exit - NFSv3: fix rpc receive buffer size for MOUNT call -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6/AkgACgkQZwvnipYK APICLQ/9ENY+mQmVMSbtw2VGlBphS+GLN44k70NgmjAaI9n8f3ILZyLl0/iHEPFU ZaCWhPWEs76olLfCWwoQFzISIUTHWVUow8mJn7gTh4mq8n9UFv8zgUxtJ3HTfEHt rtujG9xsK0Oa351UPJWE5yC7PvFzMohcc1vVD8WQeGJQ3sMBVHTuOuPatIv3vK6s 8MflKTbtz/gUkcbWMCk9ljMPXr6/Ksgu9GZDnDFAYZBfFkwx//RNmq6K+z1Ru15s tkmPPZGNMCfKblrnUXUmPt78wxSExmWrXroSMNas2fyeOXgPL0ogNx8vfdFcFsxs sHpMkF97+npntNj1y5om6GrdU4SRYUFqXT7pNqV4wuGguOfFELIXrJIQuCDaKrGD ApEoo9UDisGCLdqs738ascZFHZiTQoy6drbpR8moalqhYkTI7Al/pPxHnozGDYsJ +wElaFXZX2hlPc6ih1q54RcB+D4qswDC9QudArKc9hJEKPv+SsmiVhBG/f+X+Jca M19UJGWZvRtY8L+0yJdG22O9Hwo0zSK917gtOZNgkwtkkKgzjj2kcNHTK2jGBEuR pqEIQCreUH8Le0WR9cPeJeYc2/HeCEWDHrf/q+gFRClaZMe+0Sfu3z3pBH3v0T9q qoZ1VvCDUhggsZyJZebcPTCL+ghqOXFJajHLlcA6BSdGJ/iXq2M= =lqFu -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.7-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - nfs: fix NULL deference in nfs4_get_valid_delegation Bugfixes: - Fix corruption of the return value in cachefiles_read_or_alloc_pages() - Fix several fscache cookie issues - Fix a fscache queuing race that can trigger a BUG_ON - NFS: Fix two use-after-free regressions due to the RPC_TASK_CRED_NOREF flag - SUNRPC: Fix a use-after-free regression in rpc_free_client_work() - SUNRPC: Fix a race when tearing down the rpc client debugfs directory - SUNRPC: Signalled ASYNC tasks need to exit - NFSv3: fix rpc receive buffer size for MOUNT call" * tag 'nfs-for-5.7-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv3: fix rpc receive buffer size for MOUNT call SUNRPC: 'Directory with parent 'rpc_clnt' already present!' NFS/pnfs: Don't use RPC_TASK_CRED_NOREF with pnfs NFS: Don't use RPC_TASK_CRED_NOREF with delegreturn SUNRPC: Signalled ASYNC tasks need to exit nfs: fix NULL deference in nfs4_get_valid_delegation SUNRPC: fix use-after-free in rpc_free_client_work() cachefiles: Fix race between read_waiter and read_copier involving op->to_do NFSv4: Fix fscache cookie aux_data to ensure change_attr is included NFS: Fix fscache super_cookie allocation NFS: Fix fscache super_cookie index_key from changing after umount cachefiles: Fix corruption of the return value in cachefiles_read_or_alloc_pages()	2020-05-15 14:03:13 -07:00
David S. Miller	da07f52d3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 13:48:59 -07:00
Linus Torvalds	f85c1598dd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix sk_psock reference count leak on receive, from Xiyu Yang. 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven. 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this, from Maciej Żenczykowski. 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo Abeni. 5) Fix netprio cgroup v2 leak, from Zefan Li. 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal. 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet. 8) Various bpf probe handling fixes, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) selftests: mptcp: pm: rm the right tmp file dpaa2-eth: properly handle buffer size restrictions bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range bpf: Restrict bpf_probe_read{, str}() only to archs where they work MAINTAINERS: Mark networking drivers as Maintained. ipmr: Add lockdep expression to ipmr_for_each_table macro ipmr: Fix RCU list debugging warning drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 tcp: fix error recovery in tcp_zerocopy_receive() MAINTAINERS: Add Jakub to networking drivers. MAINTAINERS: another add of Karsten Graul for S390 networking drivers: ipa: fix typos for ipa_smp2p structure doc pppoe: only process PADT targeted at local interfaces selftests/bpf: Enforce returning 0 for fentry/fexit programs bpf: Enforce returning 0 for fentry/fexit progs net: stmmac: fix num_por initialization security: Fix the default value of secid_to_secctx hook libbpf: Fix register naming in PT_REGS s390 macros ...	2020-05-15 13:10:06 -07:00
Paolo Abeni	729cd6436f	mptcp: cope better with MP_JOIN failure Currently, on MP_JOIN failure we reset the child socket, but leave the request socket untouched. tcp_check_req will deal with it according to the 'tcp_abort_on_overflow' sysctl value - by default the req socket will stay alive. The above leads to inconsistent behavior on MP JOIN failure, and bad listener overflow accounting. This patch addresses the issue leveraging the infrastructure just introduced to ask the TCP stack to drop the req on failure. The child socket is not freed anymore by subflow_syn_recv_sock(), instead it's moved to a dead state and will be disposed by the next sock_put done by the TCP stack, so that listener overflow accounting is not affected by MP JOIN failure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 12:30:13 -07:00
Paolo Abeni	2f8a397d0a	inet_connection_sock: factor out destroy helper. Move the steps to prepare an inet_connection_sock for forced disposal inside a separate helper. No functional changes inteded, this will just simplify the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 12:30:13 -07:00
Paolo Abeni	90bf45134d	mptcp: add new sock flag to deal with join subflows MP_JOIN subflows must not land into the accept queue. Currently tcp_check_req() calls an mptcp specific helper to detect such scenario. Such helper leverages the subflow context to check for MP_JOIN subflows. We need to deal also with MP JOIN failures, even when the subflow context is not available due allocation failure. A possible solution would be changing the syn_recv_sock() signature to allow returning a more descriptive action/ error code and deal with that in tcp_check_req(). Since the above need is MPTCP specific, this patch instead uses a TCP request socket hole to add a MPTCP specific flag. Such flag is used by the MPTCP syn_recv_sock() to tell tcp_check_req() how to deal with the request socket. This change is a no-op for !MPTCP build, and makes the MPTCP code simpler. It allows also the next patch to deal correctly with MP JOIN failure. v1 -> v2: - be more conservative on drop_req initialization (Mat) RFC -> v1: - move the drop_req bit inside tcp_request_sock (Eric) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 12:30:13 -07:00
David S. Miller	3430223d39	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-15 The following pull-request contains BPF updates for your net-next tree. We've added 37 non-merge commits during the last 1 day(s) which contain a total of 67 files changed, 741 insertions(+), 252 deletions(-). The main changes are: 1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper. 2) bpftool can probe CONFIG_HZ, from Daniel. 3) CAP_BPF is introduced to isolate user processes that use BPF infra and to secure BPF networking services by dropping CAP_SYS_ADMIN requirement in certain cases, from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 10:43:52 -07:00
Vlad Buslov	0348451db9	net: sched: cls_flower: implement terse dump support Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only dump handle, flags and action data in terse mode. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 10:23:11 -07:00
Vlad Buslov	ca44b738e5	net: sched: implement terse dump support in act Extend tcf_action_dump() with boolean argument 'terse' that is used to request terse-mode action dump. In terse mode only essential data needed to identify particular action (action kind, cookie, etc.) and its stats is put to resulting skb and everything else is omitted. Implement tcf_exts_terse_dump() helper in cls API that is intended to be used to request terse dump of all exts (actions) attached to the filter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 10:23:11 -07:00
Vlad Buslov	f8ab1807a9	net: sched: introduce terse dump flag Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option is intended to be used to improve performance of TC filter dump when userland only needs to obtain stats and not the whole classifier/action data. Extend struct tcf_proto_ops with new terse_dump() callback that must be defined by supporting classifier implementations. Support of the options in specific classifiers and actions is implemented in following patches in the series. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 10:23:11 -07:00
Tobias Waldekranz	2e186a2cf8	net: core: recursively find netdev by device node The assumption that a device node is associated either with the netdev's device, or the parent of that device, does not hold for all drivers. E.g. Freescale's DPAA has two layers of platform devices above the netdev. Instead, recursively walk up the tree from the netdev, allowing any parent to match against the sought after node. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-15 10:18:49 -07:00
Alexei Starovoitov	2c78ee898d	bpf: Implement CAP_BPF Implement permissions as stated in uapi/linux/capability.h In order to do that the verifier allow_ptr_leaks flag is split into four flags and they are set as: env->allow_ptr_leaks = bpf_allow_ptr_leaks(); env->bypass_spec_v1 = bpf_bypass_spec_v1(); env->bypass_spec_v4 = bpf_bypass_spec_v4(); env->bpf_capable = bpf_capable(); The first three currently equivalent to perfmon_capable(), since leaking kernel pointers and reading kernel memory via side channel attacks is roughly equivalent to reading kernel memory with cap_perfmon. 'bpf_capable' enables bounded loops, precision tracking, bpf to bpf calls and other verifier features. 'allow_ptr_leaks' enable ptr leaks, ptr conversions, subtraction of pointers. 'bypass_spec_v1' disables speculative analysis in the verifier, run time mitigations in bpf array, and enables indirect variable access in bpf programs. 'bypass_spec_v4' disables emission of sanitation code by the verifier. That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will have speculative checks done by the verifier and other spectre mitigation applied. Such networking BPF program will not be able to leak kernel pointers and will not be able to access arbitrary kernel memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com	2020-05-15 17:29:41 +02:00
Jesper Dangaard Brouer	bc56c919fc	bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp(). Update the memory requirements, when adding xdp.frame_sz in BPF test_run function bpf_prog_test_run_xdp() which e.g. is used by XDP selftests. Specifically add the expected reserved tailroom, but also allocated a larger memory area to reflect that XDP frames usually comes in this format. Limit the provided packet data size to 4096 minus headroom + tailroom, as this also reflect a common 3520 bytes MTU limit with XDP. Note that bpf_test_init already use a memory allocation method that clears memory. Thus, this already guards against leaking uninit kernel memory. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158945349549.97035.15316291762482444006.stgit@firesoul	2020-05-14 21:21:56 -07:00
Jesper Dangaard Brouer	ddb47d518c	xdp: Clear grow memory in bpf_xdp_adjust_tail() Clearing memory of tail when grow happens, because it is too easy to write a XDP_PASS program that extend the tail, which expose this memory to users that can run tcpdump. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945349039.97035.5262100484553494.stgit@firesoul	2020-05-14 21:21:56 -07:00
Jesper Dangaard Brouer	c8741e2bfe	xdp: Allow bpf_xdp_adjust_tail() to grow packet size Finally, after all drivers have a frame size, allow BPF-helper bpf_xdp_adjust_tail() to grow or extend packet size at frame tail. Remember that helper/macro xdp_data_hard_end have reserved some tailroom. Thus, this helper makes sure that the BPF-prog don't have access to this tailroom area. V2: Remove one chicken check and use WARN_ONCE for other Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158945348530.97035.12577148209134239291.stgit@firesoul	2020-05-14 21:21:56 -07:00
Jesper Dangaard Brouer	34cc0b338a	xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame Use hole in struct xdp_frame, when adding member frame_sz, which keeps same sizeof struct (32 bytes) Drivers ixgbe and sfc had bug cases where the necessary/expected tailroom was not reserved. This can lead to some hard to catch memory corruption issues. Having the drivers frame_sz this can be detected when packet length/end via xdp->data_end exceed the xdp_data_hard_end pointer, which accounts for the reserved the tailroom. When detecting this driver issue, simply fail the conversion with NULL, which results in feedback to driver (failing xdp_do_redirect()) causing driver to drop packet. Given the lack of consistent XDP stats, this can be hard to troubleshoot. And given this is a driver bug, we want to generate some more noise in form of a WARN stack dump (to ID the driver code that inlined convert_to_xdp_frame). Inlining the WARN macro is problematic, because it adds an asm instruction (on Intel CPUs ud2) what influence instruction cache prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this and at the same time make identifying the function and line of this inlined function easier. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945337313.97035.10015729316710496600.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	a075767bbd	net: XDP-generic determining XDP frame size The SKB "head" pointer points to the data area that contains skb_shared_info, that can be found via skb_end_pointer(). Given xdp->data_hard_start have been established (basically pointing to skb->head), frame size is between skb_end_pointer() and data_hard_start, plus the size reserved to skb_shared_info. Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive offset number on grow, and negative number on shrink. As this seems more natural when reading the code. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945336804.97035.7164852191163722056.stgit@firesoul	2020-05-14 21:21:54 -07:00
David S. Miller	d00f26b623	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-14 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON. 2) support for narrow loads in bpf_sock_addr progs and additional helpers in cg-skb progs, from Andrey. 3) bpf benchmark runner, from Andrii. 4) arm and riscv JIT optimizations, from Luke. 5) bpf iterator infrastructure, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-14 20:31:21 -07:00
Andrey Ignatov	f307fa2cb4	bpf: Introduce bpf_sk_{, ancestor_}cgroup_id helpers With having ability to lookup sockets in cgroup skb programs it becomes useful to access cgroup id of retrieved sockets so that policies can be implemented based on origin cgroup of such socket. For example, a container running in a cgroup can have cgroup skb ingress program that can lookup peer socket that is sending packets to a process inside the container and decide whether those packets should be allowed or denied based on cgroup id of the peer. More specifically such ingress program can implement intra-host policy "allow incoming packets only from this same container and not from any other container on same host" w/o relying on source IP addresses since quite often it can be the case that containers share same IP address on the host. Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id(). These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id helpers with the only difference that sk is used to get cgroup id instead of skb, and share code with them. See documentation in UAPI for more details. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com	2020-05-14 18:41:07 -07:00
Andrey Ignatov	06d3e4c9f1	bpf: Allow skb_ancestor_cgroup_id helper in cgroup skb cgroup skb programs already can use bpf_skb_cgroup_id. Allow bpf_skb_ancestor_cgroup_id as well so that container policies can be implemented for a container that can have sub-cgroups dynamically created, but policies should still be implemented based on cgroup id of container itself not on an id of a sub-cgroup. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/8874194d6041eba190356453ea9f6071edf5f658.1589486450.git.rdna@fb.com	2020-05-14 18:41:07 -07:00
Andrey Ignatov	d56c2f95ad	bpf: Allow sk lookup helpers in cgroup skb Currently sk lookup helpers are allowed in tc, xdp, sk skb, and cgroup sock_addr programs. But they would be useful in cgroup skb as well so that for example cgroup skb ingress program can lookup a peer socket a packet comes from on same host and make a decision whether to allow or deny this packet based on the properties of that socket, e.g. cgroup that peer socket belongs to. Allow the following sk lookup helpers in cgroup skb: * bpf_sk_lookup_tcp; * bpf_sk_lookup_udp; * bpf_sk_release; * bpf_skc_lookup_tcp. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f8c7ee280f1582b586629436d777b6db00597d63.1589486450.git.rdna@fb.com	2020-05-14 18:41:07 -07:00
Andrey Ignatov	7aebfa1b38	bpf: Support narrow loads from bpf_sock_addr.user_port bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly code in BPF programs, like: volatile __u32 user_port = ctx->user_port; __u16 port = bpf_ntohs(user_port); Since otherwise clang may optimize the load to be 2-byte and it's rejected by verifier. Add support for 1- and 2-byte loads same way as it's supported for other fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com	2020-05-14 18:30:57 -07:00
Amol Grover	7013908c2d	ipmr: Add lockdep expression to ipmr_for_each_table macro During the initialization process, ipmr_new_table() is called to create new tables which in turn calls ipmr_get_table() which traverses net->ipv4.mr_tables without holding the writer lock. However, this is safe to do so as no tables exist at this time. Hence add a suitable lockdep expression to silence the following false-positive warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc3-next-20200428-syzkaller #0 Not tainted ----------------------------- net/ipv4/ipmr.c:136 RCU-list traversed in non-reader section!! ipmr_get_table+0x130/0x160 net/ipv4/ipmr.c:136 ipmr_new_table net/ipv4/ipmr.c:403 [inline] ipmr_rules_init net/ipv4/ipmr.c:248 [inline] ipmr_net_init+0x133/0x430 net/ipv4/ipmr.c:3089 Fixes: `f0ad0860d0` ("ipv4: ipmr: support multiple tables") Reported-by: syzbot+1519f497f2f9f08183c6@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-14 18:01:07 -07:00
Amol Grover	a14fbcd4f1	ipmr: Fix RCU list debugging warning ipmr_for_each_table() macro uses list_for_each_entry_rcu() for traversing outside of an RCU read side critical section but under the protection of rtnl_mutex. Hence, add the corresponding lockdep expression to silence the following false-positive warning at boot: [ 4.319347] ============================= [ 4.319349] WARNING: suspicious RCU usage [ 4.319351] 5.5.4-stable #17 Tainted: G E [ 4.319352] ----------------------------- [ 4.319354] net/ipv4/ipmr.c:1757 RCU-list traversed in non-reader section!! Fixes: `f0ad0860d0` ("ipv4: ipmr: support multiple tables") Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-14 18:01:07 -07:00
Jakub Kicinski	5a46b062e2	devlink: refactor end checks in devlink_nl_cmd_region_read_dumpit Clean up after recent fixes, move address calculations around and change the variable init, so that we can have just one start_offset == end_offset check. Make the check a little stricter to preserve the -EINVAL error if requested start offset is larger than the region itself. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-14 17:36:25 -07:00
Eric Dumazet	e776af608f	tcp: fix error recovery in tcp_zerocopy_receive() If user provides wrong virtual address in TCP_ZEROCOPY_RECEIVE operation we want to return -EINVAL error. But depending on zc->recv_skip_hint content, we might return -EIO error if the socket has SOCK_DONE set. Make sure to return -EINVAL in this case. BUG: KMSAN: uninit-value in tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline] BUG: KMSAN: uninit-value in do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685 CPU: 1 PID: 625 Comm: syz-executor.0 Not tainted 5.7.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline] do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685 tcp_getsockopt+0xf8/0x1f0 net/ipv4/tcp.c:3728 sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3131 __sys_getsockopt+0x533/0x7b0 net/socket.c:2177 __do_sys_getsockopt net/socket.c:2192 [inline] __se_sys_getsockopt+0xe1/0x100 net/socket.c:2189 __x64_sys_getsockopt+0x62/0x80 net/socket.c:2189 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:297 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c829 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1deeb72c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00000000004e01e0 RCX: 000000000045c829 RDX: 0000000000000023 RSI: 0000000000000006 RDI: 0000000000000009 RBP: 000000000078bf00 R08: 0000000020000200 R09: 0000000000000000 R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000001d8 R14: 00000000004d3038 R15: 00007f1deeb736d4 Local variable ----zc@do_tcp_getsockopt created at: do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670 do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670 Fixes: `05255b823a` ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-14 15:12:08 -07:00
J. Bruce Fields	933496e9cc	SUNRPC: 'Directory with parent 'rpc_clnt' already present!' Each rpc_client has a cl_clid which is allocated from a global ida, and a debugfs directory which is named after cl_clid. We're releasing the cl_clid before we free the debugfs directory named after it. As soon as the cl_clid is released, that value is available for another newly created client. That leaves a window where another client may attempt to create a new debugfs directory with the same name as the not-yet-deleted debugfs directory from the dying client. Symptoms are log messages like Directory 4 with parent 'rpc_clnt' already present! Fixes: `7c4310ff56` "SUNRPC: defer slow parts of rpc_free_client() to a workqueue." Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-05-14 16:31:09 -04:00
David S. Miller	1b54f4fa4d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix gcc-10 compilation warning in nf_conntrack, from Arnd Bergmann. 2) Add NF_FLOW_HW_PENDING to avoid races between stats and deletion commands, from Paul Blakey. 3) Remove WQ_MEM_RECLAIM from the offload workqueue, from Roi Dayan. 4) Infinite loop when removing nf_conntrack module, from Florian Westphal. 5) Set NF_FLOW_TEARDOWN bit on expiration to avoid races when refreshing the timeout from the software path. 6) Missing nft_set_elem_expired() check in the rbtree, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-14 13:15:02 -07:00
Xin Long	56b1b7c667	esp6: calculate transport_header correctly when sel.family != AF_INET6 In esp6_init_state() for beet mode when x->sel.family != AF_INET6: x->props.header_len = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead) + IPV4_BEET_PHMAXLEN + (sizeof(struct ipv6hdr) - sizeof(struct iphdr)) In xfrm6_beet_gso_segment() skb->transport_header is supposed to move to the end of the ph header for IPPROTO_BEETPH, so if x->sel.family != AF_INET6 and it's IPPROTO_BEETPH, it should do: skb->transport_header -= (sizeof(struct ipv6hdr) - sizeof(struct iphdr)); skb->transport_header += ph->hdrlen * 8; And IPV4_BEET_PHMAXLEN is only reserved for PH header, so if x->sel.family != AF_INET6 and it's not IPPROTO_BEETPH, it should do: skb->transport_header -= (sizeof(struct ipv6hdr) - sizeof(struct iphdr)); skb->transport_header -= IPV4_BEET_PHMAXLEN; Thanks Sabrina for looking deep into this issue. Fixes: `7f9e40eb18` ("esp6: add gso_segment for esp6 beet mode") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-05-14 12:13:05 +02:00
Christoph Hellwig	1b2f08df0a	ipv6: set msg_control_is_user in do_ipv6_getsockopt While do_ipv6_getsockopt does not call the high-level recvmsg helper, the msghdr eventually ends up being passed to put_cmsg anyway, and thus needs msg_control_is_user set to the proper value. Fixes: `1f466e1f15` ("net: cleanly handle kernel vs user buffers for ->msg_control") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-13 13:11:45 -07:00
Tuong Lien	88690b1079	tipc: fix failed service subscription deletion When a service subscription is expired or canceled by user, it needs to be deleted from the subscription list, so that new subscriptions can be registered (max = 65535 per net). However, there are two issues in code that can cause such an unused subscription to persist: 1) The 'tipc_conn_delete_sub()' has a loop on the subscription list but it makes a break shortly when the 1st subscription differs from the one specified, so the subscription will not be deleted. 2) In case a subscription is canceled, the code to remove the 'TIPC_SUB_CANCEL' flag from the subscription filter does not work if it is a local subscription (i.e. the little endian isn't involved). So, it will be no matches when looking for the subscription to delete later. The subscription(s) will be removed eventually when the user terminates its topology connection but that could be a long time later. Meanwhile, the number of available subscriptions may be exhausted. This commit fixes the two issues above, so as needed a subscription can be deleted correctly. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-13 12:33:19 -07:00
Tuong Lien	0771d7df81	tipc: fix memory leak in service subscripting Upon receipt of a service subscription request from user via a topology connection, one 'sub' object will be allocated in kernel, so it will be able to send an event of the service if any to the user correspondingly then. Also, in case of any failure, the connection will be shutdown and all the pertaining 'sub' objects will be freed. However, there is a race condition as follows resulting in memory leak: receive-work connection send-work \| \| \| sub-1 \|<------//-------\| \| sub-2 \|<------//-------\| \| \| \|<---------------\| evt for sub-x sub-3 \|<------//-------\| \| : : : : : : \| /--------\| \| \| \| * peer closed \| \| \| \| \| \| \| \|<-------X-------\| evt for sub-y \| \| \|<===============\| sub-n \|<------/ X shutdown \| -> orphan \| \| That is, the 'receive-work' may get the last subscription request while the 'send-work' is shutting down the connection due to peer close. We had a 'lock' on the connection, so the two actions cannot be carried out simultaneously. If the last subscription is allocated e.g. 'sub-n', before the 'send-work' closes the connection, there will be no issue at all, the 'sub' objects will be freed. In contrast the last subscription will become orphan since the connection was closed, and we released all references. This commit fixes the issue by simply adding one test if the connection remains in 'connected' state right after we obtain the connection lock, then a subscription object can be created as usual, otherwise we ignore it. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Reported-by: Thang Ngo <thang.h.ngo@dektech.com.au> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-13 12:33:18 -07:00
Tuong Lien	c726858945	tipc: fix large latency in smart Nagle streaming Currently when a connection is in Nagle mode, we set the 'ack_required' bit in the last sending buffer and wait for the corresponding ACK prior to pushing more data. However, on the receiving side, the ACK is issued only when application really reads the whole data. Even if part of the last buffer is received, we will not do the ACK as required. This might cause an unnecessary delay since the receiver does not always fetch the message as fast as the sender, resulting in a large latency in the user message sending, which is: [one RTT + the receiver processing time]. The commit makes Nagle ACK as soon as possible i.e. when a message with the 'ack_required' arrives in the receiving side's stack even before it is processed or put in the socket receive queue... This way, we can limit the streaming latency to one RTT as committed in Nagle mode. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-13 12:33:18 -07:00
Christoph Hellwig	6e8a4f9dda	net: ignore sock_from_file errors in __scm_install_fd The code had historically been ignoring these errors, and my recent refactoring changed that, which broke ssh in some setups. Fixes: `2618d530dd` ("net/scm: cleanup scm_detach_fds") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-13 12:30:54 -07:00
Yonghong Song	3c32cc1bce	bpf: Enable bpf_iter targets registering ctx argument types Commit `b121b341e5` ("bpf: Add PTR_TO_BTF_ID_OR_NULL support") adds a field btf_id_or_null_non0_off to bpf_prog->aux structure to indicate that the first ctx argument is PTR_TO_BTF_ID reg_type and all others are PTR_TO_BTF_ID_OR_NULL. This approach does not really scale if we have other different reg types in the future, e.g., a pointer to a buffer. This patch enables bpf_iter targets registering ctx argument reg types which may be different from the default one. For example, for pointers to structures, the default reg_type is PTR_TO_BTF_ID for tracing program. The target can register a particular pointer type as PTR_TO_BTF_ID_OR_NULL which can be used by the verifier to enforce accesses. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180221.2949882-1-yhs@fb.com	2020-05-13 12:30:50 -07:00
Yonghong Song	ab2ee4fcb9	bpf: Change func bpf_iter_unreg_target() signature Change func bpf_iter_unreg_target() parameter from target name to target reg_info, similar to bpf_iter_reg_target(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180220.2949737-1-yhs@fb.com	2020-05-13 12:30:50 -07:00
Yonghong Song	15172a46fa	bpf: net: Refactor bpf_iter target registration Currently bpf_iter_reg_target takes parameters from target and allocates memory to save them. This is really not necessary, esp. in the future we may grow information passed from targets to bpf_iter manager. The patch refactors the code so target reg_info becomes static and bpf_iter manager can just take a reference to it. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200513180219.2949605-1-yhs@fb.com	2020-05-13 12:30:50 -07:00
David S. Miller	6cd35888a0	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-05-13 Here's a second attempt at a bluetooth-next pull request which supercedes the one dated 2020-05-09. This should have the issues discovered by Jakub fixed. - Add support for Intel Typhoon Peak device (8087:0032) - Add device tree bindings for Realtek RTL8723BS device - Add device tree bindings for Qualcomm QCA9377 device - Add support for experimental features configuration through mgmt - Add driver hook to prevent wake from suspend - Add support for waiting for L2CAP disconnection response - Multiple fixes & cleanups to the btbcm driver - Add support for LE scatternet topology for selected devices - A few other smaller fixes & cleanups Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-13 12:20:12 -07:00
Archie Pusaka	5b440676c1	Bluetooth: L2CAP: add support for waiting disconnection resp Whenever we disconnect a L2CAP connection, we would immediately report a disconnection event (EPOLLHUP) to the upper layer, without waiting for the response of the other device. This patch offers an option to wait until we receive a disconnection response before reporting disconnection event, by using the "how" parameter in l2cap_sock_shutdown(). Therefore, upper layer can opt to wait for disconnection response by shutdown(sock, SHUT_WR). This can be used to enforce proper disconnection order in HID, where the disconnection of the interrupt channel must be complete before attempting to disconnect the control channel. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-13 10:03:51 +02:00
Sonny Sasaka	adf1d69264	Bluetooth: Handle Inquiry Cancel error after Inquiry Complete After sending Inquiry Cancel command to the controller, it is possible that Inquiry Complete event comes before Inquiry Cancel command complete event. In this case the Inquiry Cancel command will have status of Command Disallowed since there is no Inquiry session to be cancelled. This case should not be treated as error, otherwise we can reach an inconsistent state. Example of a btmon trace when this happened: < HCI Command: Inquiry Cancel (0x01\|0x0002) plen 0 > HCI Event: Inquiry Complete (0x01) plen 1 Status: Success (0x00) > HCI Event: Command Complete (0x0e) plen 4 Inquiry Cancel (0x01\|0x0002) ncmd 1 Status: Command Disallowed (0x0c) Signed-off-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-13 09:35:17 +02:00
Abhishek Pandit-Subedi	81dafad53c	Bluetooth: Add hook for driver to prevent wake from suspend Let drivers have a hook to disable configuring scanning during suspend. Drivers should use the device_may_wakeup function call to determine whether hci should be configured for wakeup. For example, an implementation for btusb may look like the following: bool btusb_prevent_wake(struct hci_dev hdev) { struct btusb_data data = hci_get_drvdata(hdev); return !device_may_wakeup(&data->udev->dev); } Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-13 09:12:04 +02:00
Abhishek Pandit-Subedi	0d2c9825e4	Bluetooth: Rename BT_SUSPEND_COMPLETE Renamed BT_SUSPEND_COMPLETE to BT_SUSPEND_CONFIGURE_WAKE since it sets up the event filter and whitelist for wake-up. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-13 09:12:04 +02:00
Abhishek Pandit-Subedi	91779665c1	Bluetooth: Modify LE window and interval for suspend When a device is suspended, it doesn't need to be as responsive to connection events. Increase the interval to 640ms (creating a duty cycle of roughly 1.75%) so that passive scanning uses much less power (vs previous duty cycle of 18.75%). The new window + interval combination has been tested to work with HID devices (which are currently the only devices capable of wake up). Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-13 08:50:56 +02:00
Abhishek Pandit-Subedi	aaebf8e608	Bluetooth: Fix incorrect type for window and interval The types for window and interval should be uint16, not uint8. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2020-05-13 08:50:56 +02:00
Paolo Abeni	eead1c2ea2	netlabel: cope with NULL catmap The cipso and calipso code can set the MLS_CAT attribute on successful parsing, even if the corresponding catmap has not been allocated, as per current configuration and external input. Later, selinux code tries to access the catmap if the MLS_CAT flag is present via netlbl_catmap_getlong(). That may cause null ptr dereference while processing incoming network traffic. Address the issue setting the MLS_CAT flag only if the catmap is really allocated. Additionally let netlbl_catmap_getlong() cope with NULL catmap. Reported-by: Matthew Sheets <matthew.sheets@gd-ms.com> Fixes: `4b8feff251` ("netlabel: fix the horribly broken catmap functions") Fixes: `ceba1832b1` ("calipso: Set the calipso socket label to match the secattr.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 18:12:40 -07:00
Vladimir Oltean	fb9f2e9286	net: dsa: tag_sja1105: appease sparse checks for ethertype accessors A comparison between a value from the packet and an integer constant value needs to be done by converting the value from the packet from net->host, or the constant from host->net. Not the other way around. Even though it makes no practical difference, correct that. Fixes: `38b5beeae7` ("net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 18:02:42 -07:00
William Tu	51fa960d3b	erspan: Check IFLA_GRE_ERSPAN_VER is set. Add a check to make sure the IFLA_GRE_ERSPAN_VER is provided by users. Fixes: `f989d546a2` ("erspan: Add type I version 0 support.") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 13:11:41 -07:00
Vladimir Oltean	84eeb5d460	net: dsa: tag_sja1105: implement sub-VLAN decoding Create a subvlan_map as part of each port's tagger private structure. This keeps reverse mappings of bridge-to-dsa_8021q VLAN retagging rules. Note that as of this patch, this piece of code is never engaged, due to the fact that the driver hasn't installed any retagging rule, so we'll always see packets with a subvlan code of 0 (untagged). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 13:08:08 -07:00
Vladimir Oltean	3eaae1d05f	net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs For switches that support VLAN retagging, such as sja1105, we extend dsa_8021q by encoding a "sub-VLAN" into the remaining 3 free bits in the dsa_8021q tag. A sub-VLAN is nothing more than a number in the range 0-7, which serves as an index into a per-port driver lookup table. The sub-VLAN value of zero means that traffic is untagged (this is also backwards-compatible with dsa_8021q without retagging). The switch should be configured to retag VLAN-tagged traffic that gets transmitted towards the CPU port (and towards the CPU only). Example: bridge vlan add dev sw1p0 vid 100 The switch retags frames received on port 0, going to the CPU, and having VID 100, to the VID of 1104 (0x0450). In dsa_8021q language: \| 11 \| 10 \| 9 \| 8 \| 7 \| 6 \| 5 \| 4 \| 3 \| 2 \| 1 \| 0 \| +-----------+-----+-----------------+-----------+-----------------------+ \| DIR \| SVL \| SWITCH_ID \| SUBVLAN \| PORT \| +-----------+-----+-----------------+-----------+-----------------------+ 0x0450 means: - DIR = 0b01: this is an RX VLAN - SUBVLAN = 0b001: this is subvlan #1 - SWITCH_ID = 0b001: this is switch 1 (see the name "sw1p0") - PORT = 0b0000: this is port 0 (see the name "sw1p0") The driver also remembers the "1 -> 100" mapping. In the hotpath, if the sub-VLAN from the tag encodes a non-untagged frame, this mapping is used to create a VLAN hwaccel tag, with the value of 100. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 13:08:08 -07:00
Vladimir Oltean	38b5beeae7	net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of 0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering mode, it needs to work with normal VLAN TPID values. A complication arises when we must transmit a VLAN-tagged packet to the switch when it's in VLAN-aware mode. We need to construct a packet with 2 VLAN tags, and the switch will use the outer header for routing and pop it on egress. But sadly, here the 2 hardware generations don't behave the same: - E/T switches won't pop an ETH_P_8021AD tag on egress, it seems (packets will remain double-tagged). - P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks like it tries to prevent VLAN hopping). But looks like the reverse is also true: - E/T switches have no problem popping the outer tag from packets with 2 ETH_P_8021Q tags. - P/Q/R/S will have no problem popping a single tag even if that is ETH_P_8021AD. So it is clear that if we want the hardware to work with dsa_8021q tagging in VLAN-aware mode, we need to send different TPIDs depending on revision. Keep that information in priv->info->qinq_tpid. The per-port tagger structure will hold an xmit_tpid value that depends not only upon the qinq_tpid, but also upon the VLAN awareness state itself (in case we must transmit using 0xdadb). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 13:08:08 -07:00
Vladimir Oltean	ec5ae61076	net: dsa: sja1105: save/restore VLANs using a delta commit method Managing the VLAN table that is present in hardware will become very difficult once we add a third operating state (best_effort_vlan_filtering). That is because correct cleanup (not too little, not too much) becomes virtually impossible, when VLANs can be added from the bridge layer, from dsa_8021q for basic tagging, for cross-chip bridging, as well as retagging rules for sub-VLANs and cross-chip sub-VLANs. So we need to rethink VLAN interaction with the switch in a more scalable way. In preparation for that, use the priv->expect_dsa_8021q boolean to classify any VLAN request received through .port_vlan_add or .port_vlan_del towards either one of 2 internal lists: bridge VLANs and dsa_8021q VLANs. Then, implement a central sja1105_build_vlan_table method that creates a VLAN configuration from scratch based on the 2 lists of VLANs kept by the driver, and based on the VLAN awareness state. Currently, if we are VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs. Then, implement a delta commit procedure that identifies which VLANs from this new configuration are actually different from the config previously committed to hardware. We apply the delta through the dynamic configuration interface (we don't reset the switch). The result is that the hardware should see the exact sequence of operations as before this patch. This also helps remove the "br" argument passed to dsa_8021q_crosschip_bridge_join, which it was only using to figure out whether it should commit the configuration back to us or not, based on the VLAN awareness state of the bridge. We can simplify that, by always allowing those VLANs inside of our dsa_8021q_vlans list, and committing those to hardware when necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 13:08:08 -07:00
Vladimir Oltean	1f66b0f0ae	net: dsa: tag_8021q: introduce a vid_is_dsa_8021q helper This function returns a boolean denoting whether the VLAN passed as argument is part of the 1024-3071 range that the dsa_8021q tagging scheme uses. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 13:08:07 -07:00
Russell King	54a0ed0df4	net: dsa: provide an option for drivers to always receive bridge VLANs DSA assumes that a bridge which has vlan filtering disabled is not vlan aware, and ignores all vlan configuration. However, the kernel software bridge code allows configuration in this state. This causes the kernel's idea of the bridge vlan state and the hardware state to disagree, so "bridge vlan show" indicates a correct configuration but the hardware lacks all configuration. Even worse, enabling vlan filtering on a DSA bridge immediately blocks all traffic which, given the output of "bridge vlan show", is very confusing. Provide an option that drivers can set to indicate they want to receive vlan configuration even when vlan filtering is disabled. At the very least, this is safe for Marvell DSA bridges, which do not look up ingress traffic in the VTU if the port is in 8021Q disabled state. It is also safe for the Ocelot switch family. Whether this change is suitable for all DSA bridges is not known. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 13:08:07 -07:00
Eric Dumazet	24adbc1676	tcp: fix SO_RCVLOWAT hangs with fat skbs We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100% overhead in tcp_set_rcvlowat() This works well when skb->len/skb->truesize ratio is bigger than 0.5 But if we receive packets with small MSS, we can end up in a situation where not enough bytes are available in the receive queue to satisfy RCVLOWAT setting. As our sk_rcvbuf limit is hit, we send zero windows in ACK packets, preventing remote peer from sending more data. Even autotuning does not help, because it only triggers at the time user process drains the queue. If no EPOLLIN is generated, this can not happen. Note poll() has a similar issue, after commit `c7004482e8` ("tcp: Respect SO_RCVLOWAT in tcp_poll().") Fixes: `03f45c883c` ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 12:49:47 -07:00
Christoph Paasch	64d950ae0b	mptcp: Initialize map_seq upon subflow establishment When the other MPTCP-peer uses 32-bit data-sequence numbers, we rely on map_seq to indicate how to expand to a 64-bit data-sequence number in expand_seq() when receiving data. For new subflows, this field is not initialized, thus results in an "invalid" mapping being discarded. Fix this by initializing map_seq upon subflow establishment time. Fixes: `f296234c98` ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-12 12:08:22 -07:00
Phil Sutter	340eaff651	netfilter: nft_set_rbtree: Add missing expired checks Expired intervals would still match and be dumped to user space until garbage collection wiped them out. Make sure they stop matching and disappear (from users' perspective) as soon as they expire. Fixes: `8d8540c4f5` ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-12 13:19:34 +02:00
Pablo Neira Ayuso	9ed81c8e0d	netfilter: flowtable: set NF_FLOW_TEARDOWN flag on entry expiration If the flow timer expires, the gc sets on the NF_FLOW_TEARDOWN flag. Otherwise, the flowtable software path might race to refresh the timeout, leaving the state machine in inconsistent state. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-12 13:19:08 +02:00
Xiongfeng Wang	746c6237ec	sunrpc: add missing newline when printing parameter 'pool_mode' by sysfs When I cat parameter '/sys/module/sunrpc/parameters/pool_mode', it displays as follows. It is better to add a newline for easy reading. [root@hulk-202 ~]# cat /sys/module/sunrpc/parameters/pool_mode global[root@hulk-202 ~]# Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2020-05-11 20:20:40 -04:00
Christoph Hellwig	1f466e1f15	net: cleanly handle kernel vs user buffers for ->msg_control The msg_control field in struct msghdr can either contain a user pointer when used with the recvmsg system call, or a kernel pointer when used with sendmsg. To complicate things further kernel_recvmsg can stuff a kernel pointer in and then use set_fs to make the uaccess helpers accept it. Replace it with a union of a kernel pointer msg_control field, and a user pointer msg_control_user one, and allow kernel_recvmsg operate on a proper kernel pointer using a bitfield to override the normal choice of a user pointer for recvmsg. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-11 16:59:16 -07:00
Christoph Hellwig	2618d530dd	net/scm: cleanup scm_detach_fds Factor out two helpes to keep the code tidy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-11 16:59:16 -07:00
Christoph Hellwig	0462b6bdb6	net: add a CMSG_USER_DATA macro Add a variant of CMSG_DATA that operates on user pointer to avoid sparse warnings about casting to/from user pointers. Also fix up CMSG_DATA to rely on the gcc extension that allows void pointer arithmetics to cut down on the amount of casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-11 16:59:16 -07:00
Florian Fainelli	097f024454	net: dsa: tag_sja1105: Constify dsa_device_ops sja1105_netdev_ops should be const since that is what the DSA layer expects. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-11 16:50:45 -07:00
Florian Fainelli	2fa3888bb7	net: dsa: ocelot: Constify dsa_device_ops ocelot_netdev_ops should be const since that is what the DSA layer expects. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-11 16:50:45 -07:00
Linus Torvalds	152036d137	Fixes: - Resolve a data integrity problem with NFSD that I inadvertently introduced last year. The change I made makes the NFS server's duplicate reply cache ineffective when krb5i or krb5p are in use, thus allowing the replay of non-idempotent NFS requests such as RENAME, SETATTR, or even WRITEs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJerCuyAAoJEDNqszNvZn+XxvAQAJmUW5412OO7mkI2IW5PDP71 ZnBAuTs4UpLBgp1VpS3ai0LYnOX9o8WLqolzGuFxGfK69ZZdh7U7fzX2aEytoTSP KkW3dNo+NzRppWOhMBEfMBLnAu22YF+F689RvwEqd0C1AgGugaFfzlF1ECrJVpA7 g1WVhTi0ihfArhzSWTWO4LiuwjRd5TNF8gEci2j3DuHn1Hp6BagbKOv0rFdgK99X BbK8IaEalBUjtpGAPgRU/WY/WznzhgARVeOX7Rh/P/zFdFB1G1M4kycaadBk6uaU SHbdWBwDsYatDNuhZUI3Wv2g+DQ5LJRrjNNesLRot+kC3XD12sBCMsSI3owoz7Jt u0s48YmOJO8uWi4kDenR9XV8bAaDmX7R/+XGZm1lethNrpBKat9EIrqSHNvqAXZ4 b3cC8/A/aCcOrWXtZnWqvJdqjx2EgL6DbcpaFheaPEekRofuiyOaAbXdlJQvzcwY Sv4EC4ymABpQRg0si+Sya5Int7bZ9ryLZTSCMiLA+L1TnoW26XjMlGAaRqYi7Tx7 Qg4Bt400IIDE0FlE/76vE7b7YWQj7GfErA6moIyDio5AInRU9sHDFyB8iCfdpKxh ajNl1NuEO/FSoXOGQvOo1uHD0vKvNVK21T6vQsRCT1f6JXtpiwTn6eLX4Wn9YLdI iKqg2YXfdCbJnAuoxzGi =hT3x -----END PGP SIGNATURE----- Merge tag 'nfsd-5.7-rc-2' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull nfsd fixes from Chuck Lever: "Resolve a data integrity problem with NFSD that I inadvertently introduced last year. The change I made makes the NFS server's duplicate reply cache ineffective when krb5i or krb5p are in use, thus allowing the replay of non-idempotent NFS requests such as RENAME, SETATTR, or even WRITEs" * tag 'nfsd-5.7-rc-2' of git://git.linux-nfs.org/projects/cel/cel-2.6: SUNRPC: Revert `241b1f419f` ("SUNRPC: Remove xdr_buf_trim()") SUNRPC: Fix GSS privacy computation of auth->au_ralign SUNRPC: Add "@len" parameter to gss_unwrap()	2020-05-11 12:04:52 -07:00
Chuck Lever	ce99aa62e1	SUNRPC: Signalled ASYNC tasks need to exit Ensure that signalled ASYNC rpc_tasks exit immediately instead of spinning until a timeout (or forever). To avoid checking for the signal flag on every scheduler iteration, the check is instead introduced in the client's finite state machine. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Fixes: `ae67bd3821` ("SUNRPC: Fix up task signalling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-05-11 14:06:50 -04:00
Florian Westphal	54ab49fde9	netfilter: conntrack: fix infinite loop on rmmod 'rmmod nf_conntrack' can hang forever, because the netns exit gets stuck in nf_conntrack_cleanup_net_list(): i_see_dead_people: busy = 0; list_for_each_entry(net, net_exit_list, exit_list) { nf_ct_iterate_cleanup(kill_all, net, 0, 0); if (atomic_read(&net->ct.count) != 0) busy = 1; } if (busy) { schedule(); goto i_see_dead_people; } When nf_ct_iterate_cleanup iterates the conntrack table, all nf_conn structures can be found twice: once for the original tuple and once for the conntracks reply tuple. get_next_corpse() only calls the iterator when the entry is in original direction -- the idea was to avoid unneeded invocations of the iterator callback. When support for clashing entries was added, the assumption that all nf_conn objects are added twice, once in original, once for reply tuple no longer holds -- NF_CLASH_BIT entries are only added in the non-clashing reply direction. Thus, if at least one NF_CLASH entry is in the list then nf_conntrack_cleanup_net_list() always skips it completely. During normal netns destruction, this causes a hang of several seconds, until the gc worker removes the entry (NF_CLASH entries always have a 1 second timeout). But in the rmmod case, the gc worker has already been stopped, so ct.count never becomes 0. We can fix this in two ways: 1. Add a second test for CLASH_BIT and call iterator for those entries as well, or: 2. Skip the original tuple direction and use the reply tuple. 2) is simpler, so do that. Fixes: `6a757c07e5` ("netfilter: conntrack: allow insertion of clashing entries") Reported-by: Chen Yi <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-11 17:46:24 +02:00
Roi Dayan	1d10da0eb0	netfilter: flowtable: Remove WQ_MEM_RECLAIM from workqueue This workqueue is in charge of handling offloaded flow tasks like add/del/stats we should not use WQ_MEM_RECLAIM flag. The flag can result in the following warning. [ 485.557189] ------------[ cut here ]------------ [ 485.562976] workqueue: WQ_MEM_RECLAIM nf_flow_table_offload:flow_offload_worr [ 485.562985] WARNING: CPU: 7 PID: 3731 at kernel/workqueue.c:2610 check_flush0 [ 485.590191] Kernel panic - not syncing: panic_on_warn set ... [ 485.597100] CPU: 7 PID: 3731 Comm: kworker/u112:8 Not tainted 5.7.0-rc1.21802 [ 485.606629] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177 [ 485.615487] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_f] [ 485.624834] Call Trace: [ 485.628077] dump_stack+0x50/0x70 [ 485.632280] panic+0xfb/0x2d7 [ 485.636083] ? check_flush_dependency+0x110/0x130 [ 485.641830] __warn.cold.12+0x20/0x2a [ 485.646405] ? check_flush_dependency+0x110/0x130 [ 485.652154] ? check_flush_dependency+0x110/0x130 [ 485.657900] report_bug+0xb8/0x100 [ 485.662187] ? sched_clock_cpu+0xc/0xb0 [ 485.666974] do_error_trap+0x9f/0xc0 [ 485.671464] do_invalid_op+0x36/0x40 [ 485.675950] ? check_flush_dependency+0x110/0x130 [ 485.681699] invalid_op+0x28/0x30 Fixes: `7da182a998` ("netfilter: flowtable: Use work entry per offload command") Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-11 17:45:59 +02:00
David Howells	c410bf0193	rxrpc: Fix the excessive initial retransmission timeout rxrpc currently uses a fixed 4s retransmission timeout until the RTT is sufficiently sampled. This can cause problems with some fileservers with calls to the cache manager in the afs filesystem being dropped from the fileserver because a packet goes missing and the retransmission timeout is greater than the call expiry timeout. Fix this by: (1) Copying the RTT/RTO calculation code from Linux's TCP implementation and altering it to fit rxrpc. (2) Altering the various users of the RTT to make use of the new SRTT value. (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO value instead (which is needed in jiffies), along with a backoff. Notes: (1) rxrpc provides RTT samples by matching the serial numbers on outgoing DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets against the reference serial number in incoming REQUESTED ACK and PING-RESPONSE ACK packets. (2) Each packet that is transmitted on an rxrpc connection gets a new per-connection serial number, even for retransmissions, so an ACK can be cross-referenced to a specific trigger packet. This allows RTT information to be drawn from retransmitted DATA packets also. (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than on an rxrpc_call because many RPC calls won't live long enough to generate more than one sample. (4) The calculated SRTT value is in units of 8ths of a microsecond rather than nanoseconds. The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers. Fixes: `17926a7932` ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"") Signed-off-by: David Howells <dhowells@redhat.com>	2020-05-11 16:42:28 +01:00
Paul Blakey	2c8897953f	netfilter: flowtable: Add pending bit for offload work Gc step can queue offloaded flow del work or stats work. Those work items can race each other and a flow could be freed before the stats work is executed and querying it. To avoid that, add a pending bit that if a work exists for a flow don't queue another work for it. This will also avoid adding multiple stats works in case stats work didn't complete but gc step started again. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-11 16:26:33 +02:00
Florian Westphal	7d4343d501	xfrm: fix unused variable warning if CONFIG_NETFILTER=n After recent change 'x' is only used when CONFIG_NETFILTER is set: net/ipv4/xfrm4_output.c: In function '__xfrm4_output': net/ipv4/xfrm4_output.c:19:21: warning: unused variable 'x' [-Wunused-variable] 19 \| struct xfrm_state *x = skb_dst(skb)->xfrm; Expand the CONFIG_NETFILTER scope to avoid this. Fixes: `2ab6096db2` ("xfrm: remove output_finish indirection from xfrm_state_afinfo") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-05-11 15:12:27 +02:00
Marcel Holtmann	e625e50cee	Bluetooth: Introduce debug feature when dynamic debug is disabled In case dynamic debug is disabled, this feature allows a vendor platform to provide debug statement printing. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-05-11 12:16:27 +02:00
Marcel Holtmann	a10c907ce0	Bluetooth: Add support for experimental features configuration To enable platform specific experimental features, introduce this new set of management commands and events. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-05-11 12:13:38 +02:00
Marcel Holtmann	568602457c	Bluetooth: Replace BT_DBG with bt_dev_dbg for security manager support The security manager operates on a specific controller and thus use bt_dev_dbg to indetify the controller for each debug message. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-05-11 12:13:38 +02:00
Marcel Holtmann	d5cc6626b3	Bluetooth: Introduce HCI_MGMT_HDEV_OPTIONAL option When setting HCI_MGMT_HDEV_OPTIONAL it is possible to target a specific conntroller or a global interface. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-05-11 12:13:38 +02:00
Marcel Holtmann	181d695352	Bluetooth: Replace BT_DBG with bt_dev_dbg for management support The majority of management interaction are based on a controller index and have a hci_dev associated with it. So use bt_dev_dbg to have a clean way of indentifying the controller the debug message belongs to. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-05-11 12:13:38 +02:00

... 5 6 7 8 9 ...

60985 commits