alistair23-linux

redonkable

Author	SHA1	Message	Date
Chuck Lever	5e54dafbe0	SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <kolga@netapp.com>	2020-12-09 09:38:34 -05:00
Roberto Bergantinos Corpas	4b5cff7ed8	sunrpc: clean-up cache downcall We can simplify code around cache_downcall unifying memory allocations using kvmalloc. This has the benefit of getting rid of cache_slow_downcall (and queue_io_mutex), and also matches userland allocation size and limits. Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:38:34 -05:00
Brett Mastbergen	2d94b20b95	netfilter: nft_ct: Remove confirmation check for NFT_CT_ID Since commit `656c8e9cc1` ("netfilter: conntrack: Use consistent ct id hash calculation") the ct id will not change from initialization to confirmation. Removing the confirmation check allows for things like adding an element to a 'typeof ct id' set in prerouting upon reception of the first packet of a new connection, and then being able to reference that set consistently both before and after the connection is confirmed. Fixes: `656c8e9cc1` ("netfilter: conntrack: Use consistent ct id hash calculation") Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-12-09 10:31:58 +01:00
Florent Revest	b60da4955f	bpf: Only provide bpf_sock_from_file with CONFIG_NET This moves the bpf_sock_from_file definition into net/core/filter.c which only gets compiled with CONFIG_NET and also moves the helper proto usage next to other tracing helpers that are conditional on CONFIG_NET. This avoids ld: kernel/trace/bpf_trace.o: in function `bpf_sock_from_file': bpf_trace.c:(.text+0xe23): undefined reference to `sock_from_file' When compiling a kernel with BPF and without NET. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20201208173623.1136863-1-revest@chromium.org	2020-12-08 18:23:36 -08:00
Eric Dumazet	72d05c00d7	tcp: select sane initial rcvq_space.space for big MSS Before commit `a337531b94` ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. This is no longer the case, and Hazem Mohamed Abuelfotoh reported that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit of rcvq_space.space computation, while it can select later a smaller value for tp->rcv_ssthresh and tp->window_clamp. ss -temoi on receiver would show : skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 This means that TCP can not increase its window in tcp_grow_window(), and that DRS can never kick. Fix this by making sure that rcvq_space.space is not bigger than number of bytes that can be held in TCP receive queue. People unable/unwilling to change their kernel can work around this issue by selecting a bigger tcp_rmem[1] value as in : echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem Based on an initial report and patch from Hazem Mohamed Abuelfotoh https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ Fixes: `a337531b94` ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Fixes: `041a14d267` ("tcp: start receiver buffer autotuning sooner") Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 16:27:48 -08:00
Zheng Yongjun	5e359044c1	net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 16:22:54 -08:00
Zheng Yongjun	8daa76a52d	net: core: devlink: simplify the return expression of devlink_nl_cmd_trap_set_doit() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 16:22:54 -08:00
Zheng Yongjun	9faad250ce	net: ipv6: rpl_iptunnel: simplify the return expression of rpl_do_srh() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 16:22:54 -08:00
Zheng Yongjun	57b0637d00	net/sched: cls_u32: simplify the return expression of u32_reoffload_knode() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 16:22:53 -08:00
Colin Ian King	8354bcbebd	net: sched: fix spelling mistake in Kconfig "trys" -> "tries" There is a spelling mistake in the Kconfig help text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 16:01:56 -08:00
David S. Miller	e1be4b5990	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-12-07 Here's the main bluetooth-next pull request for the 5.11 kernel. - Updated Bluetooth entries in MAINTAINERS to include Luiz von Dentz - Added support for Realtek 8822CE and 8852A devices - Added support for MediaTek MT7615E device - Improved workarounds for fake CSR devices - Fix Bluetooth qualification test case L2CAP/COS/CFD/BV-14-C - Fixes for LL Privacy support - Enforce 16 byte encryption key size for FIPS security level - Added new mgmt commands for extended advertising support - Multiple other smaller fixes & improvements Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:58:49 -08:00
Julian Wiedmann	97f8841e04	net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddr This gets us compile-time size checking. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:56:53 -08:00
Cengiz Can	0398ba9e5a	net: tipc: prevent possible null deref of link `tipc_node_apply_property` does a null check on a `tipc_link_entry` pointer but also accesses the same pointer out of the null check block. This triggers a warning on Coverity Static Analyzer because we're implying that `e->link` can BE null. Move "Update MTU for node link entry" line into if block to make sure that we're not in a state that `e->link` is null. Signed-off-by: Cengiz Can <cengiz@kernel.wtf> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-08 15:53:41 -08:00
Pablo Neira Ayuso	42f1c27120	netfilter: nftables: comment indirect serialization of commit_mutex with rtnl_mutex Add an explicit comment in the code to describe the indirect serialization of the holders of the commit_mutex with the rtnl_mutex. Commit `90d2723c6d` ("netfilter: nf_tables: do not hold reference on netdevice from preparation phase") already describes this, but a comment in this case is better for reference. Reported-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-12-08 21:53:48 +01:00
Pablo Neira Ayuso	917d80d376	netfilter: nft_dynset: fix timeouts later than 23 days Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by `8e1102d5a1` ("netfilter: nf_tables: support timeouts larger than 23 days"), otherwise ruleset listing breaks. Fixes: `a8b1e36d0d` ("netfilter: nft_dynset: fix element timeout for HZ != 1000") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-12-08 20:42:11 +01:00
Rasmus Villemoes	bdc40a3f4b	net: dsa: print the MTU value that could not be set These warnings become somewhat more informative when they include the MTU value that could not be set and not just the errno. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Link: https://lore.kernel.org/r/20201205133944.10182-1-rasmus.villemoes@prevas.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-08 11:24:07 -08:00
Björn Töpel	3546b9b8ec	xsk: Validate socket state in xsk_recvmsg, prior touching socket members In AF_XDP the socket state needs to be checked, prior touching the members of the socket. This was not the case for the recvmsg implementation. Fix that by moving the xsk_is_bound() call. Fixes: `45a8668184` ("xsk: Add support for recvmsg()") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201207082008.132263-1-bjorn.topel@gmail.com	2020-12-08 17:11:58 +01:00
Subash Abhinov Kasiviswanathan	cc00bcaa58	netfilter: x_tables: Switch synchronization to RCU When running concurrent iptables rules replacement with data, the per CPU sequence count is checked after the assignment of the new information. The sequence count is used to synchronize with the packet path without the use of any explicit locking. If there are any packets in the packet path using the table information, the sequence count is incremented to an odd value and is incremented to an even after the packet process completion. The new table value assignment is followed by a write memory barrier so every CPU should see the latest value. If the packet path has started with the old table information, the sequence counter will be odd and the iptables replacement will wait till the sequence count is even prior to freeing the old table info. However, this assumes that the new table information assignment and the memory barrier is actually executed prior to the counter check in the replacement thread. If CPU decides to execute the assignment later as there is no user of the table information prior to the sequence check, the packet path in another CPU may use the old table information. The replacement thread would then free the table information under it leading to a use after free in the packet processing context- Unable to handle kernel NULL pointer dereference at virtual address 000000000000008e pc : ip6t_do_table+0x5d0/0x89c lr : ip6t_do_table+0x5b8/0x89c ip6t_do_table+0x5d0/0x89c ip6table_filter_hook+0x24/0x30 nf_hook_slow+0x84/0x120 ip6_input+0x74/0xe0 ip6_rcv_finish+0x7c/0x128 ipv6_rcv+0xac/0xe4 __netif_receive_skb+0x84/0x17c process_backlog+0x15c/0x1b8 napi_poll+0x88/0x284 net_rx_action+0xbc/0x23c __do_softirq+0x20c/0x48c This could be fixed by forcing instruction order after the new table information assignment or by switching to RCU for the synchronization. Fixes: `80055dab5d` ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore") Reported-by: Sean Tranchetti <stranche@codeaurora.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-12-08 12:57:39 +01:00
Jakub Kicinski	819f56bad1	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-12-07 1) Sysbot reported fixes for the new 64/32 bit compat layer. From Dmitry Safonov. 2) Fix a memory leak in xfrm_user_policy that was introduced by adding the 64/32 bit compat layer. From Yu Kuai. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: net: xfrm: fix memory leak in xfrm_user_policy() xfrm/compat: Don't allocate memory with __GFP_ZERO xfrm/compat: memset(0) 64-bit padding at right place xfrm/compat: Translate by copying XFRMA_UNSPEC attribute ==================== Link: https://lore.kernel.org/r/20201207093937.2874932-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-07 18:29:54 -08:00
Jianguo Wu	f55628b3e7	mptcp: print new line in mptcp_seq_show() if mptcp isn't in use When do cat /proc/net/netstat, the output isn't append with a new line, it looks like this: [root@localhost ~]# cat /proc/net/netstat ... MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0[root@localhost ~]# This is because in mptcp_seq_show(), if mptcp isn't in use, net->mib.mptcp_statistics is NULL, so it just puts all 0 after "MPTcpExt:", and return, forgot the '\n'. After this patch: [root@localhost ~]# cat /proc/net/netstat ... MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 [root@localhost ~]# Fixes: `fc518953bc` ("mptcp: add and use MIB counter infrastructure") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Acked-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/142e2fd9-58d9-bb13-fb75-951cccc2331e@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-07 17:45:29 -08:00
Joseph Huang	851d0a73c9	bridge: Fix a deadlock when enabling multicast snooping When enabling multicast snooping, bridge module deadlocks on multicast_lock if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 network. The deadlock was caused by the following sequence: While holding the lock, br_multicast_open calls br_multicast_join_snoopers, which eventually causes IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). Since the destination Ethernet address is a multicast address, br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, which in turn calls br_multicast_add_group, which then deadlocks on multicast_lock. The fix is to move the call br_multicast_join_snoopers outside of the critical section. This works since br_multicast_join_snoopers only deals with IP and does not modify any multicast data structures of the bridge, so there's no need to hold the lock. Steps to reproduce: 1. sysctl net.ipv6.conf.all.force_mld_version=1 2. have another querier 3. ip link set dev bridge type bridge mcast_snooping 0 && \ ip link set dev bridge type bridge mcast_snooping 1 < deadlock > A typical call trace looks like the following: [ 936.251495] _raw_spin_lock+0x5c/0x68 [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] [ 936.269689] dev_hard_start_xmit+0x94/0x158 [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 [ 936.277890] dev_queue_xmit+0x10/0x18 [ 936.281563] neigh_resolve_output+0xec/0x198 [ 936.285845] ip6_finish_output2+0x240/0x710 [ 936.290039] __ip6_finish_output+0x130/0x170 [ 936.294318] ip6_output+0x6c/0x1c8 [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 [ 936.301834] igmp6_send+0x358/0x558 [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 [ 936.309774] igmp6_group_added+0xfc/0x110 [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] Fixes: `4effd28c12` ("bridge: join all-snoopers multicast address") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-07 17:14:43 -08:00
Cong Wang	e3366884b3	lwt_bpf: Replace preempt_disable() with migrate_disable() migrate_disable() is just a wrapper for preempt_disable() in non-RT kernel. It is safe to replace it, and RT kernel will benefit. Note that it is introduced since Feb 2020. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201205075946.497763-2-xiyou.wangcong@gmail.com	2020-12-07 11:53:40 -08:00
Dongdong Wang	d9054a1ff5	lwt: Disable BH too in run_lwt_bpf() The per-cpu bpf_redirect_info is shared among all skb_do_redirect() and BPF redirect helpers. Callers on RX path are all in BH context, disabling preemption is not sufficient to prevent BH interruption. In production, we observed strange packet drops because of the race condition between LWT xmit and TC ingress, and we verified this issue is fixed after we disable BH. Although this bug was technically introduced from the beginning, that is commit `3a0af8fd61` ("bpf: BPF for lightweight tunnel infrastructure"), at that time call_rcu() had to be call_rcu_bh() to match the RCU context. So this patch may not work well before RCU flavor consolidation has been completed around v5.0. Update the comments above the code too, as call_rcu() is now BH friendly. Signed-off-by: Dongdong Wang <wangdongdong.6@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/bpf/20201205075946.497763-1-xiyou.wangcong@gmail.com	2020-12-07 11:53:39 -08:00
Marcel Holtmann	e6ed8b78ea	Bluetooth: Increment management interface revision Increment the mgmt revision due to the recently added new commands. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:02:00 +02:00
Abhishek Pandit-Subedi	dce0a4be80	Bluetooth: Set missing suspend task bits When suspending, mark SUSPEND_SCAN_ENABLE and SUSPEND_SCAN_DISABLE tasks correctly when either classic or le scanning is modified. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Howard Chung <howardchung@google.com> Reviewed-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:01:46 +02:00
Daniel Winkler	4d9b952857	Bluetooth: Change MGMT security info CMD to be more generic For advertising, we wish to know the LE tx power capabilities of the controller in userspace, so this patch edits the Security Info MGMT command to be more generic, such that other various controller capabilities can be included in the EIR data. This change also includes the LE min and max tx power into this newly-named command. The change was tested by manually verifying that the MGMT command returns the tx power range as expected in userspace. Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:01:42 +02:00
Daniel Winkler	7c395ea521	Bluetooth: Query LE tx power on startup Queries tx power via HCI_LE_Read_Transmit_Power command when the hci device is initialized, and stores resulting min/max LE power in hdev struct. If command isn't available (< BT5 support), min/max values both default to HCI_TX_POWER_INVALID. This patch is manually verified by ensuring BT5 devices correctly query and receive controller tx power range. Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:01:38 +02:00
Daniel Winkler	9bf9f4b630	Bluetooth: Use intervals and tx power from mgmt cmds This patch takes the min/max intervals and tx power optionally provided in mgmt interface, stores them in the advertisement struct, and uses them when configuring the hci requests. While tx power is not used if extended advertising is unavailable, software rotation will use the min and max advertising intervals specified by the client. This change is validated manually by ensuring the min/max intervals are propagated to the controller on both hatch (extended advertising) and kukui (no extended advertising) chromebooks, and that tx power is propagated correctly on hatch. These tests are performed with multiple advertisements simultaneously. Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:01:33 +02:00
Daniel Winkler	1241057283	Bluetooth: Break add adv into two mgmt commands This patch adds support for the new advertising add interface, with the first command setting advertising parameters and the second to set advertising data. The set parameters command allows the caller to leave some fields "unset", with a params bitfield defining which params were purposefully set. Unset parameters will be given defaults when calling hci_add_adv_instance. The data passed to the param mgmt command is allowed to be flexible, so in the future if bluetoothd passes a larger structure with new params, the mgmt command will ignore the unknown members at the end. This change has been validated on both hatch (extended advertising) and kukui (no extended advertising) chromebooks running bluetoothd that support this new interface. I ran the following manual tests: - Set several (3) advertisements using modified test_advertisement.py - For each, validate correct data and parameters in btmon trace - Verified both for software rotation and extended adv Automatic test suite also run, testing many (25) scenarios of single and multi-advertising for data/parameter correctness. Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:01:28 +02:00
Daniel Winkler	31aab5c22e	Bluetooth: Add helper to set adv data We wish to handle advertising data separately from advertising parameters in our new MGMT requests. This change adds a helper that allows the advertising data and scan response to be updated for an existing advertising instance. Reviewed-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Daniel Winkler <danielwinkler@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:01:25 +02:00
Howard Chung	80af16a3e4	Bluetooth: Add toggle to switch off interleave scan This patch add a configurable parameter to switch off the interleave scan feature. Signed-off-by: Howard Chung <howardchung@google.com> Reviewed-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:01:00 +02:00
Howard Chung	3bc615fa93	Bluetooth: Refactor read default sys config for various types Refactor read default system configuration function so that it's capable of returning different types than u16 Signed-off-by: Howard Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:56 +02:00
Howard Chung	422bb17f8a	Bluetooth: Handle active scan case This patch adds code to handle the active scan during interleave scan. The interleave scan will be canceled when users start active scan, and it will be restarted after active scan stopped. Signed-off-by: Howard Chung <howardchung@google.com> Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:52 +02:00
Howard Chung	36afe87ac1	Bluetooth: Handle system suspend resume case This patch adds code to handle the system suspension during interleave scan. The interleave scan will be canceled when the system is going to sleep, and will be restarted after waking up. Signed-off-by: Howard Chung <howardchung@google.com> Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:47 +02:00
Howard Chung	c4f1f40816	Bluetooth: Interleave with allowlist scan This patch implements the interleaving between allowlist scan and no-filter scan. It'll be used to save power when at least one monitor is registered and at least one pending connection or one device to be scanned for. The durations of the allowlist scan and the no-filter scan are controlled by MGMT command: Set Default System Configuration. The default values are set randomly for now. Signed-off-by: Howard Chung <howardchung@google.com> Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:43 +02:00
Edward Vear	a31489d2a3	Bluetooth: Fix attempting to set RPA timeout when unsupported During controller initialization, an LE Set RPA Timeout command is sent to the controller if supported. However, the value checked to determine if the command is supported is incorrect. Page 1921 of the Bluetooth Core Spec v5.2 shows that bit 2 of octet 35 of the Supported_Commands field corresponds to the LE Set RPA Timeout command, but currently bit 6 of octet 35 is checked. This patch checks the correct value instead. This issue led to the error seen in the following btmon output during initialization of an adapter (rtl8761b) and prevented initialization from completing. < HCI Command: LE Set Resolvable Private Address Timeout (0x08\|0x002e) plen 2 Timeout: 900 seconds > HCI Event: Command Complete (0x0e) plen 4 LE Set Resolvable Private Address Timeout (0x08\|0x002e) ncmd 2 Status: Unsupported Remote Feature / Unsupported LMP Feature (0x1a) = Close Index: 00:E0:4C:6B:E5:03 The error did not appear when running with this patch. Signed-off-by: Edward Vear <edwardvear@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:38 +02:00
Luiz Augusto von Dentz	aeeae47d34	Bluetooth: Rename get_adv_instance_scan_rsp This renames get_adv_instance_scan_rsp to adv_instance_is_scannable and make it return a bool since it was not actually properly return the size of the scan response as one could expect. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:33 +02:00
Luiz Augusto von Dentz	a76a0d3650	Bluetooth: Fix not sending Set Extended Scan Response Current code is actually failing on the following tests of mgmt-tester because get_adv_instance_scan_rsp_len did not account for flags that cause scan response data to be included resulting in non-scannable instance when in fact it should be scannable. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:29 +02:00
Jimmy Wahlberg	5b8ec15d02	Bluetooth: Fix for Bluetooth SIG test L2CAP/COS/CFD/BV-14-C This test case is meant to verify that multiple unknown options is included in the response. BLUETOOTH CORE SPECIFICATION Version 5.2 \| Vol 3, Part A page 1057 'On an unknown option failure (Result=0x0003), the option(s) that contain anoption type field that is not understood by the recipient of the L2CAP_CONFIGURATION_REQ packet shall be included in the L2CAP_CONFIGURATION_RSP packet unless they are hints.' Before this patch: > ACL Data RX: Handle 11 flags 0x02 dlen 24 L2CAP: Configure Request (0x04) ident 18 len 16 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [mandatory] 10 00 11 02 11 00 12 02 12 00 < ACL Data TX: Handle 11 flags 0x00 dlen 17 L2CAP: Configure Response (0x05) ident 18 len 9 Source CID: 64 Flags: 0x0000 Result: Failure - unknown options (0x0003) Option: Unknown (0x10) [mandatory] 12 After this patch: > ACL Data RX: Handle 11 flags 0x02 dlen 24 L2CAP: Configure Request (0x04) ident 5 len 16 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [mandatory] 10 00 11 02 11 00 12 02 12 00 < ACL Data TX: Handle 11 flags 0x00 dlen 23 L2CAP: Configure Response (0x05) ident 5 len 15 Source CID: 64 Flags: 0x0000 Result: Failure - unknown options (0x0003) Option: Unknown (0x10) [mandatory] 10 11 01 11 12 01 12 Signed-off-by: Jimmy Wahlberg <jimmywa@spotify.com> Reviewed-by: Luiz Augusto Von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:23 +02:00
Wei Yongjun	f6b8c6b554	Bluetooth: sco: Fix crash when using BT_SNDMTU/BT_RCVMTU option This commit add the invalid check for connected socket, without it will causes the following crash due to sco_pi(sk)->conn being NULL: KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057] CPU: 3 PID: 4284 Comm: test_sco Not tainted 5.10.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:sco_sock_getsockopt+0x45d/0x8e0 Code: 48 c1 ea 03 80 3c 02 00 0f 85 ca 03 00 00 49 8b 9d f8 04 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 50 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e b5 03 00 00 8b 43 50 48 8b 0c RSP: 0018:ffff88801bb17d88 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff83a4ecdf RDX: 000000000000000a RSI: ffffc90002fce000 RDI: 0000000000000050 RBP: 1ffff11003762fb4 R08: 0000000000000001 R09: ffff88810e1008c0 R10: ffffffffbd695dcf R11: fffffbfff7ad2bb9 R12: 0000000000000000 R13: ffff888018ff1000 R14: dffffc0000000000 R15: 000000000000000d FS: 00007fb4f76c1700(0000) GS:ffff88811af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555e3b7a938 CR3: 00000001117be001 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? sco_skb_put_cmsg+0x80/0x80 ? sco_skb_put_cmsg+0x80/0x80 __sys_getsockopt+0x12a/0x220 ? __ia32_sys_setsockopt+0x150/0x150 ? syscall_enter_from_user_mode+0x18/0x50 ? rcu_read_lock_bh_held+0xb0/0xb0 __x64_sys_getsockopt+0xba/0x150 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `0fc1a726f8` ("Bluetooth: sco: new getsockopt options BT_SNDMTU/BT_RCVMTU") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Luiz Augusto Von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 17:00:15 +02:00
Reo Shiseki	353021588c	Bluetooth: fix typo in struct name Signed-off-by: Reo Shiseki <reoshiseki@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2020-12-07 16:51:22 +02:00
Xin Long	10c678bd0a	udp: fix the proto value passed to ip_protocol_deliver_rcu for the segments Guillaume noticed that: for segments udp_queue_rcv_one_skb() returns the proto, and it should pass "ret" unmodified to ip_protocol_deliver_rcu(). Otherwize, with a negtive value passed, it will underflow inet_protos. This can be reproduced with IPIP FOU: # ip fou add port 5555 ipproto 4 # ethtool -K eth1 rx-gro-list on Fixes: `cf329aa42b` ("udp: cope with UDP GRO packet misdirection") Reported-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-07 00:32:11 -08:00
Jakub Kicinski	78d6bb584d	This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - update include for min/max helpers, by Sven Eckelmann - add infrastructure and netlink functions for routing algo selection, by Sven Eckelmann (2 patches) - drop deprecated debugfs and sysfs support and obsoleted functionality, by Sven Eckelmann (3 patches) - drop unused include in fragmentation.c, by Simon Wunderlich -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl/KWIkWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoTrTEACUImdOCWv4+NnEQfChQv6Y3i18 gJABXoOkWLfFpGBUlw/uYzFKpMEWZ0orHig9gucC+rmjNc8veWwAOugJoTPTKQJZ /4yndhM0x39vWex03rdDmyqzCEh1V1Q9VcdEuD6XbJDaK5F4jDu3NQVneOijIkN+ 5PzhlvtUlfe8csykOCOoC9Y5wy82fEhcEvuSq+Z6dU3Cb3EGHtEUtZ4orDkpnnml 7XEcn5C5+OFGlz/ikiszKumTtNK+dmGluOxoyfAzEjQHK7PoTorcXFS2YUoSWeqQ gmYZ56RBqEHjo4eqcaEgcqq5v8cTPCEMCB8UQjAffxrhloRKHRhQOysG1+OnzGA8 IQ2ARHLQCVPVraXF2ixE0D3BvjKmtMmcvZOCXwhCHDajn9jFKAh0+hnInDyv6Fp1 7eUfpHACL9EQDxKWXeQg37X2mk3hHJ+4zgZOYidahVeKbiiexe2heaHTYAbr9rIf 8hvtlgMg4AnwL3IxadrKwsbJ5t7TEPLTInf47hPvpRg7SmthDTcgso4VDwmWgB8W Tlug8/NoXXDCmDhXUpvyi9+idHe0J8xvHl/2xGC7aSsPAbuhqOuefMKr36YhXJTY vBA5Ih5ppylJ8Dzwa0TbonvQbOAinA8YTa6izKxY4e+xTPB2jz/WT2vciEZv2+ig vNIPFrLZ7OFMohCDfQ== =tNKF -----END PGP SIGNATURE----- Merge tag 'batadv-next-pullrequest-20201204' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - update include for min/max helpers, by Sven Eckelmann - add infrastructure and netlink functions for routing algo selection, by Sven Eckelmann (2 patches) - drop deprecated debugfs and sysfs support and obsoleted functionality, by Sven Eckelmann (3 patches) - drop unused include in fragmentation.c, by Simon Wunderlich * tag 'batadv-next-pullrequest-20201204' of git://git.open-mesh.org/linux-merge: batman-adv: Drop unused soft-interface.h include in fragmentation.c batman-adv: Drop legacy code for auto deleting mesh interfaces batman-adv: Drop deprecated debugfs support batman-adv: Drop deprecated sysfs support batman-adv: Allow selection of routing algorithm over rtnetlink batman-adv: Prepare infrastructure for newlink settings batman-adv: Add new include for min/max helpers batman-adv: Start new development cycle ==================== Link: https://lore.kernel.org/r/20201204154631.21063-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-05 15:08:06 -08:00
Bongsu Jeon	bcd684aace	net/nfc/nci: Support NCI 2.x initial sequence implement the NCI 2.x initial sequence to support NCI 2.x NFCC. Since NCI 2.0, CORE_RESET and CORE_INIT sequence have been changed. If NFCEE supports NCI 2.x, then NCI 2.x initial sequence will work. In NCI 1.0, Initial sequence and payloads are as below: (DH) (NFCC) \| -- CORE_RESET_CMD --> \| \| <-- CORE_RESET_RSP -- \| \| -- CORE_INIT_CMD --> \| \| <-- CORE_INIT_RSP -- \| CORE_RESET_RSP payloads are Status, NCI version, Configuration Status. CORE_INIT_CMD payloads are empty. CORE_INIT_RSP payloads are Status, NFCC Features, Number of Supported RF Interfaces, Supported RF Interface, Max Logical Connections, Max Routing table Size, Max Control Packet Payload Size, Max Size for Large Parameters, Manufacturer ID, Manufacturer Specific Information. In NCI 2.0, Initial Sequence and Parameters are as below: (DH) (NFCC) \| -- CORE_RESET_CMD --> \| \| <-- CORE_RESET_RSP -- \| \| <-- CORE_RESET_NTF -- \| \| -- CORE_INIT_CMD --> \| \| <-- CORE_INIT_RSP -- \| CORE_RESET_RSP payloads are Status. CORE_RESET_NTF payloads are Reset Trigger, Configuration Status, NCI Version, Manufacturer ID, Manufacturer Specific Information Length, Manufacturer Specific Information. CORE_INIT_CMD payloads are Feature1, Feature2. CORE_INIT_RSP payloads are Status, NFCC Features, Max Logical Connections, Max Routing Table Size, Max Control Packet Payload Size, Max Data Packet Payload Size of the Static HCI Connection, Number of Credits of the Static HCI Connection, Max NFC-V RF Frame Size, Number of Supported RF Interfaces, Supported RF Interfaces. Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Link: https://lore.kernel.org/r/20201202223147.3472-1-bongsu.jeon@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 17:47:35 -08:00
Hoang Le	43fcd906d9	tipc: support 128bit node identity for peer removing We add the support to remove a specific node down with 128bit node identifier, as an alternative to legacy 32-bit node address. example: $tipc peer remove identiy <1001002\|16777777> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20201203035045.4564-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 17:40:27 -08:00
Eric Dumazet	905b2032fa	mac80211: mesh: fix mesh_pathtbl_init() error path If tbl_mpp can not be allocated, we call mesh_table_free(tbl_path) while tbl_path rhashtable has not yet been initialized, which causes panics. Simply factorize the rhashtable_init() call into mesh_table_alloc() WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __flush_work kernel/workqueue.c:3040 [inline] WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136 Modules linked in: CPU: 1 PID: 8474 Comm: syz-executor663 Not tainted 5.10.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__flush_work kernel/workqueue.c:3040 [inline] RIP: 0010:__cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136 Code: 5d c3 e8 bf ae 29 00 0f 0b e9 f0 fd ff ff e8 b3 ae 29 00 0f 0b 43 80 3c 3e 00 0f 85 31 ff ff ff e9 34 ff ff ff e8 9c ae 29 00 <0f> 0b e9 dc fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7d fd ff RSP: 0018:ffffc9000165f5a0 EFLAGS: 00010293 RAX: ffffffff814b7064 RBX: 0000000000000001 RCX: ffff888021c80000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888024039ca0 R08: dffffc0000000000 R09: fffffbfff1dd3e64 R10: fffffbfff1dd3e64 R11: 0000000000000000 R12: 1ffff920002cbebd R13: ffff888024039c88 R14: 1ffff11004807391 R15: dffffc0000000000 FS: 0000000001347880(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000002cc0a000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rhashtable_free_and_destroy+0x25/0x9c0 lib/rhashtable.c:1137 mesh_table_free net/mac80211/mesh_pathtbl.c:69 [inline] mesh_pathtbl_init+0x287/0x2e0 net/mac80211/mesh_pathtbl.c:785 ieee80211_mesh_init_sdata+0x2ee/0x530 net/mac80211/mesh.c:1591 ieee80211_setup_sdata+0x733/0xc40 net/mac80211/iface.c:1569 ieee80211_if_add+0xd5c/0x1cd0 net/mac80211/iface.c:1987 ieee80211_add_iface+0x59/0x130 net/mac80211/cfg.c:125 rdev_add_virtual_intf net/wireless/rdev-ops.h:45 [inline] nl80211_new_interface+0x563/0xb40 net/wireless/nl80211.c:3855 genl_family_rcv_msg_doit net/netlink/genetlink.c:739 [inline] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0xe4e/0x1280 net/netlink/genetlink.c:800 netlink_rcv_skb+0x190/0x3a0 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x780/0x930 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x9a8/0xd40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0x519/0x800 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x2b1/0x360 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `60854fd945` ("mac80211: mesh: convert path table to rhashtable") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20201204162428.2583119-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 17:34:25 -08:00
Wang Hai	bb2da7651a	openvswitch: fix error return code in validate_and_copy_dec_ttl() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Changing 'return start' to 'return action_start' can fix this bug. Fixes: `69929d4c49` ("net: openvswitch: fix TTL decrement action netlink message format") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20201204114314.1596-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 15:43:14 -08:00
Zhang Changzhong	ee4f52a8de	net: bridge: vlan: fix error return code in __vlan_add() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `f8ed289fab` ("bridge: vlan: use br_vlan_(get\|put)_master to deal with refcounts") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 15:41:06 -08:00
Zhang Changzhong	b410f04eb5	ipv4: fix error return code in rtm_to_fib_config() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `d15662682d` ("ipv4: Allow ipv6 gateway with ipv4 routes") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/1607071695-33740-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 15:38:16 -08:00
Davide Caratti	4eef8b1f36	net/sched: fq_pie: initialize timer earlier in fq_pie_init() with the following tdc testcase: 83be: (qdisc, fq_pie) Create FQ-PIE with invalid number of flows as fq_pie_init() fails, fq_pie_destroy() is called to clean up. Since the timer is not yet initialized, it's possible to observe a splat like this: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 975 Comm: tc Not tainted 5.10.0-rc4+ #298 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x99/0xcb register_lock_class+0x12dd/0x1750 __lock_acquire+0xfe/0x3970 lock_acquire+0x1c8/0x7f0 del_timer_sync+0x49/0xd0 fq_pie_destroy+0x3f/0x80 [sch_fq_pie] qdisc_create+0x916/0x1160 tc_modify_qdisc+0x3c4/0x1630 rtnetlink_rcv_msg+0x346/0x8e0 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 [...] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0 WARNING: CPU: 0 PID: 975 at lib/debugobjects.c:508 debug_print_object+0x162/0x210 [...] Call Trace: debug_object_assert_init+0x268/0x380 try_to_del_timer_sync+0x6a/0x100 del_timer_sync+0x9e/0xd0 fq_pie_destroy+0x3f/0x80 [sch_fq_pie] qdisc_create+0x916/0x1160 tc_modify_qdisc+0x3c4/0x1630 rtnetlink_rcv_msg+0x346/0x8e0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 fix it moving timer_setup() before any failure, like it was done on 'red' with former commit `608b4adab1` ("net_sched: initialize timer earlier in red_init()"). Fixes: `ec97ecf1eb` ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/2e78e01c504c633ebdff18d041833cf2e079a3a4.1607020450.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 14:15:01 -08:00
Arjun Roy	94ab9eb9b2	net-zerocopy: Defer vm zap unless actually needed. Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be triggered from userspace with madvise(MADV_DONTNEED). If userspace is configured to call this before reusing a segment, or if there was nothing mapped at this virtual address to begin with, we can avoid calling zap_page_range() under the socket lock. That said, if userspace does not do that, then we are still responsible for calling zap_page_range(). This patch adds a flag that the user can use to hint to the kernel that a zap is not required. If the flag is not set, or if an older user application does not have a flags field at all, then the kernel calls zap_page_range as before. Also, if the flag is set but a zap is still required, the kernel performs that zap as necessary. Thus incorrectly indicating that a zap can be avoided does not change the correctness of operation. It also increases the batchsize for vm_insert_pages and prefetches the page struct for the batch since we're about to bump the refcount. An alternative mechanism could be to not have a flag, assume by default a zap is not needed, and fall back to zapping if needed. However, this would harm performance for older applications for which a zap is necessary, and thus we implement it with an explicit flag so newer applications can opt in. When using RPC-style traffic with medium sized (tens of KB) RPCs, this change yields an efficency improvement of about 30% for QPS/CPU usage. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	0c3936d32f	net-zerocopy: Set zerocopy hint when data is copied Set zerocopy hint, event when falling back to copy, so that the pending data can be efficiently received using zerocopy when possible. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	f21a3c4803	net-zerocopy: Introduce short-circuit small reads. Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, or inq is generally small enough that it is cheaper to copy rather than remap pages. In these cases, we may want to either return early (inq=0) or attempt to use the provided copy buffer to simply copy the received data. This allows us to save both system call overhead and the latency of acquiring mmap_sem in read mode for cases where it would be useless to do so. This patchset enables this behaviour by: 1. Returning quickly if inq is 0. 2. Attempting to perform a regular copy if a hybrid copybuffer is provided and it is large enough to absorb all available bytes. 3. Return quickly if no such buffer was provided and there are less than PAGE_SIZE bytes available. For small RPC ping-pong workloads, normally we would have 1 getsockopt(), 1 recvmsg() and 1 sendmsg() call per RPC. With this change, we remove the recvmsg() call entirely, reducing the syscall overhead by about 33%. In testing with small (hundreds of bytes) RPC traffic, this yields a syscall reduction of about 33% and an efficiency gain of about 3-5% when defined as QPS/CPU Util. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	936ced4157	net-zerocopy: Fast return if inq < PAGE_SIZE Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, in which case we cannot remap pages. In this case, simply return the appropriate hint for regular copying without taking mmap_sem. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	98917cf0d6	net-zerocopy: Refactor frag-is-remappable test. Refactor frag-is-remappable test for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Arjun Roy	7fba5309ef	net-zerocopy: Refactor skb frag fast-forward op. Refactor skb frag fast-forwarding for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. skb_advance_to_frag(), given a skb and an offset into the skb, iterates from the first frag for the skb until we're at the frag specified by the offset. Assuming the offset provided refers to how many bytes in the skb are already read, the returned frag points to the next frag we may read from, while offset_frag is set to the number of bytes from this frag that we have already read. If frag is not null and offset_frag is equal to 0, then we may be able to map this frag's page into the process address space with vm_insert_page(). However, if offset_frag is not equal to 0, then we cannot do so. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Arjun Roy	2cd8116184	net-tcp: Introduce tcp_recvmsg_locked(). Refactor tcp_recvmsg() by splitting it into locked and unlocked portions. Callers already holding the socket lock and not using ERRQUEUE/cmsg/busy polling can simply call tcp_recvmsg_locked(). This is in preparation for a short-circuit copy performed by TCP receive zerocopy for small (< PAGE_SIZE, or otherwise requested by the user) reads. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Arjun Roy	18fb76ed53	net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy. When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will do so. This elides a recvmsg() call for received traffic that isn't exactly page-aligned in size. This was tested with RPC-style traffic of arbitrary sizes. Normally, each received message required at least one getsockopt() call, and one recvmsg() call for the remaining unaligned data. With this change, almost all of the recvmsg() calls are eliminated, leading to a savings of about 25%-50% in number of system calls for RPC-style workloads. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Florent Revest	a50a85e40c	bpf: Expose bpf_sk_storage_* to iterator programs Iterators are currently used to expose kernel information to userspace over fast procfs-like files but iterators could also be used to manipulate local storage. For example, the task_file iterator could be used to initialize a socket local storage with associations between processes and sockets or to selectively delete local storage values. Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-3-revest@google.com	2020-12-04 22:32:40 +01:00
Florent Revest	dba4a9256b	net: Remove the err argument from sock_from_file Currently, the sock_from_file prototype takes an "err" pointer that is either not set or set to -ENOTSOCK IFF the returned socket is NULL. This makes the error redundant and it is ignored by a few callers. This patch simplifies the API by letting callers deduce the error based on whether the returned socket is NULL or not. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-1-revest@google.com	2020-12-04 22:32:40 +01:00
Andrea Mayer	20a081b798	seg6: add VRF support for SRv6 End.DT6 behavior SRv6 End.DT6 is defined in the SRv6 Network Programming [1]. The Linux kernel already offers an implementation of the SRv6 End.DT6 behavior which permits IPv6 L3 VPNs over SRv6 networks. This implementation is not particularly suitable in contexts where we need to deploy IPv6 L3 VPNs among different tenants which share the same network address schemes. The underlying problem lies in the fact that the current version of DT6 (called legacy DT6 from now on) needs a complex configuration to be applied on routers which requires ad-hoc routes and routing policy rules to ensure the correct isolation of tenants. Consequently, a new implementation of DT6 has been introduced with the aim of simplifying the construction of IPv6 L3 VPN services in the multi-tenant environment using SRv6 networks. To accomplish this task, we reused the same VRF infrastructure and SRv6 core components already exploited for implementing the SRv6 End.DT4 behavior. Currently the two End.DT6 implementations coexist seamlessly and can be used depending on the context and the user preferences. So, in order to support both versions of DT6 a new attribute (vrftable) has been introduced which allows us to differentiate the implementation of the behavior to be used. A SRv6 End.DT6 legacy behavior is still instantiated using a command like the following one: $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table 100 dev eth0 While to instantiate the SRv6 End.DT6 in VRF mode, the command is still pretty straight forward: $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 vrftable 100 dev eth0. Obviously as in the case of SRv6 End.DT4, the VRF strict_mode parameter must be set (net.vrf.strict_mode=1) and the VRF associated with table 100 must exist. Please note that the instances of SRv6 End.DT6 legacy and End.DT6 VRF mode can coexist in the same system/configuration without problems. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:51 -08:00
Andrea Mayer	664d6f8686	seg6: add support for the SRv6 End.DT4 behavior SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants environments. It decapsulates the received packets and it performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT4 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e: $ sysctl -wq net.vrf.strict_mode=1. At JANOG44, LINE corporation presented their multi-tenant DC architecture using SRv6 [2]. In the slides, they reported that the Linux kernel is missing the support of SRv6 End.DT4 behavior. The SRv6 End.DT4 behavior can be instantiated using a command similar to the following: $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0 We introduce the "vrftable" extension in iproute2 in a following patch. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Andrea Mayer	cfdf64a034	seg6: add callbacks for customizing the creation/destruction of a behavior We introduce two callbacks used for customizing the creation/destruction of a SRv6 behavior. Such callbacks are defined in the new struct seg6_local_lwtunnel_ops and hereafter we provide a brief description of them: - build_state(...): used for calling the custom constructor of the behavior during its initialization phase and after all the attributes have been parsed successfully; - destroy_state(...): used for calling the custom destructor of the behavior before it is completely destroyed. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Andrea Mayer	0a3021f1d4	seg6: add support for optional attributes in SRv6 behaviors Before this patch, each SRv6 behavior specifies a set of required attributes that must be provided by the userspace application when such behavior is going to be instantiated. If at least one of the required attributes is not provided, the creation of the behavior fails. The SRv6 behavior framework lacks a way to manage optional attributes. By definition, an optional attribute for a SRv6 behavior consists of an attribute which may or may not be provided by the userspace. Therefore, if an optional attribute is missing (and thus not supplied by the user) the creation of the behavior goes ahead without any issue. This patch explicitly differentiates the required attributes from the optional attributes. In particular, each behavior can declare a set of required attributes and a set of optional ones. The semantic of the required attributes remains totally unaffected by this patch. The introduction of the optional attributes does NOT impact on the backward compatibility of the existing SRv6 behaviors. It is essential to note that if an (optional or required) attribute is supplied to a SRv6 behavior which does not expect it, the behavior simply discards such attribute without generating any error or warning. This operating mode remained unchanged both before and after the introduction of the optional attributes extension. The optional attributes are one of the key components used to implement the SRv6 End.DT6 behavior based on the Virtual Routing and Forwarding (VRF) framework. The optional attributes make possible the coexistence of the already existing SRv6 End.DT6 implementation with the new SRv6 End.DT6 VRF-based implementation without breaking any backward compatibility. Further details on the SRv6 End.DT6 behavior (VRF mode) are reported in subsequent patches. From the userspace point of view, the support for optional attributes DO NOT require any changes to the userspace applications, i.e: iproute2 unless new attributes (required or optional) are needed. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Andrea Mayer	964adce526	seg6: improve management of behavior attributes Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc), the parse() callback performs some validity checks on the provided input and updates the tunnel state (slwt) with the result of the parsing operation. However, an attribute may also need to reserve some additional resources (i.e.: memory or setting up an eBPF program) in the parse() callback to complete the parsing operation. The parse() callbacks are invoked by the parse_nla_action() for each attribute belonging to a specific behavior. Given a behavior with N attributes, if the parsing of the i-th attribute fails, the parse_nla_action() returns immediately with an error. Nonetheless, the resources acquired during the parsing of the i-1 attributes are not freed by the parse_nla_action(). Attributes which acquire resources must release them in an explicit way in both the seg6_local_{build/destroy}_state(). However, adding a new attribute of this type requires changes to seg6_local_{build/destroy}_state() to release the resources correctly. The seg6local infrastructure still lacks a simple and structured way to release the resources acquired in the parse() operations. We introduced a new callback in the struct seg6_action_param named destroy(). This callback releases any resource which may have been acquired in the parse() counterpart. Each attribute may or may not implement the destroy() callback depending on whether it needs to free some acquired resources. The destroy() callback comes with several of advantages: 1) we can have many attributes as we want for a given behavior with no need to explicitly free the taken resources; 2) As in case of the seg6_local_build_state(), the seg6_local_destroy_state() does not need to handle the release of resources directly. Indeed, it calls the destroy_attrs() function which is in charge of calling the destroy() callback for every set attribute. We do not need to patch seg6_local_{build/destroy}_state() anymore as we add new attributes; 3) the code is more readable and better structured. Indeed, all the information needed to handle a given attribute are contained in only one place; 4) it facilitates the integration with new features introduced in further patches. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Jakub Kicinski	846c3c9cfe	wireless-drivers-next patches for v5.11 First set of patches for v5.11. rtw88 getting improvements to work better with Bluetooth and other driver also getting some new features. mhi-ath11k-immutable branch was pulled from mhi tree to avoid conflicts with mhi tree. Major changes: rtw88 * major bluetooth co-existance improvements wilc1000 * Wi-Fi Multimedia (WMM) support ath11k * Fast Initial Link Setup (FILS) discovery and unsolicited broadcast probe response support * qcom,ath11k-calibration-variant Device Tree setting * cold boot calibration support * new DFS region: JP wnc36xx * enable connection monitoring and keepalive in firmware ath10k * firmware IRAM recovery feature mhi * merge mhi-ath11k-immutable branch to make MHI API change go smoothly -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJfyTQyAAoJEG4XJFUm622bCdcIAIyVnqdW7pnoDmWIyQmAEnD9 vGARkzghPHXnufpOzohyDdxT12X9klhrxSVIgzEgH1/pl3i1PpnF6KXyGFCC44Lw wrLXhQygPzmIW1IZtJJE3G72WExXoRjWx6LD1I7C7oEIduqFixXADmK2tKzFp795 Jxum+sOeT6+Dk1OvO/fIroBHX73mRE9zAuiTIMpt2G1j8uXs9QVfcTbTrUshLASN 0sX9J6JutltBuM4G7+bFpVzKnLnlQ7ebUaF6nvTCQsgHWZwkS7yAubSWX9sFohbR UXgQHNE83s/esOg7nBxAfqTKP8mbxsobmxZtxE5GR5vFY5FJDxqP9Zc2KzPp39w= =CbX/ -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-next-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.11 First set of patches for v5.11. rtw88 getting improvements to work better with Bluetooth and other driver also getting some new features. mhi-ath11k-immutable branch was pulled from mhi tree to avoid conflicts with mhi tree. Major changes: rtw88 * major bluetooth co-existance improvements wilc1000 * Wi-Fi Multimedia (WMM) support ath11k * Fast Initial Link Setup (FILS) discovery and unsolicited broadcast probe response support * qcom,ath11k-calibration-variant Device Tree setting * cold boot calibration support * new DFS region: JP wnc36xx * enable connection monitoring and keepalive in firmware ath10k * firmware IRAM recovery feature mhi * merge mhi-ath11k-immutable branch to make MHI API change go smoothly * tag 'wireless-drivers-next-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (180 commits) wl1251: remove trailing semicolon in macro definition airo: remove trailing semicolon in macro definition wilc1000: added queue support for WMM wilc1000: call complete() for failure in wilc_wlan_txq_add_cfg_pkt() wilc1000: free resource in wilc_wlan_txq_add_mgmt_pkt() for failure path wilc1000: free resource in wilc_wlan_txq_add_net_pkt() for failure path wilc1000: added 'ndo_set_mac_address' callback support brcmfmac: expose firmware config files through modinfo wlcore: Switch to using the new API kobj_to_dev() rtw88: coex: add feature to enhance HID coexistence performance rtw88: coex: upgrade coexistence A2DP mechanism rtw88: coex: add action for coexistence in hardware initial rtw88: coex: add function to avoid cck lock rtw88: coex: change the coexistence mechanism for WLAN connected rtw88: coex: change the coexistence mechanism for HID rtw88: coex: update AFH information while in free-run mode rtw88: coex: update the mechanism for A2DP + PAN rtw88: coex: add debug message rtw88: coex: run coexistence when WLAN entering/leaving LPS Revert "rtl8xxxu: Add Buffalo WI-U3-866D to list of supported devices" ... ==================== Link: https://lore.kernel.org/r/20201203185732.9CFA5C433ED@smtp.codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 10:56:37 -08:00
Zhang Changzhong	12c8a8ca11	xsk: Return error code if force_zc is set If force_zc is set, we should exit out with an error, not fall back to copy mode. Fixes: `921b68692a` ("xsk: Enable sharing of dma mappings") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/1607077277-41995-1-git-send-email-zhangchangzhong@huawei.com	2020-12-04 16:48:31 +01:00
Jakub Kicinski	a1dd1d8697	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 07:48:12 -08:00
Borwankar, Antara	bdeca45a0c	mac80211: set SDATA_STATE_RUNNING for monitor interfaces During restarrt, mac80211 is supposed to reconfigure the driver. When there's a monitor interface, the interface is added and the channel context for it was created, but not assigned to it as it was not considered running during the restart. Fix this by setting SDATA_STATE_RUNNING while adding monitor interfaces. Signed-off-by: Borwankar, Antara <antara.borwankar@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20201129172929.e1df99693a4c.I494579f28018c2d0b9d4083a664cf872c28405ae@changeid [reword commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-12-04 12:45:25 +01:00
Sara Sharon	f495acd885	cfg80211: initialize rekey_data In case we have old supplicant, the akm field is uninitialized. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20201129172929.930f0ab7ebee.Ic546e384efab3f4a89f318eafddc3eb7d556aecb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-12-04 12:35:58 +01:00
Wen Gong	8fca2b8706	mac80211: fix return value of ieee80211_chandef_he_6ghz_oper ieee80211_chandef_he_6ghz_oper() needs to return true if it determined a value 6 GHz chandef, fix that. Fixes: `1d00ce807e` ("mac80211: support S1G association") Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/1606121152-3452-1-git-send-email-wgong@codeaurora.org [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-12-04 12:35:58 +01:00
Simon Wunderlich	34a14c2e63	batman-adv: Drop unused soft-interface.h include in fragmentation.c The commit `992b03b88e` ("batman-adv: Don't always reallocate the fragmentation skb head") removed the last user of functions from soft-interface.h. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:41:16 +01:00
Sven Eckelmann	a962cb29bb	batman-adv: Drop legacy code for auto deleting mesh interfaces The only way to automatically drop batadv mesh interfaces when all soft interfaces were removed was dropped with the sysfs support. It is no longer needed to have them handled by kernel anymore. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:40:52 +01:00
Sven Eckelmann	aff6f5a68b	batman-adv: Drop deprecated debugfs support The debugfs support in batman-adv was marked as deprecated by the commit `00caf6a2b3` ("batman-adv: Mark debugfs functionality as deprecated") and scheduled for removal in 2021. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:40:52 +01:00
Sven Eckelmann	76e9f27628	batman-adv: Drop deprecated sysfs support The sysfs in batman-adv support was marked as deprecated by the commit `42cdd52148` ("batman-adv: ABI: Mark sysfs files as deprecated") and scheduled for removal in 2021. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:40:52 +01:00
Sven Eckelmann	a5ad457eea	batman-adv: Allow selection of routing algorithm over rtnetlink A batadv net_device is associated to a B.A.T.M.A.N. routing algorithm. This algorithm has to be selected before the interface is initialized and cannot be changed after that. The only way to select this algorithm was a module parameter which specifies the default algorithm used during the creation of the net_device. This module parameter is writeable over /sys/module/batman_adv/parameters/routing_algo and thus allows switching of the routing algorithm: 1. change routing_algo parameter 2. create new batadv net_device But this is not race free because another process can be scheduled between 1 + 2 and in that time frame change the routing_algo parameter again. It is much cleaner to directly provide this information inside the rtnetlink's RTM_NEWLINK message. The two processes would be (in regards of the creation parameter of their batadv interfaces) be isolated. This also eases the integration of batadv devices inside tools like network-manager or systemd-networkd which are not expecting to operate on /sys before a new net_device is created. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:40:52 +01:00
Sven Eckelmann	128254ceea	batman-adv: Prepare infrastructure for newlink settings The batadv generic netlink family can be used to retrieve the current state and set various configuration settings. But there are also settings which must be set before the actual interface is created. The rtnetlink already uses IFLA_INFO_DATA to allow net_device families to transfer such configurations. The minimal required functionality for this is now available for the batadv rtnl_link_ops. Also a new IFLA class of attributes will be attached to it because rtnetlink only allows 51 different attributes but batadv_nl_attrs already contains 62 attributes. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:40:52 +01:00
Sven Eckelmann	fcd193e1df	batman-adv: Add new include for min/max helpers The commit `b296a6d533` ("kernel.h: split out min()/max() et al. helpers") moved the min/max helper functionality from kernel.h to minmax.h. Adjust the kernel code accordingly to avoid fragile indirect includes. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:40:52 +01:00
Simon Wunderlich	fee3e9554a	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-12-04 08:40:52 +01:00
Andrii Nakryiko	22dc4a0f5e	bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF type ID, rather than BTF object's ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-10-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Prankur gupta	cb81110997	bpf: Adds support for setting window clamp Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_WINDOW_CLAMP, which sets the maximum receiver window size. It will be useful for limiting receiver window based on RTT. Signed-off-by: Prankur gupta <prankgup@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202213152.435886-2-prankgup@fb.com	2020-12-03 17:23:24 -08:00
Jakub Kicinski	55fd59b003	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 15:44:09 -08:00
Florian Westphal	3ecfbe3e82	mptcp: emit tcp reset when a join request fails RFC 8684 says: If the token is unknown or the host wants to refuse subflow establishment (for example, due to a limit on the number of subflows it will permit), the receiver will send back a reset (RST) signal, analogous to an unknown port in TCP, containing an MP_TCPRST option (Section 3.6) with an "MPTCP specific error" reason code. mptcp-next doesn't support MP_TCPRST yet, this can be added in another change. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 12:56:03 -08:00
Florian Westphal	7ea851d19b	tcp: merge 'init_req' and 'route_req' functions The Multipath-TCP standard (RFC 8684) says that an MPTCP host should send a TCP reset if the token in a MP_JOIN request is unknown. At this time we don't do this, the 3whs completes and the 'new subflow' is reset afterwards. There are two ways to allow MPTCP to send the reset. 1. override 'send_synack' callback and emit the rst from there. The drawback is that the request socket gets inserted into the listeners queue just to get removed again right away. 2. Send the reset from the 'route_req' function instead. This avoids the 'add&remove request socket', but route_req lacks the skb that is required to send the TCP reset. Instead of just adding the skb to that function for MPTCP sake alone, Paolo suggested to merge init_req and route_req functions. This saves one indirection from syn processing path and provides the skb to the merged function at the same time. 'send reset on unknown mptcp join token' is added in next patch. Suggested-by: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 12:56:03 -08:00
Davide Caratti	9608fa6530	net/sched: act_mpls: ensure LSE is pullable before reading it when 'act_mpls' is used to mangle the LSE, the current value is read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is contained in the skb "linear" area. Found by code inspection. v2: - use MPLS_HLEN instead of sizeof(new_lse), thanks to Jakub Kicinski Fixes: `2a2ea50870` ("net: sched: add mpls manipulation actions to TC") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/3243506cba43d14858f3bd21ee0994160e44d64a.1606987058.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 11:13:37 -08:00
Davide Caratti	43c13605ba	net: openvswitch: ensure LSE is pullable before reading it when openvswitch is configured to mangle the LSE, the current value is read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is contained in the skb "linear" area. Found by code inspection. Fixes: `d27cf5c59a` ("net: core: add MPLS update core helper and use in OvS") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/aa099f245d93218b84b5c056b67b6058ccf81a66.1606987185.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 11:13:29 -08:00
Davide Caratti	13de4ed9e3	net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl skb_mpls_dec_ttl() reads the LSE without ensuring that it is contained in the skb "linear" area. Fix this calling pskb_may_pull() before reading the current ttl. Found by code inspection. Fixes: `2a2ea50870` ("net: sched: add mpls manipulation actions to TC") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/53659f28be8bc336c113b5254dc637cc76bbae91.1606987074.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 11:13:21 -08:00
Roman Gushchin	819a4f3235	bpf: Eliminate rlimit-based memory accounting for xskmap maps Do not use rlimit-based memory accounting for xskmap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-31-guro@fb.com	2020-12-02 18:32:47 -08:00
Roman Gushchin	0d2c4f9640	bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps Do not use rlimit-based memory accounting for sockmap and sockhash maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-29-guro@fb.com	2020-12-02 18:32:47 -08:00
Roman Gushchin	28e1dcdef0	bpf: Refine memcg-based memory accounting for xskmap maps Extend xskmap memory accounting to include the memory taken by the xsk_map_node structure. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-18-guro@fb.com	2020-12-02 18:32:46 -08:00
Roman Gushchin	7846dd9f83	bpf: Refine memcg-based memory accounting for sockmap and sockhash maps Include internal metadata into the memcg-based memory accounting. Also include the memory allocated on updating an element. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-17-guro@fb.com	2020-12-02 18:32:46 -08:00
Dan Carpenter	6ee50c8e26	net/x25: prevent a couple of overflows The .x25_addr[] address comes from the user and is not necessarily NUL terminated. This leads to a couple problems. The first problem is that the strlen() in x25_bind() can read beyond the end of the buffer. The second problem is more subtle and could result in memory corruption. The call tree is: x25_connect() --> x25_write_internal() --> x25_addr_aton() The .x25_addr[] buffers are copied to the "addresses" buffer from x25_write_internal() so it will lead to stack corruption. Verify that the strings are NUL terminated and return -EINVAL if they are not. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Fixes: `a9288525d2` ("X25: Dont let x25_bind use addresses containing characters") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/X8ZeAKm8FnFpN//B@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-02 17:26:36 -08:00
Xuan Zhuo	3413f04141	xsk: Change the tx writeable condition Modify the tx writeable condition from the queue is not full to the number of present tx queues is less than the half of the total number of queues. Because the tx queue not full is a very short time, this will cause a large number of EPOLLOUT events, and cause a large number of process wake up. Fixes: `35fcde7f8d` ("xsk: support for Tx") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/508fef55188d4e1160747ead64c6dcda36735880.1606555939.git.xuanzhuo@linux.alibaba.com	2020-12-03 01:14:38 +01:00
Xuan Zhuo	f5da54187e	xsk: Replace datagram_poll by sock_poll_wait datagram_poll will judge the current socket status (EPOLLIN, EPOLLOUT) based on the traditional socket information (eg: sk_wmem_alloc), but this does not apply to xsk. So this patch uses sock_poll_wait instead of datagram_poll, and the mask is calculated by xsk_poll. Fixes: `c497176cb2` ("xsk: add Rx receive functions and poll support") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/e82f4697438cd63edbf271ebe1918db8261b7c09.1606555939.git.xuanzhuo@linux.alibaba.com	2020-12-03 01:14:14 +01:00
Stanislav Fomichev	427167c0b0	bpf: Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks I have to now lock/unlock socket for the bind hook execution. That shouldn't cause any overhead because the socket is unbound and shouldn't receive any traffic. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20201202172516.3483656-3-sdf@google.com	2020-12-02 13:25:11 -08:00
Eric Dumazet	05e3ecea4a	mptcp: avoid potential infinite loop in mptcp_recvmsg() If a packet is ready in receive queue, and application isssues a recvmsg()/recvfrom()/recvmmsg() request asking for zero bytes, we hang in mptcp_recvmsg(). Fixes: `ea4ca586b1` ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20201202171657.1185108-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-02 12:06:12 -08:00
Antoine Tenart	832ba59649	net: ip6_gre: set dev->hard_header_len when using header_ops syzkaller managed to crash the kernel using an NBMA ip6gre interface. I could reproduce it creating an NBMA ip6gre interface and forwarding traffic to it: skbuff: skb_under_panic: text:ffffffff8250e927 len:148 put:44 head:ffff8c03c7a33 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:109! Call Trace: skb_push+0x10/0x10 ip6gre_header+0x47/0x1b0 neigh_connected_output+0xae/0xf0 ip6gre tunnel provides its own header_ops->create, and sets it conditionally when initializing the tunnel in NBMA mode. When header_ops->create is used, dev->hard_header_len should reflect the length of the header created. Otherwise, when not used, dev->needed_headroom should be used. Fixes: `eb95f52fc7` ("net: ipv6_gre: Fix GRO to work on IPv6 over GRE tap") Cc: Maria Pasechnik <mariap@mellanox.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20201130161911.464106-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-02 11:16:12 -08:00
Fedor Tokarev	35a6d39672	net: sunrpc: Fix 'snprintf' return value check in 'do_xprt_debugfs' 'snprintf' returns the number of characters which would have been written if enough space had been available, excluding the terminating null byte. Thus, the return value of 'sizeof(buf)' means that the last character has been dropped. Signed-off-by: Fedor Tokarev <ftokarev@gmail.com> Fixes: `2f34b8bfae` ("SUNRPC: add links for all client xprts to debugfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	eee1f54964	SUNRPC: Fix open coded xdr_stream_remaining() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	0279024f22	SUNRPC: Fix up xdr_set_page() While we always want to align to the next page and/or the beginning of the tail buffer when we call xdr_set_next_page(), the functions xdr_align_data() and xdr_expand_hole() really want to align to the next object in that next page or tail. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:54 -05:00
Trond Myklebust	9ed5af268e	SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages() rpc_prepare_reply_pages() currently expects the 'hdrsize' argument to contain the length of the data that we expect to want placed in the head kvec plus a count of 1 word of padding that is placed after the page data. This is very confusing when trying to read the code, and sometimes leads to callers adding an arbitrary value of '1' just in order to satisfy the requirement (whether or not the page data actually needs such padding). This patch aims to clarify the code by changing the 'hdrsize' argument to remove that 1 word of padding. This means we need to subtract the padding from all the existing callers. Fixes: `02ef04e432` ("NFS: Account for XDR pad of buf->pages") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	1d97316692	SUNRPC: Fix up xdr_read_pages() to take arbitrary object lengths Fix up xdr_read_pages() so that it can handle object lengths that are larger than the page length, by simply aligning to the next object in the buffer tail. The function will continue to return the length of the truncate object data that actually fit into the pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	8d86e373b0	SUNRPC: Clean up helpers xdr_set_iov() and xdr_set_page_base() Allow xdr_set_iov() to set a base so that we can use it to set the cursor to a specific position in the kvec buffer. If the new base overflows the kvec/pages buffer in either xdr_set_iov() or xdr_set_page_base(), then truncate it so that we point to the end of the buffer. Finally, change both function to return the number of bytes remaining to read in their buffers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	2b1f83d108	SUNRPC: Fix up typo in xdr_init_decode() We already know that the head buffer and page are empty, so if there is any data, it is in the tail. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	4aceaaea5e	SUNRPC: Fix up open coded kmemdup_nul() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	c87b056e58	SUNRPC: Remove unused function xprt_load_transport() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:53 -05:00
Trond Myklebust	1fc5f13186	SUNRPC: Add a helper to return the transport identifier given a netid Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	9bccd26461	SUNRPC: Close a race with transport setup and module put After we've looked up the transport module, we need to ensure it can't go away until we've finished running the transport setup code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	d5aa6b22e2	SUNRPC: xprt_load_transport() needs to support the netid "rdma6" According to RFC5666, the correct netid for an IPv6 addressed RDMA transport is "rdma6", which we've supported as a mount option since Linux-4.7. The problem is when we try to load the module "xprtrdma6", that will fail, since there is no modulealias of that name. Fixes: `181342c5eb` ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:52 -05:00
Trond Myklebust	e4c72201b6	SUNRPC: rpc_wake_up() should wake up tasks in the correct order Currently, we wake up the tasks by priority queue ordering, which means that we ignore the batching that is supposed to help with QoS issues. Fixes: `c049f8ea9a` ("SUNRPC: Remove the bh-safe lock requirement on the rpc_wait_queue->lock") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2020-12-02 14:05:51 -05:00
Chuck Lever	0359af7ac3	SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2020-12-02 10:06:52 -05:00
Guvenc Gulce	a3db10efcc	net/smc: Add support for obtaining SMCR device list Deliver SMCR device information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00
Guvenc Gulce	aaf95523d5	net/smc: Add support for obtaining SMCD device list Deliver SMCD device information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00
Guvenc Gulce	8f9dde4bf2	net/smc: Add SMC-D Linkgroup diagnostic support Deliver SMCD Linkgroup information via netlink based diagnostic interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00
Guvenc Gulce	5a7e09d58f	net/smc: Introduce SMCR get link command Introduce get link command which loops through all available links of all available link groups. It uses the SMC-R linkgroup list as entry point, not the socket list, which makes linkgroup diagnosis possible, in case linkgroup does not contain active connections anymore. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00
Guvenc Gulce	e9b8c845cb	net/smc: Introduce SMCR get linkgroup command Introduce get linkgroup command which loops through all available SMCR linkgroups. It uses the SMC-R linkgroup list as entry point, not the socket list, which makes linkgroup diagnosis possible, in case linkgroup does not contain active connections anymore. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00
Guvenc Gulce	099b990bd1	net/smc: Add support for obtaining system information Add new netlink command to obtain system information of the smc module. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:13 -08:00
Guvenc Gulce	e8372d9d21	net/smc: Introduce generic netlink interface for diagnostic purposes Introduce generic netlink interface infrastructure to expose the diagnostic information regarding smc linkgroups, links and devices. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Guvenc Gulce	49407ae2bc	net/smc: Refactor smc ism v2 capability handling Encapsulate the smc ism v2 capability boolean value in a function for better information hiding. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Guvenc Gulce	6443b2f60e	net/smc: Add diagnostic information to link structure During link creation add net-device ifindex and ib-device name to link structure. This is needed for diagnostic purposes. When diagnostic information is gathered, we need to traverse device, linkgroup and link structures, to be able to do that we need to hold a spinlock for the linkgroup list, without this diagnostic information in link structure, another device list mutex holding would be necessary to dereference the device pointer in the link structure which would be impossible when holding a spinlock already. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Guvenc Gulce	3d453f53c7	net/smc: Add diagnostic information to smc ib-device During smc ib-device creation, add network device ifindex to smc ib-device structure. Register for netdevice changes and update ib-device accordingly. This is needed for diagnostic purposes. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Guvenc Gulce	ddc992866f	net/smc: Add link counters for IB device ports Add link counters to the structure of the smc ib device, one counter per ib port. Increase/decrease the counters as needed in the corresponding routines. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Guvenc Gulce	07d51580ff	net/smc: Add connection counters for links Add connection counters to the structure of the link. Increase/decrease the counters as needed in the corresponding routines. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Guvenc Gulce	8b2f0f44f0	net/smc: Use active link of the connection Use active link of the connection directly and not via linkgroup array structure when obtaining link data of the connection. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Karsten Graul	8cf3f3e423	net/smc: use helper smc_conn_abort() in listen processing The helper smc_connect_abort() can be used by the listen processing functions, too. And rename this helper to smc_conn_abort() to make the purpose clearer. No functional change. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:56:12 -08:00
Rohit Maheshwari	d31c080075	net/tls: make sure tls offload sets salt_size Recent changes made to remove AES constants started using protocol aware salt_size. ctx->prot_info's salt_size is filled in tls sw case, but not in tls offload mode, and was working so far because of the hard coded value was used. Fixes: `6942a284fb` ("net/tls: make inline helpers protocol-aware") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Link: https://lore.kernel.org/r/20201201090752.27355-1-rohitm@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 17:51:30 -08:00
Jason Gunthorpe	2b0a999ba0	Linux 5.10-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl/EM9oeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/3kH/RNkFyTlHlUkZpJx 8Ks2yWgUln7YhZcmOaG/IcIyWnhCgo3l35kiaH7XxM+rPMZzidp51MHUllaTAQDc u+5EFHMJsmTWUfE8ocHPb1cPdYEDSoVr6QUsixbL9+uADpRz+VZVtWMb89EiyMrC wvLIzpnqY5UNriWWBxD0hrmSsT4g9XCsauer4k2KB+zvebwg6vFOMCFLFc2qz7fb ABsrPFqLZOMp+16chGxyHP7LJ6ygI/Hwf7tPW8ppv4c+hes4HZg7yqJxXhV02QbJ s10s6BTcEWMqKg/T6L/VoScsMHWUcNdvrr3uuPQhgup240XdmB1XO8rOKddw27e7 VIjrjNw= =4ZaP -----END PGP SIGNATURE----- Merge tag 'v5.10-rc6' into rdma.git for-next For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-01 20:40:50 -04:00
Vladimir Oltean	c214550ff8	net: delete __dev_getfirstbyhwtype The last user of the RTNL brother of dev_getfirstbyhwtype (the latter being synchronized under RCU) has been deleted in commit `b4db2b35fc` ("afs: Use core kernel UUID generation"). Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Howells <dhowells@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201129200550.2433401-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:40:01 -08:00
Randy Dunlap	637b77fdca	net/tipc: fix all function Return: notation Fix Return: kernel-doc notation in all net/tipc/ source files. Also keep ReST list notation intact for output formatting. Fix a few typos in comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:38:17 -08:00
Randy Dunlap	f172f4b81a	net/tipc: fix socket.c kernel-doc Fix socket.c kernel-doc warnings in preparation for adding to the networking docbook. Also, for rcvbuf_limit(), use bullet notation so that the lines do not run together. ../net/tipc/socket.c:130: warning: Function parameter or member 'cong_links' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'probe_unacked' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'snd_win' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'peer_caps' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'rcv_win' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'group' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'oneway' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'nagle_start' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'snd_backlog' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'msg_acc' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'pkt_cnt' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'expect_ack' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'nodelay' not described in 'tipc_sock' ../net/tipc/socket.c:130: warning: Function parameter or member 'group_is_open' not described in 'tipc_sock' ../net/tipc/socket.c:267: warning: Function parameter or member 'sk' not described in 'tsk_advance_rx_queue' ../net/tipc/socket.c:295: warning: Function parameter or member 'sk' not described in 'tsk_rej_rx_queue' ../net/tipc/socket.c:295: warning: Function parameter or member 'error' not described in 'tsk_rej_rx_queue' ../net/tipc/socket.c:894: warning: Function parameter or member 'tsk' not described in 'tipc_send_group_msg' ../net/tipc/socket.c:1187: warning: Function parameter or member 'net' not described in 'tipc_sk_mcast_rcv' ../net/tipc/socket.c:1323: warning: Function parameter or member 'inputq' not described in 'tipc_sk_conn_proto_rcv' ../net/tipc/socket.c:1323: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_conn_proto_rcv' ../net/tipc/socket.c:1885: warning: Function parameter or member 'sock' not described in 'tipc_recvmsg' ../net/tipc/socket.c:1993: warning: Function parameter or member 'sock' not described in 'tipc_recvstream' ../net/tipc/socket.c:2313: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_filter_rcv' ../net/tipc/socket.c:2404: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_enqueue' ../net/tipc/socket.c:2456: warning: Function parameter or member 'net' not described in 'tipc_sk_rcv' ../net/tipc/socket.c:2693: warning: Function parameter or member 'kern' not described in 'tipc_accept' ../net/tipc/socket.c:3816: warning: Excess function parameter 'sysctl_tipc_sk_filter' description in 'tipc_sk_filtering' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:38:12 -08:00
Randy Dunlap	4476441e48	net/tipc: fix node.c kernel-doc Fix node.c kernel-doc warnings in preparation for adding to the networking docbook. ../net/tipc/node.c:141: warning: Function parameter or member 'kref' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'bc_entry' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'failover_sent' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'peer_id' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'peer_id_string' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'conn_sks' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'keepalive_intv' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'timer' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'peer_net' not described in 'tipc_node' ../net/tipc/node.c:141: warning: Function parameter or member 'peer_hash_mix' not described in 'tipc_node' ../net/tipc/node.c:273: warning: Function parameter or member '__n' not described in 'tipc_node_crypto_rx' ../net/tipc/node.c:822: warning: Function parameter or member 'n' not described in '__tipc_node_link_up' ../net/tipc/node.c:822: warning: Function parameter or member 'bearer_id' not described in '__tipc_node_link_up' ../net/tipc/node.c:822: warning: Function parameter or member 'xmitq' not described in '__tipc_node_link_up' ../net/tipc/node.c:888: warning: Function parameter or member 'n' not described in 'tipc_node_link_up' ../net/tipc/node.c:888: warning: Function parameter or member 'bearer_id' not described in 'tipc_node_link_up' ../net/tipc/node.c:888: warning: Function parameter or member 'xmitq' not described in 'tipc_node_link_up' ../net/tipc/node.c:948: warning: Function parameter or member 'n' not described in '__tipc_node_link_down' ../net/tipc/node.c:948: warning: Function parameter or member 'bearer_id' not described in '__tipc_node_link_down' ../net/tipc/node.c:948: warning: Function parameter or member 'xmitq' not described in '__tipc_node_link_down' ../net/tipc/node.c:948: warning: Function parameter or member 'maddr' not described in '__tipc_node_link_down' ../net/tipc/node.c:1537: warning: Function parameter or member 'net' not described in 'tipc_node_get_linkname' ../net/tipc/node.c:1537: warning: Function parameter or member 'len' not described in 'tipc_node_get_linkname' ../net/tipc/node.c:1891: warning: Function parameter or member 'n' not described in 'tipc_node_check_state' ../net/tipc/node.c:1891: warning: Function parameter or member 'xmitq' not described in 'tipc_node_check_state' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:38:09 -08:00
Randy Dunlap	5c5d6796d4	net/tipc: fix name_table.c kernel-doc Fix name_table.c kernel-doc warnings in preparation for adding to the networking docbook. ../net/tipc/name_table.c:115: warning: Function parameter or member 'start' not described in 'service_range_foreach_match' ../net/tipc/name_table.c:115: warning: Function parameter or member 'end' not described in 'service_range_foreach_match' ../net/tipc/name_table.c:127: warning: Function parameter or member 'start' not described in 'service_range_match_first' ../net/tipc/name_table.c:127: warning: Function parameter or member 'end' not described in 'service_range_match_first' ../net/tipc/name_table.c:176: warning: Function parameter or member 'start' not described in 'service_range_match_next' ../net/tipc/name_table.c:176: warning: Function parameter or member 'end' not described in 'service_range_match_next' ../net/tipc/name_table.c:225: warning: Function parameter or member 'type' not described in 'tipc_publ_create' ../net/tipc/name_table.c:225: warning: Function parameter or member 'lower' not described in 'tipc_publ_create' ../net/tipc/name_table.c:225: warning: Function parameter or member 'upper' not described in 'tipc_publ_create' ../net/tipc/name_table.c:225: warning: Function parameter or member 'scope' not described in 'tipc_publ_create' ../net/tipc/name_table.c:225: warning: Function parameter or member 'node' not described in 'tipc_publ_create' ../net/tipc/name_table.c:225: warning: Function parameter or member 'port' not described in 'tipc_publ_create' ../net/tipc/name_table.c:225: warning: Function parameter or member 'key' not described in 'tipc_publ_create' ../net/tipc/name_table.c:252: warning: Function parameter or member 'type' not described in 'tipc_service_create' ../net/tipc/name_table.c:252: warning: Function parameter or member 'hd' not described in 'tipc_service_create' ../net/tipc/name_table.c:367: warning: Function parameter or member 'sr' not described in 'tipc_service_remove_publ' ../net/tipc/name_table.c:367: warning: Function parameter or member 'node' not described in 'tipc_service_remove_publ' ../net/tipc/name_table.c:367: warning: Function parameter or member 'key' not described in 'tipc_service_remove_publ' ../net/tipc/name_table.c:383: warning: Function parameter or member 'pa' not described in 'publication_after' ../net/tipc/name_table.c:383: warning: Function parameter or member 'pb' not described in 'publication_after' ../net/tipc/name_table.c:401: warning: Function parameter or member 'service' not described in 'tipc_service_subscribe' ../net/tipc/name_table.c:401: warning: Function parameter or member 'sub' not described in 'tipc_service_subscribe' ../net/tipc/name_table.c:546: warning: Function parameter or member 'net' not described in 'tipc_nametbl_translate' ../net/tipc/name_table.c:546: warning: Function parameter or member 'type' not described in 'tipc_nametbl_translate' ../net/tipc/name_table.c:546: warning: Function parameter or member 'instance' not described in 'tipc_nametbl_translate' ../net/tipc/name_table.c:546: warning: Function parameter or member 'dnode' not described in 'tipc_nametbl_translate' ../net/tipc/name_table.c:762: warning: Function parameter or member 'net' not described in 'tipc_nametbl_withdraw' ../net/tipc/name_table.c:762: warning: Function parameter or member 'type' not described in 'tipc_nametbl_withdraw' ../net/tipc/name_table.c:762: warning: Function parameter or member 'lower' not described in 'tipc_nametbl_withdraw' ../net/tipc/name_table.c:762: warning: Function parameter or member 'upper' not described in 'tipc_nametbl_withdraw' ../net/tipc/name_table.c:762: warning: Function parameter or member 'key' not described in 'tipc_nametbl_withdraw' ../net/tipc/name_table.c:796: warning: Function parameter or member 'sub' not described in 'tipc_nametbl_subscribe' ../net/tipc/name_table.c:826: warning: Function parameter or member 'sub' not described in 'tipc_nametbl_unsubscribe' ../net/tipc/name_table.c:876: warning: Function parameter or member 'net' not described in 'tipc_service_delete' ../net/tipc/name_table.c:876: warning: Function parameter or member 'sc' not described in 'tipc_service_delete' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:38:05 -08:00
Randy Dunlap	cb67296e8c	net/tipc: fix name_distr.c kernel-doc Fix name_distr.c kernel-doc warnings in preparation for adding to the networking docbook. ../net/tipc/name_distr.c:55: warning: Function parameter or member 'i' not described in 'publ_to_item' ../net/tipc/name_distr.c:55: warning: Function parameter or member 'p' not described in 'publ_to_item' ../net/tipc/name_distr.c:70: warning: Function parameter or member 'net' not described in 'named_prepare_buf' ../net/tipc/name_distr.c:70: warning: Function parameter or member 'type' not described in 'named_prepare_buf' ../net/tipc/name_distr.c:70: warning: Function parameter or member 'size' not described in 'named_prepare_buf' ../net/tipc/name_distr.c:70: warning: Function parameter or member 'dest' not described in 'named_prepare_buf' ../net/tipc/name_distr.c:88: warning: Function parameter or member 'net' not described in 'tipc_named_publish' ../net/tipc/name_distr.c:88: warning: Function parameter or member 'publ' not described in 'tipc_named_publish' ../net/tipc/name_distr.c:116: warning: Function parameter or member 'net' not described in 'tipc_named_withdraw' ../net/tipc/name_distr.c:116: warning: Function parameter or member 'publ' not described in 'tipc_named_withdraw' ../net/tipc/name_distr.c:147: warning: Function parameter or member 'net' not described in 'named_distribute' ../net/tipc/name_distr.c:147: warning: Function parameter or member 'seqno' not described in 'named_distribute' ../net/tipc/name_distr.c:199: warning: Function parameter or member 'net' not described in 'tipc_named_node_up' ../net/tipc/name_distr.c:199: warning: Function parameter or member 'dnode' not described in 'tipc_named_node_up' ../net/tipc/name_distr.c:199: warning: Function parameter or member 'capabilities' not described in 'tipc_named_node_up' ../net/tipc/name_distr.c:225: warning: Function parameter or member 'net' not described in 'tipc_publ_purge' ../net/tipc/name_distr.c:225: warning: Function parameter or member 'publ' not described in 'tipc_publ_purge' ../net/tipc/name_distr.c:225: warning: Function parameter or member 'addr' not described in 'tipc_publ_purge' ../net/tipc/name_distr.c:272: warning: Function parameter or member 'net' not described in 'tipc_update_nametbl' ../net/tipc/name_distr.c:272: warning: Function parameter or member 'i' not described in 'tipc_update_nametbl' ../net/tipc/name_distr.c:272: warning: Function parameter or member 'node' not described in 'tipc_update_nametbl' ../net/tipc/name_distr.c:272: warning: Function parameter or member 'dtype' not described in 'tipc_update_nametbl' ../net/tipc/name_distr.c:353: warning: Function parameter or member 'net' not described in 'tipc_named_rcv' ../net/tipc/name_distr.c:353: warning: Function parameter or member 'namedq' not described in 'tipc_named_rcv' ../net/tipc/name_distr.c:353: warning: Function parameter or member 'rcv_nxt' not described in 'tipc_named_rcv' ../net/tipc/name_distr.c:353: warning: Function parameter or member 'open' not described in 'tipc_named_rcv' ../net/tipc/name_distr.c:383: warning: Function parameter or member 'net' not described in 'tipc_named_reinit' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:38:01 -08:00
Randy Dunlap	a99df449b0	net/tipc: fix link.c kernel-doc Fix link.c kernel-doc warnings in preparation for adding to the networking docbook. ../net/tipc/link.c:200: warning: Function parameter or member 'session' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'snd_nxt_state' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'rcv_nxt_state' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'in_session' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'active' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'if_name' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'rst_cnt' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'drop_point' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'failover_reasm_skb' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'failover_deferdq' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'transmq' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'backlog' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'snd_nxt' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'rcv_unacked' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'deferdq' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'window' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'min_win' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'ssthresh' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'max_win' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'cong_acks' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'checkpoint' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'reasm_tnlmsg' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'last_gap' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'last_ga' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'bc_rcvlink' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'bc_sndlink' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'nack_state' not described in 'tipc_link' ../net/tipc/link.c:200: warning: Function parameter or member 'bc_peer_is_up' not described in 'tipc_link' ../net/tipc/link.c:473: warning: Function parameter or member 'self' not described in 'tipc_link_create' ../net/tipc/link.c:473: warning: Function parameter or member 'peer_id' not described in 'tipc_link_create' ../net/tipc/link.c:473: warning: Excess function parameter 'ownnode' description in 'tipc_link_create' ../net/tipc/link.c:544: warning: Function parameter or member 'ownnode' not described in 'tipc_link_bc_create' ../net/tipc/link.c:544: warning: Function parameter or member 'peer' not described in 'tipc_link_bc_create' ../net/tipc/link.c:544: warning: Function parameter or member 'peer_id' not described in 'tipc_link_bc_create' ../net/tipc/link.c:544: warning: Function parameter or member 'peer_caps' not described in 'tipc_link_bc_create' ../net/tipc/link.c:544: warning: Function parameter or member 'bc_sndlink' not described in 'tipc_link_bc_create' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:37:56 -08:00
Randy Dunlap	ec6a1649fe	net/tipc: fix bearer.c for kernel-doc Fix kernel-doc warnings in bearer.c: ../net/tipc/bearer.c:77: warning: Function parameter or member 'name' not described in 'tipc_media_find' ../net/tipc/bearer.c:91: warning: Function parameter or member 'type' not described in 'media_find_id' ../net/tipc/bearer.c:105: warning: Function parameter or member 'buf' not described in 'tipc_media_addr_printf' ../net/tipc/bearer.c:105: warning: Function parameter or member 'len' not described in 'tipc_media_addr_printf' ../net/tipc/bearer.c:105: warning: Function parameter or member 'a' not described in 'tipc_media_addr_printf' ../net/tipc/bearer.c:174: warning: Function parameter or member 'net' not described in 'tipc_bearer_find' ../net/tipc/bearer.c:174: warning: Function parameter or member 'name' not described in 'tipc_bearer_find' ../net/tipc/bearer.c:238: warning: Function parameter or member 'net' not described in 'tipc_enable_bearer' ../net/tipc/bearer.c:238: warning: Function parameter or member 'name' not described in 'tipc_enable_bearer' ../net/tipc/bearer.c:238: warning: Function parameter or member 'disc_domain' not described in 'tipc_enable_bearer' ../net/tipc/bearer.c:238: warning: Function parameter or member 'prio' not described in 'tipc_enable_bearer' ../net/tipc/bearer.c:238: warning: Function parameter or member 'attr' not described in 'tipc_enable_bearer' ../net/tipc/bearer.c:350: warning: Function parameter or member 'net' not described in 'tipc_reset_bearer' ../net/tipc/bearer.c:350: warning: Function parameter or member 'b' not described in 'tipc_reset_bearer' ../net/tipc/bearer.c:374: warning: Function parameter or member 'net' not described in 'bearer_disable' ../net/tipc/bearer.c:374: warning: Function parameter or member 'b' not described in 'bearer_disable' ../net/tipc/bearer.c:462: warning: Function parameter or member 'net' not described in 'tipc_l2_send_msg' ../net/tipc/bearer.c:479: warning: Function parameter or member 'net' not described in 'tipc_l2_send_msg' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:37:51 -08:00
Randy Dunlap	5fcb7d47fe	net/tipc: fix various kernel-doc warnings kernel-doc and Sphinx fixes to eliminate lots of warnings in preparation for adding to the networking docbook. ../net/tipc/crypto.c:57: warning: cannot understand function prototype: 'enum ' ../net/tipc/crypto.c:69: warning: cannot understand function prototype: 'enum ' ../net/tipc/crypto.c:130: warning: Function parameter or member 'tfm' not described in 'tipc_tfm' ../net/tipc/crypto.c:130: warning: Function parameter or member 'list' not described in 'tipc_tfm' ../net/tipc/crypto.c:172: warning: Function parameter or member 'stat' not described in 'tipc_crypto_stats' ../net/tipc/crypto.c:232: warning: Function parameter or member 'flags' not described in 'tipc_crypto' ../net/tipc/crypto.c:329: warning: Function parameter or member 'ukey' not described in 'tipc_aead_key_validate' ../net/tipc/crypto.c:329: warning: Function parameter or member 'info' not described in 'tipc_aead_key_validate' ../net/tipc/crypto.c:482: warning: Function parameter or member 'aead' not described in 'tipc_aead_tfm_next' ../net/tipc/trace.c:43: warning: cannot understand function prototype: 'unsigned long sysctl_tipc_sk_filter[5] __read_mostly = ' Documentation/networking/tipc:57: ../net/tipc/msg.c:584: WARNING: Unexpected indentation. Documentation/networking/tipc:63: ../net/tipc/name_table.c:536: WARNING: Unexpected indentation. Documentation/networking/tipc:63: ../net/tipc/name_table.c:537: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/networking/tipc:78: ../net/tipc/socket.c:3809: WARNING: Unexpected indentation. Documentation/networking/tipc:78: ../net/tipc/socket.c:3807: WARNING: Inline strong start-string without end-string. Documentation/networking/tipc:72: ../net/tipc/node.c:904: WARNING: Unexpected indentation. Documentation/networking/tipc:39: ../net/tipc/crypto.c:97: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/networking/tipc:39: ../net/tipc/crypto.c:98: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/networking/tipc:39: ../net/tipc/crypto.c:141: WARNING: Inline strong start-string without end-string. ../net/tipc/discover.c:82: warning: Function parameter or member 'skb' not described in 'tipc_disc_init_msg' ../net/tipc/msg.c:69: warning: Function parameter or member 'gfp' not described in 'tipc_buf_acquire' ../net/tipc/msg.c:382: warning: Function parameter or member 'offset' not described in 'tipc_msg_build' ../net/tipc/msg.c:708: warning: Function parameter or member 'net' not described in 'tipc_msg_lookup_dest' ../net/tipc/subscr.c:65: warning: Function parameter or member 'seq' not described in 'tipc_sub_check_overlap' ../net/tipc/subscr.c:65: warning: Function parameter or member 'found_lower' not described in 'tipc_sub_check_overlap' ../net/tipc/subscr.c:65: warning: Function parameter or member 'found_upper' not described in 'tipc_sub_check_overlap' ../net/tipc/udp_media.c:75: warning: Function parameter or member 'proto' not described in 'udp_media_addr' ../net/tipc/udp_media.c:75: warning: Function parameter or member 'port' not described in 'udp_media_addr' ../net/tipc/udp_media.c:75: warning: Function parameter or member 'ipv4' not described in 'udp_media_addr' ../net/tipc/udp_media.c:75: warning: Function parameter or member 'ipv6' not described in 'udp_media_addr' ../net/tipc/udp_media.c:98: warning: Function parameter or member 'rcast' not described in 'udp_bearer' Also fixed a typo of "duest" to "dest". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:37:46 -08:00
Randy Dunlap	ff10527e89	net/tipc: fix tipc header files for kernel-doc Fix tipc header files for adding to the networking docbook. Remove some uses of "/**" that were not kernel-doc notation. Fix some source formatting to eliminate Sphinx warnings. Add missing struct member and function argument kernel-doc descriptions. Correct the description of a couple of struct members that were marked as "(FIXME)". Documentation/networking/tipc:18: ../net/tipc/name_table.h:65: WARNING: Unexpected indentation. Documentation/networking/tipc:18: ../net/tipc/name_table.h:66: WARNING: Block quote ends without a blank line; unexpected unindent. ../net/tipc/bearer.h:128: warning: Function parameter or member 'min_win' not described in 'tipc_media' ../net/tipc/bearer.h:128: warning: Function parameter or member 'max_win' not described in 'tipc_media' ../net/tipc/bearer.h:171: warning: Function parameter or member 'min_win' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'max_win' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'disc' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'up' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'refcnt' not described in 'tipc_bearer' ../net/tipc/name_distr.h:68: warning: Function parameter or member 'port' not described in 'distr_item' ../net/tipc/name_table.h:111: warning: Function parameter or member 'services' not described in 'name_table' ../net/tipc/name_table.h:111: warning: Function parameter or member 'cluster_scope_lock' not described in 'name_table' ../net/tipc/name_table.h:111: warning: Function parameter or member 'rc_dests' not described in 'name_table' ../net/tipc/name_table.h:111: warning: Function parameter or member 'snd_nxt' not described in 'name_table' ../net/tipc/subscr.h:67: warning: Function parameter or member 'kref' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'net' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'service_list' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'conid' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'inactive' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'lock' not described in 'tipc_subscription' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:37:41 -08:00
Hoang Le	0643334902	tipc: fix incompatible mtu of transmission In commit `682cd3cf94` ("tipc: confgiure and apply UDP bearer MTU on running links"), we introduced a function to change UDP bearer MTU and applied this new value across existing per-link. However, we did not apply this new MTU value at node level. This lead to packet dropped at link level if its size is greater than new MTU value. To fix this issue, we also apply this new MTU value for node level. Fixes: `682cd3cf94` ("tipc: confgiure and apply UDP bearer MTU on running links") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20201130025544.3602-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:26:57 -08:00
Danielle Ratson	22ec19f3ae	bridge: switchdev: Notify about VLAN protocol changes Drivers that support bridge offload need to be notified about changes to the bridge's VLAN protocol so that they could react accordingly and potentially veto the change. Add a new switchdev attribute to communicate the change to drivers. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:13 -08:00
Lukas Bulwahn	9e39394fae	net/ipv6: propagate user pointer annotation For IPV6_2292PKTOPTIONS, do_ipv6_getsockopt() stores the user pointer optval in the msg_control field of the msghdr. Hence, sparse rightfully warns at ./net/ipv6/ipv6_sockglue.c:1151:33: warning: incorrect type in assignment (different address spaces) expected void msg_control got char [noderef] __user optval Since commit `1f466e1f15` ("net: cleanly handle kernel vs user buffers for ->msg_control"), user pointers shall be stored in the msg_control_user field, and kernel pointers in the msg_control field. This allows to propagate __user annotations nicely through this struct. Store optval in msg_control_user to properly record and propagate the memory space annotation of this pointer. Note that msg_control_is_user is set to true, so the key invariant, i.e., use msg_control_user if and only if msg_control_is_user is true, holds. The msghdr is further used in the six alternative put_cmsg() calls, with msg_control_is_user being true, put_cmsg() picks msg_control_user preserving the __user annotation and passes that properly to copy_to_user(). No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20201127093421.21673-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 11:42:33 -08:00
Marco Elver	fa69ee5aa4	net: switch to storing KCOV handle directly in sk_buff It turns out that usage of skb extensions can cause memory leaks. Ido Schimmel reported: "[...] there are instances that blindly overwrite 'skb->extensions' by invoking skb_copy_header() after __alloc_skb()." Therefore, give up on using skb extensions for KCOV handle, and instead directly store kcov_handle in sk_buff. Fixes: `6370cc3bbd` ("net: add kcov handle to skb extensions") Fixes: `85ce50d337` ("net: kcov: don't select SKB_EXTENSIONS when there is no NET") Fixes: `97f53a08cb` ("net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling") Link: https://lore.kernel.org/linux-wireless/20201121160941.GA485907@shredder.lan/ Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20201125224840.2014773-1-elver@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 11:26:19 -08:00
Vlad Buslov	0fca55ed98	net: sched: remove redundant 'rtnl_held' argument Functions tfilter_notify_chain() and tcf_get_next_proto() are always called with rtnl lock held in current implementation. Moreover, attempting to call them without rtnl lock would cause a warning down the call chain in function __tcf_get_next_proto() that requires the lock to be held by callers. Remove the 'rtnl_held' argument in order to simplify the code and make rtnl lock requirement explicit. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Link: https://lore.kernel.org/r/20201127151205.23492-1-vladbu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 11:24:56 -08:00
Jan Engelhardt	04295878be	netfilter: use actual socket sk for REJECT action True to the message of commit v5.10-rc1-105-g46d6c5ae953c, _do_ actually make use of state->sk when possible, such as in the REJECT modules. Reported-by: Minqiang Chen <ptpt52@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-12-01 14:33:55 +01:00
Wang Shanker	f7583f02a5	netfilter: nfnl_acct: remove data from struct net This patch removes nfnl_acct_list from struct net to reduce the default memory footprint for the netns structure. Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-12-01 09:45:29 +01:00
Kaixu Xia	0ef083d51b	netfilter: Remove unnecessary conversion to bool Here we could use the '!=' expression to fix the following coccicheck warning: ./net/netfilter/xt_nfacct.c:30:41-46: WARNING: conversion to bool not needed here Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-12-01 09:45:29 +01:00
Paolo Abeni	6e628cd3a8	mptcp: use mptcp release_cb for delayed tasks We have some tasks triggered by the subflow receive path which require to access the msk socket status, specifically: mptcp_clean_una() and mptcp_push_pending() We have almost everything in place to defer to the msk release_cb such tasks when the msk sock is owned. Since the worker is no more used to clean the acked data, for fallback sockets we need to explicitly flush them. As an added bonus we can move the wake-up code in __mptcp_clean_una(), simplify a lot mptcp_poll() and move the timer update under the data lock. The worker is now used only to process and send DATA_FIN packets and do the mptcp-level retransmissions. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	7439d687b7	mptcp: avoid a few atomic ops in the rx path Extending the data_lock scope in mptcp_incoming_option we can use that to protect both snd_una and wnd_end. In the typical case, we will have a single atomic op instead of 2 Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	724cfd2ee8	mptcp: allocate TX skbs in msk context Move the TX skbs allocation in mptcp_sendmsg() scope, and tentatively pre-allocate a skbs number proportional to the sendmsg() length. Use the ssk tx skb cache to prevent the subflow allocation. This allows removing the msk skb extension cache and will make possible the later patches. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	879526030c	mptcp: protect the rx path with the msk socket spinlock Such spinlock is currently used only to protect the 'owned' flag inside the socket lock itself. With this patch, we extend its scope to protect the whole msk receive path and sk_forward_memory. Given the above, we can always move data into the msk receive queue (and OoO queue) from the subflow. We leverage the previous commit, so that we need to acquire the spinlock in the tx path only when moving fwd memory. recvmsg() must now explicitly acquire the socket spinlock when moving skbs out of sk_receive_queue. To reduce the number of lock operations required we use a second rx queue and splice the first into the latter in mptcp_lock_sock(). Additionally rmem allocated memory is bulk-freed via release_cb() Acked-by: Florian Westphal <fw@strlen.de> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	e93da92896	mptcp: implement wmem reservation This leverages the previous commit to reserve the wmem required for the sendmsg() operation when the msk socket lock is first acquired. Some heuristics are used to get a reasonable [over] estimation of the whole memory required. If we can't forward alloc such amount fallback to a reasonable small chunk, otherwise enter the wait for memory path. When sendmsg() needs more memory it looks at wmem_reserved first and if that is exhausted, move more space from sk_forward_alloc. The reserved memory is not persistent and is released at the next socket unlock via the release_cb(). Overall this will simplify the next patch. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	ad80b0fc6e	mptcp: open code mptcp variant for lock_sock This allows invoking an additional callback under the socket spin lock. Will be used by the next patches to avoid additional spin lock contention. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Björn Töpel	b02e5a0ebb	xsk: Propagate napi_id to XDP socket Rx path Add napi_id to the xdp_rxq_info structure, and make sure the XDP socket pick up the napi_id in the Rx path. The napi_id is used to find the corresponding NAPI structure for socket busy polling. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com	2020-12-01 00:09:25 +01:00
Björn Töpel	a0731952d9	xsk: Add busy-poll support for {recv,send}msg() Wire-up XDP socket busy-poll support for recvmsg() and sendmsg(). If the XDP socket prefers busy-polling, make sure that no wakeup/IPI is performed. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-6-bjorn.topel@gmail.com	2020-12-01 00:09:25 +01:00
Björn Töpel	e392081837	xsk: Check need wakeup flag in sendmsg() Add a check for need wake up in sendmsg(), so that if a user calls sendmsg() when no wakeup is needed, do not trigger a wakeup. To simplify the need wakeup check in the syscall, unconditionally enable the need wakeup flag for Tx. This has a side-effect for poll(); If poll() is called for a socket without enabled need wakeup, a Tx wakeup is unconditionally performed. The wakeup matrix for AF_XDP now looks like: need wakeup \| poll() \| sendmsg() \| recvmsg() ------------+--------------+-------------+------------ disabled \| wake Tx \| wake Tx \| nop enabled \| check flag; \| check flag; \| check flag; \| wake Tx/Rx \| wake Tx \| wake Rx Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-5-bjorn.topel@gmail.com	2020-12-01 00:09:25 +01:00
Björn Töpel	45a8668184	xsk: Add support for recvmsg() Add support for non-blocking recvmsg() to XDP sockets. Previously, only sendmsg() was supported by XDP socket. Now, for symmetry and the upcoming busy-polling support, recvmsg() is added. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-4-bjorn.topel@gmail.com	2020-12-01 00:09:25 +01:00
Björn Töpel	7c951cafc0	net: Add SO_BUSY_POLL_BUDGET socket option This option lets a user set a per socket NAPI budget for busy-polling. If the options is not set, it will use the default of 8. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20201130185205.196029-3-bjorn.topel@gmail.com	2020-12-01 00:09:25 +01:00
Björn Töpel	7fd3253a7d	net: Introduce preferred busy-polling The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling. One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling. This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit `6f8b12d661` ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed. If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume. In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option. Example usage: $ echo 2 \| sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 \| sudo tee /sys/class/net/ens785f1/gro_flush_timeout Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing. Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com	2020-12-01 00:09:25 +01:00
Björn Töpel	ed1182dc00	xdp: Handle MEM_TYPE_XSK_BUFF_POOL correctly in xdp_return_buff() It turns out that it does exist a path where xdp_return_buff() is being passed an XDP buffer of type MEM_TYPE_XSK_BUFF_POOL. This path is when AF_XDP zero-copy mode is enabled, and a buffer is redirected to a DEVMAP with an attached XDP program that drops the buffer. This change simply puts the handling of MEM_TYPE_XSK_BUFF_POOL back into xdp_return_buff(). Fixes: `82c41671ca` ("xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}") Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/bpf/20201127171726.123627-1-bjorn.topel@gmail.com	2020-11-30 23:00:26 +01:00
Chuck Lever	c1346a1216	NFSD: Replace the internals of the READ_BUF() macro Convert the READ_BUF macro in nfs4xdr.c from open code to instead use the new xdr_stream-style decoders already in use by the encode side (and by the in-kernel NFS client implementation). Once this conversion is done, each individual NFSv4 argument decoder can be independently cleaned up to replace these macros with C code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:36 -05:00
Chuck Lever	5191955d6f	SUNRPC: Prepare for xdr_stream-style decoding on the server-side A "permanent" struct xdr_stream is allocated in struct svc_rqst so that it is usable by all server-side decoders. A per-rqst scratch buffer is also allocated to handle decoding XDR data items that cross page boundaries. To demonstrate how it will be used, add the first call site for the new svcxdr_init_decode() API. As an additional part of the overall conversion, add symbolic constants for successful and failed XDR operations. Returning "0" is overloaded. Sometimes it means something failed, but sometimes it means success. To make it more clear when XDR decoding functions succeed or fail, introduce symbolic constants. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:35 -05:00
Chuck Lever	0ae4c3e8a6	SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer() Clean up: De-duplicate some frequently-used code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:35 -05:00
Chuck Lever	156708adf2	SUNRPC: Move the svc_xdr_recvfrom() tracepoint Commit `c509f15a58` ("SUNRPC: Split the xdr_buf event class") added display of the rqst's XID to the svc_xdr_buf_class. However, when the recvfrom tracepoint fires, rq_xid has yet to be filled in with the current XID. So it ends up recording the previous XID that was handled by that svc_rqst. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:24 -05:00
Chuck Lever	d7cc739726	svcrdma: support multiple Read chunks per RPC An efficient way to handle multiple Read chunks is to post them all together and then take a single completion. This is also how the code is already structured: when the Read completion fires, all portions of the incoming RPC message are available to be assembled. The difficult problem is setting up the Read sink buffers so that the server pulls the client's data into place, making subsequent pull-up unnecessary. There are several cases: * No Read chunks. No-op. * One data item Read chunk. This is the fast case, where the inline part of the RPC-over-RDMA message becomes the head and tail, and the data item chunk is placed in buf->pages. * A Position-zero Read chunk. Treated like TCP: the Read chunk is pulled into contiguous pages. + A Position-zero Read chunk with data item chunks. Treated like TCP: all of the Read chunks are pulled into contiguous pages. + Multiple data item chunks. Treated like TCP: the inline part is copied and the data item chunks are pulled into contiguous pages. The "*" cases are already supported. This patch adds support for the "+" cases. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	d96962e6d0	svcrdma: Use the new parsed chunk list when pulling Read chunks As a pre-requisite for handling multiple Read chunks in each Read list, convert svc_rdma_recv_read_chunk() to use the new parsed Read chunk list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	bafe9c27d5	svcrdma: Rename info::ri_chunklen I'm about to change the purpose of ri_chunklen: Instead of tracking the number of bytes in one Read chunk, it will track the total number of bytes in the Read list. Rename it for clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	b704be09dc	svcrdma: Clean up chunk tracepoints We already have trace_svcrdma_decode_rseg(), which records each ingress Read segment. Instead of reporting those again when they are about to be posted as RDMA Reads, let's fire one tracepoint before posting each type of chunk. So we'll get: nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=0 position=0 192@0x013ca9ebfae14000:0xb0010b05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=1 position=0 7688@0x013ca9ebf914e000:0xb0010a05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=2 position=0 28@0x013ca9ebfae15000:0xb0010905 nfsd-1998 [002] 321.666622: svcrdma_decode_rqst: cq.id=4 cid=42 xid=0x013ca9eb vers=1 credits=128 proc=RDMA_NOMSG hdrlen=100 nfsd-1998 [002] 321.666642: svcrdma_post_read_chunk: cq.id=3 cid=112 sqecount=3 kworker/2:1H-221 [002] 321.673949: svcrdma_wc_read: cq.id=3 cid=112 status=SUCCESS (0/0x0) Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	7954c8503b	svcrdma: Remove chunk list pointers Clean up: These pointers are no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	41bc163ffe	svcrdma: Support multiple Write chunks in svc_rdma_send_reply_chunk Refactor svc_rdma_send_reply_chunk() so that it Sends only the parts of rq_res that do not contain a result payload. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	2371bcc056	svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg() Refactor: svc_rdma_map_reply_msg() is restructured to DMA map only the parts of rq_res that do not contain a result payload. This change has been tested to confirm that it does not cause a regression in the no Write chunk and single Write chunk cases. Multiple Write chunks have not been tested. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:23 -05:00
Chuck Lever	9d0b09d5ef	svcrdma: Support multiple write chunks when pulling up When counting the number of SGEs needed to construct a Send request, do not count result payloads. And, when copying the Reply message into the pull-up buffer, result payloads are not to be copied to the Send buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	6911f3e10c	svcrdma: Use parsed chunk lists to encode Reply transport headers Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing the egress RPC Reply transport header, use the new parsed Write list and Reply chunk, which are version- agnostic and already XDR decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	7a1cbfa180	svcrdma: Use parsed chunk lists to construct RDMA Writes Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing RDMA Writes, use the new parsed chunk lists for the Write list and Reply chunk, which are version-agnostic and already XDR-decoded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	58b2e0fefa	svcrdma: Use parsed chunk lists to detect reverse direction replies Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Note that the XID sanity test is also removed. The XID is already looked up by the cb handler, and is rejected if it's not recognized. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	eb3de6a49d	svcrdma: Use parsed chunk lists to derive the inv_rkey Refactor: Don't duplicate header decoding smarts here. Instead, use the new parsed chunk lists. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	78147ca8b4	svcrdma: Add a "parsed chunk list" data structure This simple data structure binds the location of each data payload inside of an RPC message to the chunk that will be used to push it to or pull it from the client. There are several benefits to this small additional overhead: * It enables support for more than one chunk in incoming Read and Write lists. * It translates the version-specific on-the-wire format into a generic in-memory structure, enabling support for multiple versions of the RPC/RDMA transport protocol. * It enables the server to re-organize a chunk list if it needs to adjust where Read chunk data lands in server memory without altering the contents of the XDR-encoded Receive buffer. Construction of these lists is done while sanity checking each incoming RPC/RDMA header. Subsequent patches will make use of the generated data structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	ded380f100	svcrdma: Clean up svc_rdma_encode_reply_chunk() Refactor: Match the control flow of svc_rdma_encode_write_list(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	f6ad77590a	svcrdma: Post RDMA Writes while XDR encoding replies The only RPC/RDMA ordering requirement between RDMA Writes and RDMA Sends is that the responder must post the Writes on the Send queue before posting the Send that conveys the RPC Reply for that Write payload. The Linux NFS server implementation now has a transport method that can post result Payload Writes earlier than svc_rdma_sendto: ->xpo_result_payload() This gets RDMA Writes going earlier so they are more likely to be complete at the remote end before the Send completes. Some care must be taken with pulled-up Replies. We don't want to push the Write chunk and then send the same payload data via Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	76e5492b16	NFSD: Invoke svc_encode_result_payload() in "read" NFSD encoders Have the NFSD encoders annotate the boundaries of every direct-data-placement eligible result data payload. Then change svcrdma to use that annotation instead of the xdr->page_len when handling Write chunks. For NFSv4 on RDMA, that enables the ability to recognize multiple result payloads per compound. This is a pre-requisite for supporting multiple Write chunks per RPC transaction. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:22 -05:00
Chuck Lever	03493bca08	SUNRPC: Rename svc_encode_read_payload() Clean up: "result payload" is a less confusing name for these payloads. "READ payload" reflects only the NFS usage. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:21 -05:00
Chuck Lever	ab1394ee7a	svcrdma: Refactor the RDMA Write path Refactor for subsequent changes. Constify the xdr_buf argument to ensure the code here does not modify it, and to enable callers to pass in a "const struct xdr_buf *". At the same time, rename the helper functions, which emit RDMA Writes, not RDMA Sends, and add documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:21 -05:00
Chuck Lever	51bad8cc13	svcrdma: Const-ify the xdr_buf arguments Clean up: Ensure the code in rw.c does not modify the argument, and enable callers to also use "const struct xdr_buf *". Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:21 -05:00
Chuck Lever	5a7e702670	SUNRPC: Adjust synopsis of xdr_buf_subsegment() Clean up: This enables xdr_buf_subsegment()'s callers to pass in a const pointer to that buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:21 -05:00
Chuck Lever	e5decb2eb5	svcrdma: Catch another Reply chunk overflow case When space in the Reply chunk runs out in the middle of a segment, we end up passing a zero-length SGL to rdma_rw_ctx_init(), and it oopses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 13:00:21 -05:00
Jakub Kicinski	bd2d5c54dc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix insufficient validation of IPSET_ATTR_IPADDR_IPV6 reported by syzbot. 2) Remove spurious reports on nf_tables when lockdep gets disabled, from Florian Westphal. 3) Fix memleak in the error path of error path of ip_vs_control_net_init(), from Wang Hai. 4) Fix missing control data in flow dissector, otherwise IP address matching in hardware offload infra does not work. 5) Fix hardware offload match on prefix IP address when userspace does not send a bitwise expression to represent the prefix. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nftables_offload: build mask based from the matching bytes netfilter: nftables_offload: set address type in control dissector ipvs: fix possible memory leak in ip_vs_control_net_init netfilter: nf_tables: avoid false-postive lockdep splat netfilter: ipset: prevent uninit-value in hash_ip6_add ==================== Link: https://lore.kernel.org/r/20201127190313.24947-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 13:20:23 -08:00
Guillaume Nault	1ebf179037	ipv4: Fix tos mask in inet_rtm_getroute() When inet_rtm_getroute() was converted to use the RCU variants of ip_route_input() and ip_route_output_key(), the TOS parameters stopped being masked with IPTOS_RT_MASK before doing the route lookup. As a result, "ip route get" can return a different route than what would be used when sending real packets. For example: $ ip route add 192.0.2.11/32 dev eth0 $ ip route add unreachable 192.0.2.11/32 tos 2 $ ip route get 192.0.2.11 tos 2 RTNETLINK answers: No route to host But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would actually be routed using the first route: $ ping -c 1 -Q 2 192.0.2.11 PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms --- 192.0.2.11 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to return results consistent with real route lookups. Fixes: `3765d35ed8` ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/b2d237d08317ca55926add9654a48409ac1b8f5b.1606412894.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 13:14:23 -08:00
Jakub Kicinski	3771b82242	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2020-11-28 1) Do not reference the skb for xsk's generic TX side since when looped back into RX it might crash in generic XDP, from Björn Töpel. 2) Fix umem cleanup on a partially set up xsk socket when being destroyed, from Magnus Karlsson. 3) Fix an incorrect netdev reference count when failing xsk_bind() operation, from Marek Majtyka. 4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(), from Zhen Lei. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add MAINTAINERS entry for BPF LSM bpftool: Fix error return value in build_btf_type_table net, xsk: Avoid taking multiple skbuff references xsk: Fix incorrect netdev reference count xsk: Fix umem cleanup bug at socket destruct MAINTAINERS: Update XDP and AF_XDP entries ==================== Link: https://lore.kernel.org/r/20201128005104.1205-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 12:04:57 -08:00
Jakub Kicinski	28d35ad083	Here are some batman-adv bugfixes: - Fix head/tailroom issues for fragments, by Sven Eckelmann (3 patches) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl/BORsWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoX+AD/wJYmiGL8beD3eCBQdTjmvJNRbZ MIlzse2LsiSk/4G/UbrEr11M4pc6gFDghH/VXs2MxD2hd6qJsJH7ulAQKMPzTEqD WhwlyJ08O9HIivaN2Ri1ibwmLIDAYlvR2bsWu95weI/hDdE5ups/kIvBudM8b+fC HKQ/e9wGaiB+zUUobvl6bP0Av6u3xPNTHFkXgGTo1shgLISAYFsmEoYQ3yZYrYKy P9ckwLoJwMD6AODMRvWVmW4dzi5QSBWN/84XgQSoqdlZeNklIcTY2P5XWdePo7cv wTnSxm4gJwdU7i8dBXLbtOYzmavzq0JkQHnRcFcKm6RWMbaeNNwfYUwMLp4h4ZtG Mjh0qt4IyuRb5Tf7jkoZ9UR4fIftpuMNMMvE37Cfr7wUZKWtVbG9jHyydU2MZ678 vAF5NigYHa88i+MSdbXq8fPflmhddiHsN05nzI3olzX7tBFWNDf+71P82Gm4zKuM YC4ZlZH7Rrb79C13CJm5xKN0MI5TYGDYJeFLV4Km7hZ9H9EyDJcsuSFPgV0R0hg5 Y5rWIWRrK/lU1aSXPSEuyL1P+tbH5z5xeYxdPUz2WroqNTyre6ycXy+wuoyCw5Ij t91wqD8pNC1/3nynBbI2kcLeGFPjDTEfWYcYN7OE3Ts6AXr3fs8x1XUcUsseAZ1P oxaUxzzfCcDno3abtg== =mORx -----END PGP SIGNATURE----- Merge tag 'batadv-net-pullrequest-20201127' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Fix head/tailroom issues for fragments, by Sven Eckelmann (3 patches) * tag 'batadv-net-pullrequest-20201127' of git://git.open-mesh.org/linux-merge: batman-adv: Don't always reallocate the fragmentation skb head batman-adv: Reserve needed_*room for fragments batman-adv: Consider fragmentation for needed_headroom ==================== Link: https://lore.kernel.org/r/20201127173849.19208-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 11:56:06 -08:00
Antoine Tenart	44f64f23ba	netfilter: bridge: reset skb->pkt_type after NF_INET_POST_ROUTING traversal Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the hooks as, while it's an expected value for a bridge, routing expects PACKET_HOST. The change is undone later on after hook traversal. This can be seen with pairs of functions updating skb>pkt_type and then reverting it to its original value: For hook NF_INET_PRE_ROUTING: setup_pre_routing / br_nf_pre_routing_finish For hook NF_INET_FORWARD: br_nf_forward_ip / br_nf_forward_finish But the third case where netfilter does this, for hook NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing but never reverted. A comment says: /* We assume any code from br_dev_queue_push_xmit onwards doesn't care * about the value of skb->pkt_type. */ But when having a tunnel (say vxlan) attached to a bridge we have the following call trace: br_nf_pre_routing br_nf_pre_routing_ipv6 br_nf_pre_routing_finish br_nf_forward_ip br_nf_forward_finish br_nf_post_routing <- pkt_type is updated to PACKET_HOST br_nf_dev_queue_xmit <- but not reverted to its original value vxlan_xmit vxlan_xmit_one skb_tunnel_check_pmtu <- a check on pkt_type is performed In this specific case, this creates issues such as when an ICMPv6 PTB should be sent back. When CONFIG_BRIDGE_NETFILTER is enabled, the PTB isn't sent (as skb_tunnel_check_pmtu checks if pkt_type is PACKET_HOST and returns early). If the comment is right and no one cares about the value of skb->pkt_type after br_dev_queue_push_xmit (which isn't true), resetting it to its original value should be safe. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20201123174902.622102-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 11:46:51 -08:00
Marcelo Ricardo Leitner	3567e23379	net/sched: act_ct: enable stats for HW offloaded entries By setting NF_FLOWTABLE_COUNTER. Otherwise, the updates added by commit `ef803b3cf9` ("netfilter: flowtable: add counter support in HW offload") are not effective when using act_ct. While at it, now that we have the flag set, protect the call to nf_ct_acct_update() by commit `beb97d3a31` ("net/sched: act_ct: update nf_conn_acct for act_ct SW offload in flowtable") with the check on NF_FLOWTABLE_COUNTER, as also done on other places. Note that this shouldn't impact performance as these stats are only enabled when net.netfilter.nf_conntrack_acct is enabled. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: wenxu <wenxu@ucloud.cn> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/481a65741261fd81b0a0813e698af163477467ec.1606415787.git.marcelo.leitner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 11:43:40 -08:00
Jakub Kicinski	5c39f26e67	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Trivial conflict in CAN, keep the net-next + the byteswap wrapper. Conflicts: drivers/net/can/usb/gs_usb.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 18:25:27 -08:00
Jon Maloy	b6f88d9c2f	tipc: update address terminology in code We update the terminology in the code so that deprecated structure names and macros are replaced with those currently recommended in the user API. struct tipc_portid -> struct tipc_socket_addr struct tipc_name -> struct tipc_service_addr struct tipc_name_seq -> struct tipc_service_range TIPC_ADDR_ID -> TIPC_SOCKET_ADDR TIPC_ADDR_NAME -> TIPC_SERVICE_ADDR TIPC_ADDR_NAMESEQ -> TIPC_SERVICE_RANGE TIPC_CFG_SRV -> TIPC_NODE_STATE Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:34:01 -08:00
Jon Maloy	5f75e0a0e9	tipc: make node number calculation reproducible The 32-bit node number, aka node hash or node address, is calculated based on the 128-bit node identity when it is not set explicitly by the user. In future commits we will need to perform this hash operation on peer nodes while feeling safe that we obtain the same result. We do this by interpreting the initial hash as a network byte order number. Whenever we need to use the number locally on a node we must therefore translate it to host byte order to obtain an architecure independent result. Furthermore, given the context where we use this number, we must not allow it to be zero unless the node identity also is zero. Hence, in the rare cases when the xor-ed hash value may end up as zero we replace it with a fix number, knowing that the code anyway is capable of handling hash collisions. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:34:01 -08:00
Jon Maloy	60c102eede	tipc: refactor tipc_sk_bind() function We refactor the tipc_sk_bind() function, so that the lock handling is handled separately from the logics. We also move some sanity tests to earlier in the call chain, to the function tipc_bind(). Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:34:01 -08:00
Martin Schiller	139d6eb149	net/x25: remove x25_kill_by_device() Remove obsolete function x25_kill_by_device(). It's not used any more. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:22:51 -08:00
Martin Schiller	d023b2b9cc	net/x25: fix restart request/confirm handling We have to take the actual link state into account to handle restart requests/confirms well. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:22:51 -08:00
Martin Schiller	62480b992b	net/lapb: fix t1 timer handling for LAPB_STATE_0 1. DTE interface changes immediately to LAPB_STATE_1 and start sending SABM(E). 2. DCE interface sends N2-times DM and changes to LAPB_STATE_1 afterwards if there is no response in the meantime. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:22:51 -08:00
Martin Schiller	a4989fa911	net/lapb: support netdev events This patch allows layer2 (LAPB) to react to netdev events itself and avoids the detour via layer3 (X.25). 1. Establish layer2 on NETDEV_UP events, if the carrier is already up. 2. Call lapb_disconnect_request() on NETDEV_GOING_DOWN events to signal the peer that the connection will go down. (Only when the carrier is up.) 3. When a NETDEV_DOWN event occur, clear all queues, enter state LAPB_STATE_0 and stop all timers. 4. The NETDEV_CHANGE event makes it possible to handle carrier loss and detection. In case of Carrier Loss, clear all queues, enter state LAPB_STATE_0 and stop all timers. In case of Carrier Detection, we start timer t1 on a DCE interface, and on a DTE interface we change to state LAPB_STATE_1 and start sending SABM(E). Signed-off-by: Martin Schiller <ms@dev.tdt.de> Acked-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:22:51 -08:00
Martin Schiller	7eed751b3b	net/x25: handle additional netdev events 1. Add / remove x25_link_device by NETDEV_REGISTER/UNREGISTER and also by NETDEV_POST_TYPE_CHANGE/NETDEV_PRE_TYPE_CHANGE. This change is needed so that the x25_neigh struct for an interface is already created when it shows up and is kept independently if the interface goes UP or DOWN. This is used in an upcomming commit, where x25 params of an neighbour will get configurable through ioctls. 2. NETDEV_CHANGE event makes it possible to handle carrier loss and detection. If carrier is lost, clean up everything related to this neighbour by calling x25_link_terminated(). 3. Also call x25_link_terminated() for NETDEV_DOWN events and remove the call to x25_clear_forward_by_dev() in x25_route_device_down(), as this is already called by x25_kill_by_neigh() which gets called by x25_link_terminated(). 4. Do nothing for NETDEV_UP and NETDEV_GOING_DOWN events, as these will be handled in layer 2 (LAPB) and layer3 (X.25) will be informed by layer2 when layer2 link is established and layer3 link should be initiated. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 17:22:51 -08:00
wenxu	c129412f74	net/sched: sch_frag: add generic packet fragment support. Currently kernel tc subsystem can do conntrack in cat_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. This patch add support for a xmit hook to mirred, that gets executed before xmiting the packet. Then, when act_ct gets loaded, it configs that hook. The frag xmit hook maybe reused by other modules. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 14:36:02 -08:00
wenxu	fa6d639930	net/sched: act_mirred: refactor the handle of xmit This one is prepare for the next patch. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 14:36:02 -08:00
wenxu	aadaca9e7c	net/sched: fix miss init the mru in qdisc_skb_cb The mru in the qdisc_skb_cb should be init as 0. Only defrag packets in the act_ct will set the value. Fixes: `038ebb1a71` ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 14:36:02 -08:00
Vadim Fedorenko	74ea610602	net/tls: add CHACHA20-POLY1305 configuration Add ChaCha-Poly specific configuration code. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 14:32:37 -08:00
Vadim Fedorenko	a6acbe6235	net/tls: add CHACHA20-POLY1305 specific behavior RFC 7905 defines special behavior for ChaCha-Poly TLS sessions. The differences are in the calculation of nonce and the absence of explicit IV. This behavior is like TLSv1.3 partly. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 14:32:37 -08:00
Vadim Fedorenko	6942a284fb	net/tls: make inline helpers protocol-aware Inline functions defined in tls.h have a lot of AES-specific constants. Remove these constants and change argument to struct tls_prot_info to have an access to cipher type in later patches Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 14:32:37 -08:00
Zhu Yanjun	bb1b25cab0	xdp: Remove the functions xsk_map_inc and xsk_map_put The functions xsk_map_put() and xsk_map_inc() are simple wrappers and as such, replace these functions with the functions bpf_map_inc() and bpf_map_put() and remove some error testing code. Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/1606402998-12562-1-git-send-email-yanjunz@nvidia.com	2020-11-27 23:00:51 +01:00
Jakub Kicinski	d0742c49ca	linux-can-fixes-for-5.10-20201127 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAl/Ay8sTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqZvTB/9TfdD9pr5E4zIfStIlir7bvamDRK3y QVQd4+gaB6gLKNAQo0Kh+osbMZAm43ixzEtoLEHDOd+XaaNDyJdUpNUXyf+/ZZgs twMoCuY1vnO/1LPtVhLuggiyUWhc8YFPs7ZuFYmAWS+kYDMG4qxF3rMHcTjKw+bD TcD383ZldO8I2cTmdhT8ldx3tk0UUSCvG284Db43asUkTOhPs8SrkFLVlAZZfcNc d3gxzQYuMyR/s/ektyFP39ed9PO/OOQcg401XbT7aXvnCgaecTJHLIDfSvz6f69M HHybn8CWs68+Ne4k6kC4c5tGJzz3QGt+q1dO1J1IENG99DsDpoEnfNiS =xlpC -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.10-20201127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-11-27 The first patch is by me and target the gs_usb driver and fixes the endianess problem with candleLight firmware. Another patch by me for the mcp251xfd driver add sanity checking to bail out if no IRQ is configured. The next three patches target the m_can driver. A patch by me removes the hardcoded IRQF_TRIGGER_FALLING from the request_threaded_irq() as this clashes with the trigger level specified in the DT. Further a patch by me fixes the nominal bitiming tseg2 min value for modern m_can cores. Pankaj Sharma's patch add support for cores version 3.3.x. The last patch by Oliver Hartkopp is for af_can and converts a WARN() into a pr_warn(), which is triggered by the syzkaller. It was able to create a situation where the closing of a socket runs simultaneously to the notifier call chain for removing the CAN network device in use. * tag 'linux-can-fixes-for-5.10-20201127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0 can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given can: gs_usb: fix endianess problem with candleLight firmware ==================== Link: https://lore.kernel.org/r/20201127100301.512603-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 11:13:39 -08:00
Willem de Bruijn	985f733742	sock: set sk_err to ee_errno on dequeue from errq When setting sk_err, set it to ee_errno, not ee_origin. Commit `f5f99309fa` ("sock: do not set sk_err in sock_dequeue_err_skb") disabled updating sk_err on errq dequeue, which is correct for most error types (origins): - sk->sk_err = err; Commit `38b257938a` ("sock: reset sk_err when the error queue is empty") reenabled the behavior for IMCP origins, which do require it: + if (icmp_next) + sk->sk_err = SKB_EXT_ERR(skb_next)->ee.ee_origin; But read from ee_errno. Fixes: `38b257938a` ("sock: reset sk_err when the error queue is empty") Reported-by: Ayush Ranjan <ayushranjan@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20201126151220.2819322-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 11:07:06 -08:00
Paolo Abeni	d3ab78858f	mptcp: fix NULL ptr dereference on bad MPJ If an msk listener receives an MPJ carrying an invalid token, it will zero the request socket msk entry. That should later cause fallback and subflow reset - as per RFC - at subflow_syn_recv_sock() time due to failing hmac validation. Since commit `4cf8b7e48a` ("subflow: introduce and use mptcp_can_accept_new_subflow()"), we unconditionally dereference - in mptcp_can_accept_new_subflow - the subflow request msk before performing hmac validation. In the above scenario we hit a NULL ptr dereference. Address the issue doing the hmac validation earlier. Fixes: `4cf8b7e48a` ("subflow: introduce and use mptcp_can_accept_new_subflow()") Tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/03b2cfa3ac80d8fc18272edc6442a9ddf0b1e34e.1606400227.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 11:05:31 -08:00
Eelco Chaudron	69929d4c49	net: openvswitch: fix TTL decrement action netlink message format Currently, the openvswitch module is not accepting the correctly formated netlink message for the TTL decrement action. For both setting and getting the dec_ttl action, the actions should be nested in the OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi. When the original patch was sent, it was tested with a private OVS userspace implementation. This implementation was unfortunately not upstreamed and reviewed, hence an erroneous version of this patch was sent out. Leaving the patch as-is would cause problems as the kernel module could interpret additional attributes as actions and vice-versa, due to the actions not being encapsulated/nested within the actual attribute, but being concatinated after it. Fixes: `744676e777` ("openvswitch: add TTL decrement action") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 11:03:06 -08:00
Pablo Neira Ayuso	a5d45bc0dc	netfilter: nftables_offload: build mask based from the matching bytes Userspace might match on prefix bytes of header fields if they are on the byte boundary, this requires that the mask is adjusted accordingly. Use NFT_OFFLOAD_MATCH_EXACT() for meta since prefix byte matching is not allowed for this type of selector. The bitwise expression might be optimized out by userspace, hence the kernel needs to infer the prefix from the number of payload bytes to match on. This patch adds nft_payload_offload_mask() to calculate the bitmask to match on the prefix. Fixes: `c9626a2cbd` ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-27 12:10:47 +01:00
Pablo Neira Ayuso	3c78e9e0d3	netfilter: nftables_offload: set address type in control dissector This patch adds nft_flow_rule_set_addr_type() to set the address type from the nft_payload expression accordingly. If the address type is not set in the control dissector then a rule that matches either on source or destination IP address does not work. After this patch, nft hardware offload generates the flow dissector configuration as tc-flower does to match on an IP address. This patch has been also tested functionally to make sure packets are filtered out by the NIC. This is also getting the code aligned with the existing netfilter flow offload infrastructure which is also setting the control dissector. Fixes: `c9626a2cbd` ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-27 12:10:46 +01:00
Wang Hai	4bc3c8dc9f	ipvs: fix possible memory leak in ip_vs_control_net_init kmemleak report a memory leak as follows: BUG: memory leak unreferenced object 0xffff8880759ea000 (size 256): backtrace: [<00000000c0bf2deb>] kmem_cache_zalloc include/linux/slab.h:656 [inline] [<00000000c0bf2deb>] __proc_create+0x23d/0x7d0 fs/proc/generic.c:421 [<000000009d718d02>] proc_create_reg+0x8e/0x140 fs/proc/generic.c:535 [<0000000097bbfc4f>] proc_create_net_data+0x8c/0x1b0 fs/proc/proc_net.c:126 [<00000000652480fc>] ip_vs_control_net_init+0x308/0x13a0 net/netfilter/ipvs/ip_vs_ctl.c:4169 [<000000004c927ebe>] __ip_vs_init+0x211/0x400 net/netfilter/ipvs/ip_vs_core.c:2429 [<00000000aa6b72d9>] ops_init+0xa8/0x3c0 net/core/net_namespace.c:151 [<00000000153fd114>] setup_net+0x2de/0x7e0 net/core/net_namespace.c:341 [<00000000be4e4f07>] copy_net_ns+0x27d/0x530 net/core/net_namespace.c:482 [<00000000f1c23ec9>] create_new_namespaces+0x382/0xa30 kernel/nsproxy.c:110 [<00000000098a5757>] copy_namespaces+0x2e6/0x3b0 kernel/nsproxy.c:179 [<0000000026ce39e9>] copy_process+0x220a/0x5f00 kernel/fork.c:2072 [<00000000b71f4efe>] _do_fork+0xc7/0xda0 kernel/fork.c:2428 [<000000002974ee96>] __do_sys_clone3+0x18a/0x280 kernel/fork.c:2703 [<0000000062ac0a4d>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 [<0000000093f1ce2c>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 In the error path of ip_vs_control_net_init(), remove_proc_entry() needs to be called to remove the added proc entry, otherwise a memory leak will occur. Also, add some '#ifdef CONFIG_PROC_FS' because proc_create_net* return NULL when PROC is not used. Fixes: `b17fc9963f` ("IPVS: netns, ip_vs_stats and its procfs") Fixes: `61b1ab4583` ("IPVS: netns, add basic init per netns.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-27 12:10:46 +01:00
Ingo Molnar	a787bdaff8	Merge branch 'linus' into sched/core, to resolve semantic conflict Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-11-27 11:10:50 +01:00
Antony Antony	c7a5899eb2	xfrm: redact SA secret with lockdown confidentiality redact XFRM SA secret in the netlink response to xfrm_get_sa() or dumpall sa. Enable lockdown, confidentiality mode, at boot or at run time. e.g. when enabled: cat /sys/kernel/security/lockdown none integrity [confidentiality] ip xfrm state src 172.16.1.200 dst 172.16.1.100 proto esp spi 0x00000002 reqid 2 mode tunnel replay-window 0 aead rfc4106(gcm(aes)) 0x0000000000000000000000000000000000000000 96 note: the aead secret is redacted. Redacting secret is also a FIPS 140-2 requirement. v1->v2 - add size checks before memset calls v2->v3 - replace spaces with tabs for consistency v3->v4 - use kernel lockdown instead of a /proc setting v4->v5 - remove kconfig option Reviewed-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-11-27 11:03:06 +01:00
Oliver Hartkopp	d73ff9b7c4	can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check To detect potential bugs in CAN protocol implementations (double removal of receiver entries) a WARN() statement has been used if no matching list item was found for removal. The fault injection issued by syzkaller was able to create a situation where the closing of a socket runs simultaneously to the notifier call chain for removing the CAN network device in use. This case is very unlikely in real life but it doesn't break anything. Therefore we just replace the WARN() statement with pr_warn() to preserve the notification for the CAN protocol development. Reported-by: syzbot+381d06e0c8eaacb8706f@syzkaller.appspotmail.com Reported-by: syzbot+d0ddd88c9a7432f041e6@syzkaller.appspotmail.com Reported-by: syzbot+76d62d3b8162883c7d11@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20201126192140.14350-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2020-11-27 10:49:28 +01:00
Sven Eckelmann	992b03b88e	batman-adv: Don't always reallocate the fragmentation skb head When a packet is fragmented by batman-adv, the original batman-adv header is not modified. Only a new fragmentation is inserted between the original one and the ethernet header. The code must therefore make sure that it has a writable region of this size in the skbuff head. But it is not useful to always reallocate the skbuff by this size even when there would be more than enough headroom still in the skb. The reallocation is just to costly during in this codepath. Fixes: `ee75ed8887` ("batman-adv: Fragment and send skbs larger than mtu") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-11-27 08:02:55 +01:00
Sven Eckelmann	c5cbfc8755	batman-adv: Reserve needed_*room for fragments The batadv net_device is trying to propagate the needed_headroom and needed_tailroom from the lower devices. This is needed to avoid cost intensive reallocations using pskb_expand_head during the transmission. But the fragmentation code split the skb's without adding extra room at the end/beginning of the various fragments. This reduced the performance of transmissions over complex scenarios (batadv on vxlan on wireguard) because the lower devices had to perform the reallocations at least once. Fixes: `ee75ed8887` ("batman-adv: Fragment and send skbs larger than mtu") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-11-27 08:02:55 +01:00
Sven Eckelmann	4ca23e2c20	batman-adv: Consider fragmentation for needed_headroom If a batman-adv packets has to be fragmented, then the original batman-adv packet header is not stripped away. Instead, only a new header is added in front of the packet after it was split. This size must be considered to avoid cost intensive reallocations during the transmission through the various device layers. Fixes: `7bca68c784` ("batman-adv: Add lower layer needed_(head\|tail)room to own ones") Reported-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2020-11-27 08:02:55 +01:00
Maxim Mikityanskiy	025cc2fb6a	net/tls: Protect from calling tls_dev_del for TLS RX twice tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 17:31:06 -08:00
Parav Pandit	a7b4364950	devlink: Make sure devlink instance and port are in same net namespace When devlink reload operation is not used, netdev of an Ethernet port may be present in different net namespace than the net namespace of the devlink instance. Ensure that both the devlink instance and devlink port netdev are located in same net namespace. Fixes: `070c63f20f` ("net: devlink: allow to change namespaces during reload") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 17:26:34 -08:00
Parav Pandit	b187c9b417	devlink: Hold rtnl lock while reading netdev attributes A netdevice of a devlink port can be moved to different net namespace than its parent devlink instance. This scenario occurs when devlink reload is not used. When netdevice is undergoing migration to net namespace, its ifindex and name may change. In such use case, devlink port query may read stale netdev attributes. Fix it by reading them under rtnl lock. Fixes: `bfcd3a4661` ("Introduce devlink infrastructure") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 17:26:34 -08:00
Florian Westphal	c0700dfa2c	netfilter: nf_tables: avoid false-postive lockdep splat There are reports wrt lockdep splat in nftables, e.g.: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 31416 at net/netfilter/nf_tables_api.c:622 lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables] ... These are caused by an earlier, unrelated bug such as a n ABBA deadlock in a different subsystem. In such an event, lockdep is disabled and lockdep_is_held returns true unconditionally. This then causes the WARN() in nf_tables. Make the WARN conditional on lockdep still active to avoid this. Fixes: `f102d66b33` ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/linux-kselftest/CA+G9fYvFUpODs+NkSYcnwKnXm62tmP=ksLeBPmB+KFrB2rvCtQ@mail.gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-26 00:09:42 +01:00
Eric Dumazet	68ad89de91	netfilter: ipset: prevent uninit-value in hash_ip6_add syzbot found that we are not validating user input properly before copying 16 bytes [1]. Using NLA_BINARY in ipaddr_policy[] for IPv6 address is not correct, since it ensures at most 16 bytes were provided. We should instead make sure user provided exactly 16 bytes. In old kernels (before v4.20), fix would be to remove the NLA_BINARY, since NLA_POLICY_EXACT_LEN() was not yet available. [1] BUG: KMSAN: uninit-value in hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892 CPU: 1 PID: 11611 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197 hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892 hash_ip6_uadt+0x976/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:267 call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720 ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808 ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833 nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45deb9 Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fe2e503fc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000029ec0 RCX: 000000000045deb9 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003 RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c R13: 000000000169fb7f R14: 00007fe2e50409c0 R15: 000000000118bf2c Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline] kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289 __msan_chain_origin+0x57/0xa0 mm/kmsan/kmsan_instr.c:147 ip6_netmask include/linux/netfilter/ipset/pfxlen.h:49 [inline] hash_ip6_netmask net/netfilter/ipset/ip_set_hash_ip.c:185 [inline] hash_ip6_uadt+0xb1c/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:263 call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720 ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808 ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833 nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline] kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289 kmsan_memcpy_memmove_metadata+0x25e/0x2d0 mm/kmsan/kmsan.c:226 kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:246 __msan_memcpy+0x46/0x60 mm/kmsan/kmsan_instr.c:110 ip_set_get_ipaddr6+0x2cb/0x370 net/netfilter/ipset/ip_set_core.c:310 hash_ip6_uadt+0x439/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:255 call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720 ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808 ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833 nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline] kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104 kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76 slab_alloc_node mm/slub.c:2906 [inline] __kmalloc_node_track_caller+0xc61/0x15f0 mm/slub.c:4512 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x309/0xae0 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1094 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline] netlink_sendmsg+0xdb8/0x1840 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `a7b4f989a6` ("netfilter: ipset: IP set core support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-26 00:09:42 +01:00
Yunsheng Lin	6454eca81e	net: Use lockdep_assert_in_softirq() in napi_consume_skb() Use napi_consume_skb() to assert the case when it is not called in a atomic softirq context. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 15:08:38 -08:00
Paolo Abeni	fd8976790a	mptcp: be careful on MPTCP-level ack. We can enter the main mptcp_recvmsg() loop even when no subflows are connected. As note by Eric, that would result in a divide by zero oops on ack generation. Address the issue by checking the subflow status before sending the ack. Additionally protect mptcp_recvmsg() against invocation with weird socket states. v1 -> v2: - removed unneeded inline keyword - Jakub Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: `ea4ca586b1` ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/5370c0ae03449239e3d1674ddcfb090cf6f20abe.1606253206.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 13:36:16 -08:00
Horatiu Vultur	bfd042321a	bridge: mrp: Implement LC mode for MRP Extend MRP to support LC mode(link check) for the interconnect port. This applies only to the interconnect ring. Opposite to RC mode(ring check) the LC mode is using CFM frames to detect when the link goes up or down and based on that the userspace will need to react. One advantage of the LC mode over RC mode is that there will be fewer frames in the normal rings. Because RC mode generates InTest on all ports while LC mode sends CFM frame only on the interconnect port. All 4 nodes part of the interconnect ring needs to have the same mode. And it is not possible to have running LC and RC mode at the same time on a node. Whenever the MIM starts it needs to detect the status of the other 3 nodes in the interconnect ring so it would send a frame called InLinkStatus, on which the clients needs to reply with their link status. This patch adds InLinkStatus frame type and extends existing rules on how to forward this frame. Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 13:33:35 -08:00
Vlad Buslov	f460019b4c	net: sched: alias action flags with TCA_ACT_ prefix Currently both filter and action flags use same "TCA_" prefix which makes them hard to distinguish to code and confusing for users. Create aliases for existing action flags constants with "TCA_ACT_" prefix. Signed-off-by: Vlad Buslov <vlad@buslov.dev> Link: https://lore.kernel.org/r/20201124164054.893168-1-vlad@buslov.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 12:34:44 -08:00
Florian Westphal	b6d69fc8e8	mptcp: put reference in mptcp timeout timer On close this timer might be scheduled. mptcp uses sk_reset_timer for this, so the a reference on the mptcp socket is taken. This causes a refcount leak which can for example be reproduced with 'mp_join_server_v4.pkt' from the mptcp-packetdrill repo. The leak has nothing to do with join requests, v1_mp_capable_bind_no_cs.pkt works too when replacing the last ack mpcapable to v1 instead of v0. unreferenced object 0xffff888109bba040 (size 2744): comm "packetdrill", [..] backtrace: [..] sk_prot_alloc.isra.0+0x2b/0xc0 [..] sk_clone_lock+0x2f/0x740 [..] mptcp_sk_clone+0x33/0x1a0 [..] subflow_syn_recv_sock+0x2b1/0x690 [..] Fixes: `e16163b6e2` ("mptcp: refactor shutdown and close") Cc: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20201124162446.11448-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 12:32:45 -08:00
Eric Dumazet	2543a6000e	gro_cells: reduce number of synchronize_net() calls After cited commit, gro_cells_destroy() became damn slow on hosts with a lot of cores. This is because we have one additional synchronize_net() per cpu as stated in the changelog. gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough to not have one synchronize_net() call per netif_napi_del() We can factorize all the synchronize_net() to a single one, right before freeing per-cpu memory. Fixes: `5198d545db` ("net: remove napi_hash_del() from driver-facing API") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201124203822.1360107-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 11:28:12 -08:00
Wang Hai	e255e11e66	ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init kmemleak report a memory leak as follows: unreferenced object 0xffff8880059c6a00 (size 64): comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s) hex dump (first 32 bytes): 20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00 ............... 1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00 ................ backtrace: [<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0 [<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170 [<000000006a9ca9d4>] ops_init+0xa8/0x3c0 [<000000002da57bf2>] setup_net+0x2de/0x7e0 [<000000004e52d573>] copy_net_ns+0x27d/0x530 [<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30 [<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0 [<0000000030653721>] ksys_unshare+0x3a4/0x780 [<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40 [<0000000031a10c08>] do_syscall_64+0x33/0x40 [<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 We should free all rules when we catch an error in ip6addrlbl_net_init(). otherwise a memory leak will occur. Fixes: `2a8cc6c890` ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201124071728.8385-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 11:20:16 -08:00
Jakub Kicinski	26c8996526	Here is a batman-adv bugfix: - set module owner to THIS_MODULE, by Taehee Yoo -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl+9DbQWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoWlxD/0TOgtU9fVykvjMG3U0rnB6pvp7 rFH1Srqg5hsJ2PNLAF8fr0o4PDTyiAztirdJ4C4+ietug0ozxe4GZARB+/rPEk/3 u558cDMZaqLOo195g70s+8cr0XXPy8Bt4ip/X4uW8DNTluKMiQcKNDrVkc6Cysa/ 5cnT4wf6RftG9eMbtG1wdEzqG/goQpmrP+bo+J1QdBQEycqMwegTWYZyU9behS+m u47FIa20XQ8FmoZx3oooGNb7z9iDHzwVc7PMHw19Xv14bahtz1BwydERdJMde3hP UD4HeAQla6Gl0ZO2pS5MuCQxEu8aphel75a3g+D4XCdyyIl1B3FzK1F+DWdoHlnx g0g7xv0nnfpF2wVhGkgIKfssQXino9YPyIjl/zLJ7YXXrTZCzkOjGiumSvrvXgox mTtvDvbSIOylQksPF5UJtoOo6mWihg7HERyrOwfIV6UxKuqbKhv3e17gUlkZPIcp ec4iFjGheE0H00zSzBkeb+76M0XQmDTbvmjUfOxaxP4o+u5W27QE+HznOTrQ9uBa FrMt++XjxSfmVsiFrpqE4LQyYfcFjO3Ei5yYfR3tYfDx4qegDvqksLbzb3Fkp2G9 Pz9HEc+PYJk/w3N2Q+zSpbDJfUvHjyIPDeqsTdtbbUPUyfkGmghT7F9V+vof2jqm 6pijD3ji8JN1wmh9yw== =Qr0B -----END PGP SIGNATURE----- Merge tag 'batadv-net-pullrequest-20201124' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - set module owner to THIS_MODULE, by Taehee Yoo * tag 'batadv-net-pullrequest-20201124' of git://git.open-mesh.org/linux-merge: batman-adv: set .owner to THIS_MODULE ==================== Link: https://lore.kernel.org/r/20201124134417.17269-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 16:49:14 -08:00
Alexander Duyck	407c85c7dd	tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN When a BPF program is used to select between a type of TCP congestion control algorithm that uses either ECN or not there is a case where the synack for the frame was coming up without the ECT0 bit set. A bit of research found that this was due to the final socket being configured to dctcp while the listener socket was staying in cubic. To reproduce it all that is needed is to monitor TCP traffic while running the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed, assuming tcp_dctcp module is loaded or compiled in and the traffic matches the rules in the sample file, is that for all frames with the exception of the synack the ECT0 bit is set. To address that it is necessary to make one additional call to tcp_bpf_ca_needs_ecn using the request socket and then use the output of that to set the ECT0 bit for the tos/tclass of the packet. Fixes: `91b5b21c7c` ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/160593039663.2604.1374502006916871573.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 14:12:55 -08:00
Heiner Kallweit	1d155dfdf5	net: warn if gso_type isn't set for a GSO SKB In bug report [0] a warning in r8169 driver was reported that was caused by an invalid GSO SKB (gso_type was 0). See [1] for a discussion about this issue. Still the origin of the invalid GSO SKB isn't clear. It shouldn't be a network drivers task to check for invalid GSO SKB's. Also, even if issue [0] can be fixed, we can't be sure that a similar issue doesn't pop up again at another place. Therefore let gso_features_check() check for such invalid GSO SKB's. [0] https://bugzilla.kernel.org/show_bug.cgi?id=209423 [1] https://www.spinics.net/lists/netdev/msg690794.html Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/97c78d21-7f0b-d843-df17-3589f224d2cf@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 14:02:10 -08:00
Björn Töpel	36ccdf8582	net, xsk: Avoid taking multiple skbuff references Commit `642e450b6b` ("xsk: Do not discard packet when NETDEV_TX_BUSY") addressed the problem that packets were discarded from the Tx AF_XDP ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was bumping the skbuff reference count, so that the buffer would not be freed by dev_direct_xmit(). A reference count larger than one means that the skbuff is "shared", which is not the case. If the "shared" skbuff is sent to the generic XDP receive path, netif_receive_generic_xdp(), and pskb_expand_head() is entered the BUG_ON(skb_shared(skb)) will trigger. This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(), where a user can select the skbuff free policy. This allows AF_XDP to avoid bumping the reference count, but still keep the NETDEV_TX_BUSY behavior. Fixes: `642e450b6b` ("xsk: Do not discard packet when NETDEV_TX_BUSY") Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201123175600.146255-1-bjorn.topel@gmail.com	2020-11-24 22:39:56 +01:00
Moshe Shemesh	5204bb683c	devlink: Fix reload stats structure Fix reload stats structure exposed to the user. Change stats structure hierarchy to have the reload action as a parent of the stat entry and then stat entry includes value per limit. This will also help to avoid string concatenation on iproute2 output. Reload stats structure before this fix: "stats": { "reload": { "driver_reinit": 2, "fw_activate": 1, "fw_activate_no_reset": 0 } } After this fix: "stats": { "reload": { "driver_reinit": { "unspecified": 2 }, "fw_activate": { "unspecified": 1, "no_reset": 0 } } Fixes: `a254c26426` ("devlink: Add reload stats") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1606109785-25197-1-git-send-email-moshe@mellanox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 13:04:04 -08:00
Ido Schimmel	f0a5013e29	devlink: Add blackhole_nexthop trap Add a packet trap to report packets that were dropped due to a blackhole nexthop. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:14:56 -08:00
Jakub Kicinski	23c01ed3b0	rxrpc development -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl+8E4YACgkQ+7dXa6fL C2tehQ/8C1O+p1UBmQ4k/eXkYqsIvRQJGXGheSClxvym8tQiut/sZEKsCqucFmO8 Y2gKEYJB3xYIqbZjjJUFYbL70Q5qG/cXgKn3rCpMjhn/0UAoUSwaGyoURcLwaz2B asipDSvbvZk7OwX4ScUTi/b/7AHtO6l16Y6yRzUqneOsSkXYk1+HhK2/4rPDKEJf cxocGIjjWs4d1erG2Aqgn3BZ2GsDNb88NJYi5qgc0ywFgRacX7oL0WFeu6pD1Ja+ 8vZLjFYB/bDeLBlvzpnWal99lUiSee8BbUoQn1iGJ3EekUVWkkNNfjhpYRjzIIR5 GsJfbDHP16PHhLZgxBvNBwkWoqqBw3X/Zn0dkBDZgjlZQvLbeds6QRd6Ag/M2CJI D1QtmBDBtr4UZfRFm+5hfpTqf2jeOc4o68WROH3FEdsnKvICkttKucE9LaykokFD gQLNmkn2tb1D1yS2Y4iPucV3kYUU1Y6yw++Ck0cE2isBauN0gHtr9UAXZEc57ILc hXCFrJTbGkph3TimQDd1QbXZ28eExRBnn+H5RyYHeEidv03d2UO5uFZaJ5b5UGX0 sE+3aeWLTH3fFTBvMnYPyFjzalSky8V5zOlUltwVCx7PpiDe6P7rKpO+ZXEi4OXP mHz7HM6h1iEnnrqEGNz7L39pTqKVbfBxDMvFra07dyKkvVthF1w= =TvbK -----END PGP SIGNATURE----- Merge tag 'rxrpc-next-20201123' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Prelude to gssapi support Here are some patches that do some reorganisation of the security class handling in rxrpc to allow implementation of the RxGK security class that will allow AF_RXRPC to use GSSAPI-negotiated tokens and better crypto. The RxGK security class is not included in this patchset. It does the following things: (1) Add a keyrings patch to provide the original key description, as provided to add_key(), to the payload preparser so that it can interpret the content on that basis. Unfortunately, the rxrpc_s key type wasn't written to interpret its payload as anything other than a string of bytes comprising a key, but for RxGK, more information is required as multiple Kerberos enctypes are supported. (2) Remove the rxk5 security class key parsing. The rxk5 class never got rolled out in OpenAFS and got replaced with rxgk. (3) Support the creation of rxrpc keys with multiple tokens of different types. If some types are not supported, the ENOPKG error is suppressed if at least one other token's type is supported. (4) Punt the handling of server keys (rxrpc_s type) to the appropriate security class. (5) Organise the security bits in the rxrpc_connection struct into a union to make it easier to override for other classes. (6) Move some bits from core code into rxkad that won't be appropriate to rxgk. * tag 'rxrpc-next-20201123' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: rxrpc: Ask the security class how much space to allow in a packet rxrpc: rxkad: Don't use pskb_pull() to advance through the response packet rxrpc: Organise connection security to use a union rxrpc: Don't reserve security header in Tx DATA skbuff rxrpc: Merge prime_packet_security into init_connection_security rxrpc: Fix example key name in a comment rxrpc: Ignore unknown tokens in key payload unless no known tokens rxrpc: Make the parsing of xdr payloads more coherent rxrpc: Allow security classes to give more info on server keys rxrpc: Don't leak the service-side session key to userspace rxrpc: Hand server key parsing off to the security class rxrpc: Split the server key type (rxrpc_s) into its own file rxrpc: Don't retain the server key in the connection rxrpc: Support keys with multiple authentication tokens rxrpc: List the held token types in the key description in /proc/keys rxrpc: Remove the rxk5 security class as it's now defunct keys: Provide the original description to the key preparser ==================== Link: https://lore.kernel.org/r/160616220405.830164.2239716599743995145.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 12:05:58 -08:00
Peter Zijlstra	545b8c8df4	smp: Cleanup smp_call_function*() Get rid of the __call_single_node union and cleanup the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>	2020-11-24 16:47:49 +01:00
Christophe JAILLET	5112cf59d7	sctp: Fix some typo s/tranport/transport/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20201122180704.1366636-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 17:44:11 -08:00
Eyal Birger	d549699048	net/packet: fix packet receive on L3 devices without visible hard header In the patchset merged by commit `b9fcf0a0d8` ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which did not have header_ops were given one for the purpose of protocol parsing on af_packet transmit path. That change made af_packet receive path regard these devices as having a visible L3 header and therefore aligned incoming skb->data to point to the skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not reset their mac_header prior to ingress and therefore their incoming packets became malformed. Ideally these devices would reset their mac headers, or af_packet would be able to rely on dev->hard_header_len being 0 for such cases, but it seems this is not the case. Fix by changing af_packet RX ll visibility criteria to include the existence of a '.create()' header operation, which is used when creating a device hard header - via dev_hard_header() - by upper layers, and does not exist in these L3 devices. As this predicate may be useful in other situations, add it as a common dev_has_header() helper in netdevice.h. Fixes: `b9fcf0a0d8` ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 17:29:36 -08:00
Jakub Kicinski	cc69837fca	net: don't include ethtool.h from netdevice.h linux/netdevice.h is included in very many places, touching any of its dependecies causes large incremental builds. Drop the linux/ethtool.h include, linux/netdevice.h just needs a forward declaration of struct ethtool_ops. Fix all the places which made use of this implicit include. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 17:27:04 -08:00
Kurt Kanzenbach	8551fad63c	net: dsa: tag_hellcreek: Cleanup includes Remove unused and add needed includes. No functional change. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:57:21 -08:00
Stefano Garzarella	3fe356d58e	vsock/virtio: discard packets only when socket is really closed Starting from commit `8692cefc43` ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt"), we discard packets in virtio_transport_recv_pkt() if the socket has been released. When the socket is connected, we schedule a delayed work to wait the RST packet from the other peer, also if SHUTDOWN_MASK is set in sk->sk_shutdown. This is done to complete the virtio-vsock shutdown algorithm, releasing the port assigned to the socket definitively only when the other peer has consumed all the packets. If we discard the RST packet received, the socket will be closed only when the VSOCK_CLOSE_TIMEOUT is reached. Sergio discovered the issue while running ab(1) HTTP benchmark using libkrun [1] and observing a latency increase with that commit. To avoid this issue, we discard packet only if the socket is really closed (SOCK_DONE flag is set). We also set SOCK_DONE in virtio_transport_release() when we don't need to wait any packets from the other peer (we didn't schedule the delayed work). In this case we remove the socket from the vsock lists, releasing the port assigned. [1] https://github.com/containers/libkrun Fixes: `8692cefc43` ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt") Cc: justin.he@arm.com Reported-by: Sergio Lopez <slp@redhat.com> Tested-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20201120104736.73749-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:36:29 -08:00
Ricardo Dias	01770a1661	tcp: fix race condition when creating child sockets from syncookies When the TCP stack is in SYN flood mode, the server child socket is created from the SYN cookie received in a TCP packet with the ACK flag set. The child socket is created when the server receives the first TCP packet with a valid SYN cookie from the client. Usually, this packet corresponds to the final step of the TCP 3-way handshake, the ACK packet. But is also possible to receive a valid SYN cookie from the first TCP data packet sent by the client, and thus create a child socket from that SYN cookie. Since a client socket is ready to send data as soon as it receives the SYN+ACK packet from the server, the client can send the ACK packet (sent by the TCP stack code), and the first data packet (sent by the userspace program) almost at the same time, and thus the server will equally receive the two TCP packets with valid SYN cookies almost at the same instant. When such event happens, the TCP stack code has a race condition that occurs between the momement a lookup is done to the established connections hashtable to check for the existence of a connection for the same client, and the moment that the child socket is added to the established connections hashtable. As a consequence, this race condition can lead to a situation where we add two child sockets to the established connections hashtable and deliver two sockets to the userspace program to the same client. This patch fixes the race condition by checking if an existing child socket exists for the same client when we are adding the second child socket to the established connections socket. If an existing child socket exists, we drop the packet and discard the second child socket to the same client. Signed-off-by: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:32:33 -08:00
Paul Moore	3df98d7921	lsm,selinux: pass flowi_common instead of flowi to the LSM hooks As pointed out by Herbert in a recent related patch, the LSM hooks do not have the necessary address family information to use the flowi struct safely. As none of the LSMs currently use any of the protocol specific flowi information, replace the flowi pointers with pointers to the address family independent flowi_common struct. Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2020-11-23 18:36:21 -05:00
Jason Gunthorpe	ed92f6a52b	Linux 5.10-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl+69egeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGTSYH/ifRBlaxy5UiHFc0 2zdR7pkjWrYfDTTT3sazIAhdlzzcfnkUqgFxOP45F4ZIqeTzunH3sUY+5UlT9IX7 liUgnLxQ/1R9Gx8kPGQfu+tLCey78xVFydGsqJoW9sPRw2R+apMdGGa/lOrk+OXz DXIN+dDnGFqwCCNJpK+rxQQhFf++IPpSI8z6Y23moOFhsDZrEziHuVFy2FGyRM6z prZ/us/tcobE8ptCk1RmOxLoJ1DR6UxpA2vLimTE+JD8siOsSWPbjE0KudnWCnd5 BLqIjrsPJbSxyuzzK3v9dnO5wMv7tMDuMIuYM/MQTXDttNwtsqt/aP6gdnUCym7N 5eHEj5g= =MuO1 -----END PGP SIGNATURE----- Merge tag 'v5.10-rc5' into rdma.git for-next For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-23 16:50:59 -04:00
David Howells	d7d775b1ff	rxrpc: Ask the security class how much space to allow in a packet Ask the security class how much header and trailer space to allow for when allocating a packet, given how much data is remaining. This will allow the rxgk security class to stick both a trailer in as well as a header as appropriate in the future. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 19:53:11 +00:00
David Howells	ceff522db2	rxrpc: rxkad: Don't use pskb_pull() to advance through the response packet In the rxkad security class, don't use pskb_pull() to advance through the contents of the response packet. There's no point, especially as the next and last access to the skbuff still has to allow for the wire header in the offset (which we didn't advance over). Better to just add the displacement to the next offset. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	521bb3049c	rxrpc: Organise connection security to use a union Organise the security information in the rxrpc_connection struct to use a union to allow for different data for different security classes. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00
David Howells	f4bdf3d683	rxrpc: Don't reserve security header in Tx DATA skbuff Insert the security header into the skbuff representing a DATA packet to be transmitted rather than using skb_reserve() when the packet is allocated. This makes it easier to apply crypto that spans the security header and the data, particularly in the upcoming RxGK class where we have a common encrypt-and-checksum function that is used in a number of circumstances. Signed-off-by: David Howells <dhowells@redhat.com>	2020-11-23 18:09:30 +00:00

... 3 4 5 6 7 ...

63534 Commits (zero-sugar-mainline-defconfig)