redonkable/alistair23-linux

Author	SHA1	Message	Date
Yunsheng Lin	56cf68c730	net: hns3: Cleanup indentation for Kconfig in the the hisilicon folder This patch fixes a few indentation for Kconfig file in the hisilicon folder. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-09 09:46:54 -07:00
Yunsheng Lin	9780cb97af	net: hns3: Add hns3_get_handle macro in hns3 driver There are many places that will need to get the handle of netdev, so add a macro to get the handle of netdev. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-09 09:46:54 -07:00
Yunsheng Lin	5bca3b94df	net: hns3: Cleanup for shifting true in hns3 driver This patch fixes a shifting true in hclge_main module. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-09 09:46:53 -07:00
David S. Miller	93b03193c6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-10-09 1) Fix some error paths of the IPsec offloading API. 2) Fix a NULL pointer dereference when IPsec is used with vti. From Alexey Kodanev. 3) Don't call xfrm_policy_cache_flush under xfrm_state_lock, it triggers several locking warnings. From Artem Savkov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-09 09:43:34 -07:00
Emil Tantilov	761c2a48c7	ixgbe: split Tx/Rx ring clearing for ethtool loopback test Commit: fed21bcee7a5 ("ixgbe: Don't bother clearing buffer memory for descriptor rings) exposed some issues with the logic in the current implementation of ixgbe_clean_test_rings() that are being addressed in this patch: - Split the clearing of the Tx and Rx rings in separate loops. Previously both Tx and Rx rings were cleared in a rx_desc->wb.upper.length based loop which could lead to issues if for w/e reason packets were received outside of the frames transmitted for the loopback test. - Add check for IXGBE_TXD_STAT_DD to avoid clearing the rings if the transmits have not comlpeted by the time we enter ixgbe_clean_test_rings() - Exit early on ixgbe_check_lbtest_frame() failure. This change fixes a crash during ethtool diagnostic (ethtool -t). Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 09:41:25 -07:00
Steffen Klassert	6c0e7284d8	ipv4: Fix traffic triggered IPsec connections. A recent patch removed the dst_free() on the allocated dst_entry in ipv4_blackhole_route(). The dst_free() marked the dst_entry as dead and added it to the gc list. I.e. it was setup for a one time usage. As a result we may now have a blackhole route cached at a socket on some IPsec scenarios. This makes the connection unusable. Fix this by marking the dst_entry directly at allocation time as 'dead', so it is used only once. Fixes: `b838d5e1c5` ("ipv4: mark DST_NOGC and remove the operation of dst_free()") Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-09 09:39:50 -07:00
Steffen Klassert	62cf27e52b	ipv6: Fix traffic triggered IPsec connections. A recent patch removed the dst_free() on the allocated dst_entry in ipv6_blackhole_route(). The dst_free() marked the dst_entry as dead and added it to the gc list. I.e. it was setup for a one time usage. As a result we may now have a blackhole route cached at a socket on some IPsec scenarios. This makes the connection unusable. Fix this by marking the dst_entry directly at allocation time as 'dead', so it is used only once. Fixes: `587fea7411` ("ipv6: mark DST_NOGC and remove the operation of dst_free()") Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-09 09:39:26 -07:00
Emil Tantilov	c69be946d6	ixgbe: add error checks when initializing the PHY Ignoring errors when attempting to identify the PHY can lead to a crash. Specifically in the case of FW controlled PHYs where the PHY read/write operations are set to NULL. Removed redundant comment. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 09:38:11 -07:00
Shannon Nelson	f5a71caa17	ixgbe: restore normal RSS after last macvlan offload is removed Just like when the last VF is removed, we need to restore normal operations after the last macvlan offload is removed, else we get stuck in single queue operations. To test: ethtool -l eth1 # note the number of queues in use, ~= cpus ethtool -K eth1 l2-fwd-offload on ip link add mv1 link eth1 type macvlan mode bridge ip link set dev mv1 up ip link del mv1 ethtool -l eth1 # are we back to the same # of queues, or stuck on 1? Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 09:36:31 -07:00
Bhumika Goyal	2e033eace7	ixgbe: declare ixgbe_mac_operations structures as const Declare ixgbe_mac_operations structures as const as they are only stored in the mac_ops field of ixgbe_info structure. This field is of type const and therefore ixgbe_mac_operations structure can be made const too. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 09:34:27 -07:00
Emil Tantilov	2e22a75c55	ixgbe: Clear SWFW_SYNC register during init Added clearing of SW resource bits in the SW/FW synchronization register to ixgbe_init_swfw_sync_X540(). Updated ixgbe_acquire_swfw_sync_X540 SW Manageability host interface resource bit error case to match the error handling of the other SW resource bits. Which is to release the SW resource bits if SW times out while attempting to acquire the resource. This allows the driver to load in cases where the semaphore bits could be stuck after a reset or a crash. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 09:28:23 -07:00
John Fastabend	8e679021c5	ixgbe: incorrect XDP ring accounting in ethtool tx_frame param Changing the TX ring parameters with an XDP program attached may cause the XDP queues to be cleared and the TX rings to be incorrectly configured. Fix by doing correct ring accounting in setup call. Fixes: `33fdc82f08` ("ixgbe: add support for XDP_TX action") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 08:02:47 -07:00
Ding Tianhong	5e0fac63a6	net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag The ixgbe driver use the compile check to determine if it can send TLPs to Root Port with the Relaxed Ordering Attribute set, this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added to the kernel and we could check the bit4 in the PCIe Device Control register to determine whether we should use the Relaxed Ordering Attributes or not, so use this new way in the ixgbe driver. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 07:43:06 -07:00
Ding Tianhong	f4986d250a	Revert commit `1a8b6d76dc` ("net:add one common config...") The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added to indicate that Relaxed Ordering Attributes (RO) should not be used for Transaction Layer Packets (TLP) targeted toward these affected Root Port, it will clear the bit4 in the PCIe Device Control register, so the PCIe device drivers could query PCIe configuration space to determine if it can send TLPs to Root Port with the Relaxed Ordering Attributes set. With this new flag we don't need the config ARCH_WANT_RELAX_ORDER to control the Relaxed Ordering Attributes for the ixgbe drivers just like the commit `1a8b6d76dc` ("net:add one common config...") did, so revert this commit. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 07:43:06 -07:00
Sabrina Dubroca	a39221ce96	ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register In ixgbe_clear_udp_tunnel_port(), we read the IXGBE_VXLANCTRL register and then try to mask some bits out of the value, using the logical instead of bitwise and operator. Fixes: `a21d0822ff` ("ixgbe: add support for geneve Rx offload") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 07:43:06 -07:00
Mark D Rustad	e0f06bba96	ixgbe: Return error when getting PHY address if PHY access is not supported In cases where PHY register access is not supported, don't mislead a caller into thinking that it is supported by returning a PHY address. Instead, return -EOPNOTSUPP when PHY access is not supported. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-10-09 07:43:06 -07:00
Shmulik Ladkani	98589a0998	netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Commit `2c16d60332` ("netfilter: xt_bpf: support ebpf") introduced support for attaching an eBPF object by an fd, with the 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each IPT_SO_SET_REPLACE call. However this breaks subsequent iptables calls: # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT # iptables -A INPUT -s 5.6.7.8 -j ACCEPT iptables: Invalid argument. Run `dmesg' for more information. That's because iptables works by loading existing rules using IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with the replacement set. However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number (from the initial "iptables -m bpf" invocation) - so when 2nd invocation occurs, userspace passes a bogus fd number, which leads to 'bpf_mt_check_v1' to fail. One suggested solution [1] was to hack iptables userspace, to perform a "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new, process-local fd per every 'xt_bpf_info_v1' entry seen. However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects. This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given '.fd' and instead perform an in-kernel lookup for the bpf object given the provided '.path'. It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is expected to provide the path of the pinned object. Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved. References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2 [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2 Reported-by: Rafael Buchbinder <rafi@rbk.ms> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-10-09 15:18:04 +02:00
Lin Zhang	49f817d793	netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet, but the real server maybe reply an icmp error packet related to the exist tcp conntrack, so we will access wrong tcp data. Fix it by checking for the protocol field and only process tcp traffic. Signed-off-by: Lin Zhang <xiaolou4617@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-10-09 13:08:39 +02:00
Christos Gkekas	c49c777f9c	qed: Delete redundant check on dcb_app priority dcb_app priority is unsigned thus checking whether it is less than zero is redundant. Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Acked-By: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:21:02 -07:00
Christos Gkekas	c778c32118	net: ethernet: stmmac: Clean up dead code Many macros in dwmac-ipq806x are unused and should be removed. Moreover gmac->id is an unsigned variable and therefore checking whether it is less than zero is redundant. Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:19:07 -07:00
David S. Miller	bf6a119eea	Merge branch 'ipv6_dev_get_saddr-rcu' Eric Dumazet says: ==================== ipv6: ipv6_dev_get_saddr() rcu works Sending IPv6 udp packets on non connected sockets is quite slow, because ipv6_dev_get_saddr() is still using an rwlock and silly references games on ifa. Tested: $ ./super_netperf 16 -H 4444::555:0786 -l 2000 -t UDP_STREAM -- -m 100 & [1] 12527 Performance is boosted from 2.02 Mpps to 4.28 Mpps Kernel profile before patches : 22.62% [kernel] [k] _raw_read_lock_bh 7.04% [kernel] [k] refcount_sub_and_test 6.56% [kernel] [k] ipv6_get_saddr_eval 5.67% [kernel] [k] _raw_read_unlock_bh 5.34% [kernel] [k] __ipv6_dev_get_saddr 4.95% [kernel] [k] refcount_inc_not_zero 4.03% [kernel] [k] __ip6addrlbl_match 3.70% [kernel] [k] _raw_spin_lock 3.44% [kernel] [k] ipv6_dev_get_saddr 3.24% [kernel] [k] ip6_pol_route 3.06% [kernel] [k] refcount_add_not_zero 2.30% [kernel] [k] __local_bh_enable_ip 1.81% [kernel] [k] mlx4_en_xmit 1.20% [kernel] [k] __ip6_append_data 1.12% [kernel] [k] __ip6_make_skb 1.11% [kernel] [k] __dev_queue_xmit 1.06% [kernel] [k] l3mdev_master_ifindex_rcu Kernel profile after patches : 11.36% [kernel] [k] ip6_pol_route 7.65% [kernel] [k] _raw_spin_lock 7.16% [kernel] [k] __ipv6_dev_get_saddr 6.49% [kernel] [k] ipv6_get_saddr_eval 6.04% [kernel] [k] refcount_add_not_zero 3.34% [kernel] [k] __ip6addrlbl_match 2.62% [kernel] [k] __dev_queue_xmit 2.37% [kernel] [k] mlx4_en_xmit 2.26% [kernel] [k] dst_release 1.89% [kernel] [k] __ip6_make_skb 1.87% [kernel] [k] __ip6_append_data 1.86% [kernel] [k] udpv6_sendmsg 1.86% [kernel] [k] ip6t_do_table 1.64% [kernel] [k] ipv6_dev_get_saddr 1.64% [kernel] [k] find_match 1.51% [kernel] [k] l3mdev_master_ifindex_rcu 1.24% [kernel] [k] ipv6_addr_label ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:16:31 -07:00
Eric Dumazet	cc429c8f6f	ipv6: avoid cache line dirtying in ipv6_dev_get_saddr() By extending the rcu section a bit, we can avoid these very expensive in6_ifa_put()/in6_ifa_hold() calls done in __ipv6_dev_get_saddr() and ipv6_dev_get_saddr() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:16:31 -07:00
Eric Dumazet	f59c031e91	ipv6: __ipv6_dev_get_saddr() rcu conversion Callers hold rcu_read_lock(), so we do not need the rcu_read_lock()/rcu_read_unlock() pair. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:16:30 -07:00
Eric Dumazet	24ba333b2c	ipv6: ipv6_chk_prefix() rcu conversion Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:16:30 -07:00
Eric Dumazet	47e26941f7	ipv6: ipv6_chk_custom_prefix() rcu conversion Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:16:30 -07:00
Eric Dumazet	d9bf82c2f6	ipv6: ipv6_count_addresses() rcu conversion Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:16:30 -07:00
Eric Dumazet	8ef802aa8e	ipv6: prepare RCU lookups for idev->addr_list inet6_ifa_finish_destroy() already uses kfree_rcu() to free inet6_ifaddr structs. We need to use proper list additions/deletions in order to allow readers to use RCU instead of idev->lock rwlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:16:30 -07:00
Jon Maloy	a9e2971b8c	tipc: Unclone message at secondary destination lookup When a bundling message is received, the function tipc_link_input() calls function tipc_msg_extract() to unbundle all inner messages of the bundling message before adding them to input queue. The function tipc_msg_extract() just clones all inner skb for all inner messagges from the bundling skb. This means that the skb headroom of an inner message overlaps with the data part of the preceding message in the bundle. If the message in question is a name addressed message, it may be subject to a secondary destination lookup, and eventually be sent out on one of the interfaces again. But, since what is perceived as headroom by the device driver in reality is the last bytes of the preceding message in the bundle, the latter will be overwritten by the MAC addresses of the L2 header. If the preceding message has not yet been consumed by the user, it will evenually be delivered with corrupted contents. This commit fixes this by uncloning all messages passing through the function tipc_msg_lookup_dest(), hence ensuring that the headroom is always valid when the message is passed on. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:13:23 -07:00
Jon Maloy	3382605fd8	tipc: correct initialization of skb list We change the initialization of the skb transmit buffer queues in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also initialize their spinlocks. This is needed because we may, during error conditions, need to call skb_queue_purge() on those queues further down the stack. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:13:23 -07:00
David S. Miller	a42317785c	Merge branch 'bridge-neigh-msg-proxy-and-flood-suppression-support' Roopa Prabhu says: ==================== bridge: neigh msg proxy and flood suppression support This series implements arp and nd suppression in the bridge driver for ethernet vpns. It implements rfc7432, section 10 https://tools.ietf.org/html/rfc7432#section-10 for ethernet VPN deployments. It is similar to the existing BR_PROXYARP* flags but has a few semantic differences to conform to EVPN standard. Unlike the existing flags, this new flag suppresses flood of all neigh discovery packets (arp and nd) to tunnel ports. Supports both vlan filtering and non-vlan filtering bridges. In case of EVPN, it is mainly used to avoid flooding of arp and nd packets to tunnel ports like vxlan. v2 : rebase to latest + address some optimization feedback from Nikolay. v3 : fix kbuild reported build errors with CONFIG_INET off v4 : simplify port flag mask as suggested by stephen v5 : address some feedback from Toshiaki v6 : some v5 cleanups in nd suppress (keep it consistent with arp suppress) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:12:04 -07:00
Roopa Prabhu	ed842faeb2	bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports This patch avoids flooding and proxies ndisc packets for BR_NEIGH_SUPPRESS ports. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:12:04 -07:00
Roopa Prabhu	057658cb33	bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports This patch avoids flooding and proxies arp packets for BR_NEIGH_SUPPRESS ports. Moves existing br_do_proxy_arp to br_do_proxy_suppress_arp to support both proxy arp and neigh suppress. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:12:04 -07:00
Roopa Prabhu	821f1b21ca	bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to suppress arp and nd flood on bridge ports. It implements rfc7432, section 10. https://tools.ietf.org/html/rfc7432#section-10 for ethernet VPN deployments. It is similar to the existing BR_PROXYARP* flags but has a few semantic differences to conform to EVPN standard. Unlike the existing flags, this new flag suppresses flood of all neigh discovery packets (arp and nd) to tunnel ports. Supports both vlan filtering and non-vlan filtering bridges. In case of EVPN, it is mainly used to avoid flooding of arp and nd packets to tunnel ports like vxlan. This patch adds netlink and sysfs support to set this bridge port flag. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:12:04 -07:00
Eric Dumazet	951f788a80	ipv6: fix a BUG in rt6_get_pcpu_route() Ido reported following splat and provided a patch. [ 122.221814] BUG: using smp_processor_id() in preemptible [00000000] code: sshd/2672 [ 122.221845] caller is debug_smp_processor_id+0x17/0x20 [ 122.221866] CPU: 0 PID: 2672 Comm: sshd Not tainted 4.14.0-rc3-idosch-next-custom #639 [ 122.221880] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 122.221893] Call Trace: [ 122.221919] dump_stack+0xb1/0x10c [ 122.221946] ? _atomic_dec_and_lock+0x124/0x124 [ 122.221974] ? ___ratelimit+0xfe/0x240 [ 122.222020] check_preemption_disabled+0x173/0x1b0 [ 122.222060] debug_smp_processor_id+0x17/0x20 [ 122.222083] ip6_pol_route+0x1482/0x24a0 ... I believe we can simplify this code path a bit, since we no longer hold a read_lock and need to release it to avoid a dead lock. By disabling BH, we make sure we'll prevent code re-entry and rt6_get_pcpu_route()/rt6_make_pcpu_route() run on the same cpu. Fixes: `66f5d6ce53` ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:09:00 -07:00
David S. Miller	51a0c00c6b	Merge tag 'mlx5-updates-2017-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== Mellanox, mlx5 updates 2017-10-06 This series includes some shared code updates for kernel 4.15 to both net-next and rdma-next trees. The series includes mlx5 low level flow steering updates and optimizations to support firmware command parallelism for flow steering requests from Maor Gottlieb and two other small fixes from Matan and Maor. One fix from Matan adds error handling for when the destination list of the flow steering rule is full. Maor introduced a patch to avoid NULL pointer dereference on steering cleanup. Then Some refactoring patches needed by the series for code sharing purposes. and split the Flow Table Entry (FTE) and Flow Group (FG) creation code to two parts: 1) Object allocation - allocate the steering node and initialize its resources. 2) The firmware command execution. This change will give us the ability to take write lock on the parent node (e.g. FG for FTE creating) only on the software data struct allocation and creation part of the procedure where the synchronization is really required, and will allow us to execute multiple firmware commands simultaneously and overcome the firmware bottleneck. Refactor the locking scheme of the mlx5 core flow steering as follows: 1) Replace the mutex lock with readers-writers semaphore and take the write lock only when necessary (e.g. allocating a new flow table entry index or adding a node to the parent's children list). When we try to find a suitable child in the parent's children list (e.g. search for flow group with the same match_criteria of the rule) then we only take the read lock. 2) Add versioning mechanism - each steering entity (FT, FG, FTE, DST) will have an incremental version. The version is increased when the entity is changed (e.g. when a new FTE was added to FG - the FG's version is increased). Versioning is used in order to determine if the last traverse of an entity's children is valid or a rescan under write lock is required. Last patch adds FGs and FTEs memory pool, It is useful because these objects are not small and could be allocated/deallocated many times. This support improves the insertion rate of steering rules from ~5k/sec to ~40k/sec. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 21:07:11 -07:00
Linus Torvalds	8a5776a5f4	Linux 4.14-rc4	2017-10-08 20:53:29 -07:00
Alexey Kodanev	3d0241d57c	gso: fix payload length when gso_size is zero When gso_size reset to zero for the tail segment in skb_segment(), later in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment() we will get incorrect results (payload length, pcsum) for that segment. inet_gso_segment() already has a check for gso_size before calculating payload. The issue was found with LTP vxlan & gre tests over ixgbe NIC. Fixes: `07b26c9454` ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:12:15 -07:00
David S. Miller	28f50eb209	Merge branch 'hv_netvsc-TCP-hash-level' Haiyang Zhang says: ==================== hv_netvsc: support changing TCP hash level The patch set simplifies the existing hash level switching code for UDP. It also adds the support for changing TCP hash level. So users can switch between L3 an L4 hash levels for TCP and UDP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:11:01 -07:00
Haiyang Zhang	78005d91c1	hv_netvsc: Update netvsc Document for TCP hash level setting Update Documentation/networking/netvsc.txt for TCP hash level setting and related info. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:11:01 -07:00
Haiyang Zhang	0518ec4f9d	hv_netvsc: Add ethtool handler to set and get TCP hash levels The patch supports the options to switch TCP hash level between L3 and L4 by ethtool command. TCP over IPv4 and v6 can be set differently. The default hash level is L4. We currently only allow switching TX hash level from within the guests. For example, for TCP over IPv4 on eth0: To include TCP port numbers in hashing: ethtool -N eth0 rx-flow-hash tcp4 sdfn To exclude TCP port numbers in hashing: ethtool -N eth0 rx-flow-hash tcp4 sd To show TCP hash level: ethtool -n eth0 rx-flow-hash tcp4 Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:11:01 -07:00
Haiyang Zhang	486e398105	hv_netvsc: Change the hash level variable to bit flags This simplifies the logic and make it easier to add more options. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:11:01 -07:00
David S. Miller	c1b85a193a	Merge branch 'mlxsw-more-extack' Jiri Pirko says: ==================== mlxsw: Add more extack error reporting Ido says: Add error messages to VLAN and bridge enslavements to help users understand why the enslavement failed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:07:21 -07:00
Ido Schimmel	9b63ef88d3	mlxsw: spectrum: Propagate extack further for bridge enslavements The code that actually takes care of bridge offload introduces a few more non-trivial constraints with regards to bridge enslavements. Propagate extack there to indicate the reason. $ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10 $ ip link add link enp1s0np1 name enp1s0np1.20 type vlan id 20 $ ip link add name br0 type bridge $ ip link set dev enp1s0np1.10 master br0 $ ip link set dev enp1s0np1.20 master br0 Error: spectrum: Can not bridge VLAN uppers of the same port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:07:21 -07:00
Ido Schimmel	c1f2c6d025	mlxsw: spectrum: Add extack for VLAN enslavements Similar to physical ports, enslavement of VLAN devices can also fail. Use extack to indicate why the enslavement failed. $ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10 $ ip link add name bond0 type bond mode 802.3ad $ ip link set dev enp1s0np1.10 master bond0 Error: spectrum: VLAN devices only support bridge and VRF uppers. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:07:21 -07:00
Ido Schimmel	a69518cf0b	mlxsw: spectrum_router: Avoid expensive lookup during route removal In commit `fc922bb0dd` ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers") I increased the scale of supported VRFs by having all of them share the same LPM tree. In order to avoid look-ups for prefix lengths that don't exist, each route removal would trigger an aggregation across all the active virtual routers to see which prefix lengths are in use and which aren't and structure the tree accordingly. With the way the data structures are currently laid out, this is a very expensive operation. When preformed repeatedly - due to the invocation of the abort mechanism - and with enough VRFs, this can result in a hung task. For now, avoid this optimization until it can be properly re-added in net-next. Fixes: `fc922bb0dd` ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-08 10:05:27 -07:00
David S. Miller	c9f766bc6e	Merge branch 'bpf-obj-name-misc' Martin KaFai Lau says: ==================== bpf: Misc improvements and a new usage on bpf obj name The first two patches make improvements on the bpf obj name. The last patch adds the prog name to kallsyms. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-07 23:29:40 +01:00
Martin KaFai Lau	368211fb92	bpf: Append prog->aux->name in bpf_get_prog_name() This patch makes the bpf_prog's name available in kallsyms. The new format is bpf_prog_tag[_name]. Sample kallsyms from running selftests/bpf/test_progs: [root@arch-fb-vm1 ~]# egrep ' bpf_prog_[0-9a-fA-F]{16}' /proc/kallsyms ffffffffa0048000 t bpf_prog_dabf0207d1992486_test_obj_id ffffffffa0038000 t bpf_prog_a04f5eef06a7f555__123456789ABCDE ffffffffa0050000 t bpf_prog_a04f5eef06a7f555 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-07 23:29:39 +01:00
Martin KaFai Lau	067cae4777	bpf: Use char in prog and map name Instead of u8, use char for prog and map name. It can avoid the userspace tool getting compiler's signess warning. The bpf_prog_aux, bpf_map, bpf_attr, bpf_prog_info and bpf_map_info are changed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-07 23:29:39 +01:00
Martin KaFai Lau	473d97343f	bpf: Change bpf_obj_name_cpy() to better ensure map's name is init by 0 During get_info_by_fd, the prog/map name is memcpy-ed. It depends on the prog->aux->name and map->name to be zero initialized. bpf_prog_aux is easy to guarantee that aux->name is zero init. The name in bpf_map may be harder to be guaranteed in the future when new map type is added. Hence, this patch makes bpf_obj_name_cpy() to always zero init the prog/map name. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-07 23:29:39 +01:00
Alexei Starovoitov	8fe2d6ccd5	bpf: fix liveness marking while processing Rx = Ry instruction the verifier does regs[insn->dst_reg] = regs[insn->src_reg] which often clears write mark (when Ry doesn't have it) that was just set by check_reg_arg(Rx) prior to the assignment. That causes mark_reg_read() to keep marking Rx in this block as REG_LIVE_READ (since the logic incorrectly misses that it's screened by the write) and in many of its parents (until lucky write into the same Rx or beginning of the program). That causes is_state_visited() logic to miss many pruning opportunities. Furthermore mark_reg_read() logic propagates the read mark for BPF_REG_FP as well (though it's readonly) which causes harmless but unnecssary work during is_state_visited(). Note that do_propagate_liveness() skips FP correctly, so do the same in mark_reg_read() as well. It saves 0.2 seconds for the test below program before after bpf_lb-DLB_L3.o 2604 2304 bpf_lb-DLB_L4.o 11159 3723 bpf_lb-DUNKNOWN.o 1116 1110 bpf_lxc-DDROP_ALL.o 34566 28004 bpf_lxc-DUNKNOWN.o 53267 39026 bpf_netdev.o 17843 16943 bpf_overlay.o 8672 7929 time ~11 sec ~4 sec Fixes: `dc503a8ad9` ("bpf/verifier: track liveness for pruning") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-07 23:25:17 +01:00

... 5 6 7 8 9 ...

708086 commits