redonkable/alistair23-linux

Author	SHA1	Message	Date
Fuyun Liang	5bad95a1e5	net: hns3: fix for changing MTU when changing MTU, The new MTU must need to be set to netdevice. Fixes: `a8e8b7ff35` ("net: hns3: Add support to change MTU in HNS3 hardware") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:19 -05:00
Fuyun Liang	2866ccb2b8	net: hns3: fix for setting MTU When setting MTU, actually what we do is configuring the max frame size for the hardware. ETH_HLEN、ETH_FCS_LEN and VLAN_HLEN must need to be considered. And the frame size which is less than the default value should not be set to the hardware. Because in the hardware, the the max frame size not only controls the RX packet size, but also controls the TX packet size. the RX packets whose size are greater than the setting value will be dropped. This patch fixes the bug setting a error max frame size to hardware. Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:19 -05:00
Fuyun Liang	40173a2ec7	net: hns3: fix for updating fc_mode_last_time commit a9c782822166 ("net: hns3: add support for set_pauseparam") adds set_pauseparam support for ethtool cmd, but forgets to update fc_mode_last_time when PFC mode is disabled in hclge_cfg_pauseparam(). The wrong fc_mode_last_time will be used to update flow control mode when lldpad has been running. As a result, when using the ethtool command "-a", user will get a wrong pause parameter. This patch adds the fc_mode_last_time update when PFC mode is disabled. Fixes: a9c782822166 ("net: hns3: add support for set_pauseparam") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	cf72fa6316	net: hns3: Fix a response data read error of tqp statistics query The result of tqp statistics query was read with an error position, fix it according to the user manual. Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	8491000754	net: hns3: Add packet statistics of netdev Add packet statistics of netdev for ethtool -S, in order to show the statistics data for current net device. Remove update_stats() calling because it has been completed in hns3_get_netdev_stats(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	b59f558c6a	net: hns3: Remove a useless member of struct hns3_stats The member "stats_size" of struct hns3_stats is useless, remove it and fix the macro definition which has uses this struct. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	57ffee737b	net: hns3: Fix an error macro definition of HNS3_TQP_STAT The member "stats_offset" was designed to indicate the offset of each member of struct ring_stats in struct hns3_enet_ring, but forgot to add the offset of the member in struct ring_stats. Fixes: `496d03e960` ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	94bfaafac9	net: hns3: Fix a loop index error of tqp statistics query An error loop index was used while querying statistics data of tqps, which may cause call trace. Fixes: `496d03e960` ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	d2a5dca840	net: hns3: Fix an error of total drop packet statistics The dropped tx/rx packets number of each tqp should also be counted into the total drop tx/rx packets numbers. Fixes: `76ad4f0ee7` ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	b875cc379d	net: hns3: Mask the packet statistics query when NIC is down Update the HNS3_NIC_STATE_DOWN bit when NIC state changes. When NIC is down, mask the packet statistics for querying with ifconfig command. It's a common practice. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:18 -05:00
Jian Shen	c5f654805c	net: hns3: Modify the update period of packet statistics It takes more than 200 query response messages between driver and IMP, while updating the packet statistics. It's too heavy for IMP to update it per second. Extend the update period of packet statistics data from 1 second to 300 seconds(if too long, the statistics may overflow). As a result, we need to update it while querying with ifconfig tool to keep the statistics data fresh. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:17 -05:00
Jian Shen	7ea5cbdc66	net: hns3: Remove repeat statistic of rx_errors The igu_rx_err_pkt indicates the same error with mac_rx_fcs_err_pkt_num, so remove it. Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:17 -05:00
Jian Shen	200a88c69d	net: hns3: Fix spelling errors Fix spelling error "overrsize" --> "oversize". Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:17 -05:00
Jian Shen	a6c51c2608	net: hns3: Unify the strings display of packet statistics Some members of packet statistics are named in different styles. This patch unifies them with new internal name rules, the main modification are below: trans --> tx rcv --> rx rcb_q%d_tx --> txq#%d rcb_q%d_rx --> rxq#%d sw_err_cnt(tx side) --> tx_dropped sw_err_cnt(rx side) --> rx_dropped pkts --> packets tx_err_cnt --> errors rx_err_cnt --> errors Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:17 -05:00
Jian Shen	30ba2ab940	net: hns3: Disable VFs change rxvlan offload status Rxvlan offload status can only be changed by PF. Initialize the value of NETIF_F_HW_VLAN_CTAG_RX bit of hw_features for VFS to false, make sure user can't be able to change it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:17 -05:00
Jian Shen	391b5e9356	net: hns3: Add ethtool interface for vlan filter This patch adds vlan filter enable switch to support ethtool -K ethX rx-vlan-filter on/off. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 14:06:17 -05:00
David S. Miller	b60f699ee8	Merge branch 'net-qualcomm-rmnet-Enable-csum-offloads' Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Enable csum offloads This series introduces the MAPv4 packet format for checksum offload plus some other minor changes. Patches 1-3 are cleanups. Patch 4 renames the ingress format to data format so that all data formats can be configured using this going forward. Patch 5 uses the pacing helper to improve TCP transmit performance. Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX. A new header and trailer format are used as part of MAPv4. For RX checksum offload, only the 1's complement of the IP payload portion is computed by hardware. The meta data from RX header is used to verify the checksum field in the packet. Note that the IP packet and its field itself is not modified by hardware. This gives metadata to help with the RX checksum. For TX, the required metadata is filled up so hardware can compute the checksum. Patch 10 enables GSO on rmnet devices v1->v2: Fix sparse errors reported by kbuild test robot v2->v3: Update the commit message for Patch 5 based on Eric's comments ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:50 -05:00
Subash Abhinov Kasiviswanathan	0c9214d5ed	net: qualcomm: rmnet: Add support for GSO Real devices may support scatter gather(SG), so enable SG on rmnet devices to use GSO. GSO reduces CPU cycles by 20% for a rate of 146Mpbs for a single stream TCP connection. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:50 -05:00
Subash Abhinov Kasiviswanathan	5eb5f8608e	net: qualcomm: rmnet: Add support for TX checksum offload TX checksum offload applies to TCP / UDP packets which are not fragmented using the MAPv4 checksum trailer. The following needs to be done to have checksum computed in hardware - 1. Set the checksum start offset and inset offset. 2. Set the csum_enabled bit 3. Compute and set 1's complement of partial checksum field in transport header. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:49 -05:00
Subash Abhinov Kasiviswanathan	23c76eb740	net: qualcomm: rmnet: Handle command packets with checksum trailer When using the MAPv4 packet format in conjunction with MAP commands, a dummy DL checksum trailer will be appended to the packet. Before this packet is sent out as an ACK, the DL checksum trailer needs to be removed. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:49 -05:00
Subash Abhinov Kasiviswanathan	bbd21b247c	net: qualcomm: rmnet: Add support for RX checksum offload When using the MAPv4 packet format, receive checksum offload can be enabled in hardware. The checksum computation over pseudo header is not offloaded but the rest of the checksum computation over the payload is offloaded. This applies only for TCP / UDP packets which are not fragmented. rmnet validates the TCP/UDP checksum for the packet using the checksum from the checksum trailer added to the packet by hardware. The validation performed is as following - 1. Perform 1's complement over the checksum value from the trailer 2. Compute 1's complement checksum over IPv4 / IPv6 header and subtracts it from the value from step 1 3. Computes 1's complement checksum over IPv4 / IPv6 pseudo header and adds it to the value from step 2 4. Subtracts the checksum value from the TCP / UDP header from the value from step 3. 5. Compares the value from step 4 to the checksum value from the TCP / UDP header. 6. If the comparison in step 5 succeeds, CHECKSUM_UNNECESSARY is set and the packet is passed on to network stack. If there is a failure, then the packet is passed on as such without modifying the ip_summed field. The checksum field is also checked for UDP checksum 0 as per RFC 768 and for unexpected TCP checksum of 0. If checksum offload is disabled when using MAPv4 packet format in receive path, the packet is queued as is to network stack without the validations above. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:49 -05:00
Subash Abhinov Kasiviswanathan	c597897b08	net: qualcomm: rmnet: Define the MAPv4 packet formats The MAPv4 packet format adds support for RX / TX checksum offload. For a bi-directional UDP stream at a rate of 570 / 146 Mbps, roughly 10% CPU cycles are saved. For receive path, there is a checksum trailer appended to the end of the MAP packet. The valid field indicates if hardware has computed the checksum. csum_start_offset indicates the offset from the start of the IP header from which hardware has computed checksum. csum_length is the number of bytes over which the checksum was computed and the resulting value is csum_value. In the transmit path, a header is appended between the end of the MAP header and the start of the IP packet. csum_start_offset is the offset in bytes from which hardware will compute the checksum if the csum_enabled bit is set. udp_ip4_ind indicates if the checksum value of 0 is valid or not. csum_insert_offset is the offset from the csum_start_offset where hardware will insert the computed checksum. The use of this additional packet format for checksum offload is explained in subsequent patches. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:49 -05:00
Subash Abhinov Kasiviswanathan	4e8683a95c	net: qualcomm: rmnet: Set pacing shift The real device over which the rmnet devices are installed also aggregate multiple IP packets and sends them as a single large aggregate frame to the hardware. This causes degraded throughput for TCP TX due to bufferbloat. To overcome this problem, pacing shift value of 8 is set using the sk_pacing_shift_update() helper. This value was determined based on experiments with a single stream TCP TX using iperf for a duration of 30s. Pacing shift \| Observed data rate (Mbps) 10 \| 9 9 \| 140 8 \| 146 (Max link rate) Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:49 -05:00
Subash Abhinov Kasiviswanathan	b23e722ed6	net: qualcomm: rmnet: Rename ingress data format to data format This is done so that we can use this field for both ingress and egress flags. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:49 -05:00
Subash Abhinov Kasiviswanathan	76e08955d5	net: qualcomm: rmnet: Remove unused function declaration rmnet_map_demultiplex() is only declared but not defined anywhere, so remove it. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:48 -05:00
Subash Abhinov Kasiviswanathan	0b59a2340e	net: qualcomm: rmnet: Remove invalid condition while stamping mux id rmnet devices cannot have a mux id of 255. This is validated when assigning the mux id to the rmnet devices. As a result, checking for mux id 255 does not apply in egress path. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:48 -05:00
Subash Abhinov Kasiviswanathan	4b5ba67745	net: qualcomm: rmnet: Remove redundant check when stamping map header We already check the headroom once in rmnet_map_egress_handler(), so this is not needed. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-08 13:58:48 -05:00
David S. Miller	f66faae2f8	Merge branch 'ipv6-ipv4-nexthop-align' Ido Schimmel says: ==================== ipv6: Align nexthop behaviour with IPv4 This set tries to eliminate some differences between IPv4's and IPv6's treatment of nexthops. These differences are most likely a side effect of IPv6's data structures (specifically 'rt6_info') that incorporate both the route and the nexthop and the late addition of ECMP support in commit `51ebd31815` ("ipv6: add support of equal cost multipath (ECMP)"). IPv4 and IPv6 do not react the same to certain netdev events. For example, upon carrier change affected IPv4 nexthops are marked using the RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly. IPv6 on the other hand, does nothing which forces us to perform a carrier check during route lookup and dump. This makes it difficult to introduce features such as non-equal-cost multipath that are built on top of this set [1]. In addition, when a netdev is put administratively down IPv4 nexthops are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all the routes using these nexthops. To be consistent with IPv4, multipath routes should only be flushed when all nexthops in the group are considered dead. The first 12 patches introduce non-functional changes that store the RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev events, in a similar fashion to IPv4. This allows us to remove the carrier check performed during route lookup and dump. The next three patches make sure we only flush a multipath route when all of its nexthops are dead. Last three patches add test cases for IPv4/IPv6 FIB. These verify that both address families react similarly to netdev events. Finally, this series also serves as a good first step towards David Ahern's goal of treating nexthops as standalone objects [2], as it makes the code more in line with IPv4 where the nexthop and the nexthop group are separate objects from the route itself. 1. https://github.com/idosch/linux/tree/ipv6-nexthops 2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf Changes since RFC (feedback from David Ahern): * Remove redundant declaration of rt6_ifdown() in patch 4 and adjust comment referencing it accordingly * Drop patch to flush multipath routes upon NETDEV_UNREGISTER. Reword cover letter accordingly * Use a temporary variable to make code more readable in patch 15 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:41 -05:00
Ido Schimmel	82e45b6fd2	selftests: fib_tests: Add test cases for netdev carrier change Check that IPv4 and IPv6 react the same when the carrier of a netdev is toggled. Local routes should not be affected by this, whereas unicast routes should. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:41 -05:00
Ido Schimmel	5adb7683b4	selftests: fib_tests: Add test cases for netdev down Check that IPv4 and IPv6 react the same when a netdev is being put administratively down. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:41 -05:00
Ido Schimmel	607bd2e502	selftests: fib_tests: Add test cases for IPv4/IPv6 FIB Add test cases to check that IPv4 and IPv6 react to a netdev being unregistered as expected. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:41 -05:00
Ido Schimmel	1de178edc7	ipv6: Flush multipath routes when all siblings are dead By default, IPv6 deletes nexthops from a multipath route when the nexthop device is put administratively down. This differs from IPv4 where the nexthops are kept, but marked with the RTNH_F_DEAD flag. A multipath route is flushed when all of its nexthops become dead. Align IPv6 with IPv4 and have it conform to the same guidelines. In case the multipath route needs to be flushed, its siblings are flushed one by one. Otherwise, the nexthops are marked with the appropriate flags and the tree walker is instructed to skip all the siblings. As explained in previous patches, care is taken to update the sernum of the affected tree nodes, so as to prevent the use of wrong dst entries. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:41 -05:00
Ido Schimmel	922c2ac82e	ipv6: Take table lock outside of sernum update function The next patch is going to allow dead routes to remain in the FIB tree in certain situations. When this happens we need to be sure to bump the sernum of the nodes where these are stored so that potential copies cached in sockets are invalidated. The function that performs this update assumes the table lock is not taken when it is invoked, but that will not be the case when it is invoked by the tree walker. Have the function assume the lock is taken and make the single caller take the lock itself. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:41 -05:00
Ido Schimmel	4a8e56ee2c	ipv6: Export sernum update function We are going to allow dead routes to stay in the FIB tree (e.g., when they are part of a multipath route, directly connected route with no carrier) and revive them when their nexthop device gains carrier or when it is put administratively up. This is equivalent to the addition of the route to the FIB tree and we should therefore take care of updating the sernum of all the parent nodes of the node where the route is stored. Otherwise, we risk sockets caching and using sub-optimal dst entries. Export the function that performs the above, so that it could be invoked from fib6_ifup() later on. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	b5cb5a755b	ipv6: Teach tree walker to skip multipath routes As explained in previous patch, fib6_ifdown() needs to consider the state of all the sibling routes when a multipath route is traversed. This is done by evaluating all the siblings when the first sibling in a multipath route is traversed. If the multipath route does not need to be flushed (e.g., not all siblings are dead), then we should just skip the multipath route as our work is done. Have the tree walker jump to the last sibling when it is determined that the multipath route needs to be skipped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	a2c554d3f8	ipv6: Add explicit flush indication to routes When routes that are a part of a multipath route are evaluated by fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events the state of their sibling routes is not considered. This will change in subsequent patches in order to align IPv6 with IPv4's behavior. For example, when the last sibling in a multipath route becomes dead, the entire multipath route needs to be removed. To prevent the tree walker from re-evaluating all the sibling routes each time, we can simply evaluate them once - when the first sibling is traversed. If we determine the entire multipath route needs to be removed, then the 'should_flush' bit is set in all the siblings, which will cause the walker to flush them when it traverses them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	f9d882ea57	ipv6: Report dead flag during route dump Up until now the RTNH_F_DEAD flag was only reported in route dump when the 'ignore_routes_with_linkdown' sysctl was set. This is expected as dead routes were flushed otherwise. The reliance on this sysctl is going to be removed, so we need to report the flag regardless of the sysctl's value. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	8067bb8c1d	ipv6: Ignore dead routes during lookup Currently, dead routes are only present in the routing tables in case the 'ignore_routes_with_linkdown' sysctl is set. Otherwise, they are flushed. Subsequent patches are going to remove the reliance on this sysctl and make IPv6 more consistent with IPv4. Before this is done, we need to make sure dead routes are skipped during route lookup, so as to not cause packet loss. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	44c9f2f206	ipv6: Check nexthop flags in route dump instead of carrier Similar to previous patch, there is no need to check for the carrier of the nexthop device when dumping the route and we can instead check for the presence of the RTNH_F_LINKDOWN flag. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	14c5206c2d	ipv6: Check nexthop flags during route lookup instead of carrier Now that the RTNH_F_LINKDOWN flag is set in nexthops, we can avoid the need to dereference the nexthop device and check its carrier and instead check for the presence of the flag. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	5609b80a37	ipv6: Set nexthop flags during route creation It is valid to install routes with a nexthop device that does not have a carrier, so we need to make sure they're marked accordingly. As explained in the previous patch, host and anycast routes are never marked with the 'linkdown' flag. Note that reject routes are unaffected, as these use the loopback device which always has a carrier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	27c6fa73f9	ipv6: Set nexthop flags upon carrier change Similar to IPv4, when the carrier of a netdev changes we should toggle the 'linkdown' flag on all the nexthops using it as their nexthop device. This will later allow us to test for the presence of this flag during route lookup and dump. Up until commit `4832c30d54` ("net: ipv6: put host and anycast routes on device with address") host and anycast routes used the loopback netdev as their nexthop device and thus were not marked with the 'linkdown' flag. The patch preserves this behavior and allows one to ping the local address even when the nexthop device does not have a carrier and the 'ignore_routes_with_linkdown' sysctl is set. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	4c981e28d3	ipv6: Prepare to handle multiple netdev events To make IPv6 more in line with IPv4 we need to be able to respond differently to different netdev events. For example, when a netdev is unregistered all the routes using it as their nexthop device should be flushed, whereas when the netdev's carrier changes only the 'linkdown' flag should be toggled. Currently, this is not possible, as the function that traverses the routing tables is not aware of the triggering event. Propagate the triggering event down, so that it could be used in later patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:40 -05:00
Ido Schimmel	2127d95aef	ipv6: Clear nexthop flags upon netdev up Previous patch marked nexthops with the 'dead' and 'linkdown' flags. Clear these flags when the netdev comes back up. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:39 -05:00
Ido Schimmel	2b2413610e	ipv6: Mark dead nexthops with appropriate flags When a netdev is put administratively down or unregistered all the nexthops using it as their nexthop device should be marked with the 'dead' and 'linkdown' flags. Currently, when a route is dumped its nexthop device is tested and the flags are set accordingly. A similar check is performed during route lookup. Instead, we can simply mark the nexthops based on netdev events and avoid checking the netdev's state during route dump and lookup. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:39 -05:00
Ido Schimmel	9fcb0714dc	ipv6: Remove redundant route flushing during namespace dismantle By the time fib6_net_exit() is executed all the netdevs in the namespace have been either unregistered or pushed back to the default namespace. That is because pernet subsys operations are always ordered before pernet device operations and therefore invoked after them during namespace dismantle. Thus, all the routing tables in the namespace are empty by the time fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed. This allows us to simplify the condition in fib6_ifdown() as it's only ever called with an actual netdev. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:29:39 -05:00
David S. Miller	7f0b800048	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-01-07 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add a start of a framework for extending struct xdp_buff without having the overhead of populating every data at runtime. Idea is to have a new per-queue struct xdp_rxq_info that holds read mostly data (currently that is, queue number and a pointer to the corresponding netdev) which is set up during rxqueue config time. When a XDP program is invoked, struct xdp_buff holds a pointer to struct xdp_rxq_info that the BPF program can then walk. The user facing BPF program that uses struct xdp_md for context can use these members directly, and the verifier rewrites context access transparently by walking the xdp_rxq_info and net_device pointers to load the data, from Jesper. 2) Redo the reporting of offload device information to user space such that it works in combination with network namespaces. The latter is reported through a device/inode tuple as similarly done in other subsystems as well (e.g. perf) in order to identify the namespace. For this to work, ns_get_path() has been generalized such that the namespace can be retrieved not only from a specific task (perf case), but also from a callback where we deduce the netns (ns_common) from a netdevice. bpftool support using the new uapi info and extensive test cases for test_offload.py in BPF selftests have been added as well, from Jakub. 3) Add two bpftool improvements: i) properly report the bpftool version such that it corresponds to the version from the kernel source tree. So pick the right linux/version.h from the source tree instead of the installed one. ii) fix bpftool and also bpf_jit_disasm build with bintutils >= 2.9. The reason for the build breakage is that binutils library changed the function signature to select the disassembler. Given this is needed in multiple tools, add a proper feature detection to the tools/build/features infrastructure, from Roman. 4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the stacktrace map. It is currently unimplemented, but there are use cases where user space needs to walk all stacktrace map entries e.g. for dumping or deleting map entries w/o having to close and recreate the map. Add BPF selftests along with it, from Yonghong. 5) Few follow-up cleanups for the bpftool cgroup code: i) rename the cgroup 'list' command into 'show' as we have it for other subcommands as well, ii) then alias the 'show' command such that 'list' is accepted which is also common practice in iproute2, and iii) remove couple of newlines from error messages using p_err(), from Jakub. 6) Two follow-up cleanups to sockmap code: i) remove the unused bpf_compute_data_end_sk_skb() function and ii) only build the sockmap infrastructure when CONFIG_INET is enabled since it's only aware of TCP sockets at this time, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-07 21:26:31 -05:00
Daniel Borkmann	9be99badee	Merge branch 'bpf-stacktrace-map-next-key-support' Yonghong Song says: ==================== The patch set implements bpf syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map. Patch #1 is the core implementation and Patch #2 implements a bpf test at tools/testing/selftests/bpf directory. Please see individual patch comments for details. Changelog: v1 -> v2: - For invalid key (key pointer is non-NULL), sets next_key to be the first valid key. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-01-06 23:52:23 +01:00
Yonghong Song	3ced9b6002	tools/bpf: add a bpf selftest for stacktrace Added a bpf selftest in test_progs at tools directory for stacktrace. The test will populate a hashtable map and a stacktrace map at the same time with the same key, stackid. The user space will compare both maps, using BPF_MAP_LOOKUP_ELEM command and BPF_MAP_GET_NEXT_KEY command, to ensure that both have the same set of keys. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-01-06 23:52:23 +01:00
Yonghong Song	16f07c551e	bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not supported for stacktrace map. However, there are use cases where user space wants to enumerate all stacktrace map entries where BPF_MAP_GET_NEXT_KEY command will be really helpful. In addition, if user space wants to delete all map entries in order to save memory and does not want to close the map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve performance if map entries are sparsely populated. The implementation has similar behavior for BPF_MAP_GET_NEXT_KEY implementation in hashtab. If user provides a NULL key pointer or an invalid key, the first key is returned. Otherwise, the first valid key after the input parameter "key" is returned, or -ENOENT if no valid key can be found. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-01-06 23:52:22 +01:00

1 2 3 4 5 ...

724526 commits