redonkable/alistair23-linux

Author	SHA1	Message	Date
Alexey Dobriyan	f31cc7e815	flowcache: make flow_cache_hash_size() return "unsigned int" Hash size can't negative so "unsigned int" is logically correct. Propagate "unsigned int" to loop counters. Space savings: add/remove: 0/0 grow/shrink: 2/2 up/down: 6/-18 (-12) function old new delta flow_cache_flush_tasklet 362 365 +3 __flow_cache_shrink 333 336 +3 flow_cache_cpu_up_prep 178 171 -7 flow_cache_lookup 1159 1148 -11 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 19:04:48 -07:00
Alexey Dobriyan	5a17d9ed9a	flowcache: make flow_key_size() return "unsigned int" Flow keys aren't 4GB+ numbers so 64-bit arithmetic is excessive. Space savings (I'm not sure what CSWTCH is): add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-48 (-48) function old new delta flow_cache_lookup 1163 1159 -4 CSWTCH 75997 75953 -44 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 19:04:48 -07:00
Andrew Lunn	d39004ab13	net/faraday: Add missing include of of.h Breaking the include loop netdevice.h, dsa.h, devlink.h broke this driver, it depends on includes brought in by these headers. Adding linux/of.h fixes it. Fixes: ed0e39e97d34 ("net: break include loop netdevice.h, dsa.h, devlink.h") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 18:58:08 -07:00
Vincent Bernat	f1fb08f633	vxlan: fix ND proxy when skb doesn't have transport header offset When an incoming frame is tagged or when GRO is disabled, the skb handled to vxlan_xmit() doesn't contain a valid transport header offset. This makes ND proxying fail. We combine two changes: replace use of skb_transport_offset() and ensure the necessary amount of skb is linear just before using it: - In vxlan_xmit(), when determining if we have an ICMPv6 neighbor discovery packet, just check if it is an ICMPv6 packet and rely on neigh_reduce() to do more checks if this is the case. The use of pskb_may_pull() is replaced by skb_header_pointer() for just the IPv6 header. - In neigh_reduce(), add pskb_may_pull() for IPv6 header and neighbor discovery message since this was removed from vxlan_xmit(). Replace skb_transport_header() with ipv6_hdr() + 1. - In vxlan_na_create(), replace first skb_transport_offset() with ipv6_hdr() + 1 and second with skb_network_offset() + sizeof(struct ipv6hdr). Additionally, ensure we pskb_may_pull() the whole skb as we need it to iterate over the options. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 18:50:42 -07:00
Xin Long	d229d48d18	sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp Before when implementing sctp prsctp, SCTP_PR_STREAM_STATUS wasn't added, as it needs to save abandoned_(un)sent for every stream. After sctp stream reconf is added in sctp, assoc has structure sctp_stream_out to save per stream info. This patch is to add SCTP_PR_STREAM_STATUS by putting the prsctp per stream statistics into sctp_stream_out. v1->v2: fix an indent issue. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:52:35 -07:00
David S. Miller	cdccf74b95	Merge branch 'hns-misc-fixes' Salil Mehta says: ==================== net: hns: Misc. HNS Bug Fixes & Code Improvements This patch set introduces various HNS bug fixes, optimizations and code improvements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:45 -07:00
Salil	b4957ab082	net: hns: Some checkpatch.pl script & warning fixes This patch fixes some checkpatch.pl script caught errors and warnings during the compilation time. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	820c90cb3e	net: hns: Avoid Hip06 chip TX packet line bug There is a bug on Hip06 that tx ring interrupts packets count will be clear when drivers send data to tx ring, so that the tx packets count will never upgrade to packets line, and cause the interrupts engendered was delayed. Sometimes, it will cause sending performance lower than expected. To fix this bug, we set tx ring interrupts packets line to 1 forever, to avoid count clear. And set the gap time to 20us, to solve the problem that too many interrupts engendered when packets line is 1. This patch could advance the send performance on ARM from 6.6G to 9.37G when an iperf send thread on ARM and an iperf send thread on X86 for XGE. Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Kejian Yan	76b588bc52	net: hns: Adjust the SBM module buffer threshold HNS needs SMB Buffers to store at least two packets after sending pause frame because of the link delay. The MTU of HNS is 9728. As the processor user manual described, the SBM buffer threshold should be modified. Reported-by: Ping Zhang <zhangping5@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Kejian Yan	a2185587ad	net: hns: Simplify the exception sequence in hns_ppe_init() We need to free all ppe submodule if it fails to initialize ppe by any fault, so this patch will free all ppe resource before hns_ppe_init() returns exception situation Reported-by: JinchuanTian <tianjinchuan1@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Kejian Yan	d592a4a4b9	net: hns: Optimise the code in hns_mdio_wait_ready() This patch fixes the code to clear pclint warning/info. Reported-by: Ping Zhang <zhangping5@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Kejian Yan	6961acfa5c	net: hns: Clean redundant code from hns_mdio.c file This patch cleans the redundant code from hns_mdio.c. Reported-by: Ping Zhang <zhangping5@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Kejian Yan	9f1607b8b5	net: hns: Remove redundant mac table operations This patch removes redundant functions used only for debugging purposes. Reported-by: Weiwei Deng <dengweiwei@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Kejian Yan	20f0d4f736	net: hns: Remove redundant mac_get_id() There is a mac_id in mac control block structure, so the callback function mac_get_id() is useless. Here we remove this function. Reported-by: Weiwei Deng <dengweiwei@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Kejian Yan	040a3800aa	net: hns: Remove the redundant adding and deleting mac function The functions (hns_dsaf_set_mac_mc_entry() and hns_mac_del_mac()) are not called by any functions. They are dead code in hns. And the same features are implemented by the patch (the id is `66355f5`). Reported-by: Weiwei Deng <dengweiwei@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	64ec10dc2a	net: hns: Correct HNS RSS key set function This patch fixes below ethtool configuration error: localhost:~ # ethtool -X eth0 hkey XX:XX:XX... Cannot set Rx flow hash configuration: Operation not supported Signed-off-by: lipeng <lipeng321@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	f2aaed557e	net: hns: Replace netif_tx_lock to ring spin lock netif_tx_lock is a global spin lock, it will take affect in all rings in the netdevice. In tx_poll_one process, it can only lock the current ring, in this case, we define a spin lock in hnae_ring struct for it. Signed-off-by: lipeng <lipeng321@huawei.com> reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	b29bd41259	net: hns: Fix to adjust buf_size of ring according to mtu Because buf_size of ring set to 2048, the process of rx_poll_one can reuse the page, therefore the performance of XGE can improve. But the chip only supports three bds in one package, so the max mtu is 6K when it sets to 2048. For better performane in litter mtu, we need change buf_size according to mtu. When user change mtu, hns is only change the desc in memory. There are some desc has been fetched by the chip, these desc can not be changed by the code. So it needs set the port loopback and send some packages to let the chip consumes the wrong desc and fetch new desc. Because the Pv660 do not support rss indirection, we need add version check in mtu change process. Signed-off-by: lipeng <lipeng321@huawei.com> reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	36eedfde1a	net: hns: Optimize hns_nic_common_poll for better performance After polling less than buget packages, we need check again. If there are still some packages, we call napi_schedule add softirq queue, this is not better way. So we return buget value instead of napi_schedule. Signed-off-by: lipeng <lipeng321@huawei.com> reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
Daode Huang	4b7cdecaa4	net: hns: bug fix of ethtool show the speed When run ethtool ethX on hns driver, the speed will show as "Unknown". The base.speed is not correct assigned, this patch fix this bug. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	fb0672d116	net: hns: Remove redundant memset during buffer release Because all members of desc_cb is assigned when xmit one package, so it can delete in hnae_free_buffer, as follows: - "dma, priv, length, type" are assigned in fill_v2_desc. - "page_offset, reuse_flag, buf" are not used in tx direction. Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Weiwei Deng <dengweiwei@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	de99208cc7	net: hns: Optimize the code for GMAC pad and crc Config This patch optimises the init configuration code leg for gmac pad and crc set interface. Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: JinchuanTian <tianjinchuan1@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	87ff7e1f46	net: hns: Modify GMAC init TX threshold value This patch reduces GMAC TX threshold value to avoid gmac hang-up with speed 100M/duplex half. Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: JinchuanTian <tianjinchuan1@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
lipeng	ba2d079131	net: hns: Fix the implementation of irq affinity function This patch fixes the implementation of the IRQ affinity function. This function is used to create the cpu mask which eventually is used to initialize the cpu<->queue association for XPS(Transmit Packet Steering). Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Kejian Yan <yankejian@huawei.com> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-03 14:48:43 -07:00
David S. Miller	d4f4b91582	Merge branch 'rds-minor-bug-fixes' Sowmini Varadhan says: ==================== rds: tcp: couple of minor bug fixes A couple of minor bugfixes that showed up during testing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:41:01 -07:00
Sowmini Varadhan	087d975353	rds: tcp: canonical connection order for all paths with index > 0 The rds_connect_worker() has a bug in the check that enforces the canonical connection order described in the comments of rds_tcp_state_change(). The intention is to make sure that all the multipath connections are always initiated by the smaller IP address via rds_start_mprds. To achieve this, rds_connection_worker should check that cp_index > 0. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:41:00 -07:00
Sowmini Varadhan	e97656d03c	rds: tcp: allow progress of rds_conn_shutdown if the rds_connection is marked ERROR by an intervening FIN rds_conn_shutdown() runs in workq context, and marks the rds_connection as DISCONNECTING before quiescing Tx/Rx paths. However, after all I/O has quiesced, we may still find the rds_connection state to be RDS_CONN_ERROR if an intervening FIN was processed in softirq context. This is not a fatal error: rds_conn_shutdown() should continue the shutdown, and there is no need to log noisy messages about this event. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:41:00 -07:00
Eric Dumazet	d3fbff306c	sock: correctly test SOCK_TIMESTAMP in sock_recv_ts_and_drops() It seems the code does not match the intent. This broke packetdrill, and probably other programs. Fixes: `6c7c98bad4` ("sock: avoid dirtying sk_stamp, if possible") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:34:55 -07:00
Andrew Morton	e270e96686	drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: fix build with gcc-4.4.4 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_set_rxfh': drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: (near initialization for 'rrp.<anonymous>') drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1068: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: excess elements in struct initializer drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: (near initialization for 'rrp') gcc-4.4.4 has issues with anonymous union initializers. Work around this. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:32:57 -07:00
Andrew Morton	956327913c	drivers/net/ethernet/mellanox/mlx5/core/en_main.c: fix build with gcc-4.4.4 drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2210: error: unknown field 'rqn' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: (near initialization for 'direct_rrp.<anonymous>') drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_channels': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: (near initialization for 'rrp.<anonymous>') drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: initialization makes integer from pointer without a cast drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2228: error: unknown field 'rss' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: excess elements in struct initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: (near initialization for 'rrp') drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_drop': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2238: error: unknown field 'rqn' specified in initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: missing braces around initializer drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: (near initialization for 'drop_rrp.<anonymous>') gcc-4.4.4 has issues with anonymous union initializers. Work around this. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:32:57 -07:00
Joao Pinto	44781fef13	net: stmmac: fix cbs configuration Sending again, because forgot to include net-dev. The QoS IP does not accept AVB capabilities to default/queue 0, this way we guarantee 75% bandwidth for AVB. This patch assures that only queues >= 1 gets CBS confgured. Additional info was also added to stmmac.txt. Reported-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-02 19:30:06 -07:00
David S. Miller	a6fc09dff2	Merge branch 'mpls-more-labels' David Ahern says: ==================== net: mpls: Allow users to configure more labels per route Increase the maximum number of new labels for MPLS routes from 2 to 30. To keep memory consumption in check, the labels array is moved to the end of mpls_nh and mpls_iptunnel_encap structs as a 0-sized array. Allocations use the maximum number of labels across all nexthops in a route for LSR and the number of labels configured for LWT. The mpls_route layout is changed to: +----------------------+ \| mpls_route \| +----------------------+ \| mpls_nh 0 \| +----------------------+ \| alignment padding \| 4 bytes for odd number of labels; 0 for even +----------------------+ \| via[rt_max_alen] 0 \| +----------------------+ \| alignment padding \| via's aligned on sizeof(unsigned long) +----------------------+ \| ... \| Meaning the via follows its mpls_nh providing better locality as the number of labels increases. UDP_RR tests with namespaces shows no impact to a modest performance increase with this layout for 1 or 2 labels and 1 or 2 nexthops. mpls_route allocation size is limited to 4096 bytes allowing on the order of 30 nexthops with 30 labels (or more nexthops with fewer labels). LWT encap shares same maximum number of labels as mpls routing. v3 - initialize n_labels to 0 in case RTA_NEWDST is not defined; detected by the kbuild test robot v2 - updates per Eric's comments + added patch to ensure all reads of rt_nhn_alive and nh_flags in the packet path use READ_ONCE and all writes via event handlers use WRITE_ONCE + limit mpls_route size to 4096 (PAGE_SIZE for most arch) + mostly killed use of MAX_NEW_LABELS; it exists only for common limit between lwt and routing paths ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:45 -07:00
David Ahern	1511009cd6	net: mpls: Increase max number of labels for lwt encap Alow users to push down more labels per MPLS encap. Similar to LSR case, move label array to the end of mpls_iptunnel_encap and allocate based on the number of labels for the route. For consistency with the LSR case, re-use the same maximum number of labels. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	a4ac8c986d	net: mpls: bump maximum number of labels Allow users to push down more labels per MPLS route. With the previous patches, no memory allocations are based on MAX_NEW_LABELS; the limit is only used to keep userspace in check. At this point MAX_NEW_LABELS is only used for mpls_route_config (copying route data from userspace) and processing nexthops looking for the max number of labels across the route spec. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	df1c631648	net: mpls: Limit memory allocation for mpls_route Limit memory allocation size for mpls_route to 4096. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	59b209667a	net: mpls: change mpls_route layout Move labels to the end of mpls_nh as a 0-sized array and within mpls_route move the via for a nexthop after the mpls_nh. The new layout becomes: +----------------------+ \| mpls_route \| +----------------------+ \| mpls_nh 0 \| +----------------------+ \| alignment padding \| 4 bytes for odd number of labels; 0 for even +----------------------+ \| via[rt_max_alen] 0 \| +----------------------+ \| alignment padding \| via's aligned on sizeof(unsigned long) +----------------------+ \| ... \| +----------------------+ \| mpls_nh n-1 \| +----------------------+ \| via[rt_max_alen] n-1 \| +----------------------+ Memory allocated for nexthop + via is constant across all nexthops and their via. It is based on the maximum number of labels across all nexthops and the maximum via length. The size is saved in the mpls_route as rt_nh_size. Accessing a nexthop becomes rt->rt_nh + index * rt->rt_nh_size. The offset of the via address from a nexthop is saved as rt_via_offset so that given an mpls_nh pointer the via for that hop is simply nh + rt->rt_via_offset. With prior code, memory allocated per mpls_route with 1 nexthop: via is an ethernet address - 64 bytes via is an ipv4 address - 64 via is an ipv6 address - 72 With this patch set, memory allocated per mpls_route with 1 nexthop and 1 or 2 labels: via is an ethernet address - 56 bytes via is an ipv4 address - 56 via is an ipv6 address - 64 The 8-byte reduction is due to the previous patch; the change introduced by this patch has no impact on the size of allocations for 1 or 2 labels. Performance impact of this change was examined using network namespaces with veth pairs connecting namespaces. ns0 inserts the packet to the label-switched path using an lwt route with encap mpls. ns1 adds 1 or 2 labels depending on test, ns2 (and ns3 for 2-label test) pops the label and forwards. ns3 (or ns4) for a 2-label is the destination. Similar series of namespaces used for 2-nexthop test. Intent is to measure changes to latency (overhead in manipulating the packet) in the forwarding path. Tests used netperf with UDP_RR. IPv4: current patches 1 label, 1 nexthop 29908 30115 2 label, 1 nexthop 29071 29612 1 label, 2 nexthop 29582 29776 2 label, 2 nexthop 29086 29149 IPv6: current patches 1 label, 1 nexthop 24502 24960 2 label, 1 nexthop 24041 24407 1 label, 2 nexthop 23795 23899 2 label, 2 nexthop 23074 22959 In short, the change has no effect to a modest increase in performance. This is expected since this patch does not really have an impact on routes with 1 or 2 labels (the current limit) and 1 or 2 nexthops. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	77ef013aad	net: mpls: Convert number of nexthops to u8 Number of nexthops and number of alive nexthops are tracked using an unsigned int. A route should never have more than 255 nexthops so convert both to u8. Update all references and intermediate variables to consistently use u8 as well. Shrinks the size of mpls_route from 32 bytes to 24 bytes with a 2-byte hole before the nexthops. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
David Ahern	39eb8cd175	net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE The number of alive nexthops for a route (rt->rt_nhn_alive) and the flags for a next hop (nh->nh_flags) are modified by netdev event handlers. The event handlers run with rtnl_lock held so updates are always done with the lock held. The packet path accesses the fields under the rcu lock. Since those fields can change at any moment in the packet path, both fields should be accessed using READ_ONCE. Updates to both fields should use WRITE_ONCE. Update mpls_select_multipath (packet path) and mpls_ifdown and mpls_ifup (event handlers) accordingly. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:21:44 -07:00
Paolo Abeni	3d8417d79e	udp: use sk_protocol instead of pcflag to detect udplite sockets In the udp_sock struct, the 'forward_deficit' and 'pcflag' fields share the same cacheline. While the first is dirtied by udp_recvmsg, the latter is read, possibly several times, by the bottom half processing to discriminate between udp and udplite sockets. With this patch, sk->sk_protocol is used to check is the socket is really an udplite one, avoiding some cache misses per packet and improving the performance under udp_flood with small packet up to 10%. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:11:36 -07:00
Tobias Regnery	768bfa2a06	net: dsa: fix build error with devlink build as module After commit `96567d5dac` ("net: dsa: dsa2: Add basic support of devlink") I see the following link error with CONFIG_NET_DSA=y and CONFIG_NET_DEVLINK=m: net/built-in.o: In function 'dsa_register_switch': (.text+0xe226b): undefined reference to `devlink_alloc' net/built-in.o: In function 'dsa_register_switch': (.text+0xe2284): undefined reference to `devlink_register' net/built-in.o: In function 'dsa_register_switch': (.text+0xe243e): undefined reference to `devlink_port_register' net/built-in.o: In function 'dsa_register_switch': (.text+0xe24e1): undefined reference to `devlink_port_register' net/built-in.o: In function 'dsa_register_switch': (.text+0xe24fa): undefined reference to `devlink_port_type_eth_set' net/built-in.o: In function 'dsa_dst_unapply.part.8': dsa2.c:(.text.unlikely+0x345): undefined reference to 'devlink_port_unregister' dsa2.c:(.text.unlikely+0x36c): undefined reference to 'devlink_port_unregister' dsa2.c:(.text.unlikely+0x38e): undefined reference to 'devlink_port_unregister' dsa2.c:(.text.unlikely+0x3f2): undefined reference to 'devlink_unregister' dsa2.c:(.text.unlikely+0x3fb): undefined reference to 'devlink_free' Fix this by adding a dependency on MAY_USE_DEVLINK so that CONFIG_NET_DSA get switched to be build as module when CONFIG_NET_DEVLINK=m. Fixes: `96567d5dac` ("net: dsa: dsa2: Add basic support of devlink") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:10:20 -07:00
David S. Miller	88f913f5ee	Merge branch 'phylib-EEE-updates' Russell King says: ==================== phylib EEE updates This series of patches depends on the previous set of changes, and is therefore net-next material. While testing the EEE code, I discovered a number of issues: 1. It is possible to enable advertisment of EEE modes which are not supported by the hardware. We omit to check the supported modes and mask off those modes that are not supported before writing the EEE advertisment register. 2. We need to restart autonegotiation after a change of the EEE advertisment, otherwise the link partner does not see the updated EEE modes. 3. SGMII connected PHYs are also capable of supporting EEE. Through discussion with Florian, it has been decided to remove the check for the PHY interface mode in patch (3). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:04 -07:00
Russell King	32d751412b	net: phy: allow EEE with any interface mode EEE is able to work in any PHY interface mode, there is nothing which fundamentally restricts it to only a few modes. For example, EEE works in SGMII mode with the Marvell 88E1512. Rather than just adding SGMII mode to the list, Florian suggests removing the list of interface modes entirely: It actually sounds like we should just kill the check entirely, it does not appear that any of the interface mode would not fundamentally be able to support EEE, because the "lowest" mode we support is MII, and even there it's quite possible to support EEE. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:03 -07:00
Russell King	f75abeb833	net: phy: restart phy autonegotiation after EEE advertisment change When the EEE advertisment is changed, we should restart autonegotiation to update the link partner with the new EEE settings. Add this trigger but only if the advertisment has changed. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:03 -07:00
Russell King	83ea067fe2	net: phy: avoid setting unsupported EEE advertisments We currently allow userspace to set any EEE advertisments it desires, whether or not the PHY supports them. For example: # ethtool --set-eee eth1 advertise 0xffffffff # ethtool --show-eee eth1 EEE Settings for eth1: EEE status: disabled Tx LPI: disabled Supported EEE link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Advertised EEE link modes: 100baseT/Full 1000baseT/Full 1000baseKX/Full 10000baseT/Full 10000baseKX4/Full 10000baseKR/Full Clearly, this is not sane, we should only allow link modes that are supported to be advertised (as we do elsewhere.) Ensure that we mask the MDIO_AN_EEE_ADV value with the capabilities retrieved from the MDIO_PCS_EEE_ABLE register. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 20:04:03 -07:00
David S. Miller	eefe06e8ce	Merge branch 'bpf-prog-testing-framework' Alexei Starovoitov says: ==================== bpf: program testing framework Development and testing of networking bpf programs is quite cumbersome. Especially tricky are XDP programs that attach to real netdevices and program development feels like working on the car engine while the car is in motion. Another problem is ongoing changes to upstream llvm core that can introduce an optimization that verifier will not recognize. llvm bpf backend tests have no ability to run the programs. To improve this situation introduce BPF_PROG_TEST_RUN command to test and performance benchmark bpf programs. It achieves several goals: - development of xdp and skb based bpf programs can be done in a canned environment with unit tests - program performance optimizations can be benchmarked outside of networking core (without driver and skb costs) - continuous testing of upstream changes is finally practical Patches 4,5,6 add C based test cases of various complexity to cover some sched_cls and xdp features. More tests will be added in the future. The tests were run on centos7 only. For now the framework supports only skb and xdp programs. In the future it can be extended to socket_filter and tracing program types. More details are in individual patches. v1->v2: - rename bpf_program_test_run->bpf_prog_test_run - add missing #include <linux/bpf.h> since libbpf.h shouldn't depend on prior includes - reordered patches 3 and 4 to keep bisect clean ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:58 -07:00
Alexei Starovoitov	3782161362	selftests/bpf: add l4 load balancer test based on sched_cls this l4lb demo is a comprehensive test case for LLVM codegen and kernel verifier. It's using fully inlined jhash(), complex packet parsing and multiple map lookups of different types to stress llvm and verifier. The map sizes, map population and test vectors are artificial to exercise different paths through the bpf program. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	8d48f5e427	selftests/bpf: add a test for basic XDP functionality add C test for xdp_adjust_head(), packet rewrite and map lookups Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	6882804c91	selftests/bpf: add a test for overlapping packet range checks add simple C test case for llvm and verifier range check fix from commit `b1977682a3` ("bpf: improve verifier packet range checks") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	dd26b7f54a	tools/lib/bpf: expose bpf_program__set_type() expose bpf_program__set_type() to set program type Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Alexei Starovoitov	3084887378	tools/lib/bpf: add support for BPF_PROG_TEST_RUN command add support for BPF_PROG_TEST_RUN command to libbpf.a Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00

1 2 3 4 5 ...

663245 commits