redonkable/alistair23-linux

Author	SHA1	Message	Date
David Ahern	a65120bae4	ipv6: Use result arg in fib_lookup_arg consistently arg.result is sometimes used as fib6_result and sometimes used to hold the rt6_info. Add rt6_info to fib6_result and make the use of arg.result consistent through ipv6 rules. The rt6 entry is filled in for lookups returning a dst_entry, but not for direct fib_lookups that just want a fib6_info. Fixes: `effda4dd97` ("ipv6: Pass fib6_result to fib lookups") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 21:53:11 -07:00
David Ahern	b2f97f7de2	ipv6: fib6_rule_action_alt needs to return -EAGAIN fib rule actions should return -EAGAIN for the rules to continue to the next one. A recent change overwrote err with the lookup always returning 0 (future change will make it more like IPv4) which means the rules stopped at the first (e.g., local table lookup only). Catch and reset err to -EAGAIN. Fixes: `effda4dd97` ("ipv6: Pass fib6_result to fib lookups") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 21:52:33 -07:00
David Ahern	7973d9e767	mlxsw: spectrum_router: Prevent ipv6 gateway with v4 route via replace and append mlxsw currently does not support v6 gateways with v4 routes. Commit `19a9d136f1` ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway") prevents a route from being added, but nothing stops the replace or append. Add a catch for them too. $ ip ro add 172.16.2.0/24 via 10.99.1.2 $ ip ro replace 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0 Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported. $ ip ro append 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0 Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:54:26 -07:00
David S. Miller	08308f149b	Merge branch 'Taprio-qdisc-fixes' Andre Guedes says: ==================== Taprio qdisc fixes I'm re-sending this series, now with the "net-next" prefix in the subject. The only change from the previous version is in patch 3. As suggested by Cong Wang, it removes the !entry check within should_restart_cycle() since it is already checked by the caller. As a side effect, that function becomes a dummy wrapper on list_is_last() so we simply remove it and call list_is_last() instead. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:52:33 -07:00
Andre Guedes	6e734c82be	net: sched: taprio: Fix taprio_dequeue() In case we don't have 'guard' or 'budget' to transmit the skb, we should continue traversing the qdisc list since the remaining guard/budget might be enough to transmit a skb from other children qdiscs. Fixes: `5a781ccbd1` (“tc: Add support for configuring the taprio scheduler”) Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:52:32 -07:00
Andre Guedes	2684d1b75f	net: sched: taprio: Fix taprio_peek() While traversing taprio's children qdisc list, if the gate is closed for a given traffic class, we should continue traversing the list since the remaining qdiscs may have skb ready for transmission. This patch also takes this opportunity and changes the function to use the TAPRIO_ALL_GATES_OPEN macro instead of the magic number '-1'. Fixes: `5a781ccbd1` (“tc: Add support for configuring the taprio scheduler”) Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:52:32 -07:00
Andre Guedes	5175aafe71	net: sched: taprio: Remove should_restart_cycle() The 'entry' argument from should_restart_cycle() cannot be NULL since it is already checked by the caller so the WARN_ON() within should_ restart_cycle() could be removed. By doing that, that function becomes a dummy wrapper on list_is_last() so this patch simply gets rid of it and call list_is_last() within advance_sched() instead. Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:52:32 -07:00
Andre Guedes	8599099f0c	net: sched: taprio: Refactor taprio_get_start_time() This patch does a code refactoring to taprio_get_start_time() function to improve readability and report error properly. If 'base' time is later than 'now', the start time is equal to 'base' and taprio_get_start_time() is done. That's the natural case so we move that code to the beginning of the function. Also, if 'cycle' calculation is zero, something went really wrong with taprio and we should log that internal error properly. Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:52:32 -07:00
Andre Guedes	59ab87f6eb	net: sched: taprio: Remove pointless variable assigment This patch removes a pointless variable assigment in taprio_change(). The 'err' variable is not used from this assignment to the next one so this patch removes it. Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:52:32 -07:00
David Ahern	ecc5663cce	net: Change nhc_flags to unsigned char nhc_flags holds the RTNH_F flags for a given nexthop (fib{6}_nh). All of the RTNH_F_ flags fit in an unsigned char, and since the API to userspace (rtnh_flags and lower byte of rtm_flags) is 1 byte it can not grow. Make nhc_flags in fib_nh_common an unsigned char and shrink the size of the struct by 8, from 56 to 48 bytes. Update the flags arguments for up netdevice events and fib_nexthop_info which determines the RTNH_F flags to return on a dump/event. The RTNH_F flags are passed in the lower byte of rtm_flags which is an unsigned int so use a temp variable for the flags to fib_nexthop_info and combine with rtm_flags in the caller. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:44:18 -07:00
David Ahern	ffa8ce54be	lwtunnel: Pass encap and encap type attributes to lwtunnel_fill_encap Currently, lwtunnel_fill_encap hardcodes the encap and encap type attributes as RTA_ENCAP and RTA_ENCAP_TYPE, respectively. The nexthop objects want to re-use this code but the encap attributes passed to userspace as NHA_ENCAP and NHA_ENCAP_TYPE. Since that is the only difference, change lwtunnel_fill_encap to take the attribute type as an input. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 19:42:29 -07:00
Simon Horman	0a5d329ffd	ravb: Avoid unsupported internal delay mode for R-Car E3/D3 According to the R-Car Gen3 Hardware Manual Rev 1.50 of Nov 30, 2018, the TX clock internal delay mode isn't supported on R-Car E3 (r8a77990) or D3 (r8a77995). And by extension it is also not supported by RZ/G2E (r9a774c0). This matches all ES versions of the affected SoCs as it is not clear if this problem will be resolved in newer chips. This can be revisited, as necessary. This patch does not error-out if PHY_INTERFACE_MODE_RGMII_ID or PHY_INTERFACE_MODE_RGMII_TXID are used on SoCs where TX clock delay mode is not supported as there is a risk of introducing a regression when used in conjunction with older DT blobs present in the field. Rather, a warning is logged in such cases. Based on work by Kazuya Mizuguchi. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 18:43:49 -07:00
Stephen Rothwell	c98f4822ed	net: fix sparc64 compilation of sock_gettstamp net/core/sock.c: In function 'sock_gettstamp': net/core/sock.c:3007:23: error: expected '}' before ';' token .tv_sec = ts.tv_sec; ^ net/core/sock.c:3011:4: error: expected ')' before 'return' return -EFAULT; ^~~~~~ net/core/sock.c:3013:2: error: expected expression before '}' token } ^ Fixes: `c7cbdbf29f` ("net: rework SIOCGSTAMP ioctl handling") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 18:35:17 -07:00
Fuqian Huang	0fa4122b2d	isdn:mISDN: fix misuse of %x in hfcpci.c Pointers should be printed with %p or %px rather than cast to (u_long) and printed with %lx. Change %lx to %p to print the pointer. Change %lx to %pad to print the dma_addr_t. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 18:33:30 -07:00
Fuqian Huang	6f9fd97e3a	isdn: hisax: Fix misuse of %x in config.c Pointers should be printed with %p or %px rather than cast to (u_long) type and printed with %lX. As the function seems to be for debug purpose. Change %lX to %px to print the pointer value. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 18:33:30 -07:00
David S. Miller	6b18bdfdba	Merge branch 'ipv6-fib6_ref-conversion-to-refcount_t' Eric Dumazet says: ==================== ipv6: fib6_ref conversion to refcount_t We are chasing use-after-free in IPv6 that could have their origin in fib6_ref 0 -> 1 transitions. This patch series should help finding the root causes if these illegal transitions ever happen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 17:19:48 -07:00
Eric Dumazet	f05713e091	ipv6: convert fib6_ref to refcount_t We suspect some issues involving fib6_ref 0 -> 1 transitions might cause strange syzbot reports. Lets convert fib6_ref to refcount_t to catch them earlier. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Acked-by: Wei Wang <weiwan@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 17:19:48 -07:00
Eric Dumazet	5ea715289a	ipv6: broadly use fib6_info_hold() helper Instead of using atomic_inc(), prefer fib6_info_hold() so that upcoming refcount_t conversion is simpler. Only fib6_info_alloc() is using atomic_set() since we just allocated a new object. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Acked-by: Wei Wang <weiwan@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 17:19:48 -07:00
Eric Dumazet	b027055022	ipv6: fib6_info_destroy_rcu() cleanup We do not need to clear f6i->rt6i_exception_bucket right before freeing f6i. Note that f6i->rt6i_exception_bucket is properly protected by f6i->exception_bucket_flushed being set to one in rt6_flush_exceptions() under the protection of rt6_exception_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Acked-by: Wei Wang <weiwan@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 17:19:48 -07:00
David S. Miller	20eb08b2b0	mlx5-updates-2019-04-22 This series includes updates to mlx5e driver RX data path and some significant XDP RX/TX improvements to overcome/mitigate HW and PCIE bottlenecks. From Tariq: 1) Some Enhancements in rq->flags 2) Stabilize RX packet rate (on Striding RQ) with multiple outstanding UMR posts In this patch, we add support for multiple outstanding UMR posts, to allow faster gap closure between consuming MPWQEs and reposting them back into the WQ. Performance test: As expected, huge improvement in large-scale (48 cores). xdp_redirect_map, 64B UDP multi-stream. Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Before: Unstable, 7 to 30 Mpps After: Stable, at 70.5 Mpps From Shay: 3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: \| \| before \| after \| \| \| 24 rings \| 51Mpps \| 116Mpps \| +126% \| \| 1 ring \| 12Mpps \| 12Mpps \| same \| XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. \| \| before \| after \| \| \| 32 rings \| 64Mpps \| 92Mpps \| +43% \| \| 1 ring \| 6.4Mpps \| 6.4Mpps \| same \| As we can see, feature significantly improves scaling, without hurting single ring performance. From Maxim: 4) Some trivial refactoring and code improvements prior to a larger series to support AF_XDP. -Saeed. -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJcv2LjAAoJEEg/ir3gV/o+90gIAI8+4lwkXZAVk4mxf9PMjxuB bQiKd80e++26sgrNHCyuWZnIzTQqYAnUJ3WRC+Kk1pFTo1O23A+fvweT8m1dqAvP Z/5ktfbAeF3fwOVu7aGu9vh4zJEWJj8oO+I+G+OaOe2iV7FVTTFnWHxiiCfungAW oUnXozq4vERSQLechqqgz6nACxOPgEOCJrp4T9lDYSbqZizHgFttmInMQguq/7KS LvITcNu3EF5l4y2LxwCFiKRgGc2y/belU63AK+2pQUXhH46kQPEHdncdLg5d9QYA xJwthn697qxS0PIP5oHPHNVN+qJXfuUHVonXqVOAJebGQnV82of6+sPweRxwh1s= =MfAR -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-04-22 This series includes updates to mlx5e driver RX data path and some significant XDP RX/TX improvements to overcome/mitigate HW and PCIE bottlenecks. From Tariq: 1) Some Enhancements in rq->flags 2) Stabilize RX packet rate (on Striding RQ) with multiple outstanding UMR posts In this patch, we add support for multiple outstanding UMR posts, to allow faster gap closure between consuming MPWQEs and reposting them back into the WQ. Performance test: As expected, huge improvement in large-scale (48 cores). xdp_redirect_map, 64B UDP multi-stream. Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Before: Unstable, 7 to 30 Mpps After: Stable, at 70.5 Mpps From Shay: 3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: \| \| before \| after \| \| \| 24 rings \| 51Mpps \| 116Mpps \| +126% \| \| 1 ring \| 12Mpps \| 12Mpps \| same \| XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. \| \| before \| after \| \| \| 32 rings \| 64Mpps \| 92Mpps \| +43% \| \| 1 ring \| 6.4Mpps \| 6.4Mpps \| same \| As we can see, feature significantly improves scaling, without hurting single ring performance. From Maxim: 4) Some trivial refactoring and code improvements prior to a larger series to support AF_XDP. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 17:03:40 -07:00
Maxim Mikityanskiy	f8ebecf2e3	net/mlx5e: Use #define for the WQE wait timeout constant Create a #define for the timeout of mlx5e_wait_for_min_rx_wqes to clarify the meaning of a magic number. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy	03ceda6fe1	net/mlx5e: Remove unused rx_page_reuse stat Remove the no longer used page_reuse stat of RQs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy	63d26b490b	net/mlx5e: Take HW interrupt trigger into a function mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and enter the NAPI poll on the right CPU according to the affinity. Use it in mlx5e_activate_rq. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy	10961c5606	net/mlx5e: Remove unused parameter mdev is unused in mlx5e_rx_is_linear_skb. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy	b1b187e102	net/mlx5e: Add an underflow warning comment mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on the requested number of frames in the RQ (F) and the number of packets per WQE (P). It ensures that N is not less than the minimum number of WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min should be true. This function deals with logarithms, so it should check that log(F) - log(P) >= log(N_min). However, if F < P, this expression will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min) instead. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:21 -07:00
Maxim Mikityanskiy	9a22d5d839	net/mlx5e: Move parameter calculation functions to en/params.c This commit moves the parameter calculation functions to a separate file for better modularity and code sharing with future features. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:21 -07:00
Maxim Mikityanskiy	74bbaebf3c	net/mlx5e: Report mlx5e_xdp_set errors If the channels fail to reopen after setting an XDP program, return the error code instead of 0. A proper fix is still needed, as now any error while reopening the channels brings the interface down. This patch only adds error reporting. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:21 -07:00
Maxim Mikityanskiy	83b2fd64ba	net/mlx5e: Remove unused parameter params is unused in mlx5e_init_di_list. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:20 -07:00
Shay Agroskin	c2273219ba	net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. This is added to better utilize the HW resources (which now makes one less packet data prefetch) and allow better scalability, on the account of CPU usage (which now 'memcpy's the packet into the WQE). To load balance between HW and CPU and get max packet rate, we use watermarks to detect how much the HW is congested and move the work loads back and forth between HW and CPU. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: \| \| before \| after \| \| \| 24 rings \| 51Mpps \| 116Mpps \| +126% \| \| 1 ring \| 12Mpps \| 12Mpps \| same \| XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. \| \| before \| after \| \| \| 32 rings \| 64Mpps \| 92Mpps \| +43% \| \| 1 ring \| 6.4Mpps \| 6.4Mpps \| same \| As we can see, feature significantly improves scaling, without hurting single ring performance. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:20 -07:00
Shay Agroskin	73cab880e7	net/mlx5e: XDP, Add TX MPWQE session counter This counter tracks how many TX MPWQE sessions are started in XDP SQ in XDP TX/REDIRECT flow. It counts per-channel and global stats. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:20 -07:00
Tariq Toukan	15143bf51c	net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush The XDP redirect flush indication belongs to the receive queue, not to its XDP send queue. For this, use a new bit on rq->flags. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:19 -07:00
Tariq Toukan	f03590f74c	net/mlx5e: XDP, Fix shifted flag index in RQ bitmap Values in enum mlx5e_rq_flag are used as bit indixes. Intention was to use them with no BIT(i) wrapping. No functional bug fix here, as the same (shifted)flag bit is used for all set, test, and clear operations. Fixes: `121e892754` ("net/mlx5e: Refactor RQ XDP_TX indication") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:19 -07:00
Tariq Toukan	fd9b4be800	net/mlx5e: RX, Support multiple outstanding UMR posts The buffers mapping of the Multi-Packet WQEs (of Striding RQ) is done via UMR posts, one UMR WQE per an RX MPWQE. A single MPWQE is capable of serving many incoming packets, usually larger than the budget of a single napi cycle. Hence, posting a single UMR WQE per napi cycle (and handling its completion in the next cycle) works fine in many common cases, but not always. When an XDP program is loaded, every MPWQE is capable of serving less packets, to satisfy the packet-per-page requirement. Thus, for the same number of packets more MPWQEs (and UMR posts) are needed (twice as much for the default MTU), giving less latency room for the UMR completions. In this patch, we add support for multiple outstanding UMR posts, to allow faster gap closure between consuming MPWQEs and reposting them back into the WQ. For better SW and HW locality, we combine the UMR posts in bulks of (at least) two. This is expected to improve packet rate in high CPU scale. Performance test: As expected, huge improvement in large-scale (48 cores). xdp_redirect_map, 64B UDP multi-stream. Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Before: Unstable, 7 to 30 Mpps After: Stable, at 70.5 Mpps No degradation in other tested scenarios. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-23 12:09:19 -07:00
Saeed Mahameed	3839f99d21	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux	2019-04-23 11:57:33 -07:00
David S. Miller	539b593d39	Merge branch 'net-phy-mscc-Improvements-to-VSC8514-PHY-driver' Kavya Sree Kotagiri says: ==================== net: phy: mscc: Improvements to VSC8514 PHY driver. The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X, can communicate with the MAC via QSGMII. The MAC interface protocol for each port within QSGMII can be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is connecting to supports this functionality. VSC8514 also supports SGMII MAC-side autonegotiation on each individual port, downshifting, can set the blinking pattern of each of its 4 LEDs, SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection. This patch series adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T, QSGMII link with the MAC, downshifting, HP Auto-MDIX detection and blinking pattern for its 4 LEDs. The GPIO register bank is a set of registers that are common to all PHYs in the package. So any modification in any register of this bank affects all PHYs of the package. If the PHYs haven't been reset before booting the Linux kernel and were configured to use interrupts for e.g. link status updates, it is required to clear the interrupts mask register of all PHYs before being able to use interrupts with any PHY. The first PHY of the package that will be init will take care of clearing all PHYs interrupts mask registers. Thus, we need to keep track of the init sequence in the package, if it's already been done or if it's to be done. Most of the init sequence of a PHY of the package is common to all PHYs in the package, thus we use the SMI broadcast feature which enables us to propagate a write in one register of one PHY to all PHYs in the same package. This patch series adds support for VSC8514 in Microsemi driver(mscc.c) and removes support from Vitesse driver(vitesse.c). v8 - mscc: Added appropriate code using phy_modify() in vsc8514_config_init(). v7 - mscc: Handled return values in vsc8514_config_init(). v6 - mscc: Added proper return value in vsc85xx_csr_ctrl_phy_read(). - mscc: Replaced __mdiobus_write and__mdiobus_read with __phy_write and __phy_read resp. - mscc: Replaced register addresses in 8514_config_init() with proper constants. v5 - mscc: Added return error statements for few function calls. - mscc: Added comments in vsc85xx_csr_ctrl_phy_read() and vsc85xx_csr_ctrl_phy_write() v4 - mscc: Removed features settings - mscc: Removed aneg_done settings. v3 - mscc: Used BIT(x) for PHY_MCB_S6G_WRITE and PHY_MCB_S6G_READ instead of hex. - mscc: Replaced magic numbers with proper constants. - mscc: Handled delays and timeouts at appropriate points. - mscc: Added comments/explanation where requested. v2 - mscc: Sorted variable declarations in reverse christmas tree order. v1 - Added 0/2 file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 10:47:58 -07:00
Kavya Sree Kotagiri	edeb207b8a	net: phy: vitesse: Remove support for VSC8514. Add support for VSC8514 in Microsemi driver (mscc.c) with more features. Signed-off-by: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 10:47:58 -07:00
Kavya Sree Kotagiri	e4f9ba642f	net: phy: mscc: add support for VSC8514 PHY. The VSC8514 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X, can communicate with the MAC via QSGMII. The MAC interface protocol for each port within QSGMII can be either 1000BASE-X or SGMII, if the QSGMII MAC that the VSC8514 is connecting to supports this functionality. VSC8514 also supports SGMII MAC-side autonegotiation on each individual port, downshifting, can set the blinking pattern of each of its 4 LEDs, SyncE, 1000BASE-T Ring Resiliency as well as HP Auto-MDIX detection. This adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T, QSGMII link with the MAC, downshifting, HP Auto-MDIX detection and blinking pattern for its 4 LEDs. The GPIO register bank is a set of registers that are common to all PHYs in the package. So any modification in any register of this bank affects all PHYs of the package. If the PHYs haven't been reset before booting the Linux kernel and were configured to use interrupts for e.g. link status updates, it is required to clear the interrupts mask register of all PHYs before being able to use interrupts with any PHY. The first PHY of the package that will be init will take care of clearing all PHYs interrupts mask registers. Thus, we need to keep track of the init sequence in the package, if it's already been done or if it's to be done. Most of the init sequence of a PHY of the package is common to all PHYs in the package, thus we use the SMI broadcast feature which enables us to propagate a write in one register of one PHY to all PHYs in the same package. Signed-off-by: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Co-developed-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 10:47:58 -07:00
Jian Shen	a93f7fe134	net: phy: marvell: add new default led configure for m88e151x The default m88e151x LED configuration is 0x1177, used LED[0] for 1000M link, LED[1] for 100M link, and LED[2] for active. But for some boards, which use LED[0] for link, and LED[1] for active, prefer to be 0x1040. To be compatible with this case, this patch defines a new dev_flag, and set it before connect phy in HNS3 driver. When phy initializing, using the new LED configuration if this dev_flag is set. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-23 10:40:32 -07:00
Florian Fainelli	7e6e185c74	net: systemport: Remove need for DMA descriptor All we do is write the length/status and address bits to a DMA descriptor only to write its contents into on-chip registers right after, eliminate this unnecessary step. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:20:15 -07:00
Ido Schimmel	697cd36cda	bridge: Fix possible use-after-free when deleting bridge port When a bridge port is being deleted, do not dereference it later in br_vlan_port_event() as it can result in a use-after-free [1] if the RCU callback was executed before invoking the function. [1] [ 129.638551] ================================================================== [ 129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd [ 129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483 [ 129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383 [ 129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 129.682484] Call Trace: [ 129.685242] dump_stack+0xa9/0x10e [ 129.689068] print_address_description.cold.2+0x9/0x25e [ 129.694930] kasan_report.cold.3+0x78/0x9d [ 129.704420] br_vlan_port_event+0x53c/0x5fd [ 129.728300] br_device_event+0x2c7/0x7a0 [ 129.741505] notifier_call_chain+0xb5/0x1c0 [ 129.746202] rollback_registered_many+0x895/0xe90 [ 129.793119] unregister_netdevice_many+0x48/0x210 [ 129.803384] rtnl_delete_link+0xe1/0x140 [ 129.815906] rtnl_dellink+0x2a3/0x820 [ 129.844166] rtnetlink_rcv_msg+0x397/0x910 [ 129.868517] netlink_rcv_skb+0x137/0x3a0 [ 129.882013] netlink_unicast+0x49b/0x660 [ 129.900019] netlink_sendmsg+0x755/0xc90 [ 129.915758] ___sys_sendmsg+0x761/0x8e0 [ 129.966315] __sys_sendmsg+0xf0/0x1c0 [ 129.988918] do_syscall_64+0xa4/0x470 [ 129.993032] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 129.998696] RIP: 0033:0x7ff578104b58 ... [ 130.073811] Allocated by task 479: [ 130.077633] __kasan_kmalloc.constprop.5+0xc1/0xd0 [ 130.083008] kmem_cache_alloc_trace+0x152/0x320 [ 130.088090] br_add_if+0x39c/0x1580 [ 130.092005] do_set_master+0x1aa/0x210 [ 130.096211] do_setlink+0x985/0x3100 [ 130.100224] __rtnl_newlink+0xc52/0x1380 [ 130.104625] rtnl_newlink+0x6b/0xa0 [ 130.108541] rtnetlink_rcv_msg+0x397/0x910 [ 130.113136] netlink_rcv_skb+0x137/0x3a0 [ 130.117538] netlink_unicast+0x49b/0x660 [ 130.121939] netlink_sendmsg+0x755/0xc90 [ 130.126340] ___sys_sendmsg+0x761/0x8e0 [ 130.130645] __sys_sendmsg+0xf0/0x1c0 [ 130.134753] do_syscall_64+0xa4/0x470 [ 130.138864] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 130.146195] Freed by task 0: [ 130.149421] __kasan_slab_free+0x125/0x170 [ 130.154016] kfree+0xf3/0x310 [ 130.157349] kobject_put+0x1a8/0x4c0 [ 130.161363] rcu_core+0x859/0x19b0 [ 130.165175] __do_softirq+0x250/0xa26 [ 130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8 which belongs to the cache kmalloc-1k of size 1024 [ 130.184972] The buggy address is located 0 bytes inside of 1024-byte region [ffff8881e4aa1ae8, ffff8881e4aa1ee8) Fixes: `9c0ec2e718` ("bridge: support binding vlan dev link state to vlan member bridge ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Cc: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:17:47 -07:00
Crag.Wang	a6cbcb7793	r8152: sync sa_family with the media type of network device Without this patch the socket address family sporadically gets wrong value ends up the dev_set_mac_address() fails to set the desired MAC address. Fixes: `25766271e4` ("r8152: Refresh MAC address during USBDEVFS_RESET") Signed-off-by: Crag.Wang <crag.wang@dell.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-By: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:14:43 -07:00
David S. Miller	6f97955fd2	Merge branch 'mlxsw-Shared-buffer-improvements' Ido Schimmel says: ==================== mlxsw: Shared buffer improvements This patchset includes two improvements with regards to shared buffer configuration in mlxsw. The first part of this patchset forbids the user from performing illegal shared buffer configuration that can result in unnecessary packet loss. In order to better communicate these configuration failures to the user, extack is propagated from devlink towards drivers. This is done in patches #1-#8. The second part of the patchset deals with the shared buffer configuration of the CPU port. When a packet is trapped by the device, it is sent across the PCI bus to the attached host CPU. From the device's perspective, it is as if the packet is transmitted through the CPU port. While testing traffic directed at the CPU it became apparent that for certain packet sizes and certain burst sizes, the current shared buffer configuration of the CPU port is inadequate and results in packet drops. The configuration is adjusted by patches #9-#14 that create two new pools - ingress & egress - which are dedicated for CPU traffic. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:33 -07:00
Ido Schimmel	7a1ff9f45b	mlxsw: spectrum_buffers: Adjust CPU port shared buffer egress quotas Switch the CPU port to use the new dedicated egress pool instead the previously used egress pool which was shared with normal front panel ports. Add per-port quotas for the amount of traffic that can be buffered for the CPU port and also adjust the per-{port, TC} quotas. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:33 -07:00
Ido Schimmel	6d28725c4d	mlxsw: spectrum_buffers: Allow skipping ingress port quota configuration The CPU port is used to transmit traffic that is trapped to the host CPU. It is therefore irrelevant to define ingress quota for it. Add a 'skip_ingress' argument to the function tasked with configuring per-port quotas, so that ingress quotas could be skipped in case the passed local port is the CPU port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:33 -07:00
Ido Schimmel	24a7cc1ef6	mlxsw: spectrum_buffers: Split business logic from mlxsw_sp_port_sb_pms_init() The function is used to set the per-port shared buffer quotas. Currently, these quotas are only set for front panel ports, but a subsequent patch will configure these quotas for the CPU port as well. The configuration required for the CPU port is a bit different than that of the front panel ports, so split the business logic into a separate function which will be called with different parameters for the CPU port. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:33 -07:00
Ido Schimmel	50b5b90514	mlxsw: spectrum_buffers: Use new CPU ingress pool for control packets Use the new ingress pool that was added in the previous patch for control packets (e.g., STP, LACP) that are trapped to the CPU. The previous management pool is no longer necessary and therefore its size is set to 0. The maximum quota for traffic towards the CPU is increased to 50% of the free space in the new ingress pool and therefore the reserved space is reduced by half, to 10KB - in both the shared and headroom buffer. This allows for more efficient utilization of the shared buffer as reserved space cannot be used for other purposes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:33 -07:00
Ido Schimmel	265c49b4b9	mlxsw: spectrum_buffers: Add pools for CPU traffic Packets that are trapped to the CPU are transmitted through the CPU port to the attached host. The CPU port is therefore like any other port and needs to have shared buffer configuration. The maximum quotas configured for the CPU are provided using dynamic threshold and cannot be changed by the user. In order to make sure that these thresholds are always valid, the configuration of the threshold type of these pools is forbidden. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:32 -07:00
Ido Schimmel	857f138f04	mlxsw: spectrum_buffers: Remove assumption about pool order The code currently assumes that ingress pools have lower indices than egress pools. This makes it impossible to add more ingress pools without breaking user configuration that relies on a certain pool index to correspond to an egress pool. Remove such assumptions from the code, so that more ingress pools could be added by subsequent patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:32 -07:00
Ido Schimmel	f1aaeacdae	mlxsw: spectrum_buffers: Forbid changing multicast TCs' attributes Commit `e83c045e53` ("mlxsw: spectrum_buffers: Configure MC pool") configured the threshold of the multicast TCs as infinite so that the admission of multicast packets is only depended on per-switch priority threshold. Forbid the user from changing the thresholds of these multicast TCs and their binding to a different pool. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:32 -07:00
Ido Schimmel	51e15a4978	mlxsw: spectrum_buffers: Forbid changing threshold type of first egress pool Multicast packets have three egress quotas: * Per egress port * Per egress port and traffic class * Per switch priority The limits on the switch priority are not exposed to the user and specified as dynamic threshold on the first egress pool. Forbid changing the threshold type of the first egress pool so that these limits are always valid. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 22:09:32 -07:00

1 2 3 4 5 ...

827384 commits