redonkable/alistair23-linux

Author	SHA1	Message	Date
R. Parameswaran	113c307593	New kernel function to get IP overhead on a socket. A new function, kernel_sock_ip_overhead(), is provided to calculate the cumulative overhead imposed by the IP Header and IP options, if any, on a socket's payload. The new function returns an overhead of zero for sockets that do not belong to the IPv4 or IPv6 address families. This is used in the L2TP code path to compute the total outer IP overhead on the L2TP tunnel socket when calculating the default MTU for Ethernet pseudowires. Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:43:31 -07:00
Kees Cook	129858fa0b	net: ethernet: wiznet: avoid format string exposure While unlikely, this makes sure any format strings in the device name can't exposure information via the resulting workqueue name. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:38:11 -07:00
Kees Cook	df656bf6fb	qlge: avoid format string exposure in workqueue While unlikely, this makes sure the workqueue name won't be processed as a format string. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:37:21 -07:00
Mintz, Yuval	2f78227874	qed: Correct MSI-x for storage When qedr is enabled, qed would try dividing the msi-x vectors between L2 and RoCE, starting with L2 and providing it with sufficient vectors for its queues. Problem is qed would also do that for storage partitions, and as those don't need queues it would lead qed to award those partitions with 0 msi-x vectors, causing them to believe theye're using INTa and preventing them from operating. Fixes: `51ff17251c` ("qed: Add support for RoCE hw init") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:35:27 -07:00
David S. Miller	9e9623abda	Merge branch 'dsa-loop-fixes' Florian Fainelli says: ==================== net: dsa: Mock-up driver couple fixes Thanks to Dan's static checker, a bunch of small issues were found in the code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:33:17 -07:00
Florian Fainelli	d1db799e96	net: dsa: loop: Initialize err in dsa_loop_vlan_dump Dan's static checker reported the following: drivers/net/dsa/dsa_loop.c:223 dsa_loop_port_vlan_dump() error: uninitialized symbol 'err'. which could happen if we do hit the continue statement for each iteration of the loop. Initialize err to 0 here. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `98cd1552ea` ("net: dsa: Mock-up driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:33:17 -07:00
Florian Fainelli	5865ccce7e	net: dsa: loop: Fix uninitialized pvid variable Dan's static analyzer reported the following: drivers/net/dsa/dsa_loop.c:181 dsa_loop_port_vlan_del() error: XXX uninitialized symbol 'pvid'. we were missing the assignment of pvid to ps->vid, so add that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `98cd1552ea` ("net: dsa: Mock-up driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:33:17 -07:00
Or Gerlitz	91e91beff6	net/sched: Removed unused vlan actions definition Commit `c7e2b9689e` "sched: introduce vlan action" added both the UAPI values for the vlan actions (TCA_VLAN_ACT_) and these two in-kernel ones which are not used, remove them. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:28:35 -07:00
Eric Dumazet	75d04aa368	mlx4: trust shinfo->gso_segs mlx4 is the only driver in the tree making a point to recompute shinfo->gso_segs. Lets remove superfluous code. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:27:49 -07:00
Colin Ian King	827d240a23	qed: fix missing break in OOO_LB_TC case There seems to be a missing break on the OOO_LB_TC case, pq_id is being assigned and then re-assigned on the fall through default case and that seems suspect. Detected by CoverityScan, CID#1424402 ("Missing break in switch") Fixes: `b5a9ee7cf3` ("qed: Revise QM cofiguration") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 13:20:26 -07:00
Tobias Regnery	053ee0a703	net/mlx5e: fix build error without CONFIG_SYSFS Commit `9008ae0748` ("net/mlx5e: Minimize mlx5e_{open/close}_locked") copied the calls to netif_set_real_num_{tx,rx}_queues from mlx5e_open_locked to mlx5e_activate_priv_channels and wraps them in an if condition to test for netdev->real_num_{tx,rx}_queues. But netdev->real_num_rx_queues is conditionally compiled in if CONFIG_SYSFS is set. Without CONFIG_SYSFS the build fails: drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_activate_priv_channels': drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2515:12: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'? Fix this by unconditionally call netif_set_real_num{tx,rx}_queues like before commit `9008ae0748`. Fixes: `9008ae0748` ("net/mlx5e: Minimize mlx5e_{open/close}_locked") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:55:36 -07:00
Kees Cook	82fe0d2b44	af_unix: Use designated initializers Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, and the initializer fixes were extracted from grsecurity. In this case, NULL initialize with { } instead of undesignated NULLs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:43:04 -07:00
David S. Miller	540c86f3ed	Merge branch 'ftgmac100-rework-batch-1-Link-and-Interrupts' Benjamin Herrenschmidt says: ==================== ftgmac100: Rework batch 1 - Link & Interrupts This is version 2 of the first batch of updates to the ftgmac100 driver. Essentially: - A few misc cleanups - Fixing link speed & duplex handling (including dealing with an Aspeed requirement to double reset the controller when the speed changes) - And addition of a reset task workqueue which will be used for delaying the re-initialization of the controller - Fixing a number of issues with how interrupts and NAPI are dealt with. Subsequent batches will rework and improve the rx path, the tx path, and add a bunch of features and fixes. Version 2 addresses some review comments to patches 5 and 10 (see version history in the respective emails). ==================== Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:39:07 -07:00
Benjamin Herrenschmidt	10cbd64076	ftgmac100: Rework NAPI & interrupts handling First, don't look at the interrupt status in the poll loop to decide what to poll. It's wrong. If we have run out of budget, we may still have RX packets to unqueue but no more RX interrupt pending. So instead move the code looking at the interrupt status into the interrupt handler where it belongs. That avoids a slow MMIO read in the NAPI fast path. We keep the abnormal interrupts enabled while NAPI is scheduled. While at it, actually do something useful in the "error" cases: On AHB bus error, trigger the new reset task, that's about all we can do. On RX packet fifo or descriptor overflows, we need to restart the MAC after having freed things up. So set a flag that NAPI will see and use to perform that restart after harvesting the RX ring. Finally, we shouldn't complete NAPI if there are still outgoing packets that will need harvesting. Waiting for more interrupts is less efficient than letting NAPI run a while longer while the queue drains. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	9b86785d1e	ftgmac100: Remove useless tests in interrupt handler The interrupt is neither enabled nor registered when the interface isn't running (regardless of whether we use nc-si or not) so the test isn't useful. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	874b55bf62	ftgmac100: Rework MAC reset and init The HW requires a full MAC reset when changing the speed. Additionally the Aspeed documentation spells out that the MAC needs to be reset twice with a 10us interval. We thus move the speed setting and top level reset code into a new ftgmac100_reset_and_config_mac() function which handles both. Move the ring pointers initialization there too in order to reflect the HW change. Also reduce the timeout for the MAC reset as it shouldn't take more than 300 clock cycles according to the doc. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	855944ce1c	ftgmac100: Add a reset task and use it for link changes Link speed changes require a full HW reset. This isn't done properly at the moment. It will involve delays and thus isn't suitable to do from the link poll callback. So let's create a reset_task that we can queue up when the link changes. It will be useful for various cases of error handling as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	da40d9d4b5	ftgmac100: Move the bulk of inits to a separate function The link monitoring and error handling code will have to redo the ring inits and HW setup so move the code out of ftgmac100_open() into a dedicated function. This forces a bit of re-ordering of ftgmac100_open() but nothing dramatic. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	81f1eca663	ftgmac100: Request the interrupt only after HW is reset The interrupt isn't shared, so this will keep it masked until we have the HW in a known sane state. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	b8dbecff9b	ftgmac100: Move napi_add/del to open/close Rather than probe/remove Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	87d18757ec	ftgmac100: Split ring alloc, init and rx buffer alloc Currently, a single function is used to allocate the rings themselves, initialize them, populate the rx ring, and allocate the rx buffers. The same happens on free. This splits them into separate functions. This will be useful when properly implementing re-initialization on link changes and error handling when the rings will be repopulated but not freed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	5176477735	ftgmac100: Cleanup speed/duplex tracking and fix duplex config Keep track of both the current speed and duplex settings instead of only speed and properly apply the duplex setting to the HW. This reworks the adjust_link() function to also avoid trying to reconfigure the HW when there is no link and to display the link state to the user. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	8396e1cb0b	ftgmac100: Remove "enabled" flags It's not used in any meaningful way Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	831fb3388c	ftgmac100: Reorder struct fields and comment Reorder the fields in struct ftgmac in slightly more logical groups. Will make more sense as I add/remove some. No code change. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	3f6af0eede	ftgmac100: Remove "banner" comments The divisions they represent are not particularily meaningful and things are going to be moving around with upcoming changes making these comments more a burden than anything else. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Benjamin Herrenschmidt	60b28a1167	ftgmac100: Use netdev->irq instead of private copy There's a placeholder already for the irq, use it Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:38:04 -07:00
Felix Manlunas	bb54be589c	liquidio: fix Octeon core watchdog timeout false alarm Detection of watchdog timeout of Octeon cores is flawed and susceptible to false alarms. Refactor by removing the detection code, and in its place, leverage existing code that monitors for an indication from the NIC firmware that an Octeon core crashed; expand the meaning of the indication to "an Octeon core crashed or its watchdog timer expired". Detection of watchdog timeout is now delegated to an exception handler in the NIC firmware; this is free of false alarms. Also if there's an Octeon core crash or watchdog timeout: (1) Disable VF Ethernet links. (2) Decrement the module refcount by an amount equal to the number of active VFs of the NIC whose Octeon core crashed or had a watchdog timeout. The refcount will continue to reflect the active VFs of other liquidio NIC(s) (if present) whose Octeon cores are faultless. Item (2) is needed to avoid the case of not being able to unload the driver because the module refcount is stuck at some non-zero number. There is code that, in normal cases, decrements the refcount upon receiving a message from the firmware that a VF driver was unloaded. But in exceptional cases like an Octeon core crash or watchdog timeout, arrival of that particular message from the firmware might be unreliable. That normal case code is changed to not touch the refcount in the exceptional case to avoid contention (over the refcount) with the liquidio_watchdog kernel thread who will carry out item (2). Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:31:56 -07:00
Florian Fainelli	b73b3cde0e	net: usbnet: Remove unused driver_name variable With GCC 6.3, we can get the following warning: drivers/net/usb/usbnet.c:85:19: warning: 'driver_name' defined but not used [-Wunused-const-variable=] static const char driver_name [] = "usbnet"; ^~~~~~~~~~~ Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:25:23 -07:00
Alexei Starovoitov	89c0a36130	selftests/bpf: fix merge conflict fix artifact of merge resolution Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 12:21:59 -07:00
David S. Miller	6f14f443d3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Mostly simple cases of overlapping changes (adding code nearby, a function whose name changes, for example). Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-06 08:24:51 -07:00
David Howells	89ca694806	rxrpc: Trace client call connection Add a tracepoint (rxrpc_connect_call) to log the combination of rxrpc_call pointer, afs_call pointer/user data and wire call parameters to make it easier to match the tracebuffer contents to captured network packets. Signed-off-by: David Howells <dhowells@redhat.com>	2017-04-06 11:10:41 +01:00
David Howells	740586d290	rxrpc: Trace changes in a call's receive window size Add a tracepoint (rxrpc_rx_rwind_change) to log changes in a call's receive window size as imposed by the peer through an ACK packet. Signed-off-by: David Howells <dhowells@redhat.com>	2017-04-06 11:10:41 +01:00
David Howells	005ede286f	rxrpc: Trace received aborts Add a tracepoint (rxrpc_rx_abort) to record received aborts. Signed-off-by: David Howells <dhowells@redhat.com>	2017-04-06 11:10:41 +01:00
David Howells	fb46f6ee10	rxrpc: Trace protocol errors in received packets Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received packets. The following changes are made: (1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a call and mark the call aborted. This is wrapped by rxrpc_abort_eproto() that makes the why string usable in trace. (2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error generation points, replacing rxrpc_abort_call() with the latter. (3) Only send an abort packet in rxkad_verify_packet*() if we actually managed to abort the call. Note that a trace event is also emitted if a kernel user (e.g. afs) tries to send data through a call when it's not in the transmission phase, though it's not technically a receive event. Signed-off-by: David Howells <dhowells@redhat.com>	2017-04-06 11:09:39 +01:00
David Howells	ef68622da9	rxrpc: Handle temporary errors better in rxkad security In the rxkad security module, when we encounter a temporary error (such as ENOMEM) from which we could conceivably recover, don't abort the connection, but rather permit retransmission of the relevant packets to induce a retry. Note that I'm leaving some places that could be merged together to insert tracing in the next patch. Signed-off-by; David Howells <dhowells@redhat.com>	2017-04-06 10:11:59 +01:00
David Howells	84a4c09c38	rxrpc: Note a successfully aborted kernel operation Make rxrpc_kernel_abort_call() return an indication as to whether it actually aborted the operation or not so that kafs can trace the failure of the operation. Note that 'success' in this context means changing the state of the call, not necessarily successfully transmitting an ABORT packet. Signed-off-by: David Howells <dhowells@redhat.com>	2017-04-06 10:11:59 +01:00
David Howells	3a92789af0	rxrpc: Use negative error codes in rxrpc_call struct Use negative error codes in struct rxrpc_call::error because that's what the kernel normally deals with and to make the code consistent. We only turn them positive when transcribing into a cmsg for userspace recvmsg. Signed-off-by: David Howells <dhowells@redhat.com>	2017-04-06 10:11:56 +01:00
Ngai-Mint Kwan	7d4fe0d123	fm10k: do not enqueue mailbox when host not ready Interfaces will reset whenever the TX mailbox FIFO has become full. This occurs more frequently whenever the IES API application is not running to process and clear the messages in the FIFO. Thus, this could lead to situations where the interface would enter an infinite reset loop. That is: if the interface is trying to synchronize a huge number of unicast and multicast entries with the IES API application, the TX mailbox FIFO will become full and the interface resets. Once the interface exits reset, it'll try to synchronize the unicast and multicast entries again. Ergo, this creates an infinite loop. Other actions such as multiple mulitcast mode or up/down transitions will fill the TX mailbox FIFO and induce the interface to reset. To correct these situations, check if the interface's "host_ready" flag is enabled before enqueuing any messages to the TX mailbox FIFO. This check will be conducted by a function call. Lastly, this issue mainly affects the PF and, thus, the VF is exempt. Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-04-05 22:47:31 -07:00
Ngai-Mint Kwan	16b1889f8b	fm10k: disable receive queue when configuring ring Write to RXQCTL register to disable the receive queue when configuring the RX ring. Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-04-05 22:47:31 -07:00
Jacob Keller	02957703ca	fm10k: update function header comment for fm10k_get_stats64 Re-word the comment to avoid stating that we return a value for this void function. Additionally, there is no need to mention older kernels, since this is the upstream kernel. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-04-05 22:47:31 -07:00
Jacob Keller	b4fd8ffc11	fm10k: allow service task to reschedule itself If some code path executes fm10k_service_event_schedule(), it is guaranteed that we only queue the service task once, since we use __FM10K_SERVICE_SCHED flag. Unfortunately this has a side effect that if a service request occurs while we are currently running the watchdog, it is possible that we will fail to notice the request and ignore it until the next time the request occurs. This can cause problems with pf/vf mailbox communication and other service event tasks. To avoid this, introduce a FM10K_SERVICE_REQUEST bit. When we successfully schedule (and set the _SCHED bit) the service task, we will clear this bit. However, if we are unable to currently schedule the service event, we just set the new SERVICE_REQUEST bit. Finally, after the service event completes, we will re-schedule if the request bit has been set. This should ensure that we do not miss any service event schedules, since we will re-schedule it once the currently running task finishes. This means that for each request, we will always schedule the service task to run at least once in full after the request came in. This will avoid timing issues that can occur with the service event scheduling. We do pay a cost in re-running many tasks, but all the service event tasks use either flags to avoid duplicate work, or are tolerant of being run multiple times. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-04-05 22:47:30 -07:00
Jacob Keller	4692955787	fm10k: future-proof state bitmaps using DECLARE_BITMAP This ensures that future programmers do not have to remember to re-size the bitmaps due to adding new values. Although this is unlikely for this driver, it may happen and it's best to prevent it from ever being an issue. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-04-05 22:47:30 -07:00
Jacob Keller	3ee7b3a3b9	fm10k: use a BITMAP for flags to avoid race conditions Replace bitwise operators and #defines with a BITMAP and enumeration values. This is similar to how we handle the "state" values as well. This has two distinct advantages over the old method. First, we ensure correctness of operations which are currently problematic due to race conditions. Suppose that two kernel threads are running, such as the watchdog and an ethtool ioctl, and both modify flags. We'll say that the watchdog is CPU A, and the ethtool ioctl is CPU B. CPU A sets FLAG_1, which can be seen as CPU A read FLAGS CPU A write FLAGS \| FLAG_1 CPU B sets FLAG_2, which can be seen as CPU B read FLAGS CPU A write FLAGS \| FLAG_2 However, "\|=" and "&=" operators are not actually atomic. So this could be ordered like the following: CPU A read FLAGS -> variable CPU B read FLAGS -> variable CPU A write FLAGS (variable \| FLAG_1) CPU B write FLAGS (variable \| FLAG_2) Notice how the 2nd write from CPU B could actually undo the write from CPU A because it isn't guaranteed that the \|= operation is atomic. In practice the race windows for most flag writes is incredibly narrow so it is not easy to isolate issues. However, the more flags we have, the more likely they will cause problems. Additionally, if such a problem were to arise, it would be incredibly difficult to track down. Second, there is an additional advantage beyond code correctness. We can now automatically size the BITMAP if more flags were added, so that we do not need to remember that flags is u32 and thus if we added too many flags we would over-run the variable. This is not a likely occurrence for fm10k driver, but this patch can serve as an example for other drivers which have many more flags. This particular change does have a bit of trouble converting some of the idioms previously used with the #defines for flags. Specifically, when converting FM10K_FLAG_RSS_FIELD_IPV[46]_UDP flags. This whole operation was actually quite problematic, because we actually stored flags separately. This could more easily show the problem of the above re-ordering issue. This is really difficult to test whether atomics make a difference in practical scenarios, but you can ensure that basic functionality remains the same. This patch has a lot of code coverage, but most of it is relatively simple. While we are modifying these files, update their copyright year. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-04-05 22:47:30 -07:00
Phil Turnbull	540fca35e3	fm10k: correctly check if interface is removed FM10K_REMOVED expects a hardware address, not a 'struct fm10k_hw'. Fixes: `5cb8db4a4c` ("fm10k: Add support for VF") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2017-04-05 22:47:30 -07:00
Linus Torvalds	ea6b1720ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Reject invalid updates to netfilter expectation policies, from Pablo Neira Ayuso. 2) Fix memory leak in nfnl_cthelper, from Jeffy Chen. 3) Don't do stupid things if we get a neigh_probe() on a neigh entry whose ops lack a solicit method. From Eric Dumazet. 4) Don't transmit packets in r8152 driver when the carrier is off, from Hayes Wang. 5) Fix ipv6 packet type detection in aquantia driver, from Pavel Belous. 6) Don't write uninitialized data into hw registers in bna driver, from Arnd Bergmann. 7) Fix locking in ping_unhash(), from Eric Dumazet. 8) Make BPF verifier range checks able to understand certain sequences emitted by LLVM, from Alexei Starovoitov. 9) Fix use after free in ipconfig, from Mark Rutland. 10) Fix refcount leak on force commit in openvswitch, from Jarno Rajahalme. 11) Fix various overflow checks in AF_PACKET, from Andrey Konovalov. 12) Fix endianness bug in be2net driver, from Suresh Reddy. 13) Don't forget to wake TX queues when processing a timeout, from Grygorii Strashko. 14) ARP header on-stack storage is wrong in flow dissector, from Simon Horman. 15) Lost retransmit and reordering SNMP stats in TCP can be underreported. From Yuchung Cheng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits) nfp: fix potential use after free on xdp prog tcp: fix reordering SNMP under-counting tcp: fix lost retransmit SNMP under-counting sctp: get sock from transport in sctp_transport_update_pmtu net: ethernet: ti: cpsw: fix race condition during open() l2tp: fix PPP pseudo-wire auto-loading bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_* l2tp: take reference on sessions being dumped tcp: minimize false-positives on TCP/GRO check sctp: check for dst and pathmtu update in sctp_packet_config flow dissector: correct size of storage for ARP net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout l2tp: take a reference on sessions used in genetlink handlers l2tp: hold session while sending creation notifications l2tp: fix duplicate session creation l2tp: ensure session can't get removed during pppol2tp_session_ioctl() l2tp: fix race in l2tp_recv_common() sctp: use right in and out stream cnt bpf: add various verifier test cases for self-tests bpf, verifier: fix rejection of unaligned access checks for map_value_adj ...	2017-04-05 20:17:38 -07:00
Jakub Kicinski	c383bdd14f	nfp: fix potential use after free on xdp prog We should unregister the net_device first, before we give back our reference on xdp_prog. Otherwise xdp_prog may be freed before .ndo_stop() disabled the datapath. Found by code inspection. Fixes: `ecd63a0217` ("nfp: add XDP support in the driver") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-05 18:46:40 -07:00
Jarod Wilson	faeeb317a5	bonding: attempt to better support longer hw addresses People are using bonding over Infiniband IPoIB connections, and who knows what else. Infiniband has a hardware address length of 20 octets (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32. Various places in the bonding code are currently hard-wired to 6 octets (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides, only alb is currently possible on Infiniband links right now anyway, due to commit `1533e77315`, so the alb code is where most of the changes are. One major component of this change is the addition of a bond_hw_addr_copy function that takes a length argument, instead of using ether_addr_copy everywhere that hardware addresses need to be copied about. The other major component of this change is converting the bonding code from using struct sockaddr for address storage to struct sockaddr_storage, as the former has an address storage space of only 14, while the latter is 128 minus a few, which is necessary to support bonding over device with up to MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes up some memory corruption issues with the current code, where it's possible to write an infiniband hardware address into a sockaddr declared on the stack. Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet hardware address now: $ cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active) Primary Slave: mlx4_ib0 (primary_reselect always) Currently Active Slave: mlx4_ib0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 100 Down Delay (ms): 100 Slave Interface: mlx4_ib0 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01 Slave queue ID: 0 Slave Interface: mlx4_ib1 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02 Slave queue ID: 0 Also tested with a standard 1Gbps NIC bonding setup (with a mix of e1000 and e1000e cards), running LNST's bonding tests. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-05 18:44:54 -07:00
Yuchung Cheng	2d2517ee31	tcp: fix reordering SNMP under-counting Currently the reordering SNMP counters only increase if a connection sees a higher degree then it has previously seen. It ignores if the reordering degree is not greater than the default system threshold. This significantly under-counts the number of reordering events and falsely convey that reordering is rare on the network. This patch properly and faithfully records the number of reordering events detected by the TCP stack, just like the comment says "this exciting event is worth to be remembered". Note that even so TCP still under-estimate the actual reordering events because TCP requires TS options or certain packet sequences to detect reordering (i.e. ACKing never-retransmitted sequence in recovery or disordered state). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-05 18:41:27 -07:00
Yuchung Cheng	ecde8f36f8	tcp: fix lost retransmit SNMP under-counting The lost retransmit SNMP stat is under-counting retransmission that uses segment offloading. This patch fixes that so all retransmission related SNMP counters are consistent. Fixes: `10d3be5692` ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-05 18:41:27 -07:00
Edward Cree	148cbab6cf	sfc: don't insert mc_list on low-latency firmware if it's too long If the mc_list is longer than 256 addresses, we enter mc_promisc mode. If we're in mc_promisc mode and the firmware doesn't support cascaded multicast, normally we also insert our mc_list, to prevent stealing by another VI. However, if the mc_list was too long, this isn't really helpful - the MC groups that didn't fit in the list can still get stolen, and having only some of them stealable will probably cause more confusing behaviour than having them all stealable. Since inserting 256 multicast filters takes a long time and can lead to MCDI state machine timeouts, just skip the mc_list insert in this overflow condition. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-05 18:35:21 -07:00

... 3 4 5 6 7 ...

664305 commits