redonkable/alistair23-linux

Author	SHA1	Message	Date
Cong Wang	6578759447	net_sched: fix a compile warning in act_ife Apparently ife_meta_id2name() is only called when CONFIG_MODULES is defined. This fixes: net/sched/act_ife.c:251:20: warning: ‘ife_meta_id2name’ defined but not used [-Wunused-function] static const char *ife_meta_id2name(u32 metaid) ^~~~~~~~~~~~~~~~ Fixes: `d3f24ba895` ("net sched actions: fix module auto-loading") Cc: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:43:45 -07:00
David S. Miller	c116004d5b	Merge branch 'hv_netvsc-Add-init-of-send-table-and-var-renames' Haiyang Zhang says: ==================== hv_netvsc: Add init of send table and var renames Add initialization of send indirection table. Otherwise it may contain old info of previous device with different number of channels. Also, did some variable renaming for easier reading. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:42:56 -07:00
Haiyang Zhang	6b0cbe3158	hv_netvsc: Add initialization of tx_table in netvsc_device_add() tx_table is part of the private data of kernel net_device. It is only zero-ed out when allocating net_device. We may recreate netvsc_device w/o recreating net_device, so the private netdev data, including tx_table, are not zeroed. It may contain channel numbers for the older netvsc_device. This patch adds initialization of tx_table each time we recreate netvsc_device. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:42:55 -07:00
Haiyang Zhang	39e91cfbf6	hv_netvsc: Rename tx_send_table to tx_table Simplify the variable name: tx_send_table Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:42:55 -07:00
Haiyang Zhang	47371300df	hv_netvsc: Rename ind_table to rx_table Rename this variable because it is the Receive indirection table. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:42:55 -07:00
Colin Ian King	5dc874252f	cxgb4: fix missing break in switch and indent return statements The break statement for the Macronix case is missing and will fall through to the Winbond case and re-assign the size setting. Fix this by adding the missing break statement. Also correctly indent the return statements. Detected by CoverityScan, CID#1458020 ("Missing break in switch") Fixes: `96ac18f14a` ("cxgb4: Add support for new flash parts") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:37:57 -07:00
David S. Miller	ae96d3331e	Three fixes for the recently added new code: * make "make -s" silent for the certs file (Arnd) * fix missing CONFIG_ in extra certs symbol (Arnd) * use crypto_aead_authsize() to use the proper API and two other changes: * remove a set-but-unused variable * don't track HT capability changes, capabilities are supposed to be constant (HT operation changes) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlngxMEACgkQa3t4Rpy0 AB35pg//YCsOvIMhgTXqx4U32APeDfMsn3o5nyFwoGCccQKOhCyuC5hqEQQ5vpBY Jkz3DKwsqYtu0sCN/azlu/HcQe3B4JdyCXKIxQUQx8Bn/dJH2kQ5SnhX+SpH+8Uw EsHzjTGjJski84vMe9V7QYO5SXQyXdx7tHHHjEXgw4xlIissMjnelYghQ5lev7UN wgqqk6o/MucuVoQjmASX6UOh+yEHNW9PfVSpMhMnQ7n3IPjQ3MvlyjrXglByrAH9 kkQIYwonoKVzOfXjH8hzPFpiLCEFyDz7457uXXfSl8Zlv8dsdNKLoDc57NzTOw/k KcK9ZXOim1v/Vg/7bO4JkUWxT+oelOMBG7R5BYtvo7GRrIzI9HX66Ns7K3c7t8SL yGNRRw5ezOm1sVBIMbyuMbbLycbeGg+QcRlm84IkPCJpRed0LpgtCCKSQLtTTeLw gVhgOI5VB3rk7wWCnSopiJJRQvFYfhnB5WRIdTsX8JQl+bc/TWH70+TjY2+rTWZx uCjs3FeOowQzySOxIQndCa3Z+FDydZJWbB+E9iqsKrTWR6HKUBotbwmzWPr2iIJm h7Kjbyx7yS15vurfQV2Mw+QNLHwMZrc3OliWawjnj6uk7BGtGujxHAQ0m8qG6q2A FkPeQYJlFao0RTEJPHCeOG46kVApnD08r184N9S+QX7xYI+zppo= =KTQB -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2017-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Three fixes for the recently added new code: * make "make -s" silent for the certs file (Arnd) * fix missing CONFIG_ in extra certs symbol (Arnd) * use crypto_aead_authsize() to use the proper API and two other changes: * remove a set-but-unused variable * don't track HT capability changes, capabilities are supposed to be constant (HT operation changes) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:36:46 -07:00
David S. Miller	f013820f07	Merge branch 'cxgb4-hw-debug-logs' Rahul Lakkireddy says: ==================== cxgb4: add support to get hardware debug logs via ethtool This series of patches add support to collect hardware debug logs via ethtool --get-dump facility. Currently supports: Memory dumps - Collects on-chip EDC0 and EDC1 dumps. Hardware dumps - Collects firmware and hardware dumps. Patch 1 adds ethtool set/get dump data. It also adds template header that precedes dump data. This template header gives additional information needed for extracting and decoding the collected dump data. Patch 2 adds base to collect dumps. Also collects regdump. Patch 3 collects on-chip EDC0 and EDC1 memory dumps. Patch 4 collects firmware mbox log and device log. Patch 5 updates base API for accessing TP indirect registers. Patch 6 collects hardware TP module dump. Patch 7 collects hardware SGE, PCIE, PM, UP CIM, MA, and HMA module dumps. Patch 8 collects hardware IBQ and OBQ dump. Thanks, Rahul --- v2: - Prefix symbols that pollute global namespace in files starting with cxgb4_* with "cxgb4_" - Prefix symbols that pollute global namespace in files starting with cudbg_* with "cudbg_" - Make cudbg_collect_mem_info() static. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:15 -07:00
Rahul Lakkireddy	7c075ce221	cxgb4: collect IBQ and OBQ dumps Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Rahul Lakkireddy	270d39bf32	cxgb4: collect hardware module dumps Collect SGE, PCIE, PM, UP CIM, MA and HMA dumps. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Rahul Lakkireddy	4359cf3368	cxgb4: collect TP dump Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Rahul Lakkireddy	5ccf9d0496	cxgb4: update API for TP indirect register access Try to access TP indirect registers via firmware first. If this fails, fallback and access them directly. This ensures that driver and firmware do not conflict each other while accessing the TP indirect registers. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Rahul Lakkireddy	844d1b6f0e	cxgb4: collect firmware mbox and device log dump Collect firmware mbox and device logs before collecting the rest of the hardware dumps to snap the firmware state before the mailbox logs are updated by other hardware dumps. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Rahul Lakkireddy	b33af022e5	cxgb4: collect on-chip memory dump Collect EDC0 and EDC1 memory dump. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Rahul Lakkireddy	a7975a2f9a	cxgb4: collect register dump Add base to collect dump entities. Collect register dump and update template header accordingly. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Rahul Lakkireddy	ad75b7d32f	cxgb4: implement ethtool dump data operations Implement operations to set/get dump data via ethtool. Also add template header that precedes dump data, which helps in decoding and extracting the dump data. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:35:14 -07:00
Mark Brown	4c7787ba3a	nfp: Explicitly include linux/bug.h Today's -next build encountered an error due to a missing definition of WARN_ON(), caused by some header reorganization removing an implicit inclusion of linux/bug.h. Fix this with an explicit inclusion. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:31:41 -07:00
David S. Miller	7f4e568dd1	Merge branch 'dsa-remove-set_addr' Vivien Didelot says: ==================== net: dsa: remove .set_addr An Ethernet switch may support having a MAC address, which can be used as the switch's source address in transmitted full-duplex Pause frames. If a DSA switch supports the related .set_addr operation, the DSA core sets the master's MAC address on the switch. This won't make sense anymore in a multi-CPU ports system, because there won't be a unique master device assigned to a switch tree. Moreover this operation is confusing because it makes the user think that it could be used to program the switch with the MAC address of the CPU/management port such that MAC address learning can be disabled on said port, but in fact, that's not how it is currently used. To fix this, assign a random MAC address at setup time in the mv88e6060 and mv88e6xxx drivers before removing .set_addr completely from DSA. Changes in v3: - include fix for mv88e6060 switch MAC address setter. Changes in v2: - remove .set_addr implementation from drivers and use a random MAC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:30:07 -07:00
Vivien Didelot	841f4f2405	net: dsa: remove .set_addr Now that there is no user for the .set_addr function, remove it from DSA. If a switch supports this feature (like mv88e6xxx), the implementation can be done in the driver setup. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:30:06 -07:00
Vivien Didelot	93004a934b	net: dsa: dsa_loop: remove .set_addr The .set_addr function does nothing, remove the dsa_loop implementation before getting rid of it completely in DSA. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:30:06 -07:00
Vivien Didelot	56c3ff9bf2	net: dsa: mv88e6060: setup random mac address As for mv88e6xxx, setup the switch from within the mv88e6060 driver with a random MAC address, and remove the .set_addr implementation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:30:06 -07:00
Vivien Didelot	1723ab4f5e	net: dsa: mv88e6060: fix switch MAC address The 88E6060 Ethernet switch always transmits the multicast bit of the switch MAC address as a zero. It re-uses the corresponding bit 8 of the register "Switch MAC Address Register Bytes 0 & 1" for "DiffAddr". If the "DiffAddr" bit is 0, then all ports transmit the same source address. If it is set to 1, then bit 2:0 are used for the port number. The mv88e6060 driver is currently wrongly shifting the MAC address byte 0 by 9. To fix this, shift it by 8 as usual and clear its bit 0. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:30:06 -07:00
Vivien Didelot	04a69a1759	net: dsa: mv88e6xxx: setup random mac address An Ethernet switch may support having a MAC address, which can be used as the switch's source address in transmitted full-duplex Pause frames. If a DSA switch supports the related .set_addr operation, the DSA core sets the master's MAC address on the switch. This won't make sense anymore in a multi-CPU ports system, because there won't be a unique master device assigned to a switch tree. Instead, setup the switch from within the Marvell driver with a random MAC address, and remove the .set_addr implementation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:30:06 -07:00
Gustavo A. R. Silva	ec0d0987f0	atm: fore200e: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 18:24:44 -07:00
David S. Miller	8bc4654877	Merge branch 'nfp-bpf-support-direct-packet-access' Jakub Kicinski says: ==================== nfp: bpf: support direct packet access The core of this series is direct packet access support. With a small change to the verifier, the offloaded code can now make use of DPA. We need to be careful to use kernel (after initial translation) offsets in our JIT. Direct packet access also brings us to the problem of eBPF endianness. After considering the changes necessary we decided to not support translation on both BE and LE hosts, for now. This series contains two fixes - one for compare instructions and one for ineffective jne optimization. I chose to include fixes in this set because the code in -net works only with unreleased PoC FW (ABI version 1) and therefore nobody outside of Netronome can exercise it anyway. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:29 -07:00
Jakub Kicinski	bfddbc8adc	nfp: bpf: support direct packet access in TC Add support for direct packet access in TC, note that because writing the packet will cause the verifier to generate a csum fixup prologue we won't be able to offload packet writes from TC, just yet, only the reads will work. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	e663fe3863	nfp: bpf: direct packet access - write This patch adds ability to write packet contents using pre-validated packet pointers (direct packet access). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	2ca71441f5	nfp: bpf: add support for direct packet access - read In direct packet access bound checks are already done, we can simply dereference the packet pointer. Verifier/parser logic needs to record pointer type. Note that although verifier does protect us from CTX vs other pointer changes we will also want to differentiate between PACKET vs MAP_VALUE or STACK, so we can add the check already. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	0a7939775f	nfp: bpf: separate I/O from checks for legacy data load Move data load into a separate function and separate it from packet length checks of legacy I/O. This makes the code more readable and easier to reuse. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	943c57b97c	nfp: bpf: fix context accesses Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend on target architecture. Take that into account and use struct xdp_buff, not struct xdp_md. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	0f6cf4ddf6	nfp: bpf: support BPF offload only on little endian eBPF is host-endian specific. Translating both BE and LE eBPF to the NFP is feasible, but would require quite a bit of indirection. The fact that I don't have access to any BE hosts that would fit a 25G/40G/100G NIC is also limiting my ability to test big endian. For now restrict the offload to little endian hosts only. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	3119d1fd46	nfp: bpf: implement byte swap instruction Implement byte swaps with rotations, shifts and byte loads. Remember to clear upper parts of the 64 bit registers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	c000dfb5e2	nfp: bpf: add mov helper Register move operation is encoded as alu no op. This means that one has to specify number of unused/none parameters to the emit_alu(). Add a helper. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	26fa818dc0	nfp: bpf: fix compare instructions Now that we have BPF assemebler support in LLVM 6 we can easily test all compare instructions (LLVM 4 didn't generate most of them from C). Fix the compare to immediates and refactor the order of compare to regs to make sure they both follow the same pattern. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	8283737065	nfp: bpf: add missing return in jne_imm optimization We optimize comparisons to immediate 0 as if (reg.lo \| reg.hi). The early return statement was missing, however, which means we would generate two comparisons - optimized one followed by a normal 2x 32 bit compare. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:28 -07:00
Jakub Kicinski	bc8c80a8c9	nfp: bpf: reorder arguments to emit_ld_field_any() ld_field instruction has the following format in NFP assembler: ld_field[dst, 1000, src, <<24] reoder parameters to emit_ld_field_any() to make it closer to the familiar assembler order. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:27 -07:00
Jakub Kicinski	1bdec44955	bpf: verifier: set reg_type on context accesses in second pass Use a simplified is_valid_access() callback when verifier is used for program analysis by non-host JITs. This allows us to teach the verifier about packet start and packet end offsets for direct packet access. We can extend the callback as needed but for most packet processing needs there isn't much more the offloads may require. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:13:27 -07:00
David S. Miller	40d0af5635	Merge branch 'stmmac-Improvements-for-multi-queuing-and-for-AVB' Jose Abreu says: ==================== net: stmmac: Improvements for multi-queuing and for AVB Two improvements for stmmac: First one corrects the available fifo size per queue, second one corrects enabling of AVB queues. More info in commit log. Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Changes from v1: - Fix typo in second patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>	2017-10-14 11:12:08 -07:00
Jose Abreu	a0daae1377	net: stmmac: Disable flow ctrl for RX AVB queues and really enable TX AVB queues Flow control must be disabled for AVB enabled queues and TX AVB queues must be enabled by setting BIT(2) of TXQEN. Correct this by passing the queue mode to DMA callbacks and by checking in these functions wether we are in AVB performing the necessary adjustments. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:12:08 -07:00
Jose Abreu	52a76235d0	net: stmmac: Use correct values in TQS/RQS fields Currently we are using all the available fifo size in RQS and TQS fields. This will not work correctly in multi-queues IP's because total fifo size must be splitted to the enabled queues. Correct this by computing the available fifo size per queue and setting the right value in TQS and RQS fields. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:12:07 -07:00
Matteo Croce	258bbb1b0e	icmp: don't fail on fragment reassembly time exceeded The ICMP implementation currently replies to an ICMP time exceeded message (type 11) with an ICMP host unreachable message (type 3, code 1). However, time exceeded messages can either represent "time to live exceeded in transit" (code 0) or "fragment reassembly time exceeded" (code 1). Unconditionally replying to "fragment reassembly time exceeded" with host unreachable messages might cause unjustified connection resets which are now easily triggered as UFO has been removed, because, in turn, sending large buffers triggers IP fragmentation. The issue can be easily reproduced by running a lot of UDP streams which is likely to trigger IP fragmentation: # start netserver in the test namespace ip netns add test ip netns exec test netserver # create a VETH pair ip link add name veth0 type veth peer name veth0 netns test ip link set veth0 up ip -n test link set veth0 up for i in $(seq 20 29); do # assign addresses to both ends ip addr add dev veth0 192.168.$i.1/24 ip -n test addr add dev veth0 192.168.$i.2/24 # start the traffic netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 & done # wait send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host We need to differentiate instead: if fragment reassembly time exceeded is reported, we need to silently drop the packet, if time to live exceeded is reported, maintain the current behaviour. In both cases increment the related error count "icmpInTimeExcds". While at it, fix a typo in a comment, and convert the if statement into a switch to mate it more readable. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-14 11:05:26 -07:00
David S. Miller	a00344bd1b	Merge branch 'tipc-comm-groups' Jon Maloy says: ==================== tipc: Introduce Communcation Group feature With this commit series we introduce a 'Group Communication' feature in order to resolve the datagram and multicast flow control problem. This new feature makes it possible for a user to instantiate multiple private virtual brokerless message buses by just creating and joining member sockets. The main features are as follows: --------------------------------- - Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN. If it is the first socket of the group this implies creation of the group. This call takes four parameters: 'type' serves as group identifier, 'instance' serves as member identifier, and 'scope' indicates the visibility of the group (node/cluster/zone). Finally, 'flags' indicates different options for the socket joining the group. For the time being, there are only two such flags: 1) 'LOOPBACK' indicates if the creator of the socket wants to receive a copy of broadcast or multicast messages it sends to the group, 2) EVENTS indicates if it wants to receive membership (JOINED/LEFT) events for the other members of the group. - Groups are closed, i.e., sockets which have not joined a group will not be able to send messages to or receive messages from members of the group, and vice versa. A socket can only be member of one group at a time. - There are four transmission modes. 1: Unicast. The sender transmits a message using the port identity (node:port tuple) of the receiving socket. 2: Anycast. The sender transmits a message using a port name (type: instance:scope) of one of the receiving sockets. If more than one member socket matches the given address a destination is selected according to a round-robin algorithm, but also considering the destination load (advertised window size) as an additional criteria. 3: Multicast. The sender transmits a message using a port name (type:instance:scope) of one or more of the receiving sockets. All sockets in the group matching the given address will receive a copy of the message. 4: Broadcast. The sender transmits a message using the primtive send(). All members of the group, irrespective of their member identity (instance) number receive a copy of the message. - TIPC broadcast is used for carrying messages in mode 3 or 4 when this is deemed more efficient, i.e., depending on number of actual destinations. - All transmission modes are flow controlled, so that messages never are dropped or rejected, just like we are used to from connection oriented communication. A special algorithm guarantees that this is true even for multipoint-to-point communication, i.e., at occasions where many source sockets may decide to send simultaneously towards the same destination socket. - Sequence order is always guaranteed, even between the different transmission modes. - Member join/leave events are received in all other member sockets in guaranteed order. I.e., a 'JOINED' (an empty message with the OOB bit set) will always be received before the first data message from a new member, and a 'LEAVE' (like 'JOINED', but with EOR bit set) will always arrive after the last data message from a leaving member. ----- v2: Reordered variable declarations in descending length order, as per feedback from David Miller. This was done as far as permitted by the the initialization order. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:01 -07:00
Jon Maloy	04d7b574b2	tipc: add multipoint-to-point flow control We already have point-to-multipoint flow control within a group. But we even need the opposite; -a scheme which can handle that potentially hundreds of sources may try to send messages to the same destination simultaneously without causing buffer overflow at the recipient. This commit adds such a mechanism. The algorithm works as follows: - When a member detects a new, joining member, it initially set its state to JOINED and advertises a minimum window to the new member. This window is chosen so that the new member can send exactly one maximum sized message, or several smaller ones, to the recipient before it must stop and wait for an additional advertisement. This minimum window ADV_IDLE is set to 65 1kB blocks. - When a member receives the first data message from a JOINED member, it changes the state of the latter to ACTIVE, and advertises a larger window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can continue sending with minimal disturbances to the data flow. - The active members are kept in a dedicated linked list. Each time a message is received from an active member, it will be moved to the tail of that list. This way, we keep a record of which members have been most (tail) and least (head) recently active. - There is a maximum number (16) of permitted simultaneous active senders per receiver. When this limit is reached, the receiver will not advertise anything immediately to a new sender, but instead put it in a PENDING state, and add it to a corresponding queue. At the same time, it will pick the least recently active member, send it an advertisement RECLAIM message, and set this member to state RECLAIMING. - The reclaimee member has to respond with a REMIT message, meaning that it goes back to a send window of ADV_IDLE, and returns its unused advertised blocks beyond that value to the reclaiming member. - When the reclaiming member receives the REMIT message, it unlinks the reclaimee from its active list, resets its state to JOINED, and notes that it is now back at ADV_IDLE advertised blocks to that member. If there are still unread data messages sent out by reclaimee before the REMIT, the member goes into an intermediate state REMITTED, where it stays until the said messages have been consumed. - The returned advertised blocks can now be re-advertised to the pending member, which is now set to state ACTIVE and added to the active member list. - To be proactive, i.e., to minimize the risk that any member will end up in the pending queue, we start reclaiming resources already when the number of active members exceeds 3/4 of the permitted maximum. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:01 -07:00
Jon Maloy	a3bada7066	tipc: guarantee delivery of last broadcast before DOWN event The following scenario is possible: - A user sends a broadcast message, and thereafter immediately leaves the group. - The LEAVE message, following a different path than the broadcast, arrives ahead of the broadcast, and the sending member is removed from the receiver's list. - The broadcast message arrives, but is dropped because the sender now is unknown to the receipient. We fix this by sequence numbering membership events, just like ordinary unicast messages. Currently, when a JOIN is sent to a peer, it contains a synchronization point, - the sequence number of the next sent broadcast, in order to give the receiver a start synchronization point. We now let even LEAVE messages contain such an "end synchronization" point, so that the recipient can delay the removal of the sending member until it knows that all messages have been received. The received synchronization points are added as sequence numbers to the generated membership events, making it possible to handle them almost the same way as regular unicasts in the receiving filter function. In particular, a DOWN event with a too high sequence number will be kept in the reordering queue until the missing broadcast(s) arrive and have been delivered. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:01 -07:00
Jon Maloy	399574d419	tipc: guarantee delivery of UP event before first broadcast The following scenario is possible: - A user joins a group, and immediately sends out a broadcast message to its members. - The broadcast message, following a different data path than the initial JOIN message sent out during the joining procedure, arrives to a receiver before the latter.. - The receiver drops the message, since it is not ready to accept any messages until the JOIN has arrived. We avoid this by treating group protocol JOIN messages like unicast messages. - We let them pass through the recipient's multicast input queue, just like ordinary unicasts. - We force the first following broadacst to be sent as replicated unicast and being acknowledged by the recipient before accepting any more broadcast transmissions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:01 -07:00
Jon Maloy	2f487712b8	tipc: guarantee that group broadcast doesn't bypass group unicast We need a mechanism guaranteeing that group unicasts sent out from a socket are not bypassed by later sent broadcasts from the same socket. We do this as follows: - Each time a unicast is sent, we set a the broadcast method for the socket to "replicast" and "mandatory". This forces the first subsequent broadcast message to follow the same network and data path as the preceding unicast to a destination, hence preventing it from overtaking the latter. - In order to make the 'same data path' statement above true, we let group unicasts pass through the multicast link input queue, instead of as previously through the unicast link input queue. - In the first broadcast following a unicast, we set a new header flag, requiring all recipients to immediately acknowledge its reception. - During the period before all the expected acknowledges are received, the socket refuses to accept any more broadcast attempts, i.e., by blocking or returning EAGAIN. This period should typically not be longer than a few microseconds. - When all acknowledges have been received, the sending socket will open up for subsequent broadcasts, this time giving the link layer freedom to itself select the best transmission method. - The forced and/or abrupt transmission method changes described above may lead to broadcasts arriving out of order to the recipients. We remedy this by introducing code that checks and if necessary re-orders such messages at the receiving end. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:01 -07:00
Jon Maloy	b87a5ea31c	tipc: guarantee group unicast doesn't bypass group broadcast Group unicast messages don't follow the same path as broadcast messages, and there is a high risk that unicasts sent from a socket might bypass previously sent broadcasts from the same socket. We fix this by letting all unicast messages carry the sequence number of the next sent broadcast from the same node, but without updating this number at the receiver. This way, a receiver can check and if necessary re-order such messages before they are added to the socket receive buffer. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:01 -07:00
Jon Maloy	5b8dddb637	tipc: introduce group multicast messaging The previously introduced message transport to all group members is based on the tipc multicast service, but is logically a broadcast service within the group, and that is what we call it. We now add functionality for sending messages to all group members having a certain identity. Correspondingly, we call this feature 'group multicast'. The service is using unicast when only one destination is found, otherwise it will use the bearer broadcast service to transfer the messages. In the latter case, the receiving members filter arriving messages by looking at the intended destination instance. If there is no match, the message will be dropped, while still being considered received and read as seen by the flow control mechanism. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:01 -07:00
Jon Maloy	ee106d7f94	tipc: introduce group anycast messaging In this commit, we make it possible to send connectionless unicast messages to any member corresponding to the given member identity, when there is more than one such member. The sender must use a TIPC_ADDR_NAME address to achieve this effect. We also perform load balancing between the destinations, i.e., we primarily select one which has advertised sufficient send window to not cause a block/EAGAIN delay, if any. This mechanism is overlayed on the always present round-robin selection. Anycast messages are subject to the same start synchronization and flow control mechanism as group broadcast messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:00 -07:00
Jon Maloy	27bd9ec027	tipc: introduce group unicast messaging We now make it possible to send connectionless unicast messages within a communication group. To send a message, the sender can use either a direct port address, aka port identity, or an indirect port name to be looked up. This type of messages are subject to the same start synchronization and flow control mechanism as group broadcast messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-13 08:46:00 -07:00

1 2 3 4 5 ...

708053 commits