redonkable/remarkable-linux

Author	SHA1	Message	Date
Yuchung Cheng	e33099f96d	tcp: implement RFC5682 F-RTO This patch implements F-RTO (foward RTO recovery): When the first retransmission after timeout is acknowledged, F-RTO sends new data instead of old data. If the next ACK acknowledges some never-retransmitted data, then the timeout was spurious and the congestion state is reverted. Otherwise if the next ACK selectively acknowledges the new data, then the timeout was genuine and the loss recovery continues. This idea applies to recurring timeouts as well. While F-RTO sends different data during timeout recovery, it does not (and should not) change the congestion control. The implementaion follows the three steps of SACK enhanced algorithm (section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and 3 are in tcp_process_loss(). The basic version is not supported because SACK enhanced version also works for non-SACK connections. The new implementation is functionally in parity with the old F-RTO implementation except the one case where it increases undo events: In addition to the RFC algorithm, a spurious timeout may be detected without sending data in step 2, as long as the SACK confirms not all the original data are dropped. When this happens, the sender will undo the cwnd and perhaps enter fast recovery instead. This additional check increases the F-RTO undo events by 5x compared to the prior implementation on Google Web servers, since the sender often does not have new data to send for HTTP. Note F-RTO may detect spurious timeout before Eifel with timestamps does so. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 11:47:51 -04:00
Yuchung Cheng	ab42d9ee3d	tcp: refactor CA_Loss state processing Consolidate all of TCP CA_Loss state processing in tcp_fastretrans_alert() into a new function called tcp_process_loss(). This is to prepare the new F-RTO implementation in the next patch. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 11:47:51 -04:00
Yuchung Cheng	9b44190dc1	tcp: refactor F-RTO The patch series refactor the F-RTO feature (RFC4138/5682). This is to simplify the loss recovery processing. Existing F-RTO was developed during the experimental stage (RFC4138) and has many experimental features. It takes a separate code path from the traditional timeout processing by overloading CA_Disorder instead of using CA_Loss state. This complicates CA_Disorder state handling because it's also used for handling dubious ACKs and undos. While the algorithm in the RFC does not change the congestion control, the implementation intercepts congestion control in various places (e.g., frto_cwnd in tcp_ack()). The new code implements newer F-RTO RFC5682 using CA_Loss processing path. F-RTO becomes a small extension in the timeout processing and interfaces with congestion control and Eifel undo modules. It lets congestion control (module) determines how many to send independently. F-RTO only chooses what to send in order to detect spurious retranmission. If timeout is found spurious it invokes existing Eifel undo algorithms like DSACK or TCP timestamp based detection. The first patch removes all F-RTO code except the sysctl_tcp_frto is left for the new implementation. Since CA_EVENT_FRTO is removed, TCP westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 11:47:50 -04:00
Daniel Borkmann	e306e2c13b	filter: add minimal BPF JIT image disassembler This is a minimal stand-alone user space helper, that allows for debugging or verification of emitted BPF JIT images. This is in particular useful for emitted opcode debugging, since minor bugs in the JIT compiler can be fatal. The disassembler is architecture generic and uses libopcodes and libbfd. How to get to the disassembly, example: 1) `echo 2 > /proc/sys/net/core/bpf_jit_enable` 2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`) 3) Run e.g. `bpf_jit_disasm -o` to disassemble the most recent JIT code output `bpf_jit_disasm -o` will display the related opcodes to a particular instruction as well. Example for x86_64: $ ./bpf_jit_disasm 94 bytes emitted from JIT compiler (pass:3, flen:9) ffffffffa0356000 + <x>: 0: push %rbp 1: mov %rsp,%rbp 4: sub $0x60,%rsp 8: mov %rbx,-0x8(%rbp) c: mov 0x68(%rdi),%r9d 10: sub 0x6c(%rdi),%r9d 14: mov 0xe0(%rdi),%r8 1b: mov $0xc,%esi 20: callq 0xffffffffe0d01b71 25: cmp $0x86dd,%eax 2a: jne 0x000000000000003d 2c: mov $0x14,%esi 31: callq 0xffffffffe0d01b8d 36: cmp $0x6,%eax [...] 5c: leaveq 5d: retq $ ./bpf_jit_disasm -o 94 bytes emitted from JIT compiler (pass:3, flen:9) ffffffffa0356000 + <x>: 0: push %rbp 55 1: mov %rsp,%rbp 48 89 e5 4: sub $0x60,%rsp 48 83 ec 60 8: mov %rbx,-0x8(%rbp) 48 89 5d f8 c: mov 0x68(%rdi),%r9d 44 8b 4f 68 10: sub 0x6c(%rdi),%r9d 44 2b 4f 6c [...] 5c: leaveq c9 5d: retq c3 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 11:35:41 -04:00
David S. Miller	b34870fc9f	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into wireless John W. Linville says: ==================== This is a big pull request for new features intended for the 3.10 stream... Regarding mac80211, Johannes says: "First, I merged mac80211/master to avoid some conflicts. This brings in a bunch of fixes you're already familiar with. For real -next material, I have a whole bunch of minstrel work, minstrel_ht from Felix and legacy minstrel from Thomas (Huehn). The other Thomas (Pedersen) did a number of changes in mesh to allow userspace peering management even when the mesh isn't secured. Stanislaw changes suspend/resume to always disconnect the networks. This is typically already done by network-manager so won't make a huge difference for most users, but fixes a number problems, particularly with USB drivers that can easily disconnect while suspended. Ilan has a small change to allow mac80211 drivers to differentiate remain-on-channel reasons, and Jouni extends nl80211 to allow fast roaming with full-MAC devices. I have a fairly large number of patches as well, many of them fairly simple cleanups, but also allowing split wiphy dumps and adding back the full wiphy information in nl80211, station entry change checking and more VHT work including VHT capability overrides (mostly for testing purposes)." And for iwlwifi, Johannes says: "Here, I also merged iwlwifi-fixes to avoid conflicts, and otherwise have various cleanups and improvements on the MVM driver, along with a few throughout the driver. Other than Bluetooth Coexistence from Emmanuel there's no over-arching theme, so listing them would pretty much reproduce the shortlog." Regarding NFC, Samuel says: "The 2 features we have with this one are: - An LLCP Service Name Lookup (SNL) netlink interface for querying LLCP service availability from user space. Along the way, Thierry also improved the existing SNL interface for aggregating SNL responses. - An initial LLCP socket options implementation, for setting the Receive Window (RW) and the Maximum Information Unit Extension (MIUX) per socket. This is need for the LLCP validation tests. We also have a microread MEI build failure here: I am not sending this one to 3.9 because the MEI bus code is not there yet, so it won't break for anyone else than me." And for ath6kl, Kalle says: "I added tracing support to ath6kl, along with a new Kconfig option. Now there's also a workaround to reset USB devices when the firmware upload fails, this happened when host was warm rebooted. There are also quite a few small fixes or cleanup." On top of all that, there is the usual bundle of driver updates with new features, new hardware support and the like mixed-in. The ath9k, b43, brcmfmac, mwifiex, rt2800, and wil6210 drivers are all well-represented, and a few other drivers are hit as well. I also pulled-in the wireless fixes tree in order to resolve some pending merge conflicts. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 11:16:35 -04:00
John W. Linville	5470b462c3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2013-03-20 15:24:57 -04:00
stephen hemminger	e76d120b68	chelsio: use netdev_alloc_skb_ip_align Use netdev_alloc_sk_ip_align in the case where packet is copied. This handles case where NET_IP_ALIGN == 0 as well as adding required header padding. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 15:24:40 -04:00
David S. Miller	a6f68034de	net: Move selftests to common net/ subdirectory. Suggested-by: Daniel Baluta <daniel.baluta@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 15:07:56 -04:00
Daniel Baluta	4c1d8d0617	net: fix psock_fanout selftest bind error message Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:42:41 -04:00
Eric Dumazet	70386d40e1	chelsio: add headroom in RX path Drivers should reserve some headroom in skb used in receive path, to avoid future head reallocation. One possible way to do that is to use dev_alloc_skb() instead of alloc_skb(), so that NET_SKB_PAD bytes are reserved. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:29:34 -04:00
Chris Metcalf	8fdc929f57	dynticks: avoid flow_cache_flush() interrupting every core Previously, if you did an "ifconfig down" or similar on one core, and the kernel had CONFIG_XFRM enabled, every core would be interrupted to check its percpu flow list for items that could be garbage collected. With this change, we generate a mask of cores that actually have any percpu items, and only interrupt those cores. When we are trying to isolate a set of cpus from interrupts, this is important to do. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:28:39 -04:00
Yuval Mintz	7fa6f34081	bnx2x: AER revised Revised bnx2x implementation of PCI Express Advanced Error Recovery - stop and free driver resources according to the AER flow (instead of the currently implemented `hope-for-the-best' release approach), and do not make any assumptions on the HW state after slot reset. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:27:28 -04:00
Wei Yongjun	47a5247fdd	net: fec: make local function fec_poll_controller() static fec_poll_controller() was not declared. It should be static. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:25:37 -04:00
Wei Yongjun	e052a5893b	net: ethernet: davinci_emac: make local function emac_poll_controller() static emac_poll_controller() was not declared. It should be static. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:25:37 -04:00
Sachin Kamat	9fad0c941a	net: mdio-octeon: Use module_platform_driver() module_platform_driver macro removes some boilerplate and simplifies the code. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: David Daney <ddaney@caviumnetworks.com> Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:25:37 -04:00
Sachin Kamat	f8e5fc8c20	net: mdio-gpio: Use module_platform_driver() module_platform_driver macro removes some boilerplate and simplifies the code. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:25:37 -04:00
Sachin Kamat	95d158df44	net: au1k_ir: Use module_platform_driver() module_platform_driver macro removes some boilerplate and simplifies the code. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:25:36 -04:00
Sachin Kamat	6d7496836d	net: s6gmac: Use module_platform_driver() module_platform_driver macro removes some boilerplate and simplifies the code. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:25:36 -04:00
Sachin Kamat	18e4a7374c	net: ks8695net: Use module_platform_driver() module_platform_driver macro removes some boilerplate and simplifies the code. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:25:36 -04:00
Jesper Derehag	2b5faa4c55	connector: Added coredumping event to the process connector Process connector can now also detect coredumping events. Main aim of patch is get notified at start of coredumping, instead of having to wait for it to finish and then being notified through EXIT event. Could be used for instance by process-managers that want to get notified as soon as possible about process failures, and not necessarily beeing notified after coredump, which could be in the order of minutes depending on size of coredump, piping and so on. Signed-off-by: Jesper Derehag <jderehag@hotmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:23:21 -04:00
Claudiu Manoil	800c644bcd	gianfar: Refactor config coalescing calls for all queues The only place where gfar_configure_coalescing is called with an actual bitmask (other than 0xff) is in gfar_poll (on the hot path). So make gfar_configure_coalescing() static for the buffer processing path, and export gfar_configure_coalescing_all() for the remaining cases that require to set coalescing for all the queues at once (on the slow path). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:53 -04:00
Claudiu Manoil	5d9657d83a	gianfar: Remove redundant programming of [rt]xic registers For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:52 -04:00
Claudiu Manoil	6be5ed3fef	gianfar: Poll only active Rx queues Split the napi budget fairly among the active queues only, instead of dividing it by the total number of Rx queues assigned to the given interrupt group. Use the h/w indication field RXFi in rstat (receive status register) to identify the active rx queues from the current interrupt group (i.e. receive event occured on ring i, if ring i is part of the current interrupt group). This indication field in rstat, RXFi i=0..7, allows us to find out on which queues of the same interrupt group do we have incomming traffic once we entered the polling routine for the given interrupt group. After servicing the ring i, the corresponding bit RXFi will be written with 1 to clear the active queue indication for that ring. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:52 -04:00
Claudiu Manoil	c233cf4074	gianfar: Fix tx napi polling There are 2 issues with the current napi poll routine, with regards to tx ring cleanup: 1) for multi-queue devices (MQ_MG_MODE), should tx_bit_map != rx_bit_map, which is possible (and supported in h/w) if the DT property "fsl,tx-bit-map" holds a different value than rx_bit_map, the current polling routine will service the wrong Tx queues in this case (i.e. the interrupt group will receive interrupts from tx queues that it will not service) 2) Tx cleanup completion consumes napi budget, whereas the napi budget should be reserved for Rx work only. The patch fixes these issues and provides a clean napi polling routine. Napi poll completion is reached when all the Rx queues have been serviced and there is no Tx work to do. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:21:52 -04:00
Daniel Borkmann	3e5289d5e3	filter: add ANC_PAY_OFFSET instruction for loading payload start offset It is very useful to do dynamic truncation of packets. In particular, we're interested to push the necessary header bytes to the user space and cut off user payload that should probably not be transferred for some reasons (e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET, we can load it into the accumulator, and return it. E.g. in bpfc syntax ... ld #poff ; { 0x20, 0, 0, 0xfffff034 }, ret a ; { 0x16, 0, 0, 0x00000000 }, ... as a filter will accomplish this without having to do a big hackery in a BPF filter itself. Follow-up JIT implementations are welcome. Thanks to Eric Dumazet for suggesting and discussing this during the Netfilter Workshop in Copenhagen. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:15:45 -04:00
Daniel Borkmann	f77668dc25	net: flow_dissector: add __skb_get_poff to get a start offset to payload __skb_get_poff() returns the offset to the payload as far as it could be dissected. The main user is currently BPF, so that we can dynamically truncate packets without needing to push actual payload to the user space and instead can analyze headers only. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:15:45 -04:00
David S. Miller	61816596d1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull in the 'net' tree to get Daniel Borkmann's flow dissector infrastructure change. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:46:26 -04:00
Willem de Bruijn	23a9072e3a	net: fix psock_fanout selftest hash collision Fix flaky results with PACKET_FANOUT_HASH depending on whether the two flows hash into the same packet socket or not. Also adds tests for PACKET_FANOUT_LB and PACKET_FANOUT_CPU and replaces the counting method with a packet ring. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:33:18 -04:00
Fabio Estevam	da2191e314	net: fec: Define indexes as 'unsigned int' Fix the following warnings that happen when building with W=1 option: drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_free_buffers': drivers/net/ethernet/freescale/fec.c:1337:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_alloc_buffers': drivers/net/ethernet/freescale/fec.c:1361:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_init': drivers/net/ethernet/freescale/fec.c:1631:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:28:59 -04:00
Kees Cook	896ee0eee6	net/irda: add missing error path release_sock call This makes sure that release_sock is called for all error conditions in irda_getsockopt. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Brad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:23:13 -04:00
Wei Yongjun	fa90b077d7	lpc_eth: fix error return code in lpc_eth_drv_probe() Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:19:15 -04:00
Sergei Shtylyov	fc0c090040	sh_eth: check TSU registers ioremap() error One must check the result of ioremap() -- in this case it prevents potential kernel oops when initializing TSU registers further on... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:17:59 -04:00
Sergei Shtylyov	0582b7d15f	sh_eth: fix bitbang memory leak sh_mdio_init() allocates pointer to 'struct bb_info' but only stores it locally, so that sh_mdio_release() can't free it on driver unload. Add the pointer to 'struct bb_info' to 'struct sh_eth_private', so that sh_mdio_init() can save 'bitbang' variable for sh_mdio_release() to be able to free it later... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:17:59 -04:00
Martin Fuzzey	283951f95b	ipconfig: Fix newline handling in log message. When using ipconfig the logs currently look like: Single name server: [ 3.467270] IP-Config: Complete: [ 3.470613] device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.480670] host=infigo-1, domain=, nis-domain=(none) [ 3.486166] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.492910] nameserver0=172.16.42.1[ 3.496853] ALSA device list: Three name servers: [ 3.496949] IP-Config: Complete: [ 3.500293] device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.510367] host=infigo-1, domain=, nis-domain=(none) [ 3.515864] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.522635] nameserver0=172.16.42.1, nameserver1=172.16.42.100 [ 3.529149] , nameserver2=172.16.42.200 Fix newline handling for these cases Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:15:58 -04:00
Daniel Borkmann	8ed781668d	flow_keys: include thoff into flow_keys for later usage In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're doing the work here anyway, also store thoff for a later usage, e.g. in the BPF filter. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:14:36 -04:00
David S. Miller	e0ebcb80cb	Merge branch 'l2tp' Tom Parkin says: ==================== This l2tp bugfix patchset addresses a number of issues. The first five patches in the series prevent l2tp sessions pinning an l2tp tunnel open. This occurs because the l2tp tunnel is torn down in the tunnel socket destructor, but each session holds a tunnel socket reference which prevents tunnels with sessions being deleted. The solution I've implemented here involves adding a .destroy hook to udp code, as discussed previously on netdev[1]. The subsequent seven patches address futher bugs exposed by fixing the problem above, or exposed through stress testing the implementation above. Patch 11 (avoid deadlock in l2tp stats update) isn't directly related to tunnel/session lifetimes, but it does prevent deadlocks on i386 kernels running on 64 bit hardware. This patchset has been tested on 32 and 64 bit preempt/non-preempt kernels, using iproute2, openl2tp, and custom-made stress test code. [1] http://comments.gmane.org/gmane.linux.network/259169 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:46 -04:00
Tom Parkin	f6e16b299b	l2tp: unhash l2tp sessions on delete, not on free If we postpone unhashing of l2tp sessions until the structure is freed, we risk: 1. further packets arriving and getting queued while the pseudowire is being closed down 2. the recv path hitting "scheduling while atomic" errors in the case that recv drops the last reference to a session and calls l2tp_session_free while in atomic context As such, l2tp sessions should be unhashed from l2tp_core data structures early in the teardown process prior to calling pseudowire close. For pseudowires like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:39 -04:00
Tom Parkin	7b7c0719cd	l2tp: avoid deadlock in l2tp stats update l2tp's u64_stats writers were incorrectly synchronised, making it possible to deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp code netlink commands while passing data through l2tp sessions. Previous discussion on netdev determined that alternative solutions such as spinlock writer synchronisation or per-cpu data would bring unjustified overhead, given that most users interested in high volume traffic will likely be running 64bit kernels on 64bit hardware. As such, this patch replaces l2tp's use of u64_stats with atomic_long_t, thereby avoiding the deadlock. Ref: http://marc.info/?l=linux-netdev&m=134029167910731&w=2 http://marc.info/?l=linux-netdev&m=134079868111131&w=2 Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:39 -04:00
Tom Parkin	cf2f5c886a	l2tp: push all ppp pseudowire shutdown through .release handler If userspace deletes a ppp pseudowire using the netlink API, either by directly deleting the session or by deleting the tunnel that contains the session, we need to tear down the corresponding pppox channel. Rather than trying to manage two pppox unbind codepaths, switch the netlink and l2tp_core session_close handlers to close via. the l2tp_ppp socket .release handler. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:39 -04:00
Tom Parkin	4c6e2fd354	l2tp: purge session reorder queue on delete Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall and l2tp_session_delete. Pseudowire implementations which are deleted only via. l2tp_core l2tp_session_delete calls can dispense with their own code for flushing the reorder queue. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:39 -04:00
Tom Parkin	48f72f92b3	l2tp: add session reorder queue purge function to core If an l2tp session is deleted, it is necessary to delete skbs in-flight on the session's reorder queue before taking it down. Rather than having each pseudowire implementation reaching into the l2tp_session struct to handle this itself, provide a function in l2tp_core to purge the session queue. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:39 -04:00
Tom Parkin	02d13ed5f9	l2tp: don't BUG_ON sk_socket being NULL It is valid for an existing struct sock object to have a NULL sk_socket pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:39 -04:00
Tom Parkin	8abbbe8ff5	l2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookup When looking up the tunnel socket in struct l2tp_tunnel, hold a reference whether the socket was created by the kernel or by userspace. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:38 -04:00
Tom Parkin	2b551c6e7d	l2tp: close sessions before initiating tunnel delete When a user deletes a tunnel using netlink, all the sessions in the tunnel should also be deleted. Since running sessions will pin the tunnel socket with the references they hold, have the l2tp_tunnel_delete close all sessions in a tunnel before finally closing the tunnel socket. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:38 -04:00
Tom Parkin	936063175a	l2tp: close sessions in ip socket destroy callback l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel socket being closed from userspace. We need to do the same thing for IP-encapsulation sockets. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:38 -04:00
Tom Parkin	e34f4c7050	l2tp: export l2tp_tunnel_closeall l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a tunnel when a UDP-encapsulation socket is destroyed. We need to do something similar for IP-encapsulation sockets. Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to call it from their .destroy handlers. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:38 -04:00
Tom Parkin	9980d001ce	l2tp: add udp encap socket destroy handler L2TP sessions hold a reference to the tunnel socket to prevent it going away while sessions are still active. However, since tunnel destruction is handled by the sock sk_destruct callback there is a catch-22: a tunnel with sessions cannot be deleted since each session holds a reference to the tunnel socket. If userspace closes a managed tunnel socket, or dies, the tunnel will persist and it will be neccessary to individually delete the sessions using netlink commands. This is ugly. To prevent this occuring, this patch leverages the udp encapsulation socket destroy callback to gain early notification when the tunnel socket is closed. This allows us to safely close the sessions running in the tunnel, dropping the tunnel socket references in the process. The tunnel socket is then destroyed as normal, and the tunnel resources deallocated in sk_destruct. While we're at it, ensure that l2tp_tunnel_closeall correctly drops session references to allow the sessions to be deleted rather than leaking. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:38 -04:00
Tom Parkin	44046a593e	udp: add encap_destroy callback Users of udp encapsulation currently have an encap_rcv callback which they can use to hook into the udp receive path. In situations where a encapsulation user allocates resources associated with a udp encap socket, it may be convenient to be able to also hook the proto .destroy operation. For example, if an encap user holds a reference to the udp socket, the destroy hook might be used to relinquish this reference. This patch adds a socket destroy hook into udp, which is set and enabled in the same way as the existing encap_rcv hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:10:38 -04:00
Masatake YAMATO	f1e79e2080	genetlink: trigger BUG_ON if a group name is too long Trigger BUG_ON if a group name is longer than GENL_NAMSIZ. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:05:51 -04:00
David S. Miller	90b2621fd4	Merge branch 'master' of git://1984.lsi.us.es/nf Pablo Neira Ayuso says: ==================== The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are: * Restrict IPv6 stateless NPT targets to the mangle table. Many users are complaining that this target does not work in the nat table, which is the wrong table for it, from Florian Westphal. * Fix possible use before initialization in the netns init path of several conntrack protocol trackers (introduced recently while improving conntrack netns support), from Gao Feng. * Fix incorrect initialization of copy_range in nfnetlink_queue, spotted by Eric Dumazet during the NFWS2013, patch from myself. * Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov. * Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu not required anymore after change introduced in 3.7, again from Julian. * Fix SYN looping in IPVS state sync if the backup is used a real server in DR/TUN modes, this required a new /proc entry to disable the director function when acting as backup, also from Julian. * Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by Paul Bolle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 10:23:52 -04:00

1 2 3 4 5 ...

362011 commits