alistair23-linux/net/ipv4
Daniel Borkmann 492135557d tcp: add rfc3168, section 6.1.1.1. fallback
This work as a follow-up of commit f7b3bec6f5 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:

  [...] A host that receives no reply to an ECN-setup SYN within the
  normal SYN retransmission timeout interval MAY resend the SYN and
  any subsequent SYN retransmissions with CWR and ECE cleared. [...]

Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3 ("tcp: extend
ECN sysctl to allow server-side only ECN"):

 1) Normal ECN-capable path:

    SYN ECE CWR ----->
                <----- SYN ACK ECE
            ACK ----->

 2) Path with broken middlebox, when client has fallback:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
            SYN ----->
                <----- SYN ACK
            ACK ----->

In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)

In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:

  Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
  Gorry Fairhurst, and Richard Scheffenegger:
    "Enabling Internet-Wide Deployment of Explicit Congestion Notification."
    Proc. PAM 2015, New York.
  http://ecn.ethz.ch/ecn-pam15.pdf

Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.

tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.

Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Dave That <dave.taht@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:53:37 -04:00
..
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2015-05-18 14:47:36 -04:00
af_inet.c net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
ah4.c
arp.c netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
cipso_ipv4.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
datagram.c
devinet.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
esp4.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
fib_frontend.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-04-06 22:34:15 -04:00
fib_lookup.h
fib_rules.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
fib_semantics.c ipv4: remove the unnecessary codes in fib_info_hash_move 2015-05-02 22:17:44 -04:00
fib_trie.c switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/ 2015-05-12 18:43:52 -04:00
fou.c fou: avoid missing unlock in failure path 2015-04-16 12:11:19 -04:00
geneve_core.c geneve_core: identify as driver library in modules description 2015-05-13 15:59:13 -04:00
gre_demux.c
gre_offload.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
icmp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
igmp.c net: Export IGMP/MLD message validation code 2015-05-04 14:49:23 -04:00
inet_connection_sock.c inet: fix possible panic in reqsk_queue_unlink() 2015-04-24 11:39:15 -04:00
inet_diag.c tcp: prepare CC get_info() access from getsockopt() 2015-04-29 17:10:38 -04:00
inet_fragment.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
inet_hashtables.c tcp/dccp: get rid of central timewait timer 2015-04-13 16:40:05 -04:00
inet_lro.c
inet_timewait_sock.c tcp/dccp: tw_timer_handler() is static 2015-05-13 15:21:33 -04:00
inetpeer.c
ip_forward.c ip_forward: Drop frames with attached skb->sk 2015-04-20 14:07:33 -04:00
ip_fragment.c IPv4: skip ICMP for bridge contrack users when defrag expires 2015-05-19 00:15:27 -04:00
ip_gre.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ip_input.c netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
ip_options.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
ip_output.c bridge_netfilter: No ICMP packet on IPv4 fragmentation error 2015-05-19 00:15:39 -04:00
ip_sockglue.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
ip_tunnel.c udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). 2015-04-07 15:29:08 -04:00
ip_tunnel_core.c ip_tunnel: Report Rx dropped in ip_tunnel_get_stats64 2015-05-14 22:30:54 -04:00
ip_vti.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
ipcomp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipconfig.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipip.c ipip: fix one sparse error 2015-05-17 13:08:29 -04:00
ipmr.c netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
Kconfig geneve: Rename support library as geneve_core 2015-05-13 15:59:13 -04:00
Makefile geneve: Rename support library as geneve_core 2015-05-13 15:59:13 -04:00
netfilter.c netfilter: Use nf_hook_state in nf_queue_entry. 2015-04-04 12:25:22 -04:00
ping.c ipv4: Missing sk_nulls_node_init() in ping_unhash(). 2015-05-01 22:02:47 -04:00
proc.c tcp: add TCPWinProbe and TCPKeepAlive SNMP counters 2015-05-09 16:42:32 -04:00
protocol.c
raw.c Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-13 18:18:05 -04:00
route.c net: ipv4: route: Fix sending IGMP messages with link address 2015-05-04 00:04:08 -04:00
syncookies.c tcp: fix ipv4 mapped request socks 2015-03-25 00:57:48 -04:00
sysctl_net_ipv4.c tcp: add rfc3168, section 6.1.1.1. fallback 2015-05-19 16:53:37 -04:00
tcp.c tcp: Return error instead of partial read for saved syn headers 2015-05-19 16:33:34 -04:00
tcp_bic.c
tcp_cong.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
tcp_cubic.c
tcp_dctcp.c tcp: prepare CC get_info() access from getsockopt() 2015-04-29 17:10:38 -04:00
tcp_diag.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
tcp_fastopen.c tcp: add tcpi_bytes_received to tcp_info 2015-04-29 17:10:37 -04:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c tcp: prepare CC get_info() access from getsockopt() 2015-04-29 17:10:38 -04:00
tcp_input.c tcp: allow one skb to be received per socket under memory pressure 2015-05-17 22:45:49 -04:00
tcp_ipv4.c tcp: add rfc3168, section 6.1.1.1. fallback 2015-05-19 16:53:37 -04:00
tcp_lp.c
tcp_memcontrol.c
tcp_metrics.c tcp: RFC7413 option support for Fast Open client 2015-04-07 18:36:39 -04:00
tcp_minisocks.c tcp: provide SYN headers for passive connections 2015-05-05 16:02:34 -04:00
tcp_offload.c
tcp_output.c tcp: add rfc3168, section 6.1.1.1. fallback 2015-05-19 16:53:37 -04:00
tcp_probe.c
tcp_scalable.c
tcp_timer.c tcp: introduce tcp_under_memory_pressure() 2015-05-17 22:45:48 -04:00
tcp_vegas.c tcp: prepare CC get_info() access from getsockopt() 2015-04-29 17:10:38 -04:00
tcp_vegas.h tcp: prepare CC get_info() access from getsockopt() 2015-04-29 17:10:38 -04:00
tcp_veno.c
tcp_westwood.c tcp_westwood: fix tcp_westwood_info() 2015-04-30 00:27:44 -04:00
tcp_yeah.c
tunnel4.c
udp.c net: remove extra newlines 2015-04-07 22:24:37 -04:00
udp_diag.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
udp_impl.h
udp_offload.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
udp_tunnel.c net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
udplite.c
xfrm4_input.c netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: hash net ptr into fragmentation bucket selection 2015-03-25 14:07:04 -04:00
xfrm4_output.c netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
xfrm4_policy.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c