remarkable-linux/net/ipv4
Neal Cardwell 5ae344c949 tcp: reduce spurious retransmits due to transient SACK reneging
This commit reduces spurious retransmits due to apparent SACK reneging
by only reacting to SACK reneging that persists for a short delay.

When a sequence space hole at snd_una is filled, some TCP receivers
send a series of ACKs as they apparently scan their out-of-order queue
and cumulatively ACK all the packets that have now been consecutiveyly
received. This is essentially misbehavior B in "Misbehaviors in TCP
SACK generation" ACM SIGCOMM Computer Communication Review, April
2011, so we suspect that this is from several common OSes (Windows
2000, Windows Server 2003, Windows XP). However, this issue has also
been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
into spurious retransmissions by lack of timestamps?" from March 2014,
where the receiver was thought to be a BSD box.

Since snd_una would temporarily be adjacent to a previously SACKed
range in these scenarios, this receiver behavior triggered the Linux
SACK reneging code path in the sender. This led the sender to clear
the SACK scoreboard, enter CA_Loss, and spuriously retransmit
(potentially) every packet from the entire write queue at line rate
just a few milliseconds before the ACK for each packet arrives at the
sender.

To avoid such situations, now when a sender sees apparent reneging it
does not yet retransmit, but rather adjusts the RTO timer to give the
receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
that will restore sanity to the SACK scoreboard. If the reneging
persists until this RTO then, as before, we clear the SACK scoreboard
and enter CA_Loss.

A 10ms delay tolerates a receiver sending such a stream of ACKs at
56Kbit/sec. And to allow for receivers with slower or more congested
paths, we wait for at least RTT/2.

We validated the resulting max(RTT/2, 10ms) delay formula with a mix
of North American and South American Google web server traffic, and
found that for ACKs displaying transient reneging:

 (1) 90% of inter-ACK delays were less than 10ms
 (2) 99% of inter-ACK delays were less than RTT/2

In tests on Google web servers this commit reduced reneging events by
75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
any measurable impact on latency for user HTTP and SPDY requests.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:29:33 -07:00
..
netfilter netfilter: kill remnants of ulog targets 2014-07-25 14:55:44 +02:00
af_inet.c net-gre-gro: Fix a bug that breaks the forwarding path 2014-07-16 14:45:26 -07:00
ah4.c ah4: Use the IPsec protocol multiplexer API 2014-02-25 07:04:17 +01:00
arp.c ipv4: arp: update neighbour address when a gratuitous arp is received and arp_accept is set 2014-01-02 00:08:38 -05:00
cipso_ipv4.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
datagram.c net: Save TX flow hash in sock and set in skbuf on xmit 2014-07-07 21:14:21 -07:00
devinet.c ipv4: fail early when creating netdev named all or default 2014-07-29 11:43:50 -07:00
esp4.c esp4: Use the IPsec protocol multiplexer API 2014-02-25 07:04:17 +01:00
fib_frontend.c ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif 2014-04-16 15:05:11 -04:00
fib_lookup.h ipv4: make fib_detect_death static 2013-12-28 17:01:46 -05:00
fib_rules.c inet: fix NULL pointer Oops in fib(6)_rule_suppress 2013-12-10 17:54:23 -05:00
fib_semantics.c ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation 2014-05-07 17:14:32 -04:00
fib_trie.c net: Revert "fib_trie: use seq_file_net rather than seq->private" 2014-06-04 15:11:41 -07:00
gre_demux.c GRE: enable offloads for GRE 2014-07-11 13:53:39 -07:00
gre_offload.c net/ipv4: Use IS_ERR_OR_NULL 2014-07-29 15:31:56 -07:00
icmp.c ipv4: remove nested rcu_read_lock/unlock 2014-08-02 15:27:35 -07:00
igmp.c igmp: remove exceptional & on function name 2014-07-24 23:23:31 -07:00
inet_connection_sock.c ipv4: make ip_local_reserved_ports per netns 2014-05-14 15:31:45 -04:00
inet_diag.c inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets 2014-01-13 22:35:46 -08:00
inet_fragment.c inet: frags: use kmem_cache for inet_frag_queue 2014-08-02 15:31:31 -07:00
inet_hashtables.c net: Use a more standard macro for INET_ADDR_COOKIE 2014-05-14 16:07:23 -04:00
inet_lro.c lro: remove dead code 2013-12-29 16:34:25 -05:00
inet_timewait_sock.c tcp/dccp: remove twchain 2013-10-08 23:19:24 -04:00
inetpeer.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
ip_forward.c net: rename local_df to ignore_df 2014-05-12 14:03:41 -04:00
ip_fragment.c inet: frags: use kmem_cache for inet_frag_queue 2014-08-02 15:31:31 -07:00
ip_gre.c gre: allow changing mac address when device is up 2014-06-10 22:46:42 -07:00
ip_input.c net: Fix memory leak if TPROXY used with TCP early demux 2014-01-27 16:22:11 -08:00
ip_options.c ipv4: fix buffer overflow in ip_options_compile() 2014-07-21 20:16:26 -07:00
ip_output.c net-timestamp: SOCK_RAW and PING timestamping 2014-07-15 16:32:45 -07:00
ip_sockglue.c ipv4: clean up cast warning in do_ip_getsockopt 2014-07-29 16:31:16 -07:00
ip_tunnel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-07-16 14:09:34 -07:00
ip_tunnel_core.c net: Support for multiple checksums with gso 2014-06-04 22:46:38 -07:00
ip_vti.c vti: Simplify error handling in module init and exit 2014-06-26 08:21:57 +02:00
ipcomp.c ipcomp4: Use the IPsec protocol multiplexer API 2014-02-25 07:04:17 +01:00
ipconfig.c ipconfig: Only bootp paths should reference ic_dev_xid. 2014-07-09 22:25:18 -07:00
ipip.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-11 16:02:55 -07:00
ipmr.c net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
Kconfig udp: Add udp_sock_create for UDP tunnels to open listener socket 2014-07-14 16:12:15 -07:00
Makefile udp: Add udp_sock_create for UDP tunnels to open listener socket 2014-07-14 16:12:15 -07:00
netfilter.c netfilter: remove double colon 2014-02-19 11:41:25 +01:00
ping.c net: use inet6_iif instead of IP6CB()->iif 2014-07-31 22:37:06 -07:00
proc.c inet: frag: don't account number of fragment queues 2014-07-27 22:34:36 -07:00
protocol.c net: remove outdated comment for ipv4 and ipv6 protocol handler 2013-11-28 18:47:51 -05:00
raw.c ipv4: Make IP_MULTICAST_ALL and IP_MSFILTER work on raw sockets 2014-07-23 15:13:26 -07:00
route.c ip: make IP identifiers less predictable 2014-07-28 18:46:34 -07:00
syncookies.c tcp: cookie_v4_init_sequence: skb should be const 2014-06-27 15:53:35 -07:00
sysctl_net_ipv4.c ipv4: make ip_local_reserved_ports per netns 2014-05-14 15:31:45 -04:00
tcp.c tcp: Fix divide by zero when pushing during tcp-repair 2014-07-02 18:21:03 -07:00
tcp_bic.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_cong.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_cubic.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-05-12 13:19:14 -04:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_fastopen.c tcp: remove unnecessary tcp_sk assignment. 2014-06-16 21:35:00 -07:00
tcp_highspeed.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_htcp.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_hybla.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_illinois.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_input.c tcp: reduce spurious retransmits due to transient SACK reneging 2014-08-05 16:29:33 -07:00
tcp_ipv4.c tcp: md5: remove unneeded check in tcp_v4_parse_md5_keys 2014-08-04 15:09:45 -07:00
tcp_lp.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_memcontrol.c cgroup: replace cftype->trigger() with cftype->write() 2014-05-13 12:16:21 -04:00
tcp_metrics.c tcp: don't require root to read tcp_metrics 2014-07-31 14:07:37 -07:00
tcp_minisocks.c inet: move ipv6only in sock_common 2014-07-01 23:46:21 -07:00
tcp_offload.c net-gre-gro: Fix a bug that breaks the forwarding path 2014-07-16 14:45:26 -07:00
tcp_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-07-16 14:09:34 -07:00
tcp_probe.c tcp: switch rtt estimations to usec resolution 2014-02-26 17:08:40 -05:00
tcp_scalable.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_timer.c tcp: reduce spurious retransmits due to transient SACK reneging 2014-08-05 16:29:33 -07:00
tcp_vegas.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_vegas.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
tcp_veno.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_westwood.c tcp: remove unused min_cwnd member of tcp_congestion_ops 2014-02-13 18:22:34 -05:00
tcp_yeah.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp.c sock: remove skb argument from sk_rcvqueues_full 2014-07-23 13:23:06 -07:00
udp_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
udp_impl.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
udp_offload.c net/udp_offload: Use IS_ERR_OR_NULL 2014-07-29 15:31:56 -07:00
udp_tunnel.c udp: Add udp_sock_create for UDP tunnels to open listener socket 2014-07-14 16:12:15 -07:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm4_input.c xfrm4: Add IPsec protocol multiplexer 2014-02-25 07:04:16 +01:00
xfrm4_mode_beet.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c inetpeer: get rid of ip_id_count 2014-06-02 11:00:41 -07:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-05-24 00:32:30 -04:00
xfrm4_policy.c xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly 2014-03-14 07:28:07 +01:00
xfrm4_protocol.c xfrm4: Remove duplicate semicolon 2014-06-30 07:49:47 +02:00
xfrm4_state.c inet: make no_pmtu_disc per namespace and kill ipv4_config 2013-12-18 16:58:20 -05:00
xfrm4_tunnel.c sit: add IPv4 over IPv4 support 2013-05-31 17:19:05 -07:00