alistair23-linux/net/ipv6
Eric Dumazet 05255b823a tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
..
ila net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
netfilter netfilter: nf_tables: NAT chain and extensions require NF_TABLES 2018-04-19 12:31:34 +02:00
addrconf.c ipv6: addrconf: don't evaluate keep_addr_on_down twice 2018-04-25 13:03:37 -04:00
addrconf_core.c net: ipv6: Make inet6addr_validator a blocking notifier 2017-10-20 13:15:07 +01:00
addrlabel.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
af_inet6.c tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive 2018-04-29 21:29:55 -04:00
ah6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-11-15 11:56:19 -08:00
anycast.c net/ipv6: Remove aca_idev 2018-04-19 15:40:13 -04:00
calipso.c net, calipso: convert calipso_doi.refcount from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
datagram.c ipv6: add a wrapper for ip6_dst_store() with flowi6 checks 2018-04-04 11:31:57 -04:00
esp6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-17 00:10:42 -05:00
esp6_offload.c esp: check the NETIF_F_HW_ESP_TX_CSUM bit before segmenting 2018-02-27 10:46:01 +01:00
exthdrs.c ipv6: Count interface receive statistics on the ingress netdev 2018-04-17 13:39:51 -04:00
exthdrs_core.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
exthdrs_offload.c
fib6_notifier.c net: Add module reference to FIB notifiers 2017-09-01 20:33:42 -07:00
fib6_rules.c net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
fou6.c
icmp.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
inet6_connection_sock.c
inet6_hashtables.c inet: Add a 2nd listener hashtable (port+addr) 2017-12-03 10:18:28 -05:00
ip6_checksum.c udplite: fix partial checksum initialization 2018-02-16 15:57:42 -05:00
ip6_fib.c net/ipv6: Make from in rt6_info rcu protected 2018-04-21 16:06:14 -04:00
ip6_flowlabel.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ip6_gre.c ip6_gre: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip6_icmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip6_input.c ipv6: Count interface receive statistics on the ingress netdev 2018-04-17 13:39:51 -04:00
ip6_offload.c udp: add udp gso 2018-04-26 15:07:42 -04:00
ip6_offload.h
ip6_output.c udp: paged allocation with gso 2018-04-26 15:08:15 -04:00
ip6_tunnel.c ip6_tunnel: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip6_udp_tunnel.c
ip6_vti.c vti6: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip6mr.c net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
ipcomp6.c
ipv6_sockglue.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
Kconfig ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mcast.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
mcast_snoop.c
mip6.c
ndisc.c net/ipv6: Rename fib6_info struct elements 2018-04-19 15:40:12 -04:00
netfilter.c netfilter: use skb_to_full_sk in ip6_route_me_harder 2018-02-25 20:51:13 +01:00
output_core.c net: accept UFO datagrams from tuntap and packet 2017-11-24 01:37:35 +09:00
ping.c ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow() 2018-04-04 11:31:57 -04:00
proc.c inet: frags: break the 2GB limit for frags storage 2018-03-31 23:25:39 -04:00
protocol.c
raw.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
reassembly.c ipv6: frags: fix a lockdep false positive 2018-04-18 23:19:39 -04:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-24 23:59:11 -04:00
seg6.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
seg6_hmac.c ipv6: sr: Use ARRAY_SIZE macro 2017-09-01 18:35:23 -07:00
seg6_iptunnel.c ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode 2018-04-25 13:02:15 -04:00
seg6_local.c net/ipv6: Pass skb to route lookup 2018-03-04 13:04:22 -05:00
sit.c ipv6: sit: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
syncookies.c net/ipv4: disable SMC TCP option with SYN Cookies 2018-03-25 20:53:54 -04:00
sysctl_net_ipv6.c ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode 2018-04-25 13:02:15 -04:00
tcp_ipv6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
tcpv6_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
tunnel6.c
udp.c udp: add gso segment cmsg 2018-04-26 15:08:51 -04:00
udp_impl.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
udp_offload.c udp: add udp gso 2018-04-26 15:07:42 -04:00
udplite.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm6_input.c xfrm: Reinject transport-mode packets through tasklet 2017-12-19 08:23:21 +01:00
xfrm6_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm6_mode_ro.c ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() 2017-06-02 13:57:27 -04:00
xfrm6_mode_transport.c ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() 2017-06-02 13:57:27 -04:00
xfrm6_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm6_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm6_policy.c net/ipv6: Remove unused code and variables for rt6_info 2018-04-17 23:41:18 -04:00
xfrm6_protocol.c
xfrm6_state.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
xfrm6_tunnel.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00