alistair23-linux/net/ipv4
Andrey Ignatov d74bad4e74 bpf: Hooks for sys_connect
== The problem ==

See description of the problem in the initial patch of this patch set.

== The solution ==

The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.

It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.

Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
  significantly;
* there is no use-case for port.

As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.

Support is added for IPv4 and IPv6, for TCP and UDP.

IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.

== Implementation notes ==

The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk->sk_prot->connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk->sk_prot->connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.

bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.

bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:15:54 +02:00
..
netfilter net: Convert ipv4_net_ops 2018-03-08 12:36:45 -05:00
af_inet.c bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
ah4.c
arp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
cipso_ipv4.c
datagram.c
devinet.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
esp4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-17 00:10:42 -05:00
esp4_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-23 13:51:56 -05:00
fib_frontend.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
fib_lookup.h
fib_notifier.c
fib_rules.c ipv6: route: dissect flow in input path if fib rules need it 2018-02-28 22:44:44 -05:00
fib_semantics.c net/ipv4: Pass net to fib_multipath_hash instead of fib_info 2018-03-04 13:04:21 -05:00
fib_trie.c net: make kmem caches as __ro_after_init 2018-02-26 15:11:48 -05:00
fou.c net: Convert fou_net_ops 2018-03-05 10:48:28 -05:00
gre_demux.c
gre_offload.c
icmp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
igmp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
inet_connection_sock.c Revert "defer call to mem_cgroup_sk_alloc()" 2018-02-02 19:49:31 -05:00
inet_diag.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
inet_fragment.c net: Fix hlist corruptions in inet_evict_bucket() 2018-03-07 13:29:28 -05:00
inet_hashtables.c inet: Avoid unitialized variable warning in inet_unhash() 2018-02-01 09:48:42 -05:00
inet_timewait_sock.c net: Convert atomic_t net::count to refcount_t 2018-01-15 14:23:42 -05:00
inetpeer.c net: make kmem caches as __ro_after_init 2018-02-26 15:11:48 -05:00
ip_forward.c net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ip_fragment.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
ip_gre.c gre: fix TUNNEL_SEQ bit check on sequence numbering 2018-03-22 14:52:43 -04:00
ip_input.c net: Make ip_ra_chain per struct net 2018-03-22 15:12:56 -04:00
ip_options.c
ip_output.c net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ip_sockglue.c net: Replace ip_ra_lock with per-net mutex 2018-03-22 15:12:56 -04:00
ip_tunnel.c net: do not create fallback tunnels for non-default namespaces 2018-03-09 11:23:11 -05:00
ip_tunnel_core.c
ip_vti.c net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops 2018-02-27 11:01:37 -05:00
ipcomp.c
ipconfig.c ipconfig: use dev_set_mtu() 2018-01-24 19:13:45 -05:00
ipip.c net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops 2018-02-27 11:01:37 -05:00
ipmr.c net: Revert "ipv4: fix a deadlock in ip_ra_control" 2018-03-22 15:12:56 -04:00
ipmr_base.c ipmr, ip6mr: Unite dumproute flows 2018-03-01 13:13:23 -05:00
Kconfig ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
Makefile ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
netfilter.c netfilter: remove struct nf_afinfo and its helper functions 2018-01-08 18:11:02 +01:00
ping.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
proc.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
protocol.c
raw.c net: Revert "ipv4: fix a deadlock in ip_ra_control" 2018-03-22 15:12:56 -04:00
raw_diag.c
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
syncookies.c
sysctl_net_ipv4.c udp: Move the udp sysctl to namespace. 2018-03-16 12:03:30 -04:00
tcp.c bpf: sockmap redirect ingress support 2018-03-30 00:09:43 +02:00
tcp_bbr.c net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit 2018-03-16 15:07:48 -04:00
tcp_bic.c
tcp_cdg.c
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c
tcp_diag.c
tcp_fastopen.c
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-02-28 12:03:47 -05:00
tcp_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-06 01:20:46 -05:00
tcp_ipv4.c bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
tcp_lp.c
tcp_metrics.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
tcp_minisocks.c tcp: try to keep packet if SYN_RCV race is lost 2018-02-14 14:21:45 -05:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-01-31 10:26:30 -05:00
tcp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
tcp_output.c tcp_bbr: better deal with suboptimal GSO (II) 2018-03-01 21:44:28 -05:00
tcp_rate.c
tcp_recovery.c
tcp_scalable.c
tcp_timer.c tcp: purge write queue upon aborting the connection 2018-03-07 15:01:03 -05:00
tcp_ulp.c net: add a UID to use for ULP socket assignment 2018-02-06 11:39:31 +01:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
udp.c bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
udp_diag.c
udp_impl.h
udp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
udp_tunnel.c
udplite.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm4_policy.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c