1
0
Fork 0
alistair23-linux/net/core
John Fastabend 95fafa1cb7 bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self
[ Upstream commit 6fa9201a89 ]

If a socket redirects to itself and it is under memory pressure it is
possible to get a socket stuck so that recv() returns EAGAIN and the
socket can not advance for some time. This happens because when
redirecting a skb to the same socket we received the skb on we first
check if it is OK to enqueue the skb on the receiving socket by checking
memory limits. But, if the skb is itself the object holding the memory
needed to enqueue the skb we will keep retrying from kernel side
and always fail with EAGAIN. Then userspace will get a recv() EAGAIN
error if there are no skbs in the psock ingress queue. This will continue
until either some skbs get kfree'd causing the memory pressure to
reduce far enough that we can enqueue the pending packet or the
socket is destroyed. In some cases its possible to get a socket
stuck for a noticeable amount of time if the socket is only receiving
skbs from sk_skb verdict programs. To reproduce I make the socket
memory limits ridiculously low so sockets are always under memory
pressure. More often though if under memory pressure it looks like
a spurious EAGAIN error on user space side causing userspace to retry
and typically enough has moved on the memory side that it works.

To fix skip memory checks and skb_orphan if receiving on the same
sock as already assigned.

For SK_PASS cases this is easy, its always the same socket so we
can just omit the orphan/set_owner pair.

For backlog cases we need to check skb->sk and decide if the orphan
and set_owner pair are needed.

Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556572660.73229.12566203819812939627.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:29:19 +01:00
..
Makefile bpf: Introduce bpf sk local storage 2019-04-27 09:07:04 -07:00
bpf_sk_storage.c bpf: Improve bucket_log calculation logic 2020-02-14 16:34:10 -05:00
datagram.c net: use indirect call wrappers for skb_copy_datagram_iter() 2020-05-02 08:49:00 +02:00
datagram.h net/core: Allow the compiler to verify declaration and definition consistency 2019-03-27 13:49:44 -07:00
dev.c net: Fix bridge enslavement failure 2020-09-26 18:03:13 +02:00
dev_addr_lists.c net: remove unnecessary variables and callback 2019-10-24 14:53:49 -07:00
dev_ioctl.c net/core: Document all dev_ioctl() arguments 2019-03-27 13:49:43 -07:00
devlink.c devlink: Add missing genlmsg_cancel() in devlink_nl_sb_port_pool_fill() 2020-11-24 13:28:55 +01:00
drop_monitor.c drop_monitor: work around gcc-10 stringop-overflow warning 2020-05-20 08:20:06 +02:00
dst.c net: print proper warning on dst underflow 2019-09-26 09:05:56 +02:00
dst_cache.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ethtool.c net: Zeroing the structure ethtool_wolinfo in ethtool_get_wol() 2019-10-26 11:20:10 -07:00
failover.c failover: allow name change on IFF_UP slave interfaces 2019-04-10 22:12:26 -07:00
fib_notifier.c net: fib_notifier: move fib_notifier_ops from struct net into per-net struct 2019-09-07 17:28:22 +02:00
fib_rules.c net: fib_rules: Correctly set table field when table number exceeds 8 bits 2020-03-05 16:43:31 +01:00
filter.c net: Properly typecast int values to set sk_max_pacing_rate 2020-10-29 09:57:26 +01:00
flow_dissector.c flow_dissector: Drop BPF flow dissector prog ref on netns cleanup 2020-05-27 17:46:49 +02:00
flow_offload.c net: core: rename indirect block ingress cb function 2019-12-18 16:08:47 +01:00
gen_estimator.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
gen_stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
gro_cells.c gro_cells: make sure device is up in gro_cells_receive() 2019-03-10 11:07:14 -07:00
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: link_watch: prevent starvation when processing linkwatch wq 2019-07-01 19:02:47 -07:00
lwt_bpf.c net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-18 16:08:42 +01:00
lwtunnel.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
neighbour.c Exempt multicast addresses from five-second neighbor lifetime 2020-11-24 13:28:56 +01:00
net-procfs.c treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively 2019-04-09 14:19:06 +02:00
net-sysfs.c net-sysfs: add a newline when printing 'tx_timeout' by sysfs 2020-07-31 18:39:30 +02:00
net-sysfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
net-traces.c page_pool: add tracepoints for page_pool with details need by XDP 2019-06-19 11:23:13 -04:00
net_namespace.c netns: fix GFP flags in rtnl_net_notifyid() 2019-10-25 20:14:42 -07:00
netclassid_cgroup.c cgroup, netclassid: remove double cond_resched 2020-05-10 10:31:32 +02:00
netevent.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
netpoll.c net: Have netpoll bring-up DSA management interface 2020-11-24 13:28:57 +01:00
netprio_cgroup.c netprio_cgroup: Fix unlimited memory leak of v2 cgroups 2020-05-20 08:20:12 +02:00
page_pool.c page_pool: do not release pool until inflight == 0. 2019-12-18 16:09:07 +01:00
pktgen.c net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build 2020-04-01 11:02:18 +02:00
ptp_classifier.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
request_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
rtnetlink.c rtnetlink: Fix memory(net_device) leak when ->newlink fails 2020-07-31 18:39:30 +02:00
scm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
secure_seq.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
skbuff.c net/core: check length before updating Ethertype in skb_mpls_{push,pop} 2020-10-14 10:33:06 +02:00
skmsg.c bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self 2020-11-24 13:29:19 +01:00
sock.c socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabled 2020-11-01 12:01:01 +01:00
sock_diag.c sock: make cookie generation global instead of per netns 2019-08-09 13:14:46 -07:00
sock_map.c bpf: sockmap: Require attach_bpf_fd when detaching a program 2020-08-07 09:34:02 +02:00
sock_reuseport.c udp: Copy has_conns in reuseport_grow(). 2020-07-31 18:39:31 +02:00
stream.c tcp: make sure EPOLLOUT wont be missed 2019-08-19 13:07:43 -07:00
sysctl_net_core.c bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() 2020-07-16 08:16:45 +02:00
timestamping.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 61 2019-05-24 17:36:45 +02:00
tso.c net: Use skb accessors in network core 2019-07-22 20:47:56 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-02-05 21:22:52 +00:00
xdp.c xdp: obtain the mem_id mutex before trying to remove an entry. 2019-12-18 16:09:10 +01:00