1
0
Fork 0
alistair23-linux/net
Eric Dumazet 3226b158e6 net: avoid 32 x truesize under-estimation for tiny skbs
Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd3 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 10:51:52 -08:00
..
6lowpan
9p 9p for 5.11-rc1 2020-12-21 10:28:02 -08:00
802
8021q net: make free_netdev() more lenient with unregistering devices 2021-01-08 19:27:41 -08:00
appletalk net: appletalk: fix kerneldoc warnings 2020-10-30 11:48:17 -07:00
atm atm: nicstar: Replace in_interrupt() usage 2020-11-18 16:43:55 -08:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
batman-adv batman-adv: Drop unused soft-interface.h include in fragmentation.c 2020-12-04 08:41:16 +01:00
bluetooth Bluetooth: Increment management interface revision 2020-12-07 17:02:00 +02:00
bpf bpf: fix raw_tp test run in preempt kernel 2020-09-30 08:34:08 -07:00
bpfilter Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" 2020-10-15 12:33:24 -07:00
bridge net: bridge: Fix a warning when del bridge sysfs 2020-12-14 18:27:49 -08:00
caif caif: Remove duplicate macro SRVL_CTRL_PKT_SIZE 2020-09-05 15:57:05 -07:00
can can: isotp: isotp_getname(): fix kernel information leak 2021-01-13 22:15:13 +01:00
ceph libceph: align session_key and con_secret to 16 bytes 2020-12-28 20:34:33 +01:00
core net: avoid 32 x truesize under-estimation for tiny skbs 2021-01-14 10:51:52 -08:00
dcb net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands 2021-01-12 15:55:21 -08:00
dccp selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
decnet treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
dns_resolver
dsa net: dsa: clear devlink port type before unregistering slave netdevs 2021-01-12 18:48:50 -08:00
ethernet net: datagram: fix some kernel-doc markups 2020-11-17 14:15:03 -08:00
ethtool ethtool: fix error paths in ethnl_set_channels() 2020-12-16 13:27:17 -08:00
hsr genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
ieee802154 treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
ife
ipv4 esp: avoid unneeded kmap_atomic call 2021-01-11 18:20:09 -08:00
ipv6 net: sit: unregister_netdevice on newlink's error path 2021-01-14 10:26:46 -08:00
iucv net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddr 2020-12-08 15:56:53 -08:00
kcm net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
l2tp lsm,selinux: pass flowi_common instead of flowi to the LSM hooks 2020-11-23 18:36:21 -05:00
l3mdev net: l3mdev: Fix kerneldoc warning 2020-10-30 11:43:42 -07:00
lapb net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event 2021-01-04 13:42:41 -08:00
llc net: llc: Fix kerneldoc warnings 2020-10-30 11:34:09 -07:00
mac80211 A new set of wireless changes: 2020-12-12 10:07:56 -08:00
mac802154 net: mac802154: convert tasklets to use new tasklet_setup() API 2020-11-07 10:40:56 -08:00
mpls mpls: drop skb's dst in mpls_forward() 2020-11-03 12:55:53 -08:00
mptcp mptcp: better msk-level shutdown. 2021-01-12 20:09:19 -08:00
ncsi net/ncsi: Use real net-device for response handler 2020-12-23 12:22:23 -08:00
netfilter netfilter: nf_nat: Fix memleak in nf_nat_init 2021-01-11 00:34:11 +01:00
netlabel Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-11-19 19:08:46 -08:00
netlink netlink: export policy in extended ACK 2020-10-09 20:22:32 -07:00
netrom treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
nfc net: sched: fix spelling mistake in Kconfig "trys" -> "tries" 2020-12-08 16:01:56 -08:00
nsh
openvswitch net: openvswitch: fix TTL decrement exception action execution 2020-12-14 17:18:25 -08:00
packet net: af_packet: fix procfs header for 64-bit pointers 2020-12-18 12:17:23 -08:00
phonet treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
psample genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
qrtr net: qrtr: fix null-ptr-deref in qrtr_ns_remove 2021-01-05 16:50:09 -08:00
rds rds: stop using dmapool 2020-11-17 15:22:06 -04:00
rfkill rfkill: add a reason to the HW rfkill state 2020-12-11 12:47:17 +01:00
rose rose: Fix Null pointer dereference in rose_send_frame() 2020-11-20 10:04:58 -08:00
rxrpc rxrpc: Call state should be read with READ_ONCE() under some circumstances 2021-01-13 10:38:20 -08:00
sched net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sctp sctp: Fix some typo 2020-11-23 17:44:11 -08:00
smc net/smc: use memcpy instead of snprintf to avoid out of bounds read 2021-01-12 20:22:01 -08:00
strparser
sunrpc NFS client updates for Linux 5.11 2020-12-17 12:15:03 -08:00
switchdev net: switchdev: Fixed kerneldoc warning 2020-09-23 17:46:31 -07:00
tipc net: tip: fix a couple kernel-doc markups 2021-01-14 10:30:24 -08:00
tls net: fix proc_fs init handling in af_packet and tls 2020-12-14 19:39:30 -08:00
unix networking changes for the 5.10 merge window 2020-10-15 18:42:13 -07:00
vmw_vsock af_vsock: Assign the vsock transport considering the vsock address flags 2020-12-14 19:33:39 -08:00
wireless cfg80211: select CONFIG_CRC32 2021-01-05 15:50:36 -08:00
x25 net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xdp xsk: Rollback reservation at NETDEV_TX_BUSY 2020-12-18 16:10:21 +01:00
xfrm selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
Kconfig wimax: move out to staging 2020-10-29 19:27:45 +01:00
Makefile wimax: move out to staging 2020-10-29 19:27:45 +01:00
compat.c iov_iter: transparently handle compat iovecs in import_iovec 2020-10-03 00:02:13 -04:00
devres.c net: devres: rename the release callback of devm_register_netdev() 2020-06-30 15:57:34 -07:00
socket.c for-5.11/io_uring-2020-12-14 2020-12-16 12:44:05 -08:00
sysctl_net.c