1
0
Fork 0
alistair23-linux/net
Jesper Dangaard Brouer fb452a1aa3 Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
This reverts commit 6d7b857d54.

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.

We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:

1) On systems with CPUs > 24, the heavier fully locked
   __percpu_counter_sum() is always invoked, which will be more
   expensive than the atomic_t that is reverted to.

Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option.  To mitigate this, the batch size could be
decreased and thresh be increased.

2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
   CPU, before SKBs are pushed into sockets on remote CPUs.  Given
   NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
   likely be limited.  Thus, a fair chance that atomic add+dec happen
   on the same CPU.

Revert note that commit 1d6119baf0 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.

Fixes: 6d7b857d54 ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf0 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 11:01:05 -07:00
..
6lowpan 6lowpan: Don't set IFF_NO_QUEUE 2017-04-12 22:02:40 +02:00
9p Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
802 net: introduce __skb_put_[zero, data, u8] 2017-06-20 13:30:14 -04:00
8021q net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
appletalk networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
atm net, atm: convert eg_cache_entry.use from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
ax25 net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t 2017-07-04 22:35:19 +01:00
batman-adv batman-adv: fix TT sync flag inconsistencies 2017-07-31 11:17:38 +02:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
bpf bpf: Align packet data properly in program testing framework. 2017-05-02 11:46:28 -04:00
bridge bridge: switchdev: Clear forward mark when transmitting packet 2017-09-01 10:11:28 -07:00
caif net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
can networking: introduce and use skb_put_data() 2017-06-16 11:48:37 -04:00
ceph libceph: make RECOVERY_DELETES feature create a new interval 2017-08-01 16:46:45 +02:00
core mlxsw: spectrum: Forbid linking to devices that have uppers 2017-09-01 09:59:41 -07:00
dcb dcb: enforce minimum length on IEEE_APPS attribute 2017-05-21 13:42:33 -04:00
dccp dccp: defer ccid_hc_tx_delete() at dismantle time 2017-08-16 14:26:26 -07:00
decnet net, decnet: convert dn_fib_info.fib_clntref from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
dns_resolver Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
dsa net: dsa: Don't dereference dst->cpu_dp->netdev 2017-08-28 21:19:43 -07:00
ethernet networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
hsr net/hsr: Check skb_put_padto() return value 2017-08-22 13:40:23 -07:00
ieee802154 net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ife net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
ipv4 Revert "net: use lib/percpu_counter API for fragmentation mem accounting" 2017-09-03 11:01:05 -07:00
ipv6 ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt 2017-08-29 10:54:40 -07:00
ipx net, ipx: convert ipx_route.refcnt from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
irda irda: do not leak initialized list.dev to userspace 2017-08-18 16:21:51 -07:00
iucv iucv: Convert sk_wmem_alloc accesses to refcount_t. 2017-07-03 02:31:22 -07:00
kcm kcm: do not attach PF_KCM sockets to avoid deadlock 2017-08-30 15:55:10 -07:00
key af_key: do not use GFP_KERNEL in atomic contexts 2017-08-14 22:18:12 -07:00
l2tp l2tp: hold tunnel used while creating sessions with netlink 2017-08-28 11:34:58 -07:00
l3mdev
lapb net, lapb: convert lapb_cb.refcnt from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
llc net, llc: convert llc_sap.refcnt from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
mac80211 mac80211: add api to start ba session timer expired flow 2017-08-09 09:49:42 +03:00
mac802154 net: Fix inconsistent teardown and release of private netdev state. 2017-06-07 15:53:24 -04:00
mpls mpls: fix uninitialized in_label var warning in mpls_getroute 2017-07-08 11:26:41 +01:00
ncsi networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2017-08-24 11:49:19 -07:00
netlabel netlink: pass extended ACK struct to parsing functions 2017-04-13 13:58:22 -04:00
netlink net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
netrom net, netrom: convert nr_node.refcount from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
nfc NFC: Add sockaddr length checks before accessing sa_family in bind handlers 2017-06-23 00:38:31 +02:00
openvswitch openvswitch: fix skb_panic due to the incorrect actions attrlen 2017-08-16 14:12:37 -07:00
packet packet: Don't write vnet header beyond end of buffer 2017-08-29 15:09:13 -07:00
phonet net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
psample networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
qrtr networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
rds rds: Reintroduce statistics counting 2017-08-08 21:03:47 -07:00
rfkill net: rfkill: gpio: Switch to devm_acpi_dev_add_driver_gpios() 2017-06-13 11:07:51 +02:00
rose net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
rxrpc rxrpc: Fix oops when discarding a preallocated service call 2017-08-18 16:23:23 -07:00
sched sch_tbf: fix two null pointer dereferences on init failure 2017-08-30 15:26:12 -07:00
sctp sctp: Avoid out-of-bounds reads from address storage 2017-08-23 22:35:15 -07:00
smc net/smc: Add warning about remote memory exposure 2017-05-16 14:49:43 -04:00
strparser strparser: destroy workqueue on module exit 2017-03-03 20:43:26 -08:00
sunrpc Two nfsd bugfixes, neither 4.13 regressions, but both potentially 2017-08-25 17:27:26 -07:00
switchdev net: switchdev: Change notifier chain to be atomic 2017-06-08 14:16:24 -04:00
tipc tipc: permit bond slave as bearer 2017-08-29 15:07:33 -07:00
tls TLS: Fix length check in do_tls_getsockopt_tx() 2017-07-06 10:58:19 +01:00
unix datagram: When peeking datagrams with offset < 0 don't skip empty skbs 2017-08-18 15:12:54 -07:00
vmw_vsock net: manual clean code which call skb_put_[data:zero] 2017-06-20 13:30:15 -04:00
wimax genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
wireless netlink validation fixes for nl80211 2017-07-07 11:35:55 +01:00
x25 net, x25: convert x25_neigh.refcnt from atomic_t to refcount_t 2017-07-04 22:35:18 +01:00
xfrm xfrm_user: fix info leak in build_aevent() 2017-08-28 10:58:02 +02:00
Kconfig tls: kernel TLS support 2017-06-15 12:12:40 -04:00
Makefile tls: kernel TLS support 2017-06-15 12:12:40 -04:00
compat.c get_compat_bpf_fprog(): don't copyin field-by-field 2017-07-04 13:14:34 -04:00
socket.c net/socket: fix type in assignment and trim long line 2017-07-24 14:17:01 -07:00
sysctl_net.c sysctl: Remove dead register_sysctl_root 2017-04-16 23:42:49 -05:00