alistair23-linux/net/ipv4
Jesper Dangaard Brouer fb452a1aa3 Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
This reverts commit 6d7b857d54.

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.

We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:

1) On systems with CPUs > 24, the heavier fully locked
   __percpu_counter_sum() is always invoked, which will be more
   expensive than the atomic_t that is reverted to.

Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option.  To mitigate this, the batch size could be
decreased and thresh be increased.

2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
   CPU, before SKBs are pushed into sockets on remote CPUs.  Given
   NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
   likely be limited.  Thus, a fair chance that atomic add+dec happen
   on the same CPU.

Revert note that commit 1d6119baf0 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.

Fixes: 6d7b857d54 ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf0 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 11:01:05 -07:00
..
netfilter netfilter: x_tables: Fix use-after-free in ipt_do_table. 2017-07-31 20:24:09 +02:00
af_inet.c igmp: Fix regression caused by igmp sysctl namespace code. 2017-08-09 22:46:44 -07:00
ah4.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-06-23 14:17:31 -04:00
arp.c networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
cipso_ipv4.c Cipso: cipso_v4_optptr enter infinite loop 2017-08-01 15:31:23 -07:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c net: convert in_device.refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
esp4.c esp: Fix skb tailroom calculation 2017-08-25 09:26:24 +02:00
esp4_offload.c esp: Fix error handling on layer 2 xmit. 2017-08-07 08:31:07 +02:00
fib_frontend.c ipv4: initialize fib_trie prior to register_netdev_notifier call. 2017-07-20 15:24:45 -07:00
fib_lookup.h net: add extack arg to lwtunnel build state 2017-05-30 11:55:32 -04:00
fib_notifier.c ipv4: fib: Remove redundant argument 2017-03-10 09:45:09 -08:00
fib_rules.c ipv4: fib_rules: Dump FIB rules when registering FIB notifier 2017-03-16 10:18:34 -07:00
fib_semantics.c ipv4: fix NULL dereference in free_fib_info_rcu() 2017-08-15 17:07:52 -07:00
fib_trie.c net, ipv4: convert fib_info.fib_clntref from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
fou.c gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP 2017-08-01 16:09:14 -07:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
icmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-06-15 11:59:32 -04:00
igmp.c net: igmp: Use ingress interface rather than vrf device 2017-08-16 11:08:55 -07:00
inet_connection_sock.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
inet_diag.c tcp: remove early retransmit 2017-01-13 22:37:16 -05:00
inet_fragment.c Revert "net: use lib/percpu_counter API for fragmentation mem accounting" 2017-09-03 11:01:05 -07:00
inet_hashtables.c net: make sk_ehashfn() static 2017-07-03 03:29:14 -07:00
inet_timewait_sock.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
inetpeer.c net: convert inet_peer.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
ip_forward.c ipv4: allow local fragmentation in ip_finish_output_gso() 2016-11-03 16:10:26 -04:00
ip_fragment.c net: convert inet_frag_queue.refcnt from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
ip_gre.c net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ip_input.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
ip_options.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_output.c udp: consistently apply ufo or fragmentation 2017-08-10 09:52:12 -07:00
ip_sockglue.c do_ip_setsockopt(): don't open-code memdup_user() 2017-06-30 02:04:09 -04:00
ip_tunnel.c ip_tunnel: fix potential issue in ip_tunnel_rcv 2017-06-16 12:01:29 -04:00
ip_tunnel_core.c net: store port/representator id in metadata_dst 2017-06-25 11:42:01 -04:00
ip_vti.c net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ipcomp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipconfig.c networking: convert many more places to skb_put_zero() 2017-06-16 11:48:35 -04:00
ipip.c net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ipmr.c net: ipmr: ipmr_get_table() returns NULL 2017-07-12 08:18:46 -07:00
Kconfig Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
Makefile tcp: ULP infrastructure 2017-06-15 12:12:40 -04:00
netfilter.c netfilter: use skb_to_full_sk in ip_route_me_harder 2017-02-28 12:49:36 +01:00
ping.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
proc.c tcp: add TCPMemoryPressuresChrono counter 2017-06-08 11:26:19 -04:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
raw_diag.c net: ip, raw_diag -- Use jump for exiting from nested loop 2016-11-03 15:25:26 -04:00
route.c net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set 2017-08-18 16:05:46 -07:00
syncookies.c ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check() 2017-07-18 11:22:51 -07:00
sysctl_net_ipv4.c tcp: ULP infrastructure 2017-06-15 12:12:40 -04:00
tcp.c tcp: fix refcnt leak with ebpf congestion control 2017-08-25 17:16:27 -07:00
tcp_bbr.c tcp_bbr: init pacing rate on first RTT sample 2017-07-15 14:43:29 -07:00
tcp_bic.c tcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp 2017-05-17 16:06:01 -04:00
tcp_cdg.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
tcp_cong.c tcp: fix refcnt leak with ebpf congestion control 2017-08-25 17:16:27 -07:00
tcp_cubic.c tcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp 2017-05-17 16:06:01 -04:00
tcp_dctcp.c Revert "dctcp: update cwnd on congestion event" 2016-12-06 11:34:24 -05:00
tcp_diag.c net: diag: Fix refcnt leak in error path destroying socket 2016-08-23 23:11:36 -07:00
tcp_fastopen.c bpf: Add TCP connection BPF callbacks 2017-07-01 16:15:14 -07:00
tcp_highspeed.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_htcp.c tcp: replace misc tcp_time_stamp to tcp_jiffies32 2017-05-17 16:06:01 -04:00
tcp_hybla.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_illinois.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_input.c tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP 2017-08-18 16:07:43 -07:00
tcp_ipv4.c tcp: fix possible deadlock in TCP stack vs BPF filter 2017-08-14 22:31:27 -07:00
tcp_lp.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_metrics.c tcp: use tcp_jiffies32 to feed tp->snd_cwnd_stamp 2017-05-17 16:06:01 -04:00
tcp_minisocks.c bpf: Support for setting initial receive window 2017-07-01 16:15:13 -07:00
tcp_nv.c tcpnv: do not export local function 2017-05-21 13:42:36 -04:00
tcp_offload.c net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
tcp_output.c tcp: fastopen: tcp_connect() must refresh the route 2017-08-08 20:39:52 -07:00
tcp_probe.c tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" 2017-02-21 13:26:03 -05:00
tcp_rate.c tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions 2017-06-15 12:12:40 -04:00
tcp_recovery.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_scalable.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_timer.c net: fix keepalive code vs TCP_FASTOPEN_CONNECT 2017-08-03 09:34:51 -07:00
tcp_ulp.c tcp: ulp: avoid module refcnt leak in tcp_set_ulp 2017-08-14 22:17:05 -07:00
tcp_vegas.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_vegas.h tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_veno.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_westwood.c tcp_westwood: use tcp_jiffies32 instead of tcp_time_stamp 2017-05-17 16:06:01 -04:00
tcp_yeah.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp.c udp: fix secpath leak 2017-09-01 10:29:34 -07:00
udp_diag.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
udp_impl.h udp: make *udp*_queue_rcv_skb() functions static 2017-05-18 10:23:33 -04:00
udp_offload.c net: avoid skb_warn_bad_offload false positives on UFO 2017-08-08 21:39:01 -07:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udplite.c udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
xfrm4_input.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm4_mode_transport.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_mode_tunnel.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_output.c xfrm: Add an IPsec hardware offloading API 2017-04-14 10:06:10 +02:00
xfrm4_policy.c xfrm: policy: make policy backend const 2017-02-09 10:22:19 +01:00
xfrm4_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm4_state.c xfrm: remove unused function 2017-01-10 10:57:12 +01:00
xfrm4_tunnel.c