1
0
Fork 0
alistair23-linux/net
Eric Dumazet edb09eb17e net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()

In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.

[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:37:14 -07:00
..
6lowpan 6lowpan: move mac802154 header 2016-04-13 10:41:10 +02:00
9p remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
802
8021q vlan: Propagate MAC address to VLANs 2016-05-31 11:56:48 -07:00
appletalk appletalk: fix erroneous return value 2016-02-18 14:59:34 -05:00
atm net/atm: sk_err_soft must be positive 2016-05-23 13:51:10 -07:00
ax25 ax25: add link layer header validation function 2016-03-09 22:13:01 -05:00
batman-adv batman-adv: initialize ELP orig address on secondary interfaces 2016-05-18 11:49:44 +08:00
bluetooth net_sched: transform qdisc running bit into a seqcount 2016-06-07 16:37:13 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
caif net: caif: fix misleading indentation 2016-03-14 13:09:50 -04:00
can sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
ceph libceph: make ceph_osdc_wait_request() uninterruptible 2016-05-26 01:15:40 +02:00
core net: sched: do not acquire qdisc spinlock in qdisc/class stats dump 2016-06-07 16:37:14 -07:00
dcb
dccp dccp: do not assume DCCP code is non preemptible 2016-05-02 17:02:25 -04:00
decnet decnet: Do not build routes to devices without decnet private data. 2016-04-10 23:01:30 -04:00
dns_resolver KEYS: Add a facility to restrict new links into a keyring 2016-04-11 22:37:37 +01:00
dsa net: dsa: Add new binding implementation 2016-06-04 14:29:55 -07:00
ethernet eth: Pull header from first fragment via eth_get_headlen 2016-02-24 13:58:05 -05:00
hsr net/hsr: Use setup_timer and mod_timer. 2016-05-16 14:00:43 -04:00
ieee802154 net_sched: transform qdisc running bit into a seqcount 2016-06-07 16:37:13 -07:00
ipv4 net: disable fragment reassembly if high_thresh is zero 2016-06-05 22:56:42 -04:00
ipv6 skbuff: introduce skb_gso_validate_mtu 2016-06-03 19:37:21 -04:00
ipx
irda TTY and Serial driver update for 4.7-rc1 2016-05-20 20:57:27 -07:00
iucv af_iucv: Validate socket address length in iucv_sock_bind() 2016-01-19 14:21:08 -05:00
kcm kcm: fix a signedness in kcm_splice_read() 2016-05-19 11:26:51 -07:00
key
l2tp net_sched: transform qdisc running bit into a seqcount 2016-06-07 16:37:13 -07:00
l3mdev net: l3mdev: Allow send on enslaved interface 2016-05-09 22:33:52 -04:00
lapb net/lapb: tuse %*ph to dump buffers 2016-05-29 22:33:25 -07:00
llc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
mac80211 Some more work for 4.7, notably: 2016-05-12 11:46:58 -04:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
mpls skbuff: introduce skb_gso_validate_mtu 2016-06-03 19:37:21 -04:00
netfilter net: sched: do not acquire qdisc spinlock in qdisc/class stats dump 2016-06-07 16:37:14 -07:00
netlabel netlabel: fix a problem with netlbl_secattr_catmap_setrng() 2016-04-05 16:10:47 -04:00
netlink netlink: Fix dump skb leak/double free 2016-05-16 22:05:15 -04:00
netrom
nfc nfc: nci: Add nci_nfcc_loopback to the nci core 2016-05-04 01:48:16 +02:00
openvswitch ovs: set name assign type of internal port 2016-06-02 18:05:47 -04:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-23 18:51:33 -04:00
phonet sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
qrtr Merge tag 'qcom-soc-for-4.7-2' into net-next 2016-05-17 14:11:19 -04:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-20 20:01:26 -07:00
rfkill rfkill: Use switch to demux userspace operations 2016-04-05 10:48:53 +02:00
rose
rxrpc rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB 2016-06-03 19:41:31 -04:00
sched net: sched: do not acquire qdisc spinlock in qdisc/class stats dump 2016-06-07 16:37:14 -07:00
sctp sctp: Fix warning in sctp_packet_transmit_chunk() 2016-06-03 22:53:26 -07:00
sunrpc NFS client updates for Linux 4.7 2016-05-26 10:33:33 -07:00
switchdev switchdev: pass pointer to fib_info instead of copy 2016-05-17 13:58:49 -04:00
tipc tipc: fix potential null pointer dereferences in some compat functions 2016-05-25 12:33:52 -07:00
unix constify security_path_{mkdir,mknod,symlink} 2016-03-28 00:47:27 -04:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
wimax
wireless mm/page_ref: use page_ref helper instead of direct modification of _count 2016-05-19 19:12:14 -07:00
x25 net: fix a kernel infoleak in x25 module 2016-05-09 22:45:33 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
Kconfig bpf: add generic constant blinding for use in jits 2016-05-16 13:49:32 -04:00
Makefile net: Add Qualcomm IPC router 2016-05-08 23:46:14 -04:00
compat.c
socket.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
sysctl_net.c