alistair23-linux/net
Tuong Lien e95584a889 tipc: fix unlimited bundling of small messages
We have identified a problem with the "oversubscription" policy in the
link transmission code.

When small messages are transmitted, and the sending link has reached
the transmit window limit, those messages will be bundled and put into
the link backlog queue. However, bundles of data messages are counted
at the 'CRITICAL' level, so that the counter for that level, instead of
the counter for the real, bundled message's level is the one being
increased.
Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
to be tested against the unchanged counter for their own level, while
contributing to an unrestrained increase at the CRITICAL backlog level.

This leaves a gap in congestion control algorithm for small messages
that can result in starvation for other users or a "real" CRITICAL
user. Even that eventually can lead to buffer exhaustion & link reset.

We fix this by keeping a 'target_bskb' buffer pointer at each levels,
then when bundling, we only bundle messages at the same importance
level only. This way, we know exactly how many slots a certain level
have occupied in the queue, so can manage level congestion accurately.

By bundling messages at the same level, we even have more benefits. Let
consider this:
- One socket sends 64-byte messages at the 'CRITICAL' level;
- Another sends 4096-byte messages at the 'LOW' level;

When a 64-byte message comes and is bundled the first time, we put the
overhead of message bundle to it (+ 40-byte header, data copy, etc.)
for later use, but the next message can be a 4096-byte one that cannot
be bundled to the previous one. This means the last bundle carries only
one payload message which is totally inefficient, as for the receiver
also! Later on, another 64-byte message comes, now we make a new bundle
and the same story repeats...

With the new bundling algorithm, this will not happen, the 64-byte
messages will be bundled together even when the 4096-byte message(s)
comes in between. However, if the 4096-byte messages are sent at the
same level i.e. 'CRITICAL', the bundling algorithm will again cause the
same overhead.

Also, the same will happen even with only one socket sending small
messages at a rate close to the link transmit's one, so that, when one
message is bundled, it's transmitted shortly. Then, another message
comes, a new bundle is created and so on...

We will solve this issue radically by another patch.

Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:02:05 -04:00
..
6lowpan 6lowpan: no need to check return value of debugfs_create functions 2019-07-06 12:50:01 +02:00
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
802 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
appletalk appletalk: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
atm pppoatm: use %*ph to print small buffer 2019-09-05 12:33:28 +02:00
ax25 ax25: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
batman-adv net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2019-09-18 12:34:53 -07:00
bpf bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN 2019-07-25 18:00:41 -07:00
bpfilter Kbuild updates for v5.3 2019-07-12 16:03:16 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
caif treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 194 2019-05-30 11:29:22 -07:00
can can: add support of SAE J1939 protocol 2019-09-04 14:22:33 +02:00
ceph libceph: use ceph_kvmalloc() for osdmap arrays 2019-09-16 12:06:25 +02:00
core devlink: Fix error handling in param and info_get dumpit cb 2019-10-01 10:08:30 -07:00
dcb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 2019-05-30 11:29:52 -07:00
dccp ipv6: add priority parameter to ip6_xmit() 2019-09-27 12:05:02 +02:00
decnet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 53 2019-05-24 17:36:42 +02:00
dns_resolver Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
dsa Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2019-09-17 23:51:10 +02:00
ethernet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
hsr hsr: switch ->dellink() to ->ndo_uninit() 2019-07-11 14:37:45 -07:00
ieee802154 ieee802154: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
ife net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
ipv4 tcp: adjust rto_base in retransmits_timed_out() 2019-10-01 21:40:49 -04:00
ipv6 ipv6: Handle race in addrconf_dad_work 2019-10-01 21:43:41 -04:00
iucv net/af_iucv: mark expected switch fall-throughs 2019-07-29 10:26:14 -07:00
kcm kcm: disable preemption in kcm_parse_func_strparser() 2019-09-27 10:27:14 +02:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-07-08 19:48:57 -07:00
l2tp compat_ioctl: pppoe: fix PPPOEIOCSFWD handling 2019-07-30 14:42:13 -07:00
l3mdev ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF 2019-06-23 13:24:17 -07:00
lapb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-17 20:20:36 -07:00
llc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 281 2019-06-05 17:36:36 +02:00
mac80211 mac80211: keep BHs disabled while calling drv_tx_wake_queue() 2019-10-01 17:56:19 +02:00
mac802154 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
mpls ipv4: mpls: fix mpls_xmit for iptunnel 2019-08-25 14:34:08 -07:00
ncsi net/ncsi: Disable global multicast filter 2019-09-19 18:04:40 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2019-09-27 20:15:00 +02:00
netlabel netlabel: remove redundant assignment to pointer iter 2019-09-01 11:45:02 -07:00
netlink net: remove empty netlink_tap_exit_net 2019-06-14 19:50:33 -07:00
netrom netrom: hold sock when setting skb->destructor 2019-07-24 15:49:05 -07:00
nfc nfc: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
nsh treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
openvswitch openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC 2019-09-26 09:32:33 +02:00
packet net/packet: fix race in tpacket_snd() 2019-08-15 13:59:48 -07:00
phonet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
psample net: sched: take reference to psample group in flow_action infra 2019-09-16 09:18:03 +02:00
qrtr net: qrtr: Stop rx_worker before freeing node 2019-09-21 18:45:46 -07:00
rds net/rds: Check laddr_check before calling it 2019-09-27 12:10:55 +02:00
rfkill treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
rose treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
sched net: sched: cbs: Avoid division by zero when calculating the port rate 2019-10-01 09:51:39 -07:00
sctp ipv6: add priority parameter to ip6_xmit() 2019-09-27 12:05:02 +02:00
smc net/smc: make sure EPOLLOUT is raised 2019-08-20 12:25:14 -07:00
strparser Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sunrpc Highlights: 2019-09-27 17:00:27 -07:00
switchdev treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tipc tipc: fix unlimited bundling of small messages 2019-10-02 11:02:05 -04:00
tls net/tls: align non temporal copy to cache lines 2019-09-07 18:10:34 +02:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
vmw_vsock vsock: Fix a lockdep warning in __vsock_release() 2019-10-01 21:23:35 -04:00
wimax wimax: no need to check return value of debugfs_create functions 2019-08-10 15:25:47 -07:00
wireless nl80211: fix null pointer dereference 2019-10-01 17:56:19 +02:00
x25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 41 2019-05-24 17:27:12 +02:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-28 17:47:33 -07:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
compat.c uio: make import_iovec()/compat_import_iovec() return bytes on success 2019-05-31 15:30:03 -06:00
Kconfig devlink: Add packet trap infrastructure 2019-08-17 12:40:08 -07:00
Makefile
socket.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
sysctl_net.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00