Fork of reMarkable kernel https://github.com/reMarkable/linux
Go to file
Eric Dumazet 39e6c8208d net: solve a NAPI race
While playing with mlx4 hardware timestamping of RX packets, I found
that some packets were received by TCP stack with a ~200 ms delay...

Since the timestamp was provided by the NIC, and my probe was added
in tcp_v4_rcv() while in BH handler, I was confident it was not
a sender issue, or a drop in the network.

This would happen with a very low probability, but hurting RPC
workloads.

A NAPI driver normally arms the IRQ after the napi_complete_done(),
after NAPI_STATE_SCHED is cleared, so that the hard irq handler can grab
it.

Problem is that if another point in the stack grabs NAPI_STATE_SCHED bit
while IRQ are not disabled, we might have later an IRQ firing and
finding this bit set, right before napi_complete_done() clears it.

This can happen with busy polling users, or if gro_flush_timeout is
used. But some other uses of napi_schedule() in drivers can cause this
as well.

thread 1                                 thread 2 (could be on same cpu, or not)

// busy polling or napi_watchdog()
napi_schedule();
...
napi->poll()

device polling:
read 2 packets from ring buffer
                                          Additional 3rd packet is
available.
                                          device hard irq

                                          // does nothing because
NAPI_STATE_SCHED bit is owned by thread 1
                                          napi_schedule();

napi_complete_done(napi, 2);
rearm_irq();

Note that rearm_irq() will not force the device to send an additional
IRQ for the packet it already signaled (3rd packet in my example)

This patch adds a new NAPI_STATE_MISSED bit, that napi_schedule_prep()
can set if it could not grab NAPI_STATE_SCHED

Then napi_complete_done() properly reschedules the napi to make sure
we do not miss something.

Since we manipulate multiple bits at once, use cmpxchg() like in
sk_busy_loop() to provide proper transactions.

In v2, I changed napi_watchdog() to use a relaxed variant of
napi_schedule_prep() : No need to set NAPI_STATE_MISSED from this point.

In v3, I added more details in the changelog and clears
NAPI_STATE_MISSED in busy_poll_stop()

In v4, I added the ideas given by Alexander Duyck in v3 review

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-01 09:50:58 -08:00
arch Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2017-02-28 11:50:53 -08:00
block lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
certs certs: Add a secondary system keyring that can be added to dynamically 2016-04-11 22:48:09 +01:00
crypto crypto: change LZ4 modules to work with new LZ4 module version 2017-02-24 17:46:57 -08:00
Documentation This is a few small fixes to the main IPMI driver, make some 2017-02-28 21:06:30 -08:00
drivers net: usb: asix_devices: fix missing return code check on call to asix_write_medium_mode 2017-03-01 09:50:58 -08:00
firmware WHENCE: use https://linuxtv.org for LinuxTV URLs 2015-12-04 10:35:11 -02:00
fs The nfsd update this round is mainly a lot of miscellaneous cleanups and 2017-02-28 15:39:09 -08:00
include net: solve a NAPI race 2017-03-01 09:50:58 -08:00
init Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax 2017-02-28 20:29:41 -08:00
ipc ipc/shm: Fix shmat mmap nil-page protection 2017-02-27 18:43:46 -08:00
kernel Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-28 11:44:01 -08:00
lib Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax 2017-02-28 20:29:41 -08:00
mm Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax 2017-02-28 20:29:41 -08:00
net net: solve a NAPI race 2017-03-01 09:50:58 -08:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
scripts checkpatch: warn when formats use %Z and suggest %z 2017-02-27 18:43:48 -08:00
security lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
sound lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
tools Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax 2017-02-28 20:29:41 -08:00
usr kbuild: initramfs cleanup, set target from Kconfig 2017-01-05 09:40:16 -08:00
virt mm: add new mmget() helper 2017-02-27 18:43:48 -08:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:48:52 -04:00
.mailmap mailmap: add codeaurora.org names for nameless email commits 2017-01-10 18:31:55 -08:00
COPYING
CREDITS MAINTAINERS: Remove old e-mail address 2017-02-13 12:24:56 -05:00
Kbuild scripts/gdb: provide linux constants 2016-05-23 17:04:14 -07:00
Kconfig kbuild: migrate all arch to the kconfig mainmenu upgrade 2010-09-19 22:54:11 -04:00
MAINTAINERS MAINTAINERS: Orphan usb/net/hso driver 2017-03-01 09:50:58 -08:00
Makefile Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-28 10:15:59 -08:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.