Commit graph

738464 commits

Author SHA1 Message Date
Davide Caratti 5bf7f8185f net/sched: fix idr leak in the error path of tcf_act_police_init()
tcf_act_police_init() can fail after the idr has been successfully
reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
subsequent attempts to configure a police rule using the same idr value
systematiclly fail with -ENOSPC:

 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 ...

Fix this in the error path of tcf_act_police_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().

Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21 18:12:00 -04:00
Davide Caratti 60e10b3adc net/sched: fix idr leak in the error path of tcf_simp_init()
if the kernel fails to duplicate 'sdata', creation of a new action fails
with -ENOMEM. However, subsequent attempts to install the same action
using the same value of 'index' systematically fail with -ENOSPC, and
that value of 'index' will no more be usable by act_simple, until rmmod /
insmod of act_simple.ko is done:

 # tc actions add action simple sdata hello index 100
 # tc actions list action simple

        action order 0: Simple <hello>
         index 100 ref 1 bind 0
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup().

Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21 18:11:53 -04:00
Davide Caratti bbc09e7842 net/sched: fix idr leak on the error path of tcf_bpf_init()
when the following command sequence is entered

 # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
 RTNETLINK answers: Invalid argument
 We have an error talking to the kernel
 # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

act_bpf correctly refuses to install the first TC rule, because 31 is not
a valid instruction. However, it refuses to install the second TC rule,
even if the BPF code is correct. Furthermore, it's no more possible to
install any other rule having the same value of 'index' until act_bpf
module is unloaded/inserted again. After the idr has been reserved, call
tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.

Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21 18:11:46 -04:00
Colin Ian King 3f2176dd7f qede: fix spelling mistake: "registeration" -> "registration"
Trivial fix to spelling mistakes in DP_ERR error message text and
comments

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21 18:09:46 -04:00
Colin Ian King 924613d3a8 bnx2x: fix spelling mistake: "registeration" -> "registration"
Trivial fix to spelling mistake in BNX2X_ERR error message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21 18:09:03 -04:00
David S. Miller 3d27484eba Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-03-21

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Follow-up fix to the fault injection framework to prevent jump
   optimization on the kprobe by installing a dummy post-handler,
   from Masami.

2) Drop bpf_perf_prog_read_value helper from tracepoint type programs
   which was mistakenly added there and would otherwise crash due to
   wrong input context, from Yonghong.

3) Fix a crash in BPF fs when compiled with clang. Code appears to
   be fine just that clang tries to overly aggressive optimize in
   non C conform ways, therefore fix the kernel's Makefile to
   generally prevent such issues, from Daniel.

4) Skip unnecessary capability checks in bpf syscall, which is otherwise
   triggering unnecessary security hooks on capability checking and
   causing false alarms on unprivileged processes trying to access
   CAP_SYS_ADMIN restricted infra, from Chenbo.

5) Fix the test_bpf.ko module when CONFIG_BPF_JIT_ALWAYS_ON is set
   with regards to a test case that is really just supposed to fail
   on x8_64 JIT but not others, from Thadeu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21 12:09:04 -04:00
Daniel Borkmann 87e0d4f0f3 kbuild: disable clang's default use of -fmerge-all-constants
Prasad reported that he has seen crashes in BPF subsystem with netd
on Android with arm64 in the form of (note, the taint is unrelated):

  [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001
  [ 4134.820925] Mem abort info:
  [ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
  [ 4135.016736]   SET = 0, FnV = 0
  [ 4135.119820]   EA = 0, S1PTW = 0
  [ 4135.201431] Data abort info:
  [ 4135.301388]   ISV = 0, ISS = 0x00000021
  [ 4135.359599]   CM = 0, WnR = 0
  [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000
  [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000
  [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP
  [ 4135.674610] Modules linked in:
  [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S      W       4.14.19+ #1
  [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000
  [ 4135.731599] PC is at bpf_prog_add+0x20/0x68
  [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
  [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145
  [ 4135.769062] sp : ffffff801d4e3ce0
  [...]
  [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000)
  [ 4136.273746] Call trace:
  [...]
  [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006
  [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584
  [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68
  [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c
  [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c
  [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88

Android's netd was basically pinning the uid cookie BPF map in BPF
fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it
again resulting in above panic. Issue is that the map was wrongly
identified as a prog! Above kernel was compiled with clang 4.0,
and it turns out that clang decided to merge the bpf_prog_iops and
bpf_map_iops into a single memory location, such that the two i_ops
could then not be distinguished anymore.

Reason for this miscompilation is that clang has the more aggressive
-fmerge-all-constants enabled by default. In fact, clang source code
has a comment about it in lib/AST/ExprConstant.cpp on why it is okay
to do so:

  Pointers with different bases cannot represent the same object.
  (Note that clang defaults to -fmerge-all-constants, which can
  lead to inconsistent results for comparisons involving the address
  of a constant; this generally doesn't matter in practice.)

The issue never appeared with gcc however, since gcc does not enable
-fmerge-all-constants by default and even *explicitly* states in
it's option description that using this flag results in non-conforming
behavior, quote from man gcc:

  Languages like C or C++ require each variable, including multiple
  instances of the same variable in recursive calls, to have distinct
  locations, so using this option results in non-conforming behavior.

There are also various clang bug reports open on that matter [1],
where clang developers acknowledge the non-conforming behavior,
and refer to disabling it with -fno-merge-all-constants. But even
if this gets fixed in clang today, there are already users out there
that triggered this. Thus, fix this issue by explicitly adding
-fno-merge-all-constants to the kernel's Makefile to generically
disable this optimization, since potentially other places in the
kernel could subtly break as well.

Note, there is also a flag called -fmerge-constants (not supported
by clang), which is more conservative and only applies to strings
and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In
gcc's code, the two flags -fmerge-{all-,}constants share the same
variable internally, so when disabling it via -fno-merge-all-constants,
then we really don't merge any const data (e.g. strings), and text
size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o).

  $ gcc -fverbose-asm -O2 foo.c -S -o foo.S
    -> foo.S lists -fmerge-constants under options enabled
  $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S
    -> foo.S doesn't list -fmerge-constants under options enabled
  $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S
    -> foo.S lists -fmerge-constants under options enabled

Thus, as a workaround we need to set both -fno-merge-all-constants
*and* -fmerge-constants in the Makefile in order for text size to
stay as is.

  [1] https://bugs.llvm.org/show_bug.cgi?id=18538

Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chenbo Feng <fengc@google.com>
Cc: Richard Smith <richard-llvm@metafoo.co.uk>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: linux-kernel@vger.kernel.org
Tested-by: Prasad Sodagudi <psodagud@codeaurora.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-03-20 17:43:15 -07:00
Chenbo Feng 0fa4fe85f4 bpf: skip unnecessary capability check
The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-20 23:50:39 +01:00
Yonghong Song f005afede9 trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programs
Commit 4bebdc7a85 ("bpf: add helper bpf_perf_prog_read_value")
added helper bpf_perf_prog_read_value so that perf_event type program
can read event counter and enabled/running time.
This commit, however, introduced a bug which allows this helper
for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value
needs to access perf_event through its bpf_perf_event_data_kern type context,
which is not available for tracepoint type program.

This patch fixed the issue by separating bpf_func_proto between tracepoint
and perf_event type programs and removed bpf_perf_prog_read_value
from tracepoint func prototype.

Fixes: 4bebdc7a85 ("bpf: add helper bpf_perf_prog_read_value")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-20 23:08:52 +01:00
Thadeu Lima de Souza Cascardo 52fda36d63 test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches
Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
commit 09584b4067 ("bpf: fix selftests/bpf test_kmod.sh failure when
CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
case.

However, it does not fail on other architectures, which have a different
JIT compiler design. So, test_bpf has started to fail to load on those.

After this fix, test_bpf loads fine on both x86_64 and ppc64el.

Fixes: 09584b4067 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-20 23:04:30 +01:00
Stefano Brivio 5f2fb802ee ipv6: old_dport should be a __be16 in __ip6_datagram_connect()
Fixes: 2f987a76a9 ("net: ipv6: keep sk status consistent after datagram connect failure")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 12:43:43 -04:00
David S. Miller 8220ce6d1b linux-can-fixes-for-4.16-20180319
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEENrCndlB/VnAEWuH5k9IU1zQoZfEFAlqvjK0THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCT0hTXNChl8ZMKB/4mf5fq7ed4EUKpeu2a6LTxoVl1bMWR
 SDwsjVM35x83riG2mctP0TsnZMq16w4PzzDsBERkjBFz2GuiZzfNkpy3dOSR/Y6G
 uNl2lJAKJftqJnx7r7qQIfrn0ItAIjGfanGsDI/ZwhiHYM7DGEXbDYUo8BMJAjvW
 t9tnqWa6vk7uy2UXxvnLNtqt40EWXL9kKnNjUvL5Ik5g1bKPn9hqcyhKJi48X9Jt
 +29MuuLai8dZlpInn9FR/QKZK2Z2ug3Y5oasB9ArNOGDuG7sNAW2gVjpVaZYi3Py
 wDUIaQ793FWPBvRm82nj67TomKE06Mm8SDFH1S7TOLEPePfXQwaYzG+1
 =kYfz
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.16-20180319' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2018-03-19

this is a pull reqeust of one patch for net/master.

The patch is by Andri Yngvason and fixes a potential use-after-free bug
in the cc770 driver introduced in the previous pull-request.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 12:42:36 -04:00
Igor Pylypiv 44caebd368 net: gemini: fix memory leak
cppcheck report:
[drivers/net/ethernet/cortina/gemini.c:543]: (error) Memory leak: skb_tab

Signed-off-by: Igor Pylypiv <igor.pylypiv@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 12:08:45 -04:00
Christophe JAILLET 00777fac28 net: ethernet: arc: Fix a potential memory leak if an optional regulator is deferred
If the optional regulator is deferred, we must release some resources.
They will be re-allocated when the probe function will be called again.

Fixes: 6eacf31139 ("ethernet: arc: Add support for Rockchip SoC layer device tree bindings")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 12:06:23 -04:00
Arkadi Sharshevsky 7fe4d6dcbc devlink: Remove redundant free on error path
The current code performs unneeded free. Remove the redundant skb freeing
during the error path.

Fixes: 1555d204e7 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 10:59:29 -04:00
Igor Pylypiv 8137a8e219 vmxnet3: remove unused flag "rxcsum" from struct vmxnet3_adapter
Signed-off-by: Igor Pylypiv <ipylypiv@silver-peak.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 10:56:25 -04:00
David S. Miller b5dd0e792f Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
Here are a few more important Bluetooth driver fixes for the 4.16
kernel.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 10:28:09 -04:00
David S. Miller 36fe095606 Merge branch 'phy-relax-error-checking'
Grygorii Strashko says:

====================
net: phy: relax error checking when creating sysfs link netdev->phydev

Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
one netdevice, as result such drivers will produce warning during system
boot and fail to connect second phy to netdevice when PHYLIB framework
will try to create sysfs link netdev->phydev for second PHY
in phy_attach_direct(), because sysfs link with the same name has been
created already for the first PHY.
As result, second CPSW external port will became unusable.
This regression was introduced by commits:
5568363f0c ("net: phy: Create sysfs reciprocal links for attached_dev/phydev"
a399546049 ("net: phy: Relax error checking on sysfs_create_link()"

Patch 1: exports sysfs_create_link_nowarn() function as preparation for Patch 2.
Patch 2: relaxes error checking when PHYLIB framework is creating sysfs
link netdev->phydev in phy_attach_direct(), suppresses warning by using
sysfs_create_link_nowarn() and adds error message instead, so links creation
failure is not fatal any more and system can continue working,
which fixes TI CPSW issue and makes boot logs accessible
in case of NFS boot, for example.

This can be stable material 4.13+.

Changes in v2:
- commit messages updated.

v1:
 https://patchwork.ozlabs.org/cover/886058/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19 21:14:27 -04:00
Grygorii Strashko 4414b3ed74 net: phy: relax error checking when creating sysfs link netdev->phydev
Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
one netdevice, as result such drivers will produce warning during system
boot and fail to connect second phy to netdevice when PHYLIB framework
will try to create sysfs link netdev->phydev for second PHY
in phy_attach_direct(), because sysfs link with the same name has been
created already for the first PHY. As result, second CPSW external
port will became unusable.

Fix it by relaxing error checking when PHYLIB framework is creating sysfs
link netdev->phydev in phy_attach_direct(), suppressing warning by using
sysfs_create_link_nowarn() and adding error message instead.
After this change links (phy->netdev and netdev->phy) creation failure is not
fatal any more and system can continue working, which fixes TI CPSW issue.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: a399546049 ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19 21:14:27 -04:00
Grygorii Strashko 2399ac42e7 sysfs: symlink: export sysfs_create_link_nowarn()
The sysfs_create_link_nowarn() is going to be used in phylib framework in
subsequent patch which can be built as module. Hence, export
sysfs_create_link_nowarn() to avoid build errors.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: a399546049 ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19 21:14:26 -04:00
Andri Yngvason 9ffd750394 can: cc770: Fix use after free in cc770_tx_interrupt()
This fixes use after free introduced by the last cc770 patch.

Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Fixes: 746201235b ("can: cc770: Fix queue stall & dropped RTR reply")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-03-19 10:57:29 +01:00
Florian Fainelli a069215cf5 net: fec: Fix unbalanced PM runtime calls
When unbinding/removing the driver, we will run into the following warnings:

[  259.655198] fec 400d1000.ethernet: 400d1000.ethernet supply phy not found, using dummy regulator
[  259.665065] fec 400d1000.ethernet: Unbalanced pm_runtime_enable!
[  259.672770] fec 400d1000.ethernet (unnamed net_device) (uninitialized): Invalid MAC address: 00:00:00:00:00:00
[  259.683062] fec 400d1000.ethernet (unnamed net_device) (uninitialized): Using random MAC address: f2:3e:93:b7:29:c1
[  259.696239] libphy: fec_enet_mii_bus: probed

Avoid these warnings by balancing the runtime PM calls during fec_drv_remove().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-18 16:32:47 -04:00
Ronak Doshi 034f405793 vmxnet3: use correct flag to indicate LRO feature
'Commit 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2
(fwd)")' introduced a flag "lro" in structure vmxnet3_adapter which is
used to indicate whether LRO is enabled or not. However, the patch
did not set the flag and hence it was never exercised.

So, when LRO is enabled, it resulted in poor TCP performance due to
delayed acks. This issue is seen with packets which are larger than
the mss getting a delayed ack rather than an immediate ack, thus
resulting in high latency.

This patch removes the lro flag and directly uses device features
against NETIF_F_LRO to check if lro is enabled.

Fixes: 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)")
Reported-by: Rachel Lunnon <rachel_lunnon@stormagic.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 20:03:53 -04:00
Ronak Doshi 7a4c003d69 vmxnet3: avoid xmit reset due to a race in vmxnet3
The field txNumDeferred is used by the driver to keep track of the number
of packets it has pushed to the emulation. The driver increments it on
pushing the packet to the emulation and the emulation resets it to 0 at
the end of the transmit.

There is a possibility of a race either when (a) ESX is under heavy load or
(b) workload inside VM is of low packet rate.

This race results in xmit hangs when network coalescing is disabled. This
change creates a local copy of txNumDeferred and uses it to perform ring
arithmetic.

Reported-by: Noriho Tanaka <ntanaka@vmware.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 20:03:53 -04:00
David S. Miller 78f1b04fa2 Merge branch 'tcf_foo_init-NULL-deref'
Davide Caratti says:

====================
net/sched: fix NULL dereference in the error path of .init()

with several TC actions it's possible to see NULL pointer dereference,
when the .init() function calls tcf_idr_alloc(), fails at some point and
then calls tcf_idr_release(): this series fixes all them introducing
non-NULL tests in the .cleanup() function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:53:29 -04:00
Davide Caratti 2d43361017 net/sched: fix NULL dereference on the error path of tcf_skbmod_init()
when the following command

 # tc action replace action skbmod swap mac index 100

is run for the first time, and tcf_skbmod_init() fails to allocate struct
tcf_skbmod_params, tcf_skbmod_cleanup() calls kfree_rcu(NULL), thus
causing the following error:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 IP: __call_rcu+0x23/0x2b0
 PGD 8000000034057067 P4D 8000000034057067 PUD 74937067 PMD 0
 Oops: 0002 [#1] SMP PTI
 Modules linked in: act_skbmod(E) psample ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache jbd2 crc32_pclmul snd_hda_core ghash_clmulni_intel snd_hwdep pcbc snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd glue_helper snd cryptd virtio_balloon joydev soundcore pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_net virtio_blk ata_piix libata crc32c_intel virtio_pci serio_raw virtio_ring virtio i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_skbmod]
 CPU: 3 PID: 3144 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__call_rcu+0x23/0x2b0
 RSP: 0018:ffffbd2e403e7798 EFLAGS: 00010246
 RAX: ffffffffc0872080 RBX: ffff981d34bff780 RCX: 00000000ffffffff
 RDX: ffffffff922a5f00 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000021f
 R10: 000000003d003000 R11: 0000000000aaaaaa R12: 0000000000000000
 R13: ffffffff922a5f00 R14: 0000000000000001 R15: ffff981d3b698c2c
 FS:  00007f3678292740(0000) GS:ffff981d3fd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000008 CR3: 000000007c57a006 CR4: 00000000001606e0
 Call Trace:
  __tcf_idr_release+0x79/0xf0
  tcf_skbmod_init+0x1d1/0x210 [act_skbmod]
  tcf_action_init_1+0x2cc/0x430
  tcf_action_init+0xd3/0x1b0
  tc_ctl_action+0x18b/0x240
  rtnetlink_rcv_msg+0x29c/0x310
  ? _cond_resched+0x15/0x30
  ? __kmalloc_node_track_caller+0x1b9/0x270
  ? rtnl_calcit.isra.28+0x100/0x100
  netlink_rcv_skb+0xd2/0x110
  netlink_unicast+0x17c/0x230
  netlink_sendmsg+0x2cd/0x3c0
  sock_sendmsg+0x30/0x40
  ___sys_sendmsg+0x27a/0x290
  ? filemap_map_pages+0x34a/0x3a0
  ? __handle_mm_fault+0xbfd/0xe20
  __sys_sendmsg+0x51/0x90
  do_syscall_64+0x6e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
 RIP: 0033:0x7f36776a3ba0
 RSP: 002b:00007fff4703b618 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fff4703b740 RCX: 00007f36776a3ba0
 RDX: 0000000000000000 RSI: 00007fff4703b690 RDI: 0000000000000003
 RBP: 000000005aaaba36 R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fff4703b0a0 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fff4703b754 R14: 0000000000000001 R15: 0000000000669f60
 Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
 RIP: __call_rcu+0x23/0x2b0 RSP: ffffbd2e403e7798
 CR2: 0000000000000008

Fix it in tcf_skbmod_cleanup(), ensuring that kfree_rcu(p, ...) is called
only when p is not NULL.

Fixes: 86da71b573 ("net_sched: Introduce skbmod action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:53:29 -04:00
Davide Caratti 1f110e7cae net/sched: fix NULL dereference in the error path of tcf_sample_init()
when the following command

 # tc action add action sample rate 100 group 100 index 100

is run for the first time, and psample_group_get(100) fails to create a
new group, tcf_sample_cleanup() calls psample_group_put(NULL), thus
causing the following error:

 BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
 IP: psample_group_put+0x15/0x71 [psample]
 PGD 8000000075775067 P4D 8000000075775067 PUD 7453c067 PMD 0
 Oops: 0002 [#1] SMP PTI
 Modules linked in: act_sample(E) psample ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core mbcache jbd2 crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq ghash_clmulni_intel pcbc snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer glue_helper snd cryptd joydev pcspkr i2c_piix4 soundcore virtio_balloon nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_net ata_piix virtio_console virtio_blk libata serio_raw crc32c_intel virtio_pci i2c_core virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_tunnel_key]
 CPU: 2 PID: 5740 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:psample_group_put+0x15/0x71 [psample]
 RSP: 0018:ffffb8a80032f7d0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000024
 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffffc06d93c0
 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
 R10: 00000000bd003000 R11: ffff979fba04aa59 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff979fbba3f22c
 FS:  00007f7638112740(0000) GS:ffff979fbfd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000000000001c CR3: 00000000734ea001 CR4: 00000000001606e0
 Call Trace:
  __tcf_idr_release+0x79/0xf0
  tcf_sample_init+0x125/0x1d0 [act_sample]
  tcf_action_init_1+0x2cc/0x430
  tcf_action_init+0xd3/0x1b0
  tc_ctl_action+0x18b/0x240
  rtnetlink_rcv_msg+0x29c/0x310
  ? _cond_resched+0x15/0x30
  ? __kmalloc_node_track_caller+0x1b9/0x270
  ? rtnl_calcit.isra.28+0x100/0x100
  netlink_rcv_skb+0xd2/0x110
  netlink_unicast+0x17c/0x230
  netlink_sendmsg+0x2cd/0x3c0
  sock_sendmsg+0x30/0x40
  ___sys_sendmsg+0x27a/0x290
  ? filemap_map_pages+0x34a/0x3a0
  ? __handle_mm_fault+0xbfd/0xe20
  __sys_sendmsg+0x51/0x90
  do_syscall_64+0x6e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
 RIP: 0033:0x7f7637523ba0
 RSP: 002b:00007fff0473ef58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fff0473f080 RCX: 00007f7637523ba0
 RDX: 0000000000000000 RSI: 00007fff0473efd0 RDI: 0000000000000003
 RBP: 000000005aaaac80 R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fff0473e9e0 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fff0473f094 R14: 0000000000000001 R15: 0000000000669f60
 Code: be 02 00 00 00 48 89 df e8 a9 fe ff ff e9 7c ff ff ff 0f 1f 40 00 0f 1f 44 00 00 53 48 89 fb 48 c7 c7 c0 93 6d c0 e8 db 20 8c ef <83> 6b 1c 01 74 10 48 c7 c7 c0 93 6d c0 ff 14 25 e8 83 83 b0 5b
 RIP: psample_group_put+0x15/0x71 [psample] RSP: ffffb8a80032f7d0
 CR2: 000000000000001c

Fix it in tcf_sample_cleanup(), ensuring that calls to psample_group_put(p)
are done only when p is not NULL.

Fixes: cadb9c9fdb ("net/sched: act_sample: Fix error path in init")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:53:28 -04:00
Davide Caratti abdadd3cfd net/sched: fix NULL dereference in the error path of tunnel_key_init()
when the following command

 # tc action add action tunnel_key unset index 100

is run for the first time, and tunnel_key_init() fails to allocate struct
tcf_tunnel_key_params, tunnel_key_release() dereferences NULL pointers.
This causes the following error:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
 IP: tunnel_key_release+0xd/0x40 [act_tunnel_key]
 PGD 8000000033787067 P4D 8000000033787067 PUD 74646067 PMD 0
 Oops: 0000 [#1] SMP PTI
 Modules linked in: act_tunnel_key(E) act_csum ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel pcbc snd_hda_codec snd_hda_core snd_hwdep snd_seq aesni_intel snd_seq_device crypto_simd glue_helper snd_pcm cryptd joydev snd_timer pcspkr virtio_balloon snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk drm virtio_console crc32c_intel ata_piix serio_raw i2c_core virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CPU: 2 PID: 3101 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tunnel_key_release+0xd/0x40 [act_tunnel_key]
 RSP: 0018:ffffba46803b7768 EFLAGS: 00010286
 RAX: ffffffffc09010a0 RBX: 0000000000000000 RCX: 0000000000000024
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99ee336d7480
 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
 R10: 0000000000000220 R11: ffff99ee79d73131 R12: 0000000000000000
 R13: ffff99ee32d67610 R14: ffff99ee7671dc38 R15: 00000000fffffff4
 FS:  00007febcb2cd740(0000) GS:ffff99ee7fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000010 CR3: 000000007c8e4005 CR4: 00000000001606e0
 Call Trace:
  __tcf_idr_release+0x79/0xf0
  tunnel_key_init+0xd9/0x460 [act_tunnel_key]
  tcf_action_init_1+0x2cc/0x430
  tcf_action_init+0xd3/0x1b0
  tc_ctl_action+0x18b/0x240
  rtnetlink_rcv_msg+0x29c/0x310
  ? _cond_resched+0x15/0x30
  ? __kmalloc_node_track_caller+0x1b9/0x270
  ? rtnl_calcit.isra.28+0x100/0x100
  netlink_rcv_skb+0xd2/0x110
  netlink_unicast+0x17c/0x230
  netlink_sendmsg+0x2cd/0x3c0
  sock_sendmsg+0x30/0x40
  ___sys_sendmsg+0x27a/0x290
  __sys_sendmsg+0x51/0x90
  do_syscall_64+0x6e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
 RIP: 0033:0x7febca6deba0
 RSP: 002b:00007ffe7b0dd128 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007ffe7b0dd250 RCX: 00007febca6deba0
 RDX: 0000000000000000 RSI: 00007ffe7b0dd1a0 RDI: 0000000000000003
 RBP: 000000005aaa90cb R08: 0000000000000002 R09: 0000000000000000
 R10: 00007ffe7b0dcba0 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007ffe7b0dd264 R14: 0000000000000001 R15: 0000000000669f60
 Code: 44 00 00 8b 0d b5 23 00 00 48 8b 87 48 10 00 00 48 8b 3c c8 e9 a5 e5 d8 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 9f b0 00 00 00 <83> 7b 10 01 74 0b 48 89 df 31 f6 5b e9 f2 fa 7f c3 48 8b 7b 18
 RIP: tunnel_key_release+0xd/0x40 [act_tunnel_key] RSP: ffffba46803b7768
 CR2: 0000000000000010

Fix this in tunnel_key_release(), ensuring 'param' is not NULL before
dereferencing it.

Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:53:28 -04:00
Davide Caratti aab378a7ed net/sched: fix NULL dereference in the error path of tcf_csum_init()
when the following command

 # tc action add action csum udp continue index 100

is run for the first time, and tcf_csum_init() fails allocating struct
tcf_csum, tcf_csum_cleanup() calls kfree_rcu(NULL,...). This causes the
following error:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
 IP: __call_rcu+0x23/0x2b0
 PGD 80000000740b4067 P4D 80000000740b4067 PUD 32e7f067 PMD 0
 Oops: 0002 [#1] SMP PTI
 Modules linked in: act_csum(E) act_vlan ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd glue_helper cryptd snd joydev pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console ata_piix crc32c_intel libata virtio_pci serio_raw i2c_core virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_vlan]
 CPU: 2 PID: 5763 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__call_rcu+0x23/0x2b0
 RSP: 0018:ffffb275803e77c0 EFLAGS: 00010246
 RAX: ffffffffc057b080 RBX: ffff9674bc6f5240 RCX: 00000000ffffffff
 RDX: ffffffff928a5f00 RSI: 0000000000000008 RDI: 0000000000000008
 RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000044
 R10: 0000000000000220 R11: ffff9674b9ab4821 R12: 0000000000000000
 R13: ffffffff928a5f00 R14: 0000000000000000 R15: 0000000000000001
 FS:  00007fa6368d8740(0000) GS:ffff9674bfd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000010 CR3: 0000000073dec001 CR4: 00000000001606e0
 Call Trace:
  __tcf_idr_release+0x79/0xf0
  tcf_csum_init+0xfb/0x180 [act_csum]
  tcf_action_init_1+0x2cc/0x430
  tcf_action_init+0xd3/0x1b0
  tc_ctl_action+0x18b/0x240
  rtnetlink_rcv_msg+0x29c/0x310
  ? _cond_resched+0x15/0x30
  ? __kmalloc_node_track_caller+0x1b9/0x270
  ? rtnl_calcit.isra.28+0x100/0x100
  netlink_rcv_skb+0xd2/0x110
  netlink_unicast+0x17c/0x230
  netlink_sendmsg+0x2cd/0x3c0
  sock_sendmsg+0x30/0x40
  ___sys_sendmsg+0x27a/0x290
  ? filemap_map_pages+0x34a/0x3a0
  ? __handle_mm_fault+0xbfd/0xe20
  __sys_sendmsg+0x51/0x90
  do_syscall_64+0x6e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
 RIP: 0033:0x7fa635ce9ba0
 RSP: 002b:00007ffc185b0fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007ffc185b10f0 RCX: 00007fa635ce9ba0
 RDX: 0000000000000000 RSI: 00007ffc185b1040 RDI: 0000000000000003
 RBP: 000000005aaa85e0 R08: 0000000000000002 R09: 0000000000000000
 R10: 00007ffc185b0a20 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007ffc185b1104 R14: 0000000000000001 R15: 0000000000669f60
 Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
 RIP: __call_rcu+0x23/0x2b0 RSP: ffffb275803e77c0
 CR2: 0000000000000010

fix this in tcf_csum_cleanup(), ensuring that kfree_rcu(param, ...) is
called only when param is not NULL.

Fixes: 9c5f69bbd7 ("net/sched: act_csum: don't use spinlock in the fast path")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:53:28 -04:00
Davide Caratti 1edf8abe04 net/sched: fix NULL dereference in the error path of tcf_vlan_init()
when the following command

 # tc actions replace action vlan pop index 100

is run for the first time, and tcf_vlan_init() fails allocating struct
tcf_vlan_params, tcf_vlan_cleanup() calls kfree_rcu(NULL, ...). This causes
the following error:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
 IP: __call_rcu+0x23/0x2b0
 PGD 80000000760a2067 P4D 80000000760a2067 PUD 742c1067 PMD 0
 Oops: 0002 [#1] SMP PTI
 Modules linked in: act_vlan(E) ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel mbcache snd_hda_codec jbd2 snd_hda_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer glue_helper snd cryptd joydev soundcore virtio_balloon pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_blk virtio_net ata_piix crc32c_intel libata virtio_pci i2c_core virtio_ring serio_raw virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_vlan]
 CPU: 3 PID: 3119 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__call_rcu+0x23/0x2b0
 RSP: 0018:ffffaac3005fb798 EFLAGS: 00010246
 RAX: ffffffffc0704080 RBX: ffff97f2b4bbe900 RCX: 00000000ffffffff
 RDX: ffffffffabca5f00 RSI: 0000000000000010 RDI: 0000000000000010
 RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000044
 R10: 00000000fd003000 R11: ffff97f2faab5b91 R12: 0000000000000000
 R13: ffffffffabca5f00 R14: ffff97f2fb80202c R15: 00000000fffffff4
 FS:  00007f68f75b4740(0000) GS:ffff97f2ffd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000018 CR3: 0000000072b52001 CR4: 00000000001606e0
 Call Trace:
  __tcf_idr_release+0x79/0xf0
  tcf_vlan_init+0x168/0x270 [act_vlan]
  tcf_action_init_1+0x2cc/0x430
  tcf_action_init+0xd3/0x1b0
  tc_ctl_action+0x18b/0x240
  rtnetlink_rcv_msg+0x29c/0x310
  ? _cond_resched+0x15/0x30
  ? __kmalloc_node_track_caller+0x1b9/0x270
  ? rtnl_calcit.isra.28+0x100/0x100
  netlink_rcv_skb+0xd2/0x110
  netlink_unicast+0x17c/0x230
  netlink_sendmsg+0x2cd/0x3c0
  sock_sendmsg+0x30/0x40
  ___sys_sendmsg+0x27a/0x290
  ? filemap_map_pages+0x34a/0x3a0
  ? __handle_mm_fault+0xbfd/0xe20
  __sys_sendmsg+0x51/0x90
  do_syscall_64+0x6e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
 RIP: 0033:0x7f68f69c5ba0
 RSP: 002b:00007fffd79c1118 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fffd79c1240 RCX: 00007f68f69c5ba0
 RDX: 0000000000000000 RSI: 00007fffd79c1190 RDI: 0000000000000003
 RBP: 000000005aaa708e R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fffd79c0ba0 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fffd79c1254 R14: 0000000000000001 R15: 0000000000669f60
 Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
 RIP: __call_rcu+0x23/0x2b0 RSP: ffffaac3005fb798
 CR2: 0000000000000018

fix this in tcf_vlan_cleanup(), ensuring that kfree_rcu(p, ...) is called
only when p is not NULL.

Fixes: 4c5b9d9642 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Manish Kurup <manish.kurup@verizon.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:53:28 -04:00
SZ Lin (林上智) f9db50691d net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
According to AM335x TRM[1] 14.3.6.2, AM437x TRM[2] 15.3.6.2 and
DRA7 TRM[3] 24.11.4.8.7.3.3, in-band mode in EXT_EN(bit18) register is only
available when PHY is configured in RGMII mode with 10Mbps speed. It will
cause some networking issues without RGMII mode, such as carrier sense
errors and low throughput. TI also mentioned this issue in their forum[4].

This patch adds the check mechanism for PHY interface with RGMII interface
type, the in-band mode can only be set in RGMII mode with 10Mbps speed.

References:
[1]: https://www.ti.com/lit/ug/spruh73p/spruh73p.pdf
[2]: http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf
[3]: http://www.ti.com/lit/ug/spruic2b/spruic2b.pdf
[4]: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/640765/2392155

Suggested-by: Holsety Chen (陳憲輝) <Holsety.Chen@moxa.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Schuyler Patton <spatton@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:49:34 -04:00
Matthias Brugger d61d263c8d net: hns: Fix ethtool private flags
The driver implementation returns support for private flags, while
no private flags are present. When asked for the number of private
flags it returns the number of statistic flag names.

Fix this by returning EOPNOTSUPP for not implemented ethtool flags.

Signed-off-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 19:48:26 -04:00
Ido Schimmel bcdd5de80a mlxsw: spectrum_buffers: Set a minimum quota for CPU port traffic
In commit 9ffcc3725f ("mlxsw: spectrum: Allow packets to be trapped
from any PG") I fixed a problem where packets could not be trapped to
the CPU due to exceeded shared buffer quotas. The mentioned commit
explains the problem in detail.

The problem was fixed by assigning a minimum quota for the CPU port and
the traffic class used for scheduling traffic to the CPU.

However, commit 117b0dad2d ("mlxsw: Create a different trap group list
for each device") assigned different traffic classes to different
packet types and rendered the fix useless.

Fix the problem by assigning a minimum quota for the CPU port and all
the traffic classes that are currently in use.

Fixes: 117b0dad2d ("mlxsw: Create a different trap group list for each device")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Eddie Shklaer <eddies@mellanox.com>
Tested-by: Eddie Shklaer <eddies@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 17:34:54 -04:00
Eric Dumazet cce6294cc2 net: sched: fix uses after free
syzbot reported one use-after-free in pfifo_fast_enqueue() [1]

Issue here is that we can not reuse skb after a successful skb_array_produce()
since another cpu might have consumed it already.

I believe a similar problem exists in try_bulk_dequeue_skb_slow()
in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.

[1]
BUG: KASAN: use-after-free in qdisc_pkt_len include/net/sch_generic.h:610 [inline]
BUG: KASAN: use-after-free in qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
BUG: KASAN: use-after-free in pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
Read of size 4 at addr ffff8801cede37e8 by task syzkaller717588/5543

CPU: 1 PID: 5543 Comm: syzkaller717588 Not tainted 4.16.0-rc4+ #265
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x23c/0x360 mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 qdisc_pkt_len include/net/sch_generic.h:610 [inline]
 qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
 pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
 __dev_xmit_skb net/core/dev.c:3216 [inline]

Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc:	Cong Wang <xiyou.wangcong@gmail.com>
Cc:	Jiri Pirko <jiri@resnulli.us>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17 17:03:45 -04:00
Andrew Lunn 7b7ef57c0d net: dsa: mv88e6xxx: Fix binding documentation for MDIO busses
The MDIO busses are switch properties and so should be inside the
switch node. Fix the examples in the binding document.

Reported-by: 尤晓杰 <yxj790222@163.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: a3c53be55c ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 15:01:34 -04:00
Vinicius Costa Gomes 6e5d58fdc9 skbuff: Fix not waking applications when errors are enqueued
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the waiting application is not notified.

Calling 'sk->sk_data_ready()' would not notify applications that
selected only POLLERR events in poll() (for example).

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Randy E. Witt <randy.e.witt@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 12:35:33 -04:00
Nicolas Dichtel 02a2385f37 netlink: avoid a double skb free in genlmsg_mcast()
nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.

Fixes: cb9f7a9a5c ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 12:34:48 -04:00
Michal Kalderon 4609adc271 qede: Fix qedr link update
Link updates were not reported to qedr correctly.
Leading to cases where a link could be down, but qedr
would see it as up.
In addition, once qede was loaded, link state would be up,
regardless of the actual link state.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 12:15:24 -04:00
David S. Miller b225598d62 Merge branch 'qed-iWARP-related-fixes'
Michal Kalderon says:

====================
qed: iWARP related fixes

This series contains two fixes related to iWARP flow.
====================

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
2018-03-16 12:13:51 -04:00
Michal Kalderon 16da09047d qed: Fix non TCP packets should be dropped on iWARP ll2 connection
FW workaround. The iWARP LL2 connection did not expect TCP packets
to arrive on it's connection. The fix drops any non-tcp packets

Fixes b5c29ca ("qed: iWARP CM - setup a ll2 connection for handling
SYN packets")

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 12:13:50 -04:00
Michal Kalderon 933e8c91b9 qed: Fix MPA unalign flow in case header is split across two packets.
There is a corner case in the MPA unalign flow where a FPDU header is
split over two tcp segments. The length of the first fragment in this
case was not initialized properly and should be '1'

Fixes: c7d1d839 ("qed: Add support for MPA header being split over two tcp packets")

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 12:13:50 -04:00
Arvind Yadav fa6a91e9b9 net/iucv: Free memory obtained by kzalloc
Free memory by calling put_device(), if afiucv_iucv_init is not
successful.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 11:42:12 -04:00
Florian Fainelli 484d802d0f net: systemport: Rewrite __bcm_sysport_tx_reclaim()
There is no need for complex checking between the last consumed index
and current consumed index, a simple subtraction will do.

This also eliminates the possibility of a permanent transmit queue stall
under the following conditions:

- one CPU bursts ring->size worth of traffic (up to 256 buffers), to the
  point where we run out of free descriptors, so we stop the transmit
  queue at the end of bcm_sysport_xmit()

- because of our locking, we have the transmit process disable
  interrupts which means we can be blocking the TX reclamation process

- when TX reclamation finally runs, we will be computing the difference
  between ring->c_index (last consumed index by SW) and what the HW
  reports through its register

- this register is masked with (ring->size - 1) = 0xff, which will lead
  to stripping the upper bits of the index (register is 16-bits wide)

- we will be computing last_tx_cn as 0, which means there is no work to
  be done, and we never wake-up the transmit queue, leaving it
  permanently disabled

A practical example is e.g: ring->c_index aka last_c_index = 12, we
pushed 256 entries, HW consumer index = 268, we mask it with 0xff = 12,
so last_tx_cn == 0, nothing happens.

Fixes: 80105befdb ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 11:16:55 -04:00
Tom Herbert 2cc683e88c kcm: lock lower socket in kcm_attach
Need to lock lower socket in order to provide mutual exclusion
with kcm_unattach.

v2: Add Reported-by for syzbot

Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+ea75c0ffcd353d32515f064aaebefc5279e6161e@syzkaller.appspotmail.com
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 11:12:16 -04:00
David S. Miller e693be293f Merge branch 'vlan-untag-and-insert-fixes'
Toshiaki Makita says:

====================
Fix vlan untag and insertion for bridge and vlan with reorder_hdr off

As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from
bridge devices ends up with corrupted packets. He narrowed down this problem
and found that the root cause is in skb_reorder_vlan_header().

While I was working on fixing this problem, I found that the function does
not work properly for double tagged packets with reorder_hdr off as well.

Patch 1 fixes these 2 problems in skb_reorder_vlan_header().

And it turned out that fixing skb_reorder_vlan_header() is not sufficient
to receive double tagged packets with reorder_hdr off while I was testing the
fix. Vlan tags got out of order when vlan devices with reorder_hdr disabled
were stacked. Patch 2 fixes this problem.

[1] https://www.spinics.net/lists/linux-ethernet-bridging/msg07039.html
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 10:03:48 -04:00
Toshiaki Makita cbe7128c4b vlan: Fix out of order vlan headers with reorder header off
With reorder header off, received packets are untagged in skb_vlan_untag()
called from within __netif_receive_skb_core(), and later the tag will be
inserted back in vlan_do_receive().

This caused out of order vlan headers when we create a vlan device on top
of another vlan device, because vlan_do_receive() inserts a tag as the
outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
and inserted back in vlan_do_receive(), then the inner tag is next removed
and inserted back as the outermost tag.

This patch fixes the behaviour by inserting the inner tag at the right
position.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 10:03:47 -04:00
Toshiaki Makita 4bbb3e0e82 net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
When we have a bridge with vlan_filtering on and a vlan device on top of
it, packets would be corrupted in skb_vlan_untag() called from
br_dev_xmit().

The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
which makes use of skb->mac_len. In this function mac_len is meant for
handling rx path with vlan devices with reorder_header disabled, but in
tx path mac_len is typically 0 and cannot be used, which is the problem
in this case.

The current code even does not properly handle rx path (skb_vlan_untag()
called from __netif_receive_skb_core()) with reorder_header off actually.

In rx path single tag case, it works as follows:

- Before skb_reorder_vlan_header()

 mac_header                                data
   v                                        v
   +-------------------+-------------+------+----
   |        ETH        |    VLAN     | ETH  |
   |       ADDRS       | TPID | TCI  | TYPE |
   +-------------------+-------------+------+----
   <-------- mac_len --------->
                       <------------->
                        to be removed

- After skb_reorder_vlan_header()

            mac_header                     data
                 v                          v
                 +-------------------+------+----
                 |        ETH        | ETH  |
                 |       ADDRS       | TYPE |
                 +-------------------+------+----
                 <-------- mac_len --------->

This is ok, but in rx double tag case, it corrupts packets:

- Before skb_reorder_vlan_header()

 mac_header                                              data
   v                                                      v
   +-------------------+-------------+-------------+------+----
   |        ETH        |    VLAN     |    VLAN     | ETH  |
   |       ADDRS       | TPID | TCI  | TPID | TCI  | TYPE |
   +-------------------+-------------+-------------+------+----
   <--------------- mac_len ---------------->
                                     <------------->
                                    should be removed
                       <--------------------------->
                         actually will be removed

- After skb_reorder_vlan_header()

            mac_header                                   data
                 v                                        v
                               +-------------------+------+----
                               |        ETH        | ETH  |
                               |       ADDRS       | TYPE |
                               +-------------------+------+----
                 <--------------- mac_len ---------------->

So, two of vlan tags are both removed while only inner one should be
removed and mac_header (and mac_len) is broken.

skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
so use skb->data and skb->mac_header to calculate the right offset.

Reported-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Fixes: a6e18ff111 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 10:03:47 -04:00
Roman Mashak 51d4740f88 net sched actions: return explicit error when tunnel_key mode is not specified
If set/unset mode of the tunnel_key action is not provided, ->init() still
returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
this results in crash:

% tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1

[   35.805515] general protection fault: 0000 [#1] SMP PTI
[   35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
crypto_simd glue_helper cryptd serio_raw
[   35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286
[   35.808929] RIP: 0010:tcf_action_init+0x90/0x190
[   35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206
[   35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000
[   35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910
[   35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808
[   35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38
[   35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910
[   35.814006] FS:  00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000)
knlGS:0000000000000000
[   35.814881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0
[   35.816457] Call Trace:
[   35.817158]  tc_ctl_action+0x11a/0x220
[   35.817795]  rtnetlink_rcv_msg+0x23d/0x2e0
[   35.818457]  ? __slab_alloc+0x1c/0x30
[   35.819079]  ? __kmalloc_node_track_caller+0xb1/0x2b0
[   35.819544]  ? rtnl_calcit.isra.30+0xe0/0xe0
[   35.820231]  netlink_rcv_skb+0xce/0x100
[   35.820744]  netlink_unicast+0x164/0x220
[   35.821500]  netlink_sendmsg+0x293/0x370
[   35.822040]  sock_sendmsg+0x30/0x40
[   35.822508]  ___sys_sendmsg+0x2c5/0x2e0
[   35.823149]  ? pagecache_get_page+0x27/0x220
[   35.823714]  ? filemap_fault+0xa2/0x640
[   35.824423]  ? page_add_file_rmap+0x108/0x200
[   35.825065]  ? alloc_set_pte+0x2aa/0x530
[   35.825585]  ? finish_fault+0x4e/0x70
[   35.826140]  ? __handle_mm_fault+0xbc1/0x10d0
[   35.826723]  ? __sys_sendmsg+0x41/0x70
[   35.827230]  __sys_sendmsg+0x41/0x70
[   35.827710]  do_syscall_64+0x68/0x120
[   35.828195]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   35.828859] RIP: 0033:0x7f3d0ca4da67
[   35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[   35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67
[   35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003
[   35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000
[   35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
[   35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640
[   35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2
fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48
8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c
[   35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0
[   35.838291] ---[ end trace a095c06ee4b97a26 ]---

Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-15 14:43:41 -04:00
Hans de Goede e07c99b07a Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters
The IRQ output of the bcm bt-device is really a level IRQ signal, which
signals a logical high as long as the device's buffer contains data. Since
the draining in the buffer is done in the tty driver, we cannot (easily)
wait in a threaded interrupt handler for the draining, after which the
IRQ should go low again.

So instead we treat the IRQ as an edge interrupt. This opens the window
for a theoretical race where we wakeup, read some data and then autosuspend
*before* the IRQ has gone (logical) low, followed by the device just at
that moment receiving more data, causing the IRQ to stay high and we never
see an edge.

Since we call pm_runtime_mark_last_busy() on every received byte, there
should be plenty time for the IRQ to go (logical) low before we ever
suspend, so this should never happen, but after commit 43fff76834
("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted
since, this was actually happening causing the device to get stuck in
runtime suspend.

The bcm bt-device actually has a workaround for this, if we set the
pulsed_host_wake flag in the sleep parameters, then the device monitors
if the host is draining the buffer and if not then after a timeout the
device will pulse the IRQ line, causing us to see an edge, fixing the
stuck in suspend condition.

This commit sets the pulsed_host_wake flag to fix the (mostly theoretical)
race caused by us treating the IRQ as an edge IRQ.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-03-15 19:39:37 +01:00
Hans de Goede b09c61522c Revert "Bluetooth: hci_bcm: Streamline runtime PM code"
This reverts commit 43fff76834 ("Bluetooth: hci_bcm: Streamline runtime
PM code"). The commit msg for this commit states "No functional change
intended.", but replacing:

 pm_runtime_get();
 pm_runtime_mark_last_busy();
 pm_runtime_put_autosuspend();

with:

 pm_request_resume();

Does result in a functional change, pm_request_resume() only calls
pm_runtime_mark_last_busy() if the device was suspended before the call.

This results in the following happening:

1) Device is runtime suspended
2) Device drives host_wake IRQ logically high as it starts receiving data
3) bcm_host_wake() gets called, causes the device to runtime-resume,
   current time gets marked as last_busy time
4) After 5 seconds the autosuspend timer expires and the dev autosuspends
   as no one has been calling pm_runtime_mark_last_busy(), the device was
   resumed during those 5 seconds, so all the pm_request_resume() calls
   while receiving data and/or bcm_host_wake() calls were nops
5) If 4) happens while the device has (just received) data in its buffer to
   be read by the host the IRQ line is *already* / still logically high
   when we autosuspend and since we use an edge triggered IRQ, the IRQ
   will never trigger, causing the device to get stuck in suspend

Therefor this commit has to be reverted, so that we avoid the device
getting stuck in suspend.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-03-15 19:39:37 +01:00