Commit graph

593813 commits

Author SHA1 Message Date
Linus Torvalds 230e51f211 Merge branch 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core signal updates from Ingo Molnar:
 "These updates from Stas Sergeev and Andy Lutomirski, improve the
  sigaltstack interface by extending its ABI with the SS_AUTODISARM
  feature, which makes it possible to use swapcontext() in a sighandler
  that works on sigaltstack.  Without this flag, the subsequent signal
  will corrupt the state of the switched-away sighandler.

  The inspiration is more robust dosemu signal handling"

* 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  signals/sigaltstack: Change SS_AUTODISARM to (1U << 31)
  signals/sigaltstack: Report current flag bits in sigaltstack()
  selftests/sigaltstack: Fix the sigaltstack test on old kernels
  signals/sigaltstack: If SS_AUTODISARM, bypass on_sig_stack()
  selftests/sigaltstack: Add new testcase for sigaltstack(SS_ONSTACK|SS_AUTODISARM)
  signals/sigaltstack: Implement SS_AUTODISARM flag
  signals/sigaltstack: Prepare to add new SS_xxx flags
  signals/sigaltstack, x86/signals: Unify the x86 sigaltstack check with other architectures
2016-05-16 12:25:25 -07:00
Linus Torvalds a3871bd434 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Documentation updates, including fixes to the design-level
     requirements documentation and a fixed version of the design-level
     data-structure documentation.  These fixes include removing
     cartoons and getting rid of the html/htmlx duplication.

   - Further improvements to the new-age expedited grace periods.

   - Miscellaneous fixes.

   - Torture-test changes, including a new rcuperf module for measuring
     RCU grace-period performance and scalability, which is useful for
     the expedited-grace-period changes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  rcutorture: Add boot-time adjustment of leaf fanout
  rcutorture: Add irqs-disabled test for call_rcu()
  rcutorture: Dump trace buffer upon shutdown
  rcutorture: Don't rebuild identical kernel
  rcutorture: Add OS-jitter capability
  documentation: Add documentation for RCU's major data structures
  rcutorture: Convert test duration to seconds early
  torture: Kill qemu, not parent process
  torture: Clarify refusal to run more than one torture test
  rcutorture: Consider FROZEN hotplug notifier transitions
  rcutorture: Remove redundant initialization to zero
  rcuperf: Do not wake up shutdown wait queue if "shutdown" is false.
  rcutorture: Add largish-system rcuperf scenario
  rcutorture: Avoid RCU CPU stall warning and RT throttling
  rcutorture: Add rcuperf holdoff boot parameter to reduce interference
  rcutorture: Make scripts analyze rcuperf trace data, if present
  rcutorture: Make rcuperf collect expedited event-trace data
  rcutorture: Print measure of batching efficiency
  rcutorture: Set rcuperf writer kthreads to real-time priority
  rcutorture: Bind rcuperf reader/writer kthreads to CPUs
  ...
2016-05-16 12:02:08 -07:00
Linus Torvalds 0052af4411 Merge branch 'core-lib-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core/lib update from Ingo Molnar:
 "This contains a single commit that removes an unused facility that the
  scheduler used to make use of"

* 'core-lib-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lib/proportions: Remove unused code
2016-05-16 11:36:02 -07:00
George Spelvin 0fed3ac866 namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS
The hash mixing between adding the next 64 bits of name
was just a bit weak.

Replaced with a still very fast but slightly more effective
mixing function.

Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-16 11:35:08 -07:00
Daniel Borkmann 7e2c3aea43 net: also make sch_handle_egress() drop monitor ready
Follow-up for 8a3a4c6e7b ("net: make sch_handle_ingress() drop
monitor ready") to also make the egress side drop monitor ready.

Also here only TC_ACT_SHOT is a clear indication that something
went wrong. Hence don't provide false positives to drop monitors
such as 'perf record -e skb:kfree_skb ...'.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 14:02:44 -04:00
Muhammad Falak R Wani 15db6e0dc7 net/hsr: Use setup_timer and mod_timer.
The function setup_timer combines the initialization of a timer with the
initialization of the timer's function and data fields. The mulitiline
code for timer initialization is now replaced with function setup_timer.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and aid readablity.

Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 14:00:43 -04:00
David S. Miller a2a658a303 Merge branch 'qed-next'
Yuval Mintz says:

====================
qed: IOV enhncements and fixups

This is a follow-up on the recent patch series that adds SR-IOV support
to qed. All content here is iov-related fixups [nothing terminal] and
enhancements.

Please consider applying this series to `net-next'.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:59:20 -04:00
Yuval Mintz 416cdf0635 qed: VFs gracefully accept lack of PM
VF's probe might log that it has no PM capability in its PCI configuration
space. As this is a valid configuration, silence such prints.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:59:19 -04:00
Yuval Mintz 83f34bd436 qed: Allow more than 16 VFs
In multi-function modes, PFs are currently limited to using 16 VFs -
But that limitation would also currently apply in case there's a single
PCI function exposed, where no such restriction should have existed.

This lifts the restriction for the default mode; User should be able
to start the maximum number of VFs as appear in the PCI config space.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:59:19 -04:00
Manish Chopra 079d20a673 qed: Reset link on IOV disable
PF updates its VFs' bulletin boards with link configurations whenever
the physical carrier changes or whenever hyper-user explicitly requires
some setting of the VFs link via the hypervisor's PF.

Since the bulletin board is getting cleaned as part of the IOV disable
flow on the PF side, re-enabling sriov would lead to a VF that sees the
carrier as 'down', until an event causing the PF to re-fill the bulletin
with the link configuration would occur.

To fix this we simply refelect the link state during the flows, giving
the later VFs a default reflecting the PFs link state.

Signed-off-by: Manish Chopra <Manish.Chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:59:19 -04:00
Yuval Mintz b2b897eba6 qed: Improve VF interrupt reset
During FLR flow, need to make sure HW is no longer capable of writing to
host memory as part of its interrupt mechanisms.
While we're at it, unify the logic cleaning the driver's status-blocks
into using a single API function for both PFs and VFs.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:59:18 -04:00
Yuval Mintz b0409fa094 qed: Correct PF-sanity check
Seems like something broke in commit 1408cc1fa4 ("qed: Introduce VFs")
and the function no longer verifies that the vf is indeed a valid one.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:59:18 -04:00
Tariq Toukan 2bb07e155b net/mlx4_core: Fix access to uninitialized index
Prevent using uninitialized or negative index when handling
steering entries.

Fixes: b12d93d63c ('mlx4: Add support for promiscuous mode in the new steering model.')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:58:01 -04:00
David S. Miller 2ffd7e0356 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2016-05-14

Here are two more Bluetooth patches for the 4.7 kernel which we wanted
to get into net-next before the merge window opens. Please let me know
if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:56:37 -04:00
David S. Miller 14d7e48751 Merge branch 'w5100-small-changes'
Akinobu Mita says:

====================
net: w5100: collection of small changes

This patch series is the collection of relatively small changes for
w5100 driver which includes a cleanup with no functional change,
two fixes, and adding a functionality.

* Changes from v1
- Remove the watchdong_timeo assignment to set default tx timeout,
  suggested by David Miller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:55:49 -04:00
Akinobu Mita c3875ca7d9 net: w5100-spi: add support to specify MAC address by device tree
This adds support to specify the MAC address by 'mac-address' or
'local-mac-address' properties in the device tree.  These are common
properties for the Ethernet controller.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Mike Sinkovsky <msink@permonline.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:55:49 -04:00
Akinobu Mita 7d6da453ef net: w5100: increase TX timeout period
This increases TX timeout period from one second to 5 seconds which is
the default value if the driver doesn't explicitly set
net_device->watchdog_timeo.

The one second timeout is too short for W5100 with SPI interface mode
which doesn't support burst READ/WRITE processing in the SPI transfer.
If the packet is transmitted while RX packets are being received at a
very high rate, the TX transmittion work in the workqueue is delayed
and the watchdog timer is expired.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Mike Sinkovsky <msink@permonline.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:55:49 -04:00
Akinobu Mita d41cd5f7e2 net: w5100: fix MAC filtering for W5500
W5500 has different bit position for MAC filter in Socket n mode
register from W5100 and W5200.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Mike Sinkovsky <msink@permonline.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:55:48 -04:00
Akinobu Mita e9f0cd94c1 net: w5100: remove unused is_w5200()
The is_w5200() function is not used anymore by the commit which adds
the W5500 support.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Mike Sinkovsky <msink@permonline.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:55:48 -04:00
David S. Miller b0456b24ff Merge branch 'lxt-cleanups'
Sergei Shtylyov says:

====================
   Here's the set of 2 patches against DaveM's 'net-next.git' repo. We save
several LoCs on the unneeded local variables....

[1/2] lxt: simplify lxt97[01]_config_intr()
[2/2] lxt: simplify lxt970_config_init()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:53:20 -04:00
Sergei Shtylyov c8396d84c7 lxt: simplify lxt970_config_init()
This function declares the 'err' local variable for no good reason, get rid
of it.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:53:20 -04:00
Sergei Shtylyov 04e6233d57 lxt: simplify lxt97[01]_config_intr()
Both these functions declare the 'err' local variables for no good reason,
get rid of them.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:53:19 -04:00
David S. Miller 485b777855 Merge branch 'bpf-blinding'
Daniel Borkmann says:

====================
BPF updates

This set implements constant blinding for BPF, first couple of
patches are some preparatory cleanups, followed by the blinding.
Please see individual patches for details.

Thanks a lot!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:33 -04:00
Daniel Borkmann d93a47f735 bpf, s390: add support for constant blinding
This patch adds recently added constant blinding helpers into the
s390 eBPF JIT. In the bpf_int_jit_compile() path, requirements are
to utilize bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair for rewriting the program into a blinded one, and to map the
BPF_REG_AX register to a CPU register. The mapping of BPF_REG_AX
is at r12 and similarly like in x86 case performs reloading when
ld_abs/ind is used. When blinding is not used, there's no additional
overhead in the generated image.

When BPF_REG_AX is used, we don't need to emit skb->data reload when
helper function changed skb->data, as this will be reloaded later
on anyway from stack on ld_abs/ind, where skb->data is needed. s390
allows for this w/o much additional complexity unlike f.e. x86.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:33 -04:00
Daniel Borkmann 26eb042ee4 bpf, arm64: add support for constant blinding
This patch adds recently added constant blinding helpers into the
arm64 eBPF JIT. In the bpf_int_jit_compile() path, requirements are
to utilize bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair for rewriting the program into a blinded one, and to map the
BPF_REG_AX register to a CPU register. The mapping is on x9.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Yang Shi <yang.shi@linaro.org>
Tested-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:33 -04:00
Daniel Borkmann 959a757916 bpf, x86: add support for constant blinding
This patch adds recently added constant blinding helpers into the
x86 eBPF JIT. In the bpf_int_jit_compile() path, requirements are
to utilize bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair for rewriting the program into a blinded one, and to map the
BPF_REG_AX register to a CPU register. The mapping of BPF_REG_AX
is at non-callee saved register r10, and thus shared with cached
skb->data used for ld_abs/ind and not in every program type needed.
When blinding is not used, there's zero additional overhead in the
generated image.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
Daniel Borkmann 4f3446bb80 bpf: add generic constant blinding for use in jits
This work adds a generic facility for use from eBPF JIT compilers
that allows for further hardening of JIT generated images through
blinding constants. In response to the original work on BPF JIT
spraying published by Keegan McAllister [1], most BPF JITs were
changed to make images read-only and start at a randomized offset
in the page, where the rest was filled with trap instructions. We
have this nowadays in x86, arm, arm64 and s390 JIT compilers.
Additionally, later work also made eBPF interpreter images read
only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86,
arm, arm64 and s390 archs as well currently. This is done by
default for mentioned JITs when JITing is enabled. Furthermore,
we had a generic and configurable constant blinding facility on our
todo for quite some time now to further make spraying harder, and
first implementation since around netconf 2016.

We found that for systems where untrusted users can load cBPF/eBPF
code where JIT is enabled, start offset randomization helps a bit
to make jumps into crafted payload harder, but in case where larger
programs that cross page boundary are injected, we again have some
part of the program opcodes at a page start offset. With improved
guessing and more reliable payload injection, chances can increase
to jump into such payload. Elena Reshetova recently wrote a test
case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which
can leave some more room for payloads. Note that for all this,
additional bugs in the kernel are still required to make the jump
(and of course to guess right, to not jump into a trap) and naturally
the JIT must be enabled, which is disabled by default.

For helping mitigation, the general idea is to provide an option
bpf_jit_harden that admins can tweak along with bpf_jit_enable, so
that for cases where JIT should be enabled for performance reasons,
the generated image can be further hardened with blinding constants
for unpriviledged users (bpf_jit_harden == 1), with trading off
performance for these, but not for privileged ones. We also added
the option of blinding for all users (bpf_jit_harden == 2), which
is quite helpful for testing f.e. with test_bpf.ko. There are no
further e.g. hardening levels of bpf_jit_harden switch intended,
rationale is to have it dead simple to use as on/off. Since this
functionality would need to be duplicated over and over for JIT
compilers to use, which are already complex enough, we provide a
generic eBPF byte-code level based blinding implementation, which is
then just transparently JITed. JIT compilers need to make only a few
changes to integrate this facility and can be migrated one by one.

This option is for eBPF JITs and will be used in x86, arm64, s390
without too much effort, and soon ppc64 JITs, thus that native eBPF
can be blinded as well as cBPF to eBPF migrations, so that both can
be covered with a single implementation. The rule for JITs is that
bpf_jit_blind_constants() must be called from bpf_int_jit_compile(),
and in case blinding is disabled, we follow normally with JITing the
passed program. In case blinding is enabled and we fail during the
process of blinding itself, we must return with the interpreter.
Similarly, in case the JITing process after the blinding failed, we
return normally to the interpreter with the non-blinded code. Meaning,
interpreter doesn't change in any way and operates on eBPF code as
usual. For doing this pre-JIT blinding step, we need to make use of
a helper/auxiliary register, here BPF_REG_AX. This is strictly internal
to the JIT and not in any way part of the eBPF architecture. Just like
in the same way as JITs internally make use of some helper registers
when emitting code, only that here the helper register is one
abstraction level higher in eBPF bytecode, but nevertheless in JIT
phase. That helper register is needed since f.e. manually written
program can issue loads to all registers of eBPF architecture.

The core concept with the additional register is: blind out all 32
and 64 bit constants by converting BPF_K based instructions into a
small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this
is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND,
and REG <OP> BPF_REG_AX, so actual operation on the target register
is translated from BPF_K into BPF_X one that is operating on
BPF_REG_AX's content. During rewriting phase when blinding, RND is
newly generated via prandom_u32() for each processed instruction.
64 bit loads are split into two 32 bit loads to make translation and
patching not too complex. Only basic thing required by JITs is to
call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair, and to map BPF_REG_AX into an unused register.

Small bpf_jit_disasm extract from [2] when applied to x86 JIT:

echo 0 > /proc/sys/net/core/bpf_jit_harden

  ffffffffa034f5e9 + <x>:
  [...]
  39:   mov    $0xa8909090,%eax
  3e:   mov    $0xa8909090,%eax
  43:   mov    $0xa8ff3148,%eax
  48:   mov    $0xa89081b4,%eax
  4d:   mov    $0xa8900bb0,%eax
  52:   mov    $0xa810e0c1,%eax
  57:   mov    $0xa8908eb4,%eax
  5c:   mov    $0xa89020b0,%eax
  [...]

echo 1 > /proc/sys/net/core/bpf_jit_harden

  ffffffffa034f1e5 + <x>:
  [...]
  39:   mov    $0xe1192563,%r10d
  3f:   xor    $0x4989b5f3,%r10d
  46:   mov    %r10d,%eax
  49:   mov    $0xb8296d93,%r10d
  4f:   xor    $0x10b9fd03,%r10d
  56:   mov    %r10d,%eax
  59:   mov    $0x8c381146,%r10d
  5f:   xor    $0x24c7200e,%r10d
  66:   mov    %r10d,%eax
  69:   mov    $0xeb2a830e,%r10d
  6f:   xor    $0x43ba02ba,%r10d
  76:   mov    %r10d,%eax
  79:   mov    $0xd9730af,%r10d
  7f:   xor    $0xa5073b1f,%r10d
  86:   mov    %r10d,%eax
  89:   mov    $0x9a45662b,%r10d
  8f:   xor    $0x325586ea,%r10d
  96:   mov    %r10d,%eax
  [...]

As can be seen, original constants that carry payload are hidden
when enabled, actual operations are transformed from constant-based
to register-based ones, making jumps into constants ineffective.
Above extract/example uses single BPF load instruction over and
over, but of course all instructions with constants are blinded.

Performance wise, JIT with blinding performs a bit slower than just
JIT and faster than interpreter case. This is expected, since we
still get all the performance benefits from JITing and in normal
use-cases not every single instruction needs to be blinded. Summing
up all 296 test cases averaged over multiple runs from test_bpf.ko
suite, interpreter was 55% slower than JIT only and JIT with blinding
was 8% slower than JIT only. Since there are also some extremes in
the test suite, I expect for ordinary workloads that the performance
for the JIT with blinding case is even closer to JIT only case,
f.e. nmap test case from suite has averaged timings in ns 29 (JIT),
35 (+ blinding), and 151 (interpreter).

BPF test suite, seccomp test suite, eBPF sample code and various
bigger networking eBPF programs have been tested with this and were
running fine. For testing purposes, I also adapted interpreter and
redirected blinded eBPF image to interpreter and also here all tests
pass.

  [1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
  [2] https://github.com/01org/jit-spray-poc-for-ksp/
  [3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
Daniel Borkmann d1c55ab5e4 bpf: prepare bpf_int_jit_compile/bpf_prog_select_runtime apis
Since the blinding is strictly only called from inside eBPF JITs,
we need to change signatures for bpf_int_jit_compile() and
bpf_prog_select_runtime() first in order to prepare that the
eBPF program we're dealing with can change underneath. Hence,
for call sites, we need to return the latest prog. No functional
change in this patch.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
Daniel Borkmann c237ee5eb3 bpf: add bpf_patch_insn_single helper
Move the functionality to patch instructions out of the verifier
code and into the core as the new bpf_patch_insn_single() helper
will be needed later on for blinding as well. No changes in
functionality.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
Daniel Borkmann 93a73d442d bpf, x86/arm64: remove useless checks on prog
There is never such a situation, where bpf_int_jit_compile() is
called with either prog as NULL or len as 0, so the tests are
unnecessary and confusing as people would just copy them. s390
doesn't have them, so no change is needed there.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
Daniel Borkmann 6077776b59 bpf: split HAVE_BPF_JIT into cBPF and eBPF variant
Split the HAVE_BPF_JIT into two for distinguishing cBPF and eBPF JITs.

Current cBPF ones:

  # git grep -n HAVE_CBPF_JIT arch/
  arch/arm/Kconfig:44:    select HAVE_CBPF_JIT
  arch/mips/Kconfig:18:   select HAVE_CBPF_JIT if !CPU_MICROMIPS
  arch/powerpc/Kconfig:129:       select HAVE_CBPF_JIT
  arch/sparc/Kconfig:35:  select HAVE_CBPF_JIT

Current eBPF ones:

  # git grep -n HAVE_EBPF_JIT arch/
  arch/arm64/Kconfig:61:  select HAVE_EBPF_JIT
  arch/s390/Kconfig:126:  select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES
  arch/x86/Kconfig:94:    select HAVE_EBPF_JIT                    if X86_64

Later code also needs this facility to check for eBPF JITs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:31 -04:00
Daniel Borkmann c94987e40e bpf: move bpf_jit_enable declaration
Move the bpf_jit_enable declaration to the filter.h file where
most other core code is declared, also since we're going to add
a second knob there.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:31 -04:00
Daniel Borkmann 4936e3528e bpf: minor cleanups in ebpf code
Besides others, remove redundant comments where the code is self
documenting enough, and properly indent various bpf_verifier_ops
and bpf_prog_type_list declarations. Moreover, remove two exports
that actually have no module user.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:31 -04:00
Vivien Didelot 553eb54444 net: dsa: mv88e6xxx: remove bridge work
Now that the bridge code defers the switchdev port state setting, there
is no need to defer the port STP state change within the mv88e6xxx code.
Thus get rid of the driver's bridge work code.

This also fixes a race condition where the DSA layer assumes that the
bridge code already set the unbridged port's STP state to Disabled
before restoring the Forwarding state.

As a consequence, this also fixes the FDB flush for the unbridged port
which now correctly occurs during the Forwarding to Disabled transition.

Fixes: 0bc05d585d ("switchdev: allow caller to explicitly request attr_set as deferred")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:46:24 -04:00
David Ahern b0e95ccdd7 net: vrf: protect changes to private data with rcu
One cpu can be processing packets which includes using the cached route
entries in the vrf device's private data and on another cpu the device
gets deleted which releases the routes and sets the pointers in net_vrf
to NULL. This results in datapath dereferencing a NULL pointer.

Fix by protecting access to dst's with rcu.

Fixes: 193125dbd8 ("net: Introduce VRF device driver")
Fixes: 35402e3136 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:46:24 -04:00
Eric Dumazet ea1627c20c tcp: minor optimizations around tcp_hdr() usage
tcp_hdr() is slightly more expensive than using skb->data in contexts
where we know they point to the same byte.

In receive path, tcp_v4_rcv() and tcp_v6_rcv() are in this situation,
as tcp header has not been pulled yet.

In output path, the same can be said when we just pushed the tcp header
in the skb, in tcp_transmit_skb() and tcp_make_synack()

Also factorize the two checks for tcb->tcp_flags & TCPHDR_SYN in
tcp_transmit_skb() and pass tcp header pointer to tcp_ecn_send(),
so that compiler can further optimize and avoid a reload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:46:23 -04:00
Nicolas Dichtel 5022524308 netlink: kill nla_put_u64()
This function is not used anymore. nla_put_u64_64bit() should be used
instead.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:46:23 -04:00
Eric Dumazet 2632616bc4 sock: propagate __sock_cmsg_send() error
__sock_cmsg_send() might return different error codes, not only -EINVAL.

Fixes: 24025c465f ("ipv4: process socket-level control messages in IPv4")
Fixes: ad1e46a837 ("ipv6: process socket-level control messages in IPv6")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:46:23 -04:00
Arnd Bergmann a986a05de9 net: qrtr: fix build problems
Having multiple loadable modules with the same name cannot work
with modprobe, and having both net/qrtr/smd.ko and drivers/soc/qcom/smd.ko
results in a (somewhat cryptic) build error:

ERROR: "qcom_smd_driver_unregister" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_driver_register" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_set_drvdata" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_send" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_get_drvdata" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_driver_unregister" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_driver_register" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_set_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_send" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_get_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!

Also, the qrtr driver uses the SMD interface and has a Kconfig dependency,
but also allows for compile-testing when SMD is disabled. However, if
with QCOM_SMD=m and COMPILE_TEST=y we can end up with QRTR_SMD=y and
that fails with a related link error.

The changes the dependency so we can still compile-test the driver but
not have it built-in if SMD is a module, to avoid running in the broken
configuration, and changes the Makefile to provide the driver under
a different module name.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: bdabad3e36 ("net: Add Qualcomm IPC router")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:44:58 -04:00
David S. Miller 148bd3a34b Merge branch 'tc_flower_offload'
Amir Vadai says:

====================
sched,mlx5: Offloaded TC flower filter statistics

This patchset introduces counters support for offloaded cls_flower filters.
When the user calls 'tc show -s ..', fl_dump is called.
Before fl_dump() returns the statistics, it calls the NIC driver (using a new
ndo_setup_tc() command - TC_CLSFLOWER_STATS) to read the hardware counters and
update the statistics accordingly. A new TC action op was added (stats_update())
to be used by the NIC driver to update the statistics.

Patchset was applied and tested over commit ed7cbbc ("udp: Resolve NULL pointer
dereference over flow-based vxlan device")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:52 -04:00
Amir Vadai aad7e08d39 net/mlx5e: Hardware offloaded flower filter statistics support
Introduce support in updating statistics of offloaded TC flower
classifiers. Currently only the DROP action is supported.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:51 -04:00
Amir Vadai 43a335e055 net/mlx5_core: Flow counters infrastructure
If a counter has the aging flag set when created, it is added to a list
of counters that will be queried periodically from a workqueue.  query
result and last use timestamp are cached.
add/del counter must be very efficient since thousands of such
operations might be issued in a second.
There is only a single reference to counters without aging, therefore
no need for locks.
But, counters with aging enabled are stored in a list. In order to make
code as lockless as possible, all the list manipulation and access to
hardware is done from a single context - the periodic counters query
thread.

The hardware supports multiple counters per FTE, however currently we
are using one counter for each FTE.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:51 -04:00
Amir Vadai bd5251dbf1 net/mlx5_core: Introduce flow steering destination of type counter
When adding a flow steering rule with a counter, need to supply a
destination of type MLX5_FLOW_DESTINATION_TYPE_COUNTER, with a pointer
to a struct mlx5_fc.
Also, MLX5_FLOW_CONTEXT_ACTION_COUNT bit should be set in the action.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:51 -04:00
Amir Vadai 9dc0b289c4 net/mlx5_core: Firmware commands to support flow counters
Getting packet/byte statistics on flows is done through flow counters.
Implement the firmware commands to alloc, free and query flow counters.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:51 -04:00
Amir Vadai 42ca502e17 net/mlx5_core: Use a macro in mlx5_command_str()
Use a macro instead of copying the OP name.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:51 -04:00
Amir Vadai 10cbc68434 net/sched: cls_flower: Hardware offloaded filters statistics support
Introduce a new command in ndo_setup_tc() for hardware offloaded
filters, to call the NIC driver, and make it update the statistics.
This will be done before dumping the filter and its statistics.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:50 -04:00
Amir Vadai 9fea47d93b net/sched: act_gact: Update statistics when offloaded to hardware
Implement the stats_update callback that will be called by NIC drivers
for hardware offloaded filters.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:50 -04:00
Amir Vadai 3804070235 net/sched: Enable netdev drivers to update statistics of offloaded actions
Introduce stats_update callback. netdev driver could call it for offloaded
actions to update the basic statistics (packets, bytes and last use).
Since bstats_update() and bstats_cpu_update() use skb as an argument to
get the counters, _bstats_update() and _bstats_cpu_update(), that get
bytes and packets as arguments, were added.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:43:50 -04:00
David S. Miller 388665a9be Merge branch 'pxa168_eth-perf'
Jisheng Zhang says:

====================
net: pxa168_eth: improve performance

This series is to improve the pxa168_eth driver performance by using
{readl|writel}_relaxed or appropriate memory barriers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:39:50 -04:00
Jisheng Zhang b17d15592d net: pxa168_eth: Use dma_wmb/rmb where appropriate
Update the pxa168_eth driver to use the dma_rmb/wmb calls instead of the
full barriers in order to improve performance: reduced 97ns/39ns on
average in tx/rx path on Marvell BG4CT platform.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:39:49 -04:00