Commit graph

637281 commits

Author SHA1 Message Date
stephen hemminger a50af86dd4 netvsc: reduce maximum GSO size
Hyper-V (and Azure) support using NVGRE which requires some extra space
for encapsulation headers. Because of this the largest allowed TSO
packet is reduced.

For older releases, hard code a fixed reduced value.  For next release,
there is a better solution which uses result of host offload
negotiation.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 13:13:41 -05:00
Alex 74685b08fb drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links
Support for setting the RGMII_IDMODE bit was added in the commit
referenced below. However, that commit did not add the symmetrical
clearing of the bit by way of setting it in "mask". Add it here.

Note that the documentation marks clearing this bit as "reserved",
however, according to TI, support for delaying the clock does exist in
the MAC, although it is not officially supported.
We tested this on a board with an RGMII to RGMII link that will not
work unless this bit is cleared.

Fixes: 0fb26c3063 ("drivers: net: cpsw-phy-sel: add support to configure rgmii internal delay")
Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 13:12:17 -05:00
Niklas Cassel 90364feaf7 net: stmmac: do not call phy_ethtool_ksettings_set from atomic context
>From what I can tell, spin_lock(&priv->lock) is not needed, since the
phy_ethtool_ksettings_set call is not given the priv struct.

phy_start_aneg takes the phydev->lock. Calls to phy_adjust_link
from phy_state_machine also takes the phydev->lock.

[   13.718319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
[   13.726717] in_atomic(): 1, irqs_disabled(): 0, pid: 1307, name: ethtool
[   13.742115] Hardware name: Axis ARTPEC-6 Platform
[   13.746829] [<80110568>] (unwind_backtrace) from [<8010c2bc>] (show_stack+0x18/0x1c)
[   13.754575] [<8010c2bc>] (show_stack) from [<80433484>] (dump_stack+0x80/0xa0)
[   13.761801] [<80433484>] (dump_stack) from [<80145428>] (___might_sleep+0x108/0x170)
[   13.769554] [<80145428>] (___might_sleep) from [<806c9b50>] (mutex_lock+0x24/0x44)
[   13.777128] [<806c9b50>] (mutex_lock) from [<8050cbc0>] (phy_start_aneg+0x1c/0x13c)
[   13.784783] [<8050cbc0>] (phy_start_aneg) from [<8050d338>] (phy_ethtool_ksettings_set+0x98/0xd0)
[   13.793656] [<8050d338>] (phy_ethtool_ksettings_set) from [<80517adc>] (stmmac_ethtool_set_link_ksettings+0xa0/0xb4)
[   13.804184] [<80517adc>] (stmmac_ethtool_set_link_ksettings) from [<805c5138>] (ethtool_set_settings+0xd4/0x13c)
[   13.814358] [<805c5138>] (ethtool_set_settings) from [<805c9718>] (dev_ethtool+0x13c4/0x211c)
[   13.822882] [<805c9718>] (dev_ethtool) from [<805dc7c0>] (dev_ioctl+0x480/0x8e0)
[   13.830291] [<805dc7c0>] (dev_ioctl) from [<80260e34>] (do_vfs_ioctl+0x94/0xa00)
[   13.837699] [<80260e34>] (do_vfs_ioctl) from [<802617dc>] (SyS_ioctl+0x3c/0x60)
[   13.845011] [<802617dc>] (SyS_ioctl) from [<801088bc>] (__sys_trace_return+0x0/0x10)

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 13:08:23 -05:00
David S. Miller 233900d885 linux-can-fixes-for-4.9-20161207
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEES2FAuYbJvAGobdVQPTuqJaypJWoFAlhH2rkTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRA9O6olrKklasABCAClCaAGYAOuMLWXJ9mL1K7o2u+s6M93
 bSZQfke8ffFv0vNa35HLH8vXCh3QBl8BEZkN0XuXQwRSiewsWQz/W5hMZ9UcfgdR
 lml6v/jaQ0nai5UrdSHJqXwm21e3M/ELFn5dE0esp1p7lSITRAbTPSyN2kNhiGyv
 HLMXFPopCGEbInyo+5QA17xTPGUdMNE6J8rnRPKnMzbLgyRgZ3Jwy0furKAbzTY7
 4eFiERuBErRB/tD3+oxFICcB5zyvqawnXZp+gsg5o8vm7DGZU4/WXbxOQOgwx1fK
 Nw4/9FXPxUxXB925JMIn/bxjNi0OK9M96Q7bMpoa8Ik/CCkV/zKdKxIt
 =0RcO
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.9-20161207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-12-07

Andrey Konovalov triggered a warning in the CAN RAW layer, which is
fixed by a patch by me.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 12:07:09 -05:00
Linus Torvalds ce779d6b5b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
 "Fix a regression spotted by Jeff Layton"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix clearing suid, sgid for chown()
2016-12-07 08:45:23 -08:00
Linus Torvalds f27c2f69cc Revert "default exported asm symbols to zero"
This reverts commit 8ab2ae655b.

I loved that commit because of how it explained what the problem with
newer versions of binutils were, but the actual patch itself turns out
to not work very well.

It has two problems:

 - a zero CRC value isn't actually right.  It happens to work for the
   case where both sides of the equation fail at giving the symbol a
   crc, but there are cases where the users of the exported symbol get
   the right crc (due to seeing the C declarations), but the actual
   exporting itself does not (due to the whole weak asm symbol issue).

   So then the module load fails after all - we did have a crc for the
   symbol, but we couldn't match it with the loaded module.

 - it seems that the alpha assembler has special semantics for the
   '.set' directive, and on alpha it doesn't actually set the value of
   the specified symbol at all, it is instead used to set various
   assembly modes (eg ".set noat" and ".set noreorder").

   So using ".set" to set the symbol value would just cause build
   failures on alpha.

I'm sure we'll find some other workaround for these issues (hopefully
that involves getting rid of modversions entirely some day, but people
are also talking about just using smarter tools).  But for now we'll
just fall back on commit faaae2a581 ("Re-enable CONFIG_MODVERSIONS in
a slightly weaker form") that just let's a missing crc through.

Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Philip Müller <philm@manjaro.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07 08:39:00 -08:00
Linus Torvalds a0ac402cfc Don't feed anything but regular iovec's to blk_rq_map_user_iov
In theory we could map other things, but there's a reason that function
is called "user_iov".  Using anything else (like splice can do) just
confuses it.

Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07 08:23:35 -08:00
Alex Deucher faefba95c9 drm/amdgpu: just suspend the hw on pci shutdown
We can't just reuse pci_remove as there may be userspace still
doing things.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98638
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97980
Reviewed-by: Christian König <christian.koenig@amd.com>
Reported-and-tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-12-07 11:17:21 -05:00
David S. Miller 64e8de58c5 Merge branch 'ti-cpts-update-and-fixes'
Grygorii Strashko says:

====================
net: ethernet: ti: cpts: update and fixes

It is preparation series intended to clean up and optimize TI CPTS driver to
facilitate further integration with other TI's SoCs like Keystone 2.

Changes in v5:
- fixed copy paste error in cpts_release
- reworked cc.mult/shift and cc_mult initialization

Changes in v4:
- fixed build error in patch
  "net: ethernet: ti: cpts: clean up event list if event pool is empty"
- rebased on top of net-next

Changes in v3:
- patches reordered: fixes and small updates moved first
- added comments in code about cpts->cc_mult
- conversation range (maxsec) limited to 10sec

Changes in v2:
- patch "net: ethernet: ti: cpts: rework initialization/deinitialization"
  was split on 4 patches
- applied comments from Richard Cochran
- dropped patch
  "net: ethernet: ti: cpts: add return value to tx and rx timestamp funcitons"
- new patches added:
  "net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs"
  and "clocksource: export the clocks_calc_mult_shift to use by timestamp code"

Links on prev versions:
v4: https://lkml.org/lkml/2016/12/6/496
v3: https://www.spinics.net/lists/devicetree/msg153474.html
v2: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1282034.html
v1: http://www.spinics.net/lists/linux-omap/msg131925.html
====================

Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:49 -05:00
Grygorii Strashko 20138cf9ef net: ethernet: ti: cpts: fix overflow check period
The CPTS drivers uses 8sec period for overflow checking with
assumption that CPTS retclk will not exceed 500MHz. But that's not
true on some TI platforms (Kesytone 2). As result, it is possible that
CPTS counter will overflow more than once between two readings.

Hence, fix it by selecting overflow check period dynamically as
max_sec_before_overflow/2, where
 max_sec_before_overflow = max_counter_val / rftclk_freq.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:48 -05:00
Grygorii Strashko 88f0f0b0be net: ethernet: ti: cpts: calc mult and shift from refclk freq
The cyclecounter mult and shift values can be calculated based on the
CPTS rfclk frequency and timekeepnig framework provides required algos
and API's.

Hence, calc mult and shift basing on CPTS rfclk frequency if both
cpts_clock_shift and cpts_clock_mult properties are not provided in DT (the
basis of calculation algorithm is borrowed from
__clocksource_update_freq_scale() commit 7d2f944a2b ("clocksource:
Provide a generic mult/shift factor calculation")). After this change
cpts_clock_shift and cpts_clock_mult DT properties will become optional.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:48 -05:00
Murali Karicheri 5304121ada clocksource: export the clocks_calc_mult_shift to use by timestamp code
The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
requires to know mult and shift factors for timestamp conversion from raw
value to nanoseconds (ptp clock). Now these mult and shift factors are
calculated manually and provided through DT, which makes very hard to
support of a lot number of platforms, especially if CPTS refclk is not the
same for some kind of boards and depends on efuse settings (Keystone 2
platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
mult and shift factors.

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:48 -05:00
Grygorii Strashko 4a88fb9565 net: ethernet: ti: cpts: move dt props parsing to cpts driver
Move DT properties parsing into CPTS driver to simplify CPSW
code and CPTS driver porting on other SoC in the future
(like Keystone 2) - with this change it will not be required
to add the same DT parsing code in Keystone 2 NETCP driver.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:48 -05:00
Grygorii Strashko 8a2c9a5ab4 net: ethernet: ti: cpts: rework initialization/deinitialization
The current implementation CPTS initialization and deinitialization
(represented by cpts_register/unregister()) does too many static
initialization from .ndo_open(), which is reasonable to do once at probe
time instead, and also require caller to allocate memory for struct cpts,
which is internal for CPTS driver in general.

This patch splits CPTS initialization and deinitialization on two parts:

- static initializtion cpts_create()/cpts_release() which expected to be
executed when parent driver is probed/removed;

- dynamic part cpts_register/unregister() which expected to be executed
when network device is opened/closed.

As result, current code of CPTS parent driver - CPSW - will be simplified
(and it also will allow simplify adding support for Keystone 2 devices in
the future), plus more initialization errors will be catched earlier. In
addition, this change allows to clean up cpts.h for the case when CPTS is
disabled.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:47 -05:00
Grygorii Strashko 2a79df3ee9 net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs
CPTS module and IRQs are always enabled when CPTS is registered,
before starting overflow check work, and disabled during
deregistration, when overflow check work has been canceled already.
So, It doesn't require to (re)enable CPTS module and IRQs in
cpts_overflow_check().

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:47 -05:00
WingMan Kwok e4439fa838 net: ethernet: ti: cpts: clean up event list if event pool is empty
When a CPTS user does not exit gracefully by disabling cpts
timestamping and leaving a joined multicast group, the system
continues to receive and timestamps the ptp packets which eventually
occupy all the event list entries.  When this happns, the added code
tries to remove some list entries which are expired.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:47 -05:00
Grygorii Strashko 8fcd68914e net: ethernet: ti: cpts: disable cpts when unregistered
The cpts now is left enabled after unregistration.
Hence, disable it in cpts_unregister().

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:47 -05:00
Grygorii Strashko 6c691405bc net: ethernet: ti: cpts: fix registration order
The ptp clock registered before spinlock, which is protecting it, and
before timecounter and cyclecounter initialization in cpts_register().

So, ensure that ptp clock is registered the last, after everything
else is done.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:46 -05:00
Grygorii Strashko fd123a9414 net: ethernet: ti: cpts: fix unbalanced clk api usage in cpts_register/unregister
There are two issues with TI CPTS code which are reproducible when TI
CPSW ethX device passes few up/down iterations:
- cpts refclk prepare counter continuously incremented after each
up/down iteration;
- devm_clk_get(dev, "cpts") is called many times.

Hence, fix these issues by using clk_disable_unprepare() in
cpts_clk_release() and skipping devm_clk_get() if cpts refclk has been
acquired already.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:46 -05:00
Grygorii Strashko b63ba58ee9 net: ethernet: ti: cpsw: minimize direct access to struct cpts
This will provide more flexibility in changing CPTS internals and also
required for further changes.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:46 -05:00
Grygorii Strashko c8395d4e1d net: ethernet: ti: allow cpts to be built separately
TI CPTS IP is used as part of TI OMAP CPSW driver, but it's also
present as part of NETCP on TI Keystone 2 SoCs. So, It's required
to enable build of CPTS for both this drivers and this can be
achieved by allowing CPTS to be built separately.

Hence, allow cpts to be built separately and convert it to be
a module as both CPSW and NETCP drives can be built as modules.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:46 -05:00
Grygorii Strashko 391fd6caf5 net: ethernet: ti: cpts: switch to readl/writel_relaxed()
Switch to readl/writel_relaxed() APIs, because this is recommended
API and the CPTS IP is reused on Keystone 2 SoCs
where LE/BE modes are supported.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 11:13:45 -05:00
David S. Miller d3243aef98 Merge branch 'bnxt_en-RDMA'
Michael Chan says:

====================
bnxt_en: Add interface to support RDMA driver.

This series adds an interface to support a brand new RDMA driver bnxt_re.
The first step is to re-arrange some code so that pci_enable_msix() can
be called during pci probe.  The purpose is to allow the RDMA driver to
initialize and stay initialized whether the netdev is up or down.

Then we make some changes to VF resource allocation so that there is
enough resources to support RDMA.

Finally the last patch adds a simple interface to allow the RDMA driver to
probe and register itself with any bnxt_en devices that support RDMA.
Once registered, the RDMA driver can request MSIX, send fw messages, and
receive some notifications.

v2: Fixed kbuild test robot warnings.

David, please consider this series for net-next.  Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:27 -05:00
Michael Chan a588e4580a bnxt_en: Add interface to support RDMA driver.
Since the network driver and RDMA driver operate on the same PCI function,
we need to create an interface to allow the RDMA driver to share resources
with the network driver.

1. Create a new bnxt_en_dev struct which will be returned by
bnxt_ulp_probe() upon success.  After that, all calls from the RDMA driver
to bnxt_en will pass a pointer to this struct.

2. This struct contains additional function pointers to register, request
msix, send fw messages, register for async events.

3. If the RDMA driver wants to enable RDMA on the function, it needs to
call the function pointer bnxt_register_device().  A ulp_ops structure
is passed for RCU protected upcalls from bnxt_en to the RDMA driver.

4. The RDMA driver can call firmware APIs using the bnxt_send_fw_msg()
function pointer.

5. 1 stats context is reserved when the RDMA driver registers.  MSIX
and completion rings are reserved when the RDMA driver calls
bnxt_request_msix() function pointer.

6. When the RDMA driver calls bnxt_unregister_device(), all RDMA resources
will be cleaned up.

v2: Fixed 2 uninitialized variable warnings.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:26 -05:00
Michael Chan a1653b13f1 bnxt_en: Refactor the driver registration function with firmware.
The driver register function with firmware consists of passing version
information and registering for async events.  To support the RDMA driver,
the async events that we need to register may change.  Separate the
driver register function into 2 parts so that we can just update the
async events for the RDMA driver.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:26 -05:00
Michael Chan e4060d306b bnxt_en: Reserve RDMA resources by default.
If the device supports RDMA, we'll setup network default rings so that
there are enough minimum resources for RDMA, if possible.  However, the
user can still increase network rings to the max if he wants.  The actual
RDMA resources won't be reserved until the RDMA driver registers.

v2: Fix compile warning when BNXT_CONFIG_SRIOV is not set.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:26 -05:00
Michael Chan 7b08f661ab bnxt_en: Improve completion ring allocation for VFs.
All available remaining completion rings not used by the PF should be
made available for the VFs so that there are enough rings in the VF to
support RDMA.  The earlier workaround code of capping the rings by the
statistics context is removed.

When SRIOV is disabled, call a new function bnxt_restore_pf_fw_resources()
to restore FW resources.  Later on we need to add some logic to account
for RDMA resources.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:26 -05:00
Michael Chan aa8ed021ab bnxt_en: Move function reset to bnxt_init_one().
Now that MSIX is enabled in bnxt_init_one(), resources may be allocated by
the RDMA driver before the network device is opened.  So we cannot do
function reset in bnxt_open() which will clear all the resources.

The proper place to do function reset now is in bnxt_init_one().
If we get AER, we'll do function reset as well.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:25 -05:00
Michael Chan 7809592d3e bnxt_en: Enable MSIX early in bnxt_init_one().
To better support the new RDMA driver, we need to move pci_enable_msix()
from bnxt_open() to bnxt_init_one().  This way, MSIX vectors are available
to the RDMA driver whether the network device is up or down.

Part of the existing bnxt_setup_int_mode() function is now refactored into
a new bnxt_init_int_mode().  bnxt_init_int_mode() is called during
bnxt_init_one() to enable MSIX.  The remaining logic in
bnxt_setup_int_mode() to map the IRQs to the completion rings is called
during bnxt_open().

v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:25 -05:00
Michael Chan 33c2657eb6 bnxt_en: Add bnxt_set_max_func_irqs().
By refactoring existing code into this new function.  The new function
will be used in subsequent patches.

v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:59:25 -05:00
Eric Dumazet 5b8e2f61b9 net: sock_rps_record_flow() is for connected sockets
Paolo noticed a cache line miss in UDP recvmsg() to access
sk_rxhash, sharing a cache line with sk_drops.

sk_drops might be heavily incremented by cpus handling a flood targeting
this socket.

We might place sk_drops on a separate cache line, but lets try
to avoid wasting 64 bytes per socket just for this, since we have
other bottlenecks to take care of.

sock_rps_record_flow() should only access sk_rxhash for connected
flows.

Testing sk_state for TCP_ESTABLISHED covers most of the cases for
connected sockets, for a zero cost, since system calls using
sock_rps_record_flow() also access sk->sk_prot which is on the
same cache line.

A follow up patch will provide a static_key (Jump Label) since most
hosts do not even use RFS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:47:35 -05:00
Pablo Neira Ayuso 73c25fb139 netfilter: nft_quota: allow to restore consumed quota
Allow to restore consumed quota, this is useful to restore the quota
state across reboots.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 14:40:53 +01:00
Willem de Bruijn 2c16d60332 netfilter: xt_bpf: support ebpf
Add support for attaching an eBPF object by file descriptor.

The iptables binary can be called with a path to an elf object or a
pinned bpf object. Also pass the mode and path to the kernel to be
able to return it later for iptables dump and save.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:32:35 +01:00
Marcelo Ricardo Leitner 5bad87348c netfilter: x_tables: avoid warn and OOM killer on vmalloc call
Andrey Konovalov reported that this vmalloc call is based on an
userspace request and that it's spewing traces, which may flood the logs
and cause DoS if abused.

Florian Westphal also mentioned that this call should not trigger OOM
killer.

This patch brings the vmalloc call in sync to kmalloc and disables the
warn trace on allocation failure and also disable OOM killer invocation.

Note, however, that under such stress situation, other places may
trigger OOM killer invocation.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:31:41 +01:00
Pablo Neira Ayuso 8411b6442e netfilter: nf_tables: support for set flushing
This patch adds support for set flushing, that consists of walking over
the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set.
This patch requires the following changes:

1) Add set->ops->deactivate_one() operation: This allows us to
   deactivate an element from the set element walk path, given we can
   skip the lookup that happens in ->deactivate().

2) Add a new nft_trans_alloc_gfp() function since we need to allocate
   transactions using GFP_ATOMIC given the set walk path happens with
   held rcu_read_lock.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:31:40 +01:00
Pablo Neira Ayuso 37df5301a3 netfilter: nft_set: introduce nft_{hash, rbtree}_deactivate_one()
This new function allows us to deactivate one single element, this is
required by the set flush command that comes in a follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:31:02 +01:00
Pablo Neira Ayuso 1a37ef769d netfilter: nf_tables: constify struct nft_ctx * parameter in nft_trans_alloc()
Context is not modified by nft_trans_alloc(), so constify it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:51 +01:00
Davide Caratti 3189a290f9 netfilter: nat: skip checksum on offload SCTP packets
SCTP GSO and hardware can do CRC32c computation after netfilter processing,
so we can avoid calling sctp_compute_checksum() on skb if skb->ip_summed
is equal to CHECKSUM_PARTIAL. Moreover, set skb->ip_summed to CHECKSUM_NONE
when the NAT code computes the CRC, to prevent offloaders from computing
it again (on ixgbe this resulted in a transmission with wrong L4 checksum).

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:50 +01:00
Liping Zhang 3b760dcb0f netfilter: rpfilter: bypass ipv4 lbcast packets with zeronet source
Otherwise, DHCP Discover packets(0.0.0.0->255.255.255.255) may be
dropped incorrectly.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:50 +01:00
Pablo Neira Ayuso a9fea2a3c3 netfilter: nf_tables: allow to filter stateful object dumps by type
This patch adds the netlink code to filter out dump of stateful objects,
through the NFTA_OBJ_TYPE netlink attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:49 +01:00
Pablo Neira Ayuso 63aea29060 netfilter: nft_objref: support for stateful object maps
This patch allows us to refer to stateful object dictionaries, the
source register indicates the key data to be used to look up for the
corresponding state object. We can refer to these maps through names or,
alternatively, the map transaction id. This allows us to refer to both
anonymous and named maps.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:48 +01:00
Pablo Neira Ayuso 8aeff920dc netfilter: nf_tables: add stateful object reference to set elements
This patch allows you to refer to stateful objects from set elements.
This provides the infrastructure to create maps where the right hand
side of the mapping is a stateful object.

This allows us to build dictionaries of stateful objects, that you can
use to perform fast lookups using any arbitrary key combination.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:47 +01:00
Pablo Neira Ayuso 1896531710 netfilter: nft_quota: add depleted flag for objects
Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag
indicates we have reached overquota.

Add pointer to table from nft_object, so we can use it when sending the
depletion notification to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:12 +01:00
Pablo Neira Ayuso 2599e98934 netfilter: nf_tables: notify internal updates of stateful objects
Introduce nf_tables_obj_notify() to notify internal state changes in
stateful objects. This is used by the quota object to report depletion
in a follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 12:57:20 +01:00
Pablo Neira Ayuso 43da04a593 netfilter: nf_tables: atomic dump and reset for stateful objects
This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
dump-and-reset of the stateful object. This also comes with add support
for atomic dump and reset for counter and quota objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 12:56:57 +01:00
tim 48a992727d crypto: mcryptd - Check mcryptd algorithm compatibility
Algorithms not compatible with mcryptd could be spawned by mcryptd
with a direct crypto_alloc_tfm invocation using a "mcryptd(alg)" name
construct.  This causes mcryptd to crash the kernel if an arbitrary
"alg" is incompatible and not intended to be used with mcryptd.  It is
an issue if AF_ALG tries to spawn mcryptd(alg) to expose it externally.
But such algorithms must be used internally and not be exposed.

We added a check to enforce that only internal algorithms are allowed
with mcryptd at the time mcryptd is spawning an algorithm.

Link: http://marc.info/?l=linux-crypto-vger&m=148063683310477&w=2
Cc: stable@vger.kernel.org
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07 19:55:37 +08:00
Stephan Mueller 0c1e16cd1e crypto: algif_aead - fix AEAD tag memory handling
For encryption, the AEAD ciphers require AAD || PT as input and generate
AAD || CT || Tag as output and vice versa for decryption. Prior to this
patch, the AF_ALG interface for AEAD ciphers requires the buffer to be
present as input for encryption. Similarly, the output buffer for
decryption required the presence of the tag buffer too. This implies
that the kernel reads / writes data buffers from/to kernel space
even though this operation is not required.

This patch changes the AF_ALG AEAD interface to be consistent with the
in-kernel AEAD cipher requirements.

Due to this handling, he changes are transparent to user space with one
exception: the return code of recv indicates the mount of output buffer.
That output buffer has a different size compared to before the patch
which implies that the return code of recv will also be different.
For example, a decryption operation uses 16 bytes AAD, 16 bytes CT and
16 bytes tag, the AF_ALG AEAD interface before showed a recv return
code of 48 (bytes) whereas after this patch, the return code is 32
since the tag is not returned any more.

Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07 19:55:36 +08:00
Horia Geantă 39eaf75946 crypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernel
Start with a clean slate before dealing with bit 16 (pointer size)
of Master Configuration Register.
This fixes the case of AArch64 boot loader + AArch32 kernel, when
the boot loader might set MCFGR[PS] and kernel would fail to clear it.

Cc: <stable@vger.kernel.org>
Reported-by: Alison Wang <alison.wang@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Reviewed-By: Alison Wang <Alison.wang@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07 19:55:17 +08:00
Romain Perier 9e5f7a149e crypto: marvell - Don't corrupt state of an STD req for re-stepped ahash
mv_cesa_hash_std_step() copies the creq->state into the SRAM at each
step, but this is only required on the first one. By doing that, we
overwrite the engine state, and get erroneous results when the crypto
request is split in several chunks to fit in the internal SRAM.

This commit changes the function to copy the state only on the first
step.

Fixes: commit 2786cee8e5 ("crypto: marvell - Move SRAM I/O op...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07 19:55:17 +08:00
Romain Perier 68c7f8c1c4 crypto: marvell - Don't copy hash operation twice into the SRAM
No need to copy the template of an hash operation twice into the SRAM
from the step function.

Fixes: commit 85030c5168 ("crypto: marvell - Add support for chai...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07 19:55:16 +08:00