1
0
Fork 0
Commit Graph

87642 Commits (f93bd17b916959fc20fbb7dc578e1f2657a8c343)

Author SHA1 Message Date
Bartosz Folta 83a77e9ec4 net: macb: Added PCI wrapper for Platform Driver.
There are hardware PCI implementations of Cadence GEM network
controller. This patch will allow to use such hardware with reuse of
existing Platform Driver.

Signed-off-by: Bartosz Folta <bfolta@cadence.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 10:24:33 -05:00
Alexey Dobriyan 3e1ed981b7 netlink: revert broken, broken "2-clause nla_ok()"
Commit 4f7df337fe
"netlink: 2-clause nla_ok()" is BROKEN.

First clause tests if "->nla_len" could even be accessed at all,
it can not possibly be omitted.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-13 14:54:44 -05:00
Linus Torvalds 067d14f0dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:
 "Just a bunch of small cleanups and fixes here, and support for user
  probes from Allen Pais"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: fix a building error reported by kbuild
  sparc64: fix typo in pgd_clear()
  sparc64: restore irq in error paths in iommu
  sparc: leon: Fix a retry loop in leon_init_timers()
  sparc64: make string buffers large enough
  sparc64: move dereference after check for NULL
  sparc: kernel: use builtin_platform_driver
  sparc64:Support User Probes for sparc
2016-12-12 08:18:41 -08:00
Allen Pais e8f4aa6087 sparc64:Support User Probes for sparc
Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-11 18:01:51 -08:00
Asbjørn Sloth Tønnesen 47c3e7783b net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*
PPPOL2TP_MSG_* and L2TP_MSG_* are duplicates, and are being used
interchangeably in the kernel, so let's standardize on L2TP_MSG_*
internally, and keep PPPOL2TP_MSG_* defined in UAPI for compatibility.

Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10 23:29:11 -05:00
Asbjørn Sloth Tønnesen 41c43fbee6 net: l2tp: export debug flags to UAPI
Move the L2TP_MSG_* definitions to UAPI, as it is part of
the netlink API.

Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10 23:29:11 -05:00
David S. Miller 821781a9f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-10 16:21:55 -05:00
Linus Torvalds cd6628953e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Limit the number of can filters to avoid > MAX_ORDER allocations.
    Fix from Marc Kleine-Budde.

 2) Limit GSO max size in netvsc driver to avoid problems with NVGRE
    configurations. From Stephen Hemminger.

 3) Return proper error when memory allocation fails in
    ser_gigaset_init(), from Dan Carpenter.

 4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao
    Feng.

 5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers, from
    Florian Fainelli.

 6) Handle probe deferral properly in smsc911x driver.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: mlx5: Fix Kconfig help text
  net: smsc911x: back out silently on probe deferrals
  ibmveth: set correct gso_size and gso_type
  net: ethernet: cpmac: Call SET_NETDEV_DEV()
  net: ethernet: lantiq_etop: Call SET_NETDEV_DEV()
  vhost-vsock: fix orphan connection reset
  cxgb4/cxgb4vf: Assign netdev->dev_port with port ID
  driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed
  ser_gigaset: return -ENOMEM on error instead of success
  NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
  can: peak: fix bad memory access and free sequence
  phy: Don't increment MDIO bus refcount unless it's a different owner
  netvsc: reduce maximum GSO size
  drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links
  can: raw: raw_setsockopt: limit number of can_filter that can be set
2016-12-10 09:23:19 -08:00
David S. Miller 5ac9efbe1c Three fixes:
* fix a logic bug introduced by a previous cleanup
  * fix nl80211 attribute confusing (trying to use
    a single attribute for two purposes)
  * fix a long-standing BSS leak that happens when an
    association attempt is abandoned
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYSpxnAAoJEGt7eEactAAd3hEP/0RzU5BLTe3FD39i2ESo4fQo
 q2Wnaa+ES1Ul473rCuSmPLGzlSjh0GciltHXRu7UEf5zXAjwuQtilrKsI9DizVR8
 hgTV4Jp0TDLuDudgxEPlpLxcFWALDaK0AlKuL1dY/FSI1BnNnToEeX8Bum6/otqe
 2wLQ11+70HrdNHJjvBEHP/kE/2D55easydmkCS30WYlFrd0BEFtGZ6Leb8deIAzL
 qQpanf26jBYVTm7ls+j0bt4mYbb0RLcsLrOS8EgyIYhCsbJHbaC2OpYGTbGxR6ob
 KKx01PGVnzytaKXCx/m70923V2mwWZWwa7IgDfoj2IzvsTnfmCgekGdSCiY+DJjE
 1jiDYWVK3KgTJQqXRnE1BCbF/FPK6ABKoPgmJBAAiLC48VpmrQwG0OLLQmYVTdp9
 KLrQztvZAVV1adA32fGpJHecDyQMMZ2xp7TZn9YY3qAiP4APU8IUscKuSXALmKN9
 kMBUBhwkk7QuHZXkry0QFBpFXpOgYjX3vt/gBh8EAmGfyRIklTKtGsmftkuQbWR9
 9BN4TbPznEJECqVy/BCL8llHNkfsJgcz3noFOePUjwa4FCAxJst/NFya+IkkqOQ5
 eAOj5cjsDfxsrdJFGxIsxXrtGZI1MjwKZf3w6jmu/VVL6BMryxYwtWnwrwcBsit7
 nXjitThBO0V2l3Iaf09m
 =HvKt
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2016-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Three fixes:
 * fix a logic bug introduced by a previous cleanup
 * fix nl80211 attribute confusing (trying to use
   a single attribute for two purposes)
 * fix a long-standing BSS leak that happens when an
   association attempt is abandoned
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09 22:59:05 -05:00
Eric Dumazet 6b229cf77d udp: add batching to udp_rmem_release()
If udp_recvmsg() constantly releases sk_rmem_alloc
for every read packet, it gives opportunity for
producers to immediately grab spinlocks and desperatly
try adding another packet, causing false sharing.

We can add a simple heuristic to give the signal
by batches of ~25 % of the queue capacity.

This patch considerably increases performance under
flood by about 50 %, since the thread draining the queue
is no longer slowed by false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09 22:12:21 -05:00
Eric Dumazet c84d949057 udp: copy skb->truesize in the first cache line
In UDP RX handler, we currently clear skb->dev before skb
is added to receive queue, because device pointer is no longer
available once we exit from RCU section.

Since this first cache line is always hot, lets reuse this space
to store skb->truesize and thus avoid a cache line miss at
udp_recvmsg()/udp_skb_destructor time while receive queue
spinlock is held.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09 22:12:21 -05:00
Linus Torvalds 810ac7b755 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "Several fixes to the DSM (ACPI device specific method) marshaling
  implementation.

  I consider these urgent enough to send for 4.9 consideration since
  they fix the kernel's handling of ARS (Address Range Scrub) commands.
  Especially for platforms without machine-check-recovery capabilities,
  successful execution of ARS commands enables the platform to
  potentially break out of an infinite reboot problem if a media error
  is present in the boot path. There is also a one line fix for a
  device-dax read-only mapping regression.

  Commits 9a901f5495 ("acpi, nfit: fix extended status translations
  for ACPI DSMs") and 325896ffdf ("device-dax: fix private mapping
  restriction, permit read-only") are true regression fixes for changes
  introduced this cycle.

  Commit efda1b5d87 ("acpi, nfit, libnvdimm: fix / harden ars_status
  output length handling") fixes the kernel's handling of zero-length
  results, this never would have worked in the past, but we only just
  recently discovered a BIOS implementation that emits this arguably
  spec non-compliant result.

  The remaining two commits are additional fall out from thinking
  through the implications of a zero / truncated length result of the
  ARS Status command.

  In order to mitigate the risk that these changes introduce yet more
  regressions they are backstopped by a new unit test in commit
  a7de92dac9 ("tools/testing/nvdimm: unit test acpi_nfit_ctl()") that
  mocks up inputs to acpi_nfit_ctl()"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: fix private mapping restriction, permit read-only
  tools/testing/nvdimm: unit test acpi_nfit_ctl()
  acpi, nfit: fix bus vs dimm confusion in xlat_status
  acpi, nfit: validate ars_status output buffer size
  acpi, nfit, libnvdimm: fix / harden ars_status output length handling
  acpi, nfit: fix extended status translations for ACPI DSMs
2016-12-09 11:27:22 -08:00
Johannes Berg e6f462df9a cfg80211/mac80211: fix BSS leaks when abandoning assoc attempts
When mac80211 abandons an association attempt, it may free
all the data structures, but inform cfg80211 and userspace
about it only by sending the deauth frame it received, in
which case cfg80211 has no link to the BSS struct that was
used and will not cfg80211_unhold_bss() it.

Fix this by providing a way to inform cfg80211 of this with
the BSS entry passed, so that it can clean up properly, and
use this ability in the appropriate places in mac80211.

This isn't ideal: some code is more or less duplicated and
tracing is missing. However, it's a fairly small change and
it's thus easier to backport - cleanups can come later.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-09 12:57:49 +01:00
Vamsi Krishna 2fa436b3a2 nl80211: Use different attrs for BSSID and random MAC addr in scan req
NL80211_ATTR_MAC was used to set both the specific BSSID to be scanned
and the random MAC address to be used when privacy is enabled. When both
the features are enabled, both the BSSID and the local MAC address were
getting same value causing Probe Request frames to go with unintended
DA. Hence, this has been fixed by using a different NL80211_ATTR_BSSID
attribute to set the specific BSSID (which was the more recent addition
in cfg80211) for a scan.

Backwards compatibility with old userspace software is maintained to
some extent by allowing NL80211_ATTR_MAC to be used to set the specific
BSSID when scanning without enabling random MAC address use.

Scanning with random source MAC address was introduced by commit
ad2b26abc1 ("cfg80211: allow drivers to support random MAC addresses
for scan") and the issue was introduced with the addition of the second
user for the same attribute in commit 818965d391 ("cfg80211: Allow a
scan request for a specific BSSID").

Fixes: 818965d391 ("cfg80211: Allow a scan request for a specific BSSID")
Signed-off-by: Vamsi Krishna <vamsin@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-09 12:47:19 +01:00
Martin KaFai Lau 17bedab272 bpf: xdp: Allow head adjustment in XDP prog
This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header).  It is
done by adding a new XDP helper bpf_xdp_adjust_head().

It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.

This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support.  The driver can then decide
to error out during XDP_SETUP_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 14:25:13 -05:00
Woojung.Huh@microchip.com f38e7a32ee phy: add phy fixup unregister functions
>From : Woojung Huh <woojung.huh@microchip.com>

Add functions to unregister phy fixup for modules.

int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
	Unregister phy fixup from phy_fixup_list per bus_id, phy_uid &
	phy_uid_mask

int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
	Unregister phy fixup from phy_fixup_list.
	Use it for fixup registered by phy_register_fixup_for_uid()

int phy_unregister_fixup_for_id(const char *bus_id)
	Unregister phy fixup from phy_fixup_list.
	Use it for fixup registered by phy_register_fixup_for_id()

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 14:21:47 -05:00
Alexei Starovoitov d2a4dd37f6 bpf: fix state equivalence
Commmits 57a09bf0a4 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
and 484611357c ("bpf: allow access into map value arrays") by themselves
are correct, but in combination they make state equivalence ignore 'id' field
of the register state which can lead to accepting invalid program.

Fixes: 57a09bf0a4 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:31:11 -05:00
Eric Dumazet 3665f3817c net: do not read sk_drops if application does not care
sk_drops can be an often written field, do not read it unless
application showed interest.

Note that sk_drops can be read via inet_diag, so applications
can avoid getting this info from every received packet.

In the future, 'reading' sk_drops might require folding per node or per
cpu fields, and thus become even more expensive than today.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:30:22 -05:00
Eric Dumazet c8c8b12709 udp: under rx pressure, try to condense skbs
Under UDP flood, many softirq producers try to add packets to
UDP receive queue, and one user thread is burning one cpu trying
to dequeue packets as fast as possible.

Two parts of the per packet cost are :
- copying payload from kernel space to user space,
- freeing memory pieces associated with skb.

If socket is under pressure, softirq handler(s) can try to pull in
skb->head the payload of the packet if it fits.

Meaning the softirq handler(s) can free/reuse the page fragment
immediately, instead of letting udp_recvmsg() do this hundreds of usec
later, possibly from another node.

Additional gains :
- We reduce skb->truesize and thus can store more packets per SO_RCVBUF
- We avoid cache line misses at copyout() time and consume_skb() time,
and avoid one put_page() with potential alien freeing on NUMA hosts.

This comes at the cost of a copy, bounded to available tail room, which
is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
than necessary)

This patch gave me about 5 % increase in throughput in my tests.

skb_condense() helper could probably used in other contexts.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:25:07 -05:00
Eric Dumazet 13bfff25c0 net: rfs: add a jump label
RFS is not commonly used, so add a jump label to avoid some conditionals
in fast path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:18:35 -05:00
Niklas Cassel 4022d039a3 net: smmac: allow configuring lower pbl values
The driver currently always sets the PBLx8/PBLx4 bit, which means that
the pbl values configured via the pbl/txpbl/rxpbl DT properties are
always multiplied by 8/4 in the hardware.

In order to allow the DT to configure lower pbl values, while at the
same time not changing behavior of any existing device trees using the
pbl/txpbl/rxpbl settings, add a property to disable the multiplication
of the pbl by 8/4 in the hardware.

Suggested-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:07:10 -05:00
Niklas Cassel 89caaa2d80 net: stmmac: add support for independent DMA pbl for tx/rx
GMAC and newer supports independent programmable burst lengths for
DMA tx/rx. Add new optional devicetree properties representing this.

To be backwards compatible, snps,pbl will still be valid, but
snps,txpbl/snps,rxpbl will override the value in snps,pbl if set.

If the IP is synthesized to use the AXI interface, there is a register
and a matching DT property inside the optional stmmac-axi-config DT node
for controlling burst lengths, named snps,blen.
However, using this register, it is not possible to control tx and rx
independently. Also, this register is not available if the IP was
synthesized with, e.g., the AHB interface.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:07:10 -05:00
Daniele Palmas 7b8076ce8a NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
Telit LE922A MBIM based composition does not work properly
with altsetting toggle done in cdc_ncm_bind_common.

This patch adds CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk
to avoid this procedure that, instead, is mandatory for
other modems.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:02:25 -05:00
Simon Horman 7b684884fb net/sched: cls_flower: Support matching on ICMP type and code
Support matching on ICMP type and code.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol ip parent ffff: flower \
	indev eth0 ip_proto icmp type 8 code 0 action drop

tc filter add dev eth0 protocol ipv6 parent ffff: flower \
	indev eth0 ip_proto icmpv6 type 128 code 0 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 11:47:08 -05:00
Simon Horman 972d3876fa flow dissector: ICMP support
Allow dissection of ICMP(V6) type and code. This should only occur
if a packet is ICMP(V6) and the dissector has FLOW_DISSECTOR_KEY_ICMP set.

There are currently no users of FLOW_DISSECTOR_KEY_ICMP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ICMP to be used by
the flower classifier.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 11:45:21 -05:00
Or Gerlitz faa3ffce78 net/sched: cls_flower: Add support for matching on flags
Add UAPI to provide set of flags for matching, where the flags
provided from user-space are mapped to flow-dissector flags.

The 1st flag allows to match on whether the packet is an
IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 11:32:50 -05:00
David S. Miller 5fccd64aa4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains a large Netfilter update for net-next,
to summarise:

1) Add support for stateful objects. This series provides a nf_tables
   native alternative to the extended accounting infrastructure for
   nf_tables. Two initial stateful objects are supported: counters and
   quotas. Objects are identified by a user-defined name, you can fetch
   and reset them anytime. You can also use a maps to allow fast lookups
   using any arbitrary key combination. More info at:

   http://marc.info/?l=netfilter-devel&m=148029128323837&w=2

2) On-demand registration of nf_conntrack and defrag hooks per netns.
   Register nf_conntrack hooks if we have a stateful ruleset, ie.
   state-based filtering or NAT. The new nf_conntrack_default_on sysctl
   enables this from newly created netnamespaces. Default behaviour is not
   modified. Patches from Florian Westphal.

3) Allocate 4k chunks and then use these for x_tables counter allocation
   requests, this improves ruleset load time and also datapath ruleset
   evaluation, patches from Florian Westphal.

4) Add support for ebpf to the existing x_tables bpf extension.
   From Willem de Bruijn.

5) Update layer 4 checksum if any of the pseudoheader fields is updated.
   This provides a limited form of 1:1 stateless NAT that make sense in
   specific scenario, eg. load balancing.

6) Add support to flush sets in nf_tables. This series comes with a new
   set->ops->deactivate_one() indirection given that we have to walk
   over the list of set elements, then deactivate them one by one.
   The existing set->ops->deactivate() performs an element lookup that
   we don't need.

7) Two patches to avoid cloning packets, thus speed up packet forwarding
   via nft_fwd from ingress. From Florian Westphal.

8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
   prevent infinite loops, patch from Dwip Banerjee. And one minor
   refactoring from Gao feng.

9) Revisit recent log support for nf_tables netdev families: One patch
   to ensure that we correctly handle non-ethernet packets. Another
   patch to add missing logger definition for netdev. Patches from
   Liping Zhang.

10) Three patches for nft_fib, one to address insufficient register
    initialization and another to solve incorrect (although harmless)
    byteswap operation. Moreover update xt_rpfilter and nft_fib to match
    lbcast packets with zeronet as source, eg. DHCP Discover packets
    (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.

11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
    Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
    been broken in many-cast mode for some little time, let's give them a
    chance by placing them at the same level as other existing protocols.
    Thus, users don't explicitly have to modprobe support for this and
    NAT rules work for them. Some people point to the lack of support in
    SOHO Linux-based routers that make deployment of new protocols harder.
    I guess other middleboxes outthere on the Internet are also to blame.
    Anyway, let's see if this has any impact in the midrun.

12) Skip software SCTP software checksum calculation if the NIC comes
    with SCTP checksum offload support. From Davide Caratti.

13) Initial core factoring to prepare conversion to hook array. Three
    patches from Aaron Conole.

14) Gao Feng made a wrong conversion to switch in the xt_multiport
    extension in a patch coming in the previous batch. Fix it in this
    batch.

15) Get vmalloc call in sync with kmalloc flags to avoid a warning
    and likely OOM killer intervention from x_tables. From Marcelo
    Ricardo Leitner.

16) Update Arturo Borrero's email address in all source code headers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 19:16:46 -05:00
Linus Torvalds f27c2f69cc Revert "default exported asm symbols to zero"
This reverts commit 8ab2ae655b.

I loved that commit because of how it explained what the problem with
newer versions of binutils were, but the actual patch itself turns out
to not work very well.

It has two problems:

 - a zero CRC value isn't actually right.  It happens to work for the
   case where both sides of the equation fail at giving the symbol a
   crc, but there are cases where the users of the exported symbol get
   the right crc (due to seeing the C declarations), but the actual
   exporting itself does not (due to the whole weak asm symbol issue).

   So then the module load fails after all - we did have a crc for the
   symbol, but we couldn't match it with the loaded module.

 - it seems that the alpha assembler has special semantics for the
   '.set' directive, and on alpha it doesn't actually set the value of
   the specified symbol at all, it is instead used to set various
   assembly modes (eg ".set noat" and ".set noreorder").

   So using ".set" to set the symbol value would just cause build
   failures on alpha.

I'm sure we'll find some other workaround for these issues (hopefully
that involves getting rid of modversions entirely some day, but people
are also talking about just using smarter tools).  But for now we'll
just fall back on commit faaae2a581 ("Re-enable CONFIG_MODVERSIONS in
a slightly weaker form") that just let's a missing crc through.

Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Philip Müller <philm@manjaro.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07 08:39:00 -08:00
Eric Dumazet 5b8e2f61b9 net: sock_rps_record_flow() is for connected sockets
Paolo noticed a cache line miss in UDP recvmsg() to access
sk_rxhash, sharing a cache line with sk_drops.

sk_drops might be heavily incremented by cpus handling a flood targeting
this socket.

We might place sk_drops on a separate cache line, but lets try
to avoid wasting 64 bytes per socket just for this, since we have
other bottlenecks to take care of.

sock_rps_record_flow() should only access sk_rxhash for connected
flows.

Testing sk_state for TCP_ESTABLISHED covers most of the cases for
connected sockets, for a zero cost, since system calls using
sock_rps_record_flow() also access sk->sk_prot which is on the
same cache line.

A follow up patch will provide a static_key (Jump Label) since most
hosts do not even use RFS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 10:47:35 -05:00
Willem de Bruijn 2c16d60332 netfilter: xt_bpf: support ebpf
Add support for attaching an eBPF object by file descriptor.

The iptables binary can be called with a path to an elf object or a
pinned bpf object. Also pass the mode and path to the kernel to be
able to return it later for iptables dump and save.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:32:35 +01:00
Pablo Neira Ayuso 8411b6442e netfilter: nf_tables: support for set flushing
This patch adds support for set flushing, that consists of walking over
the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set.
This patch requires the following changes:

1) Add set->ops->deactivate_one() operation: This allows us to
   deactivate an element from the set element walk path, given we can
   skip the lookup that happens in ->deactivate().

2) Add a new nft_trans_alloc_gfp() function since we need to allocate
   transactions using GFP_ATOMIC given the set walk path happens with
   held rcu_read_lock.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:31:40 +01:00
Pablo Neira Ayuso 63aea29060 netfilter: nft_objref: support for stateful object maps
This patch allows us to refer to stateful object dictionaries, the
source register indicates the key data to be used to look up for the
corresponding state object. We can refer to these maps through names or,
alternatively, the map transaction id. This allows us to refer to both
anonymous and named maps.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:48 +01:00
Pablo Neira Ayuso 8aeff920dc netfilter: nf_tables: add stateful object reference to set elements
This patch allows you to refer to stateful objects from set elements.
This provides the infrastructure to create maps where the right hand
side of the mapping is a stateful object.

This allows us to build dictionaries of stateful objects, that you can
use to perform fast lookups using any arbitrary key combination.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:47 +01:00
Pablo Neira Ayuso 1896531710 netfilter: nft_quota: add depleted flag for objects
Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag
indicates we have reached overquota.

Add pointer to table from nft_object, so we can use it when sending the
depletion notification to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 13:22:12 +01:00
Pablo Neira Ayuso 2599e98934 netfilter: nf_tables: notify internal updates of stateful objects
Introduce nf_tables_obj_notify() to notify internal state changes in
stateful objects. This is used by the quota object to report depletion
in a follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 12:57:20 +01:00
Pablo Neira Ayuso 43da04a593 netfilter: nf_tables: atomic dump and reset for stateful objects
This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
dump-and-reset of the stateful object. This also comes with add support
for atomic dump and reset for counter and quota objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 12:56:57 +01:00
Pablo Neira Ayuso 795595f68d netfilter: nft_quota: dump consumed quota
Add a new attribute NFTA_QUOTA_CONSUMED that displays the amount of
quota that has been already consumed. This allows us to restore the
internal state of the quota object between reboots as well as to monitor
how wasted it is.

This patch changes the logic to account for the consumed bytes, instead
of the bytes that remain to be consumed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07 12:54:22 +01:00
Marc Kleine-Budde 332b05ca7a can: raw: raw_setsockopt: limit number of can_filter that can be set
This patch adds a check to limit the number of can_filters that can be
set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER
are not prevented resulting in a warning.

Reference: https://lkml.org/lkml/2016/12/2/230

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-12-07 10:45:57 +01:00
David S. Miller c63d352f05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-06 21:33:19 -05:00
Dan Williams efda1b5d87 acpi, nfit, libnvdimm: fix / harden ars_status output length handling
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
field of the ARS (Address Range Scrub) Status command, a firmware
implementation may in practice return 0, 4, or 8 to indicate that there
is no output payload to process.

The specification states "Size of Output Buffer in bytes, including this
field.". However, 'Output Buffer' is also the name of the entire
payload, and earlier in the specification it states "Max Query ARS
Status Output Buffer Size: Maximum size of buffer (including the Status
and Extended Status fields)".

Without this fix if the BIOS happens to return 0 it causes memory
corruption as evidenced by this result from the acpi_nfit_ctl() unit
test.

 ars_status00000000: 00020000 00000000                    ........
 BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
 kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
 task: ffff8803332d2ec0 task.stack: ffffc9000174c000
 RIP: 0010:[<ffffffff814cfe72>]  [<ffffffff814cfe72>] __memcpy+0x12/0x20
 RSP: 0018:ffffc9000174f9a8  EFLAGS: 00010246
 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
 FS:  00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
 Stack:
  ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
  0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
  0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
 Call Trace:
  [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
  [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test]

Cc: <stable@vger.kernel.org>
Fixes: 747ffe11b4 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06 16:08:10 -08:00
Pablo Neira Ayuso c97d22e68b netfilter: nf_tables: add stateful object reference expression
This new expression allows us to refer to existing stateful objects from
rules.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:48:25 +01:00
Pablo Neira Ayuso 173705d9a2 netfilter: nft_quota: add stateful object type
Register a new quota stateful object type into the new stateful object
infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:48:24 +01:00
Pablo Neira Ayuso b1ce0ced10 netfilter: nft_counter: add stateful object type
Register a new percpu counter stateful object type into the stateful
object infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:48:23 +01:00
Pablo Neira Ayuso e50092404c netfilter: nf_tables: add stateful objects
This patch augments nf_tables to support stateful objects. This new
infrastructure allows you to create, dump and delete stateful objects,
that are identified by a user-defined name.

This patch adds the generic infrastructure, follow up patches add
support for two stateful objects: counters and quotas.

This patch provides a native infrastructure for nf_tables to replace
nfacct, the extended accounting infrastructure for iptables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:48:22 +01:00
Florian Westphal 3bf3276119 netfilter: add and use nf_fwd_netdev_egress
... so we can use current skb instead of working with a clone.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:48:22 +01:00
Florian Westphal df122f58b8 netfilter: ingress: translate 0 nf_hook_slow retval to -1
The caller assumes that < 0 means that skb was stolen (or free'd).

All other return values continue skb processing.

nf_hook_slow returns 3 different return value types:

A) a (negative) errno value: the skb was dropped (NF_DROP, e.g.
by iptables '-j DROP' rule).

B) 0. The skb was stolen by the hook or queued to userspace.

C) 1. all hooks returned NF_ACCEPT so the caller should invoke
   the okfn so packet processing can continue.

nft ingress facility currently doesn't have the 'okfn' that
the NF_HOOK() macros use; there is no nfqueue support either.

So 1 means that nf_hook_ingress() caller should go on processing the skb.

In order to allow use of NF_STOLEN from ingress we need to translate
this to an errno number, else we'd crash because we continue with
already-free'd (or about to be free-d) skb.

The errno value isn't checked, its just important that its less than 0,
so return -1.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:48:21 +01:00
Pablo Neira Ayuso 1814096980 netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields
This patch adds a new flag that signals the kernel to update layer 4
checksum if the packet field belongs to the layer 4 pseudoheader. This
implicitly provides stateless NAT 1:1 that is useful under very specific
usecases.

Since rules mangling layer 3 fields that are part of the pseudoheader
may potentially convey any layer 4 packet, we have to deal with the
layer 4 checksum adjustment using protocol specific code.

This patch adds support for TCP, UDP and ICMPv6, since they include the
pseudoheader in the layer 4 checksum calculation. ICMP doesn't, so we
can skip it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:47:54 +01:00
Florian Westphal ae0ac0ed6f netfilter: x_tables: pack percpu counter allocations
instead of allocating each xt_counter individually, allocate 4k chunks
and then use these for counter allocation requests.

This should speed up rule evaluation by increasing data locality,
also speeds up ruleset loading because we reduce calls to the percpu
allocator.

As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
arches with 64k page size.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:19 +01:00
Florian Westphal f28e15bace netfilter: x_tables: pass xt_counters struct to counter allocator
Keeps some noise away from a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:18 +01:00
Florian Westphal 4d31eef517 netfilter: x_tables: pass xt_counters struct instead of packet counter
On SMP we overload the packet counter (unsigned long) to contain
percpu offset.  Hide this from callers and pass xt_counters address
instead.

Preparation patch to allocate the percpu counters in page-sized batch
chunks.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:17 +01:00