Commit graph

565804 commits

Author SHA1 Message Date
Xing Zheng af72261f33 net: ethernet: arc: Add support emac for RK3036
The RK3036's GRFs offset are different with RK3066/RK3188, and need to set
mac TX/RX clock before probe emac.

Signed-off-by: Xing Zheng <zhengxing@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:21:31 -05:00
Xing Zheng f4c9d3ee03 net: ethernet: arc: Keep emac compatibility for more Rockchip SoCs
On the RK3066/RK3188, there was fixed GRF offset configuration to set emac
and fixed DIV2 mac TX/RX clock. So, we need to easily set and fit to other
SoCs (RK3036) which maybe have different GRF offset, and need adjust mac
TX/RX clock.

Signed-off-by: Xing Zheng <zhengxing@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:21:31 -05:00
Xing Zheng c9bca2fe3c net: ethernet: arc: Probe emac after set RMII clock
After enter arc_emac_probe, emac will get_phy_id, phy_poll_reset and
other connecting PHY via mdiobus_read, so we need to set correct
ref clock rate for emac before probe emac.

Signed-off-by: Xing Zheng <zhengxing@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:21:31 -05:00
David S. Miller 0652cb5b8b Merge branch 'bnxt_en-zeropad-fw-and-reset'
Michael Chan says:

====================
bnxt_en: Zero pad fw messages and add fw reset.

2 patches related to firmware for net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:19:19 -05:00
Rob Swindell d2d6318cb9 bnxt_en: Reset embedded processor after applying firmware upgrade
Use HWRM_FW_RESET command to request a self-reset of the embedded
processor(s) after successfully applying a firmware update. For boot
processor, the self-reset is currently deferred until the next PCIe reset.

Signed-off-by: Rob Swindell <swindell@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:19:19 -05:00
Michael Chan d79979a103 bnxt_en: Zero pad firmware messages to 128 bytes.
For future compatibility, zero pad all messages that the driver sends
to the firmware to 128 bytes.  If these messages are extended in the
future with new byte enables, zero padding these messages now will
guarantee future compatibility.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:19:19 -05:00
Daniel Borkmann 1f211a1b92 net, sched: add clsact qdisc
This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).

Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb->priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb->priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb->tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.

Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb->priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).

The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
dev->_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.

Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.

I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.

Example, adding qdisc:

   # tc qdisc add dev foo clsact
   # tc qdisc show dev foo
   qdisc mq 0: root
   qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc clsact ffff: parent ffff:fff1

Adding filters (deleting, etc works analogous by specifying ingress/egress):

   # tc filter add dev foo ingress bpf da obj bar.o sec ingress
   # tc filter add dev foo egress  bpf da obj bar.o sec egress
   # tc filter show dev foo ingress
   filter protocol all pref 49152 bpf
   filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
   # tc filter show dev foo egress
   filter protocol all pref 49152 bpf
   filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.

Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.

  [1] http://patchwork.ozlabs.org/patch/512949/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:13:15 -05:00
Andrew Lunn ede5599753 ethernet: amd: au1000: Remove pointless warning
The warning about being able to read any MDIO device, not just the
attached ethernet devices PHY applies to all MDIO drivers. So remove
it. This also removes a reference to a member in phy_device which has
moved.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:06:59 -05:00
Andrew Lunn 3fe01e2406 staging: netlogic: Fix build error due to missed API change
Fix a number of build errors due to moving the phy_map and centralizing
interrupt allocation.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:06:58 -05:00
Guenter Roeck e574f39816 net: ethernet: faraday: Use phy_find_first() instead of open coding it
Use phy_find_first() to find the first phy device instead of
open coding it.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:05:30 -05:00
Guenter Roeck ee64f08ea9 net: ethernet: broadcom: Fix build errors
Commit 7f854420fb ("phy: Add API for {un}registering an mdio device to
a bus") introduces an API to access mii_bus structures, but missed to
update the sb1250 driver. This results in the following build error.

drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe':
drivers/net/ethernet/broadcom/sb1250-mac.c:2360:24: error:
	'struct mii_bus' has no member named 'phy_map'

Use phy_find_first() instead of open coding it.

Commit 2220943a21 ("phy: Centralise print about attached phy") introduces
the following build error.

drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe':
drivers/net/ethernet/broadcom/sb1250-mac.c:2383:20: error: 'phydev' undeclared

Fixes: 7f854420fb ("phy: Add API for {un}registering an mdio device to a bus")
Fixes: 2220943a21 ("phy: Centralise print about attached phy")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 22:05:30 -05:00
David S. Miller 5c721d561d Merge branch 'mdio-device-fixes'
Andrew Lunn says:

====================
Fix breakage from mdio device

These two patches fix MIPS platforms which got broken by
the recent mdio device patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 18:03:47 -05:00
Andrew Lunn 0c129bf756 net: ethernet-rgmii.c: Fix breakage from moving phdev bus
The mdio device patches moved the bus member in phy_device into a
substructure. This driver got missed. Fix it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 18:03:47 -05:00
Andrew Lunn 2a4fc4ea29 net: lantiq_etop.c: Use helper to find first phy
Make use of the helper to find the first phy device.
This also fixes the compile breakage.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 18:03:47 -05:00
Romain Perier 6c672c9bf4 stmmac: Don't exit mdio registration when mdio subnode is not found in the DTS
Originally, most of the platforms using this driver did not define an mdio subnode
in the devicetree. Commit e34d65 ("stmmac: create of compatible mdio bus for stmmac driver")
introduced a backward compatibily issue by using of_mdiobus_register explicitly
with an mdio subnode. This patch fixes the issue by calling the function
mdiobus_register, when mdio subnode is not found. The driver is now compatible
with both modes.

Fixes: e34d65696d ("stmmac: create of compatible mdio bus for stmmac driver")
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 18:02:33 -05:00
Linus Torvalds afd2ff9b7e Linux 4.4 2016-01-10 15:01:32 -08:00
Sasha Levin 320f1a4a17 net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memory
proc_dostring() needs an initialized destination string, while the one
provided in proc_sctp_do_hmac_alg() contains stack garbage.

Thus, writing to cookie_hmac_alg would strlen() that garbage and end up
accessing invalid memory.

Fixes: 3c68198e7 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 18:01:01 -05:00
Kristian Evensen 18715b2615 net: qmi_wwan: Add SIMCom 7230E
SIMCom 7230E is a QMI LTE module with support for most "normal" bands.
Manual testing has showed that only interface five works.

Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:55:34 -05:00
David S. Miller 749f7df186 Merge branch 'bpf-next'
Daniel Borkmann says:

====================
BPF update

Fixes a csum issue on ingress. As mentioned previously, net-next
seems just fine imho. Later on, will follow up with couple of
replacements like ovs_skb_postpush_rcsum() etc.

Thanks!

v1 -> v2:
  - Added patch 1 with helper
  - Implemented Hannes' idea to just use csum_partial, thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:54:28 -05:00
Daniel Borkmann f8ffad69c9 bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions
Add a small helper skb_postpush_rcsum() and fix up redirect locations
that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects
a proper csum that covers also Ethernet header, f.e. since 2c26d34bbc
("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we
also do skb_postpull_rcsum() after pulling Ethernet header off via
eth_type_trans().

When using eBPF in a netns setup f.e. with vxlan in collect metadata mode,
I can trigger the following csum issue with an IPv6 setup:

  [  505.144065] dummy1: hw csum failure
  [...]
  [  505.144108] Call Trace:
  [  505.144112]  <IRQ>  [<ffffffff81372f08>] dump_stack+0x44/0x5c
  [  505.144134]  [<ffffffff81607cea>] netdev_rx_csum_fault+0x3a/0x40
  [  505.144142]  [<ffffffff815fee3f>] __skb_checksum_complete+0xcf/0xe0
  [  505.144149]  [<ffffffff816f0902>] nf_ip6_checksum+0xb2/0x120
  [  505.144161]  [<ffffffffa08c0e0e>] icmpv6_error+0x17e/0x328 [nf_conntrack_ipv6]
  [  505.144170]  [<ffffffffa0898eca>] ? ip6t_do_table+0x2fa/0x645 [ip6_tables]
  [  505.144177]  [<ffffffffa08c0725>] ? ipv6_get_l4proto+0x65/0xd0 [nf_conntrack_ipv6]
  [  505.144189]  [<ffffffffa06c9a12>] nf_conntrack_in+0xc2/0x5a0 [nf_conntrack]
  [  505.144196]  [<ffffffffa08c039c>] ipv6_conntrack_in+0x1c/0x20 [nf_conntrack_ipv6]
  [  505.144204]  [<ffffffff8164385d>] nf_iterate+0x5d/0x70
  [  505.144210]  [<ffffffff816438d6>] nf_hook_slow+0x66/0xc0
  [  505.144218]  [<ffffffff816bd302>] ipv6_rcv+0x3f2/0x4f0
  [  505.144225]  [<ffffffff816bca40>] ? ip6_make_skb+0x1b0/0x1b0
  [  505.144232]  [<ffffffff8160b77b>] __netif_receive_skb_core+0x36b/0x9a0
  [  505.144239]  [<ffffffff8160bdc8>] ? __netif_receive_skb+0x18/0x60
  [  505.144245]  [<ffffffff8160bdc8>] __netif_receive_skb+0x18/0x60
  [  505.144252]  [<ffffffff8160ccff>] process_backlog+0x9f/0x140
  [  505.144259]  [<ffffffff8160c4a5>] net_rx_action+0x145/0x320
  [...]

What happens is that on ingress, we push Ethernet header back in, either
from cls_bpf or right before skb_do_redirect(), but without updating csum.
The "hw csum failure" can be fixed by using the new skb_postpush_rcsum()
helper for the dev_forward_skb() case to correct the csum diff again.

Thanks to Hannes Frederic Sowa for the csum_partial() idea!

Fixes: 3896d655f4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f6305 ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:54:28 -05:00
Daniel Borkmann fdc5432a7b net, sched: add skb_at_tc_ingress helper
Add a skb_at_tc_ingress() as this will be needed elsewhere as well and
can hide the ugly ifdef.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:54:28 -05:00
David S. Miller 4156afafcc Merge branch 'tcp-keepalive-namespaceify'
Nikolay Borisov says:

====================
Namespaceify tcp keepalive machinery

The following patch series enables the tcp keepalive mechanism
to be configured per net namespace. This is especially useful
if you have multiple containers hosted on one node and one of
them is under DoS-  in such situations one thing which could
be done is to configure the tcp keepalive settings such that
connections for that particular container are being reset
faster.

Another scenario where not being able to control those knob
comes per container is problematic is occurs the value of
net.netfilter.nf_conntrack_tcp_timeout_established is set
below the keepalive interval, in such situations the server won't
send an RST packet resulting in applications not trying to
reconnect and stale connection waiting. Changing the global
keepalive value is a possible solution but it might interfere
with other containers.

The three patches gradually convert each of the affected knobs
to be per netns. I thought it would be easier for review than
put everything in one patch. If people deem it more appropriate
to squash everything in one patch (maybe after review) I'd
be more than happy to do it.

The patches have been compile-tested on 4.4 and functionally
tested on 3.12 and they work as expected.

These are based off 4.4-rc8
====================

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:32:09 -05:00
Nikolay Borisov b840d15d39 ipv4: Namespecify the tcp_keepalive_intvl sysctl knob
This is the final part required to namespaceify the tcp
keep alive mechanism.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:32:09 -05:00
Nikolay Borisov 9bd6861bd4 ipv4: Namespecify tcp_keepalive_probes sysctl knob
This is required to have full tcp keepalive mechanism namespace
support.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:32:09 -05:00
Nikolay Borisov 13b287e8d1 ipv4: Namespaceify tcp_keepalive_time sysctl knob
Different net namespaces might have different requirements as to
the keepalive time of tcp sockets. This might be required in cases
where different firewall rules are in place which require tcp
timeout sockets to be increased/decreased independently of the host.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:32:09 -05:00
Hannes Frederic Sowa 787d7ac308 udp: restrict offloads to one namespace
udp tunnel offloads tend to aggregate datagrams based on inner
headers. gro engine gets notified by tunnel implementations about
possible offloads. The match is solely based on the port number.

Imagine a tunnel bound to port 53, the offloading will look into all
DNS packets and tries to aggregate them based on the inner data found
within. This could lead to data corruption and malformed DNS packets.

While this patch minimizes the problem and helps an administrator to find
the issue by querying ip tunnel/fou, a better way would be to match on
the specific destination ip address so if a user space socket is bound
to the same address it will conflict.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:28:24 -05:00
David S. Miller d3517f19f2 Merge branch 'mlxsw-layer2-multicast'
Jiri Pirko says:

====================
mlxsw: Adding layer 2 multicast

Elad says:

This patchset add Linux hardware reflection for L2 multicast offload and add
MC support in mlxsw. For every bridge MDB entry insertion, either by IGMP
snooping or by static insertion/removal, a switchdev ops is been called.
In mlxsw, a new multicast group (MID) is been created and ports are assigned.
When all ports are removed, the multicast group is been deleted.

---
v1->v2:
- GFP_ATOMIC->GFP_KERNEL change in patch 7/8
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz 4f5590f8cd switchdev: Adding IGMP snooping documentation
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz 3a49b4fde2 mlxsw: Adding layer 2 multicast support
Add SWITCHDEV_OBJ_ID_PORT_MDB switchdev ops support. On first MDB insertion
creates a new multicast group (MID) and add members port to the MID. Also
add new MDB entry for the flooding-domain (fid-vid) and link the MDB entry
to the newly constructed MC group.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz e4b6f6931c mlxsw: Adding VID to FID translatation
Adding a generic function that translate VID to FID.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz 53ae628316 mlxsw: Changing the maximum number of multicast group to a define
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz fabe548322 mlxsw: reg: Adding SMID register
Adding back SMID register definition and packing. For each MC group a new
SMID entry will be generated.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz 5230b25f06 mlxsw: reg: Add definition of multicast record for SFD register
Multicast-related records have specific format in SFD register.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz f1fecb1d10 bridge: Reflect MDB entries to hardware
Offload MDB changes per port to hardware

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz 4d41e12593 switchdev: Adding MDB entry offload
Define HW multicast entry: MAC and VID.
Using a MAC address simplifies support for both IPV4 and IPv6.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:20 -05:00
Mickaël Salaün 3e46b25376 um: Use race-free temporary file creation
Open the memory mapped file with the O_TMPFILE flag when available.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Acked-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:50 +01:00
Mickaël Salaün 571d2f0c34 um: Do not set unsecure permission for temporary file
Remove the insecure 0777 mode for temporary file to prohibit other users
to change the executable mapped code.

An attacker could gain access to the mapped file descriptor from the
temporary file (before it is unlinked) in a read-only mode but it should
not be accessible in write mode to avoid arbitrary code execution.

To not change the hostfs behavior, the temporary file creation
permission now depends on the current umask(2) and the implementation of
mkstemp(3).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Acked-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:50 +01:00
Mickaël Salaün 42d91f612c um: Fix build error and kconfig for i386
Fix build error by generating elfcore.o only when ELF_CORE (depending on
COREDUMP) is selected:

arch/x86/um/built-in.o: In function `elf_core_write_extra_phdrs':
(.text+0x3e62): undefined reference to `dump_emit'
arch/x86/um/built-in.o: In function `elf_core_write_extra_data':
(.text+0x3eef): undefined reference to `dump_emit'

Fixes: 5d2acfc7b9 ("kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-01-10 21:49:49 +01:00
Mickaël Salaün c50b4659e4 um: Add seccomp support
This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through
prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64
subarchitectures.

secure_computing() is called first in handle_syscall() so that the
syscall emulation will be aborted quickly if matching a seccomp rule.

This is inspired from Meredydd Luff's patch
(https://gerrit.chromium.org/gerrit/21425).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10 21:49:49 +01:00
Mickaël Salaün d8f8b84456 um: Add full asm/syscall.h support
Add subarchitecture-independent implementation of asm-generic/syscall.h
allowing access to user system call parameters and results:
* syscall_get_nr()
* syscall_rollback()
* syscall_get_error()
* syscall_get_return_value()
* syscall_set_return_value()
* syscall_get_arguments()
* syscall_set_arguments()
* syscall_get_arch() provided by arch/x86/um/asm/syscall.h

This provides the necessary syscall helpers needed by
HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error().

This is inspired from Meredydd Luff's patch
(https://gerrit.chromium.org/gerrit/21425).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10 21:49:49 +01:00
Mickaël Salaün 4a0b880704 selftests/seccomp: Remove the need for HAVE_ARCH_TRACEHOOK
Some architectures do not implement PTRACE_GETREGSET nor
PTRACE_SETREGSET (required by HAVE_ARCH_TRACEHOOK) but only implement
PTRACE_GETREGS and PTRACE_SETREGS (e.g. User-mode Linux).

This improve seccomp selftest portability for architectures without
HAVE_ARCH_TRACEHOOK support by defining a new trigger HAVE_GETREGS. For
now, this is only enabled for i386 and x86_64 architectures. This is
required to be able to run this tests on User-mode Linux.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10 21:49:49 +01:00
Mickaël Salaün e04c989eb7 um: Fix ptrace GETREGS/SETREGS bugs
This fix two related bugs:
* PTRACE_GETREGS doesn't get the right orig_ax (syscall) value
* PTRACE_SETREGS can't set the orig_ax value (erased by initial value)

Get rid of the now useless and error-prone get_syscall().

Fix inconsistent behavior in the ptrace implementation for i386 when
updating orig_eax automatically update the syscall number as well. This
is now updated in handle_syscall().

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Anton Ivanov <aivanov@brocade.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10 21:49:48 +01:00
Vegard Nossum a7df4716d1 um: link with -lpthread
Similarly to commit fb1770aa78, with gcc 5
on Ubuntu and CONFIG_STATIC_LINK=y I was seeing these linker errors:

/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new':
(.text+0xcd): undefined reference to `pthread_once'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new':
(.text+0x126): undefined reference to `pthread_attr_init'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new':
(.text+0x168): undefined reference to `pthread_attr_setdetachstate'
[...]

Obviously we also need -lpthread for librt.a.

Cc: stable@vger.kernel.org # 4.4
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:48 +01:00
Anton Ivanov 8c6157b6b3 um: Update UBD to use pread/pwrite family of functions
This decreases the number of syscalls per read/write by half.

Signed-off-by: Anton Ivanov <aivanov@brocade.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:48 +01:00
Anton Ivanov 470a166e8c um: Do not change hard IRQ flags in soft IRQ processing
Software IRQ processing in generic architectures assumes that the
exit out of hard IRQ may have re-enabled interrupts (some
architectures may have an implicit EOI). It presumes them enabled
and toggles the flags once more just in case unless this is turned
off in the architecture specific hardirq.h by setting
__ARCH_IRQ_EXIT_IRQS_DISABLED

This patch adds this to UML where due to the way IRQs are handled
it is an optimization (it works fine without it too).

Signed-off-by: Anton Ivanov <aivanov@brocade.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:48 +01:00
Anton Ivanov d5e3f5cbe5 um: Prevent IRQ handler reentrancy
The existing IRQ handler design in UML does not prevent reentrancy

This is mitigated by fd-enable/fd-disable semantics for the IO
portion of the UML subsystem. The timer, however, can and is
re-entered resulting in very deep stack usage and occasional
stack exhaustion.

This patch prevents this by checking if there is a timer
interrupt in-flight before processing any pending timer interrupts.

Signed-off-by: Anton Ivanov <aivanov@brocade.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:47 +01:00
Vegard Nossum 0754fb298f uml: flush stdout before forking
I was seeing some really weird behaviour where piping UML's output
somewhere would cause output to get duplicated:

  $ ./vmlinux | head -n 40
  Checking that ptrace can change system call numbers...Core dump limits :
          soft - 0
          hard - NONE
  OK
  Checking syscall emulation patch for ptrace...Core dump limits :
          soft - 0
          hard - NONE
  OK
  Checking advanced syscall emulation patch for ptrace...Core dump limits :
          soft - 0
          hard - NONE
  OK
  Core dump limits :
          soft - 0
          hard - NONE

This is because these tests do a fork() which duplicates the non-empty
stdout buffer, then glibc flushes the duplicated buffer as each child
exits.

A simple workaround is to flush before forking.

Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:47 +01:00
Vegard Nossum 9f2dfda2f2 uml: fix hostfs mknod()
An inverted return value check in hostfs_mknod() caused the function
to return success after handling it as an error (and cleaning up).

It resulted in the following segfault when trying to bind() a named
unix socket:

  Pid: 198, comm: a.out Not tainted 4.4.0-rc4
  RIP: 0033:[<0000000061077df6>]
  RSP: 00000000daae5d60  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208
  RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600
  RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000
  R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000
  R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88
  Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6
  CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1
  Stack:
   e027d620 dfc54208 0000006f da981398
   61bee000 0000c1ed daae5de0 0000006e
   e027d620 dfcd4208 00000005 6092a460
  Call Trace:
   [<60dedc67>] SyS_bind+0xf7/0x110
   [<600587be>] handle_syscall+0x7e/0x80
   [<60066ad7>] userspace+0x3e7/0x4e0
   [<6006321f>] ? save_registers+0x1f/0x40
   [<6006c88e>] ? arch_prctl+0x1be/0x1f0
   [<60054985>] fork_handler+0x85/0x90

Let's also get rid of the "cosmic ray protection" while we're at it.

Fixes: e9193059b1 "hostfs: fix races in dentry_name() and inode_name()"
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:47 +01:00
Rami Rosen 2cfa2b191e cgroup: fix a typo.
This patch fixes a typo in a comment in cgroup.c.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-01-10 12:26:45 -05:00
Sudip Mukherjee eb37bc3f85 m68k: Provide __phys_to_pfn() and __pfn_to_phys()
The defconfig build of m68k was failing with the error:
implicit declaration of function '__pfn_to_phys'

Other architectures have added <asm/memory.h>, but if we do so here then
we will also get redeclaration of some other functions. So it is better
to copy these macros into page.h.

Fixes: 0a3c3bf11240 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()")
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: Guenter Roeck <linux@roeck-us.net> (m68knommu)
[geert: Apply to page.h instead of page_mm.h to cover nommu, reword]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-01-10 13:56:40 +01:00