Commit graph

483661 commits

Author SHA1 Message Date
Eli Cohen f66f049fb7 net/mlx5_core: Request the mlx5 IB module on driver load
Call request module on mlx5_ib so it will be available for applications
requiring it, such as installers that require boot over IB.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:54 -05:00
David S. Miller 4945c1d644 Merge branch 'r8169-next'
Chunhao Lin says:

====================
r8169:change hardware setting

This patch series contains two hardware setting modification to prevent
hardware become abnormal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:43:31 -05:00
Chun-Hao Lin 003609da5e r8169:disable rtl8168ep cmac engine
Cmac engine is the bridge between driver and dash firmware.
Other os may not disable cmac when leave. And r8169 did not allocate any
resources for cmac engine. Disable it to prevent abnormal system behavior.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:43:26 -05:00
Chun-Hao Lin d6e572911a r8169:prevent enable hardware tx/rx too early
For RTL8168G/GU/H/EP and RTL8411B remove enable tx/rx from its own hw_start
function. This will prevent enable tx/rx before complete hardware tx/rx
setting.

Tx/Rx will be enabled in the end of function rtl_hw_start_8168.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:43:25 -05:00
David S. Miller 66813d4d04 Merge branch 'tipc-next'
Ying Xue says:

====================
tipc: convert name table read-write lock to RCU

Now TIPC name table is statically allocated and is protected with a
Read-Write lock. To enhance the performance of TIPC name table lookup,
we are going to involve RCU lock to protect the name table. As a
consequence, it becomes lockless to concurrently look up name table on
read side. However, before the conversion can be successfully made,
the following two things must be first done:

- change allocation way of name table from static to dynamic
- fix several incorrect locking policy issues
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:40:38 -05:00
Ying Xue 97ede29e80 tipc: convert name table read-write lock to RCU
Convert tipc name table read-write lock to RCU. After this change,
a new spin lock is used to protect name table on write side while
RCU is applied on read side.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue 834caafa3e tipc: remove unnecessary INIT_LIST_HEAD
When a list_head variable is seen as a new entry to be added to a
list head, it's unnecessary to be initialized with INIT_LIST_HEAD().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue 5492390a94 tipc: simplify relationship between name table lock and node lock
When tipc name sequence is published, name table lock is released
before name sequence buffer is delivered to remote nodes through its
underlying unicast links. However, when name sequence is withdrawn,
the name table lock is held until the transmission of the removal
message of name sequence is finished. During the process, node lock
is nested in name table lock. To prevent node lock from being nested
in name table lock, while withdrawing name, we should adopt the same
locking policy of publishing name sequence: name table lock should
be released before message is sent.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue 3493d25cfb tipc: any name table member must be protected under name table lock
As tipc_nametbl_lock is used to protect name_table structure, the lock
must be held while all members of name_table structure are accessed.
However, the lock is not obtained while a member of name_table
structure - local_publ_count is read in tipc_nametbl_publish(), as
a consequence, an inconsistent value of local_publ_count might be got.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:57 -05:00
Ying Xue fb9962f3ce tipc: ensure all name sequences are properly protected with its lock
TIPC internally created a name table which is used to store name
sequences. Now there is a read-write lock - tipc_nametbl_lock to
protect the table, and each name sequence saved in the table is
protected with its private lock. When a name sequence is inserted
or removed to or from the table, its members might need to change.
Therefore, in normal case, the two locks must be held while TIPC
operates the table. However, there are still several places where
we only hold tipc_nametbl_lock without proprerly obtaining name
sequence lock, which might cause the corruption of name sequence.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:56 -05:00
Ying Xue 38622f4195 tipc: ensure all name sequences are released when name table is stopped
As TIPC subscriber server is terminated before name table, no user
depends on subscription list of name sequence when name table is
stopped. Therefore, all name sequences stored in name table should
be released whatever their subscriptions lists are empty or not,
otherwise, memory leak might happen.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:56 -05:00
Ying Xue 993bfe5daf tipc: make name table allocated dynamically
Name table locking policy is going to be adjusted from read-write
lock protection to RCU lock protection in the future commits. But
its essential precondition is to convert the allocation way of name
table from static to dynamic mode.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:56 -05:00
Ying Xue 1b61e70ad1 tipc: remove size variable from publ_list struct
The size variable is introduced in publ_list struct to help us exactly
calculate SKB buffer sizes needed by publications when all publications
in name table are delivered in bulk in named_distribute(). But if
publication SKB buffer size is assumed to MTU, the size variable in
publ_list struct can be completely eliminated at the cost of wasting
a bit memory space for last SKB.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tero Aho <tero.aho@coriant.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:39:55 -05:00
Joe Perches 60c04aecd8 udp: Neaten and reduce size of compute_score functions
The compute_score functions are a bit difficult to read.

Neaten them a bit to reduce object sizes and make them a
bit more intelligible.

Return early to avoid indentation and avoid unnecessary
initializations.

(allyesconfig, but w/ -O2 and no profiling)

$ size net/ipv[46]/udp.o.*
   text    data     bss     dec     hex filename
  28680    1184      25   29889    74c1 net/ipv4/udp.o.new
  28756    1184      25   29965    750d net/ipv4/udp.o.old
  17600    1010       2   18612    48b4 net/ipv6/udp.o.new
  17632    1010       2   18644    48d4 net/ipv6/udp.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:28:47 -05:00
Petri Gynther b0ba512e25 net: bcmgenet: enable driver to work without a device tree
Modify bcmgenet driver so that it can be used on Broadcom 7xxx
MIPS-based STB platforms without a device tree.

Signed-off-by: Petri Gynther <pgynther@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:26:59 -05:00
Haiyang Zhang c3582a2c4d hyperv: Add support for vNIC hot removal
This patch adds proper handling of the vNIC hot removal event, which includes
a rescind-channel-offer message from the host side that triggers vNIC close and
removal. In this case, the notices to the host during close and removal is not
necessary because the channel is rescinded. This patch blocks these unnecessary
messages, and lets vNIC removal process complete normally.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:24:11 -05:00
Denis Kirjanov 6867b17b26 test: bpf: expand DIV_KX to DIV_MOD_KX
Expand DIV_KX to use BPF_MOD operation in the
DIV_KX bpf 'classic' test.

CC: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:23:22 -05:00
David S. Miller aae68bc6f6 Merge branch 'tstamp-next'
Willem de Bruijn says:

====================
timestamping updates

The main goal for this patchset is to allow correlating timestamps
with the egress interface. Also introduce a warning, as discussed
previously, and update the tests to verify the new feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:55 -05:00
Willem de Bruijn cbd3aad5ce net-timestamp: expand documentation and test
Documentation:
  expand explanation of timestamp counter

Test:
  new: flag -I requests and prints PKTINFO
  new: flag -x prints payload (possibly truncated)
  fix: remove pretty print that breaks common flag '-l 1'

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
Willem de Bruijn 829ae9d611 net-timestamp: allow reading recv cmsg on errqueue with origin tstamp
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
Willem de Bruijn 7ce875e5ec ipv4: warn once on passing AF_INET6 socket to ip_recv_error
One line change, in response to catching an occurrence of this bug.
See also fix f4713a3dfa ("net-timestamp: make tcp_recvmsg call ...")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:20:48 -05:00
Joe Perches e6c97234d1 i40e: Reduce stack in i40e_dbg_dump_desc
Reduce stack use by using kmemdup and not using a very
large struct on stack.

In function ‘i40e_dbg_dump_desc’:
warning: the frame size of 8192 bytes is larger than 2048 bytes [-Wframe-larger-than=]

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 05:00:01 -08:00
Catherine Sullivan a36fdd8e3e i40e: Bump i40e version to 1.2.2 and i40evf version to 1.0.6
Bump version.

Change-ID: I4264e81dcfb57ec46a3ede54b0a6cb25b497d3cb
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:53:03 -08:00
Shannon Nelson 5fb11d7610 i40e: get pf_id from HW rather than PCI function
Getting the pf_id from the function number was a good place to start,
but when the PF was setup in passthru mode, the PCI bus/device/function
was virtualized and the number in the VM is different from the number in
the bare metal.  This caused HW configuration issues when the wrong pf_id
was used to set up the HMC and other structures.  The PF_FUNC_RID register
has the real bus/device/function information as configured by the BIOS,
so use that for a better number.  This works in NPAR mode as well.

Change-ID: I65e3dd6c97594890c2bad566b83cc670b1dae534
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Acked-by: Kevin Scott <kevin.c.scott@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:46:45 -08:00
Mitch Williams baf7327735 i40e: increase ARQ size
The ARQ needs to have at least as many entries as VFs, or the VFs will
get errors from the FW when they send messages to the PF. Since we don't
know how many VFs we'll end up with, just set up 128 descriptors.

Change-ID: I04ae3d1c7faf09110eb782214e9c05aeb62a6c59
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:39:30 -08:00
Anjali Singhai Jain b64ba08481 i40e: Re enable Main VSI loopback setting in the reset path
There is an order in which this should happen. It turns out that FW will
not let you change the Loopback setting of the VSI with update VSI prior
to the VEB creation.

Change-ID: I7614ddff8b4c37702930c02f16f8c346aaa64bd1
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:33:04 -08:00
Anjali Singhai Jain 79c21a827e i40e: Add new update VSI flow to accommodate FW fix with VSI Loopback mode
All VSIs on a VEB should either have loopback enabled or disabled, a
mixed mode is not supported for a VEB. Since our driver supports multiple
VSIs per PF that need to talk to each other make sure to enable Loopback
for the PF and FDIR VSI as well.

Also, we now have to explicitly enable Loopback mode otherwise we fail
VSI creation for VMDq and VF VSIs.

Change-ID: Ib68c3ea4aeb730ac9468f930610de456efbe5b20
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:26:33 -08:00
Kevin Scott b9a81b2b73 i40e: Increase reset delay
Increase reset delay to ensure all internal caches are properly flushed
in worst case scenario.

Change-ID: I6f059a9e024fbf9ef1debd32497eed21369957fc
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:20:08 -08:00
Mitch Williams 906a6937d8 i40evf: make early init sequence even more robust
When multiple VFs attempt to initialize simultaneously, the firmware may
delay or drop messages. Make the init code more adept at handling these
situations by a) reinitializing the admin queue if the firmware fails to
process a request, and b) resending a request if the PF doesn't answer.

Once the request has been sent again, the PF might end up getting both
requests and send the configuration information to the driver twice.
This will cause the VF to complain about receiving an unexpected message
from the PF. Since this is not fatal, reduce the warning level of the
log messages that are generated in response to this event.

Change-ID: I9370a1a2fde2ad3934fa25ccfd0545edfbbb4805
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:13:40 -08:00
Shannon Nelson fad177dc8b i40e: fix netdev_stat macro definition
The old xxx_NETDEV_STAT() macro was defined long before the newer
rtnl_link_stats64 came into being, and just never got updated.  Since we're
using rtnl_link_stats64 in other parts of the driver, we should use it
here as well.  We've just been lucky that the field definitions are the
same sizes.

Change-ID: I19fc71619905700235dcdf0d3c8153aec81d36de
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:07:02 -08:00
Anjali Singhai Jain e7f2e4b94c i40e: Define and use i40e_is_vf macro
This patch is useful for future expansion when new VF MAC types get
added. It helps with cleaning up VF driver flow.

Change-ID: Ibe1eeb71262a3a40f24a1c5409436bdc3411da7f
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 04:00:24 -08:00
Anjali Singhai Jain 09f7efabd5 i40e: Add a virtual channel op to config RSS
Add the Virtual Channel OP event opcode for CONFIG_RSS, so that the
Virtual Channel state machine can properly decipher status change events.

Change-ID: I09939c7aa380147f60c49fd01ef2e27d0dc1c299
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 03:53:48 -08:00
Jacob Keller fe88bda9e6 i40e: don't enable PTP support on more than one PF per port
Resolve an issue related to images with multiple PFs per physical
port. We cannot fully support 1588 PTP features, since only one port
should control (ie: write) the registers at a time. Doing so can cause
interference of functionality.

It may be possible to partially implement the API for only those
features without side effects. However, this at minimum means non
controlling PFs lose Tx timestamps, frequency atunement, and possibly
SYSTIME adjustment. There may be further impact I did not discover.
Since the API in the kernel expects these features to work, it is
simpler and less dangerous to just disable PTP features on all PFs not
identified as the controlling PF in PRTTSYN_CTL0.PF_ID.

This change also removes the warning printed when hwtstaml IOCTL is
called on the wrong PF. This is actually meaningless now, since only one
PF per port will support it. In addition, the ethtool get_ts_info IOCTL
was updated so that only the controlling port will even indicate support
(so as not to confuse users).

The overall downside is complete loss of functionality on non
controlling PF, vs the possible gain of partial support. The biggest
factor for choosing this approach is simplicity and ensuring that the
main PF will work. There could easily be other portions of the 1588
logic with side effects I am not aware, and the reduced functionality
that might be made available is significantly less useful. In addition,
the API does not allow for proper indication of why particular features
are not supported. These reasons are enough to decide for the simpler
approach to resolving this issue.

Change-ID: If4696bae686fc18aef6552b67dd417213d987c16
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 03:47:15 -08:00
Carolyn Wyborny b2008cbf8a i40e: Add description to misc and fd interrupts
This patch adds additional text description for base pf0 and flow director
generated interrupts.  Without this patch, these interrupts are difficult
to distinguish per port on a multi-function device.

Change-ID: I4662e1b38840757765a3fe63d90219d28e76bfab
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 03:40:40 -08:00
Shannon Nelson fbe8210100 i40e: allow various base numbers in debugfs aq commands
Use the 'i' rather than the more restrictive 'x' or 'd' in the aq_cmd
arguments.  This makes the user interface much more forgiving and user
friendly.

Change-ID: I5dcd57b9befc047e06b74cf1152a25a3fa9e1309
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 03:33:59 -08:00
Shannon Nelson 038861b21b i40e: remove useless debug noise
This message really doesn't give any useful information and ends up
getting printed every service_task loop in the Linux driver, filling the
logfile with noise when AQ tracing is enabled.  This patch simply removes
the noise.

Change-ID: I30ad51e6b03c7ad12a7d9c102def0087db622df3
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 03:27:16 -08:00
Shannon Nelson 2352b849a4 i40e: Remove unneeded break statement
This case statement is empty and the fall through just breaks out
so remove the break and let it fall through to break out.

Change-ID: I1b5ba9870d5245ca80bfca6e7f5f089e2eb8ccb0
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-12-06 03:20:43 -08:00
David S. Miller 8d0c469753 Merge branch 'ebpf-next'
Alexei Starovoitov says:

====================
allow eBPF programs to be attached to sockets

V1->V2:

fixed comments in sample code to state clearly that packet data is accessed
with LD_ABS instructions and not internal skb fields.
Also replaced constants in:
BPF_LD_ABS(BPF_B, 14 + 9 /* R0 = ip->proto */),
with:
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */),

V1 cover:

Introduce BPF_PROG_TYPE_SOCKET_FILTER type of eBPF programs that can be
attached to sockets with setsockopt().
Allow such programs to access maps via lookup/update/delete helpers.

This feature was previewed by bpf manpage in commit b4fc1a460f30("Merge branch 'bpf-next'")
Now it can actually run.

1st patch adds LD_ABS/LD_IND instruction verification and
2nd patch adds new setsockopt() flag.
Patches 3-6 are examples in assembler and in C.

Though native eBPF programs are way more powerful than classic filters
(attachable through similar setsockopt() call), they don't have skb field
accessors yet. Like skb->pkt_type, skb->dev->ifindex are not accessible.
There are sevaral ways to achieve that. That will be in the next set of patches.
So in this set native eBPF programs can only read data from packet and
access maps.

The most powerful example is sockex2_kern.c from patch 6 where ~200 lines of C
are compiled into ~300 of eBPF instructions.
It shows how quite complex packet parsing can be done.

LLVM used to build examples is at https://github.com/iovisor/llvm
which is fork of llvm trunk that I'm cleaning up for upstreaming.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:48 -08:00
Alexei Starovoitov fbe3310840 samples: bpf: large eBPF program in C
sockex2_kern.c is purposefully large eBPF program in C.
llvm compiles ~200 lines of C code into ~300 eBPF instructions.

It's similar to __skb_flow_dissect() to demonstrate that complex packet parsing
can be done by eBPF.
Then it uses (struct flow_keys)->dst IP address (or hash of ipv6 dst) to keep
stats of number of packets per IP.
User space loads eBPF program, attaches it to loopback interface and prints
dest_ip->#packets stats every second.

Usage:
$sudo samples/bpf/sockex2
ip 127.0.0.1 count 19
ip 127.0.0.1 count 178115
ip 127.0.0.1 count 369437
ip 127.0.0.1 count 559841
ip 127.0.0.1 count 750539

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:34 -08:00
Alexei Starovoitov a80857822b samples: bpf: trivial eBPF program in C
this example does the same task as previous socket example
in assembler, but this one does it in C.

eBPF program in kernel does:
    /* assume that packet is IPv4, load one byte of IP->proto */
    int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
    long *value;

    value = bpf_map_lookup_elem(&my_map, &index);
    if (value)
        __sync_fetch_and_add(value, 1);

Corresponding user space reads map[tcp], map[udp], map[icmp]
and prints protocol stats every second

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:33 -08:00
Alexei Starovoitov 249b812d80 samples: bpf: elf_bpf file loader
simple .o parser and loader using BPF syscall.
.o is a standard ELF generated by LLVM backend

It parses elf file compiled by llvm .c->.o
- parses 'maps' section and creates maps via BPF syscall
- parses 'license' section and passes it to syscall
- parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns
  by storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD
- loads eBPF programs via BPF syscall

One ELF file can contain multiple BPF programs.

int load_bpf_file(char *path);
populates prog_fd[] and map_fd[] with FDs received from bpf syscall

bpf_helpers.h - helper functions available to eBPF programs written in C

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:33 -08:00
Alexei Starovoitov 03f4723ed7 samples: bpf: example of stateful socket filtering
this socket filter example does:
- creates arraymap in kernel with key 4 bytes and value 8 bytes

- loads eBPF program which assumes that packet is IPv4 and loads one byte of
  IP->proto from the packet and uses it as a key in a map

  r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)];
  *(u32*)(fp - 4) = r0;
  value = bpf_map_lookup_elem(map_fd, fp - 4);
  if (value)
       (*(u64*)value) += 1;

- attaches this program to raw socket

- every second user space reads map[IPPROTO_TCP], map[IPPROTO_UDP], map[IPPROTO_ICMP]
  to see how many packets of given protocol were seen on loopback interface

Usage:
$sudo samples/bpf/sock_example
TCP 0 UDP 0 ICMP 0 packets
TCP 187600 UDP 0 ICMP 4 packets
TCP 376504 UDP 0 ICMP 8 packets
TCP 563116 UDP 0 ICMP 12 packets
TCP 753144 UDP 0 ICMP 16 packets

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Alexei Starovoitov 89aa075832 net: sock: allow eBPF programs to be attached to sockets
introduce new setsockopt() command:

setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))

where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER

setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.

The same eBPF program can be attached to multiple sockets.

User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Alexei Starovoitov ddd872bc30 bpf: verifier: add checks for BPF_ABS | BPF_IND instructions
introduce program type BPF_PROG_TYPE_SOCKET_FILTER that is used
for attaching programs to sockets where ctx == skb.

add verifier checks for ABS/IND instructions which can only be seen
in socket filters, therefore the check:
  if (env->prog->aux->prog_type != BPF_PROG_TYPE_SOCKET_FILTER)
    verbose("BPF_LD_ABS|IND instructions are only allowed in socket filters\n");

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Jason Wang f51a5e82ea tun/macvtap: use consume_skb() instead of kfree_skb() when needed
To be more friendly with drop monitor, we should only call kfree_skb() when
the packets were dropped and use consume_skb() in other cases.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:45:09 -08:00
Markus Elfring 6db16718c9 net-PA Semi: Deletion of unnecessary checks before the function call "pci_dev_put"
The pci_dev_put() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call
is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:14:20 -08:00
Markus Elfring 04901cea21 net-ipvlan: Deletion of an unnecessary check before the function call "free_percpu"
The free_percpu() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:14:20 -08:00
Markus Elfring 39af455daf net: cassini: Deletion of an unnecessary check before the function call "vfree"
The vfree() function performs also input parameter validation.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:14:19 -08:00
Andy Shevchenko c4b2b9a849 stmmac: pci: allocate memory resources dynamically
Instead of using global variables we are going to use dynamically allocated
memory. It allows to append a support of more than one ethernet adapter which
might have different settings simultaniously.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:03:48 -08:00
David S. Miller 244ebd9f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:

1) Rise maximum number per IP address to be remembered in xt_recent
   while retaining backward compatibility, from Florian Westphal.

2) Skip zeroing timer area in nf_conn objects, also from Florian.

3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
   using meta l4proto and transport layer header, from Alvaro Neira.

4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
   and IP6_NF_IPTABLES=n.

And ipset updates from Jozsef Kadlecsik:

5) Support updating element extensions when the set is full (fixes
   netfilter bugzilla id 880).

6) Fix set match with 32-bits userspace / 64-bits kernel.

7) Indicate explicitly when /0 networks are supported in ipset.

8) Simplify cidr handling for hash:*net* types.

9) Allocate the proper size of memory when /0 networks are supported.

10) Explicitly add padding elements to hash:net,net and hash:net,port,
    because the elements must be u32 sized for the used hash function.

Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 20:56:46 -08:00