Commit graph

694587 commits

Author SHA1 Message Date
Florian Fainelli 43cee2d246 net: mv643xx_eth: Be drop monitor friendly
txq_reclaim() does the normal transmit queue reclamation and
rxq_deinit() does the RX ring cleanup, none of these are packet drops,
so use dev_consume_skb() for both locations.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 21:27:09 -07:00
Florian Fainelli 1e9d8e7ad3 tg3: Be drop monitor friendly
tg3_tx() does the normal packet TX completion,
tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
new SKB that is suitable to workaround HW bugs, and finally
tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
these 3 locations to be SKB drop monitor friendly.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:52:32 -07:00
David S. Miller 45c7ec9dd3 Merge branch 'ipv6-Route-ICMPv6-errors-with-the-flow-when-ECMP-in-use'
Jakub Sitnicki says:

====================
ipv6: Route ICMPv6 errors with the flow when ECMP in use

This patch set is another take at making Path MTU Discovery work when
server nodes are behind a router employing multipath routing in a
load-balance or anycast setup (that is, when not every end-node can be
reached by every path). The problem has been well described in RFC 7690
[1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
to be routed back to the server node that sent a reply that exceeds path
MTU.

The proposed solution is two-fold:

 (1) on the server side - reflect the Flow Label [2]. This can be done
     without modifying the application using a new per-netns sysctl knob
     that has been proposed independently of this patchset in the patch
     entitled "ipv6: Add sysctl for per namespace flow label
     reflection" [3].

 (2) on the ECMP router - make the ipv6 routing subsystem look into the
     ICMPv6 error packets and compute the flow-hash from its payload,
     i.e. the offending packet that triggered the error. This is the
     same behavior as ipv4 stack has already.

With both parts in place Path MTU Discovery can work past the ECMP
router when using IPv6.

[1] https://tools.ietf.org/html/rfc7690
[2] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
[3] http://patchwork.ozlabs.org/patch/804870/

v1 -> v2:
 - don't use "extern" in external function declaration in header file
 - style change, put as many arguments as possible on the first line of
   a function call, and align consecutive lines to the first argument
 - expand the cover letter based on the feedback

v2 -> v3:
 - switch to computing flow-hash using flow dissector to align with
   recent changes to multipath routing in ipv4 stack
 - add a sysctl knob for enabling flow label reflection per netns

---

Testing has covered multipath routing of ICMPv6 PTB errors in forward
and local output path in a simple use-case of an HTTP server sending a
reply which is over the path MTU size [3]. I have also checked if the
flows get evenly spread over multiple paths (i.e. if there are no
regressions) [4].

[3] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/pmtud
[4] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/load-balance
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:21:17 -07:00
Jakub Sitnicki b673d6ccea ipv6: Use multipath hash from flow info if available
Allow our callers to influence the choice of ECMP link by honoring the
hash passed together with the flow info. This allows for special
treatment of ICMP errors which we would like to route over the same path
as the IPv6 datagram that triggered the error.

Also go through rt6_multipath_hash(), in the usual case when we aren't
dealing with an ICMP error, so that there is one central place where
multipath hash is computed.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:21:17 -07:00
Jakub Sitnicki 956b45318a ipv6: Fold rt6_info_hash_nhsfn() into its only caller
Commit 644d0e6569 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
sense to keep it around. Also remove the accompanying comment that has
become outdated.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:21:17 -07:00
Jakub Sitnicki 23aebdacb0 ipv6: Compute multipath hash for ICMP errors from offending packet
When forwarding or sending out an ICMPv6 error, look at the embedded
packet that triggered the error and compute a flow hash over its
headers.

This let's us route the ICMP error together with the flow it belongs to
when multipath (ECMP) routing is in use, which in turn makes Path MTU
Discovery work in ECMP load-balanced or anycast setups (RFC 7690).

Granted, end-hosts behind the ECMP router (aka servers) need to reflect
the IPv6 Flow Label for PMTUD to work.

The code is organized to be in parallel with ipv4 stack:

  ip_multipath_l3_keys -> ip6_multipath_l3_keys
  fib_multipath_hash   -> rt6_multipath_hash

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:21:17 -07:00
Jakub Sitnicki 2982571712 net: Extend struct flowi6 with multipath hash
Allow for functions that fill out the IPv6 flow info to also pass a hash
computed over the skb contents. The hash value will drive the multipath
routing decisions.

This is intended for special treatment of ICMPv6 errors, where we would
like to make a routing decision based on the flow identifying the
offending IPv6 datagram that triggered the error, rather than the flow
of the ICMP error itself.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:21:17 -07:00
David S. Miller 790c605668 devlink: Fix devlink_dpipe_table_register() stub signature.
One too many arguments compared to the non-stub version.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: ffd3cdccf2 ("devlink: Add support for dynamic table size")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:10:46 -07:00
Jakub Sitnicki 22b6722bfa ipv6: Add sysctl for per namespace flow label reflection
Reflecting IPv6 Flow Label at server nodes is useful in environments
that employ multipath routing to load balance the requests. As "IPv6
Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
messages generated in response to a downstream packets from the server
can be routed by a load balancer back to the original server without
looking at transport headers, if the server applies the flow label
reflection. This enables the Path MTU Discovery past the ECMP router in
load-balance or anycast environments where each server node is reachable
by only one path.

Introduce a sysctl to enable flow label reflection per net namespace for
all newly created sockets. Same could be earlier achieved only per
socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
socket option.

[1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:05:43 -07:00
Bhumika Goyal 39a7e58924 net/mlx5e: make mlx5e_profile const
Make this const as it is only passed as an argument to the function
mlx5e_create_netdev and the corresponding argument is of type const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 12:33:31 -07:00
Bhumika Goyal 3f2c5fb2d8 net/mlx4_core: make mlx4_profile const
Make these const as they are only used in a copy operation.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 12:33:31 -07:00
David S. Miller e7d12ce121 Merge branch 'xdp-more-work-on-xdp-tracepoints'
Jesper Dangaard Brouer says:

====================
xdp: more work on xdp tracepoints

More work on streamlining and performance optimizing the tracepoints
for XDP.

I've created a simple xdp_monitor application that uses this
tracepoint, and prints statistics. Available at github:

https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_kern.c
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_user.c

The improvement over tracepoint with strcpy: 9810372 - 8428762 = +1381610 pps faster
 - (1/9810372 - 1/8428762)*10^9 = -16.7 nanosec
 - 100-(8428762/9810372*100) = strcpy-trace is 14.08% slower
 - 981037/8428762*100 = removing strcpy made it 11.64% faster

V3: Fix merge conflict with commit e4a8e817d3 ("bpf: misc xdp redirect cleanups")
V2: Change trace_xdp_redirect() to align with args of trace_xdp_exception()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 11:59:37 -07:00
Jesper Dangaard Brouer 315ec3990e xdp: get tracepoints xdp_exception and xdp_redirect in sync
Remove the net_device string name from the xdp_exception tracepoint,
like the xdp_redirect tracepoint.

Align the TP_STRUCT to have common entries between these two
tracepoint.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 11:59:37 -07:00
Jesper Dangaard Brouer a873585587 xdp: remove net_device names from xdp_redirect tracepoint
There is too much overhead in the current trace_xdp_redirect
tracepoint as it does strcpy and strlen on the net_device names.

Besides, exposing the ifindex/index is actually the information that
is needed in the tracepoint to diagnose issues.  When a lookup fails
(either ifindex or devmap index) then there is a need for saying which
to_index that have issues.

V2: Adjust args to be aligned with trace_xdp_exception.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 11:59:37 -07:00
Jesper Dangaard Brouer 2886447dc5 ixgbe: use return codes from ndo_xdp_xmit that are distinguishable
For XDP_REDIRECT the use of return code -EINVAL is confusing, as it is
used in three different cases.  (1) When the index or ifindex lookup
fails, and in the ixgbe driver (2) when link is down and (3) when XDP
have not been enabled.

The return code can be picked up by the tracepoint xdp:xdp_redirect
for diagnosing why XDP_REDIRECT isn't working.  Thus, there is a need
different return codes to tell the issues apart.

I'm considering using a specific err-code scheme for XDP_REDIRECT
instead of using these errno codes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 11:59:37 -07:00
Jesper Dangaard Brouer 2facaad600 xdp: make generic xdp redirect use tracepoint trace_xdp_redirect
If the xdp_do_generic_redirect() call fails, it trigger the
trace_xdp_exception tracepoint.  It seems better to use the same
tracepoint trace_xdp_redirect, as the native xdp_do_redirect{,_map} does.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 11:59:36 -07:00
Jesper Dangaard Brouer d08adb82fd xdp: remove bpf_warn_invalid_xdp_redirect
Given there is a tracepoint that can track the error code
of xdp_do_redirect calls, the WARN_ONCE in bpf_warn_invalid_xdp_redirect
doesn't seem relevant any longer.  Simply remove the function.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 11:59:36 -07:00
David S. Miller fb3bbbda5f Merge branch 'mlxsw-ipv4-host-dpipe-table'
Jiri Pirko says:

====================
mlxsw: Add IPv4 host dpipe table

Arkadi says:

This patchset adds IPv4 host dpipe table support. This will provide the
ability to observe the hardware offloaded IPv4 neighbors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:17 -07:00
Arkadi Sharshevsky a481d71323 mlxsw: spectrum_dpipe: Add support for controlling neighbor counters
Add support for controlling neighbor counters via dpipe.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky a86f030915 mlxsw: spectrum_dpipe: Add support for IPv4 host table dump
Add support for IPv4 host table dump.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 7cfcbc7591 mlxsw: spectrum_router: Add support for setting counters on neighbors
Add support for setting counters on neighbors based on dpipe's host table
counter status. This patch also adds the ability for getting the counter
value, which will be used by the dpipe host table implementation in the
next patches.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 6bba7e20da mlxsw: reg: Make flow counter set type enum to be shared
This is done as a preparation before introducing support for neighbor
counters. The flow counter's type enum is used by many registers, yet,
until now it was used only by mgpc and thus it was private. This patch
updates the namespace for more generic usage.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 6aecb36bc0 mlxsw: spectrum_dpipe: Add IPv4 host table initial support
Add IPv4 host table initial support.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 7e57ae9fc5 mlxsw: spectrum_dpipe: Fix label name
Change label name for case of erif table init failure.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky f17cc84d1c mlxsw: spectrum_router: Add helpers for neighbor access
This is done as a preparation before introducing the ability to dump the
host table via dpipe, and to count the table size. The mlxsw's neighbor
representative struct stays private to the router module.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 3580732448 devlink: Move dpipe entry clear function into devlink
The entry clear routine can be shared between the drivers, thus it is
moved inside devlink.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky ffd3cdccf2 devlink: Add support for dynamic table size
Up until now the dpipe table's size was static and known at registration
time. The host table does not have constant size and it is resized in
dynamic manner. In order to support this behavior the size is changed
to be obtained dynamically via an op.

This patch also adjust the current dpipe table for the new API.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 23ca5ec3af mlxsw: spectrum_dpipe: Fix erif table op name space
Fix ERIF's table operations name space.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 3fb886ecea devlink: Add IPv4 header for dpipe
This will be used by the IPv4 host table which will be introduced in the
following patches. This header is global and can be reused by many
drivers.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky 1177009131 devlink: Add Ethernet header for dpipe
This will be used by the IPv4 host table which will be introduced in the
following patches. This header is global and can be reused by many
drivers.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:15 -07:00
Matan Barak 4c03e69ab1 net/mlx5: Add tracepoints
Add a tracepoint infrastructure for mlx5_core driver.
Implemented flow steering tracepoints:
1. Add flow group
2. Remove flow group
3. Add flow table entry
4. Remove flow table entry
5. Add flow table rule
6. Remove flow table rule

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Matan Barak 693c6883bb net/mlx5: Add hash table for flow groups in flow table
When adding a flow table entry (fte) to a flow table (ft), we first
need to find its flow group (fg). Currently, this is done by
traversing a linear list of all flow groups in the flow table.
Furthermore, since multiple flow groups which correspond to the same
fte mask may exist in the same ft, we can't just stop at the first
match. Converting the linear list to rhltable in order to speed things
up.

The last four patches increases the steering rules update rate by a
factor of more than 7 (for insertion of 50K steering rules).

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Matan Barak 0d235c3fab net/mlx5: Add hash table to search FTEs in a flow-group
When adding a flow table entry (fte) to a flow group (fg), we first
need to check whether this fte exist. In such a case we just merge
the destinations (if possible). Currently, this is done by traversing
the fte list available in a fg. This could take a lot of time when
using large flow groups. Speeding this up by using rhashtable, which
is much faster.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Matan Barak 667cb65ae5 net/mlx5: Don't store reserved part in FTEs and FGs
The current code stores fte_match_param in the software representation
of FTEs and FGs. fte_match_param contains a large reserved area at the
bottom of the struct. Since downstream patches are going to hash this
part, we would like to avoid doing so on a reserved part.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Matan Barak 8ebabaa02f net/mlx5: Convert linear search for free index to ida
When allocating a flow table entry, we need to allocate a free index
in the flow group. Currently, this is done by traversing the existing
flow table entries in the flow group, until a free index is found.
Replacing this by using a ida, which allows us to find a free index
much faster.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Gal Pressman 0980030550 net/mlx5e: Fix wrong code indentation in conditional statement
Fix the following checkpatch warning in en_ethtool.c:
WARNING: suspect code indent for conditional statements (8, 9)
+	for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
+	 strcpy(data + (idx++) * ETH_GSTRING_LEN,

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Gal Pressman 07533c6765 net/mlx5: Remove a leftover unused variable
mlx5_core_wq is no longer being used and should be removed
from the code.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Saeed Mahameed 3c0045837d net/mlx5: Add a blank line after declarations V2
The blank line should be after u32 val = ...
and not after __be32 __iomem *addr = ...

Fixes: ad5b39a95c ("net/mlx5: Add a blank line after declarations")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Joe Perches <joe@perches.com>
2017-08-24 16:02:57 +03:00
Daniel Borkmann a5e2da6e97 bpf: netdev is never null in __dev_map_flush
No need to test for it in fast-path, every dev in bpf_dtab_netdev
is guaranteed to be non-NULL, otherwise dev_map_update_elem() will
fail in the first place.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:43:40 -07:00
Shubham Bansal d2aaa3dc41 bpf, doc: Add arm32 as arch supporting eBPF JIT
As eBPF JIT support for arm32 was added recently with
commit 39c13c204b, it seems appropriate to
add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well.

Signed-off-by: Shubham Bansal <illusionist.neo@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:40:12 -07:00
David S. Miller 81152518e9 Merge branch 'bpf-verifier-fixes'
Edward Cree says:

====================
bpf: verifier fixes

Fix a couple of bugs introduced in my recent verifier patches.
Patch #2 does slightly increase the insn count on bpf_lxc.o, but only by
 about a hundred insns (i.e. 0.2%).

v2: added test for write-marks bug (patch #1); reworded comment on
 propagate_liveness() for clarity.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:08 -07:00
Edward Cree 8e9cd9ce90 bpf/verifier: document liveness analysis
The liveness tracking algorithm is quite subtle; add comments to explain it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:08 -07:00
Edward Cree 1b688a19a9 bpf/verifier: remove varlen_map_value_access flag
The optimisation it does is broken when the 'new' register value has a
 variable offset and the 'old' was constant.  I broke it with my pointer
 types unification (see Fixes tag below), before which the 'new' value
 would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
 other changes in that patch mean that its original behaviour (ignore
 min/max values) cannot be restored.
Tests on a sample set of cilium programs show no change in count of
 processed instructions.

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:08 -07:00
Alexei Starovoitov df20cb7ec1 selftests/bpf: add a test for a pruning bug in the verifier
The test makes a read through a map value pointer, then considers pruning
 a branch where the register holds an adjusted map value pointer.  It
 should not prune, but currently it does.

Signed-off-by: Alexei Starovoitov <ast@fb.com>
[ecree@solarflare.com: added test-name and patch description]
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:07 -07:00
Edward Cree 63f45f8406 bpf/verifier: when pruning a branch, ignore its write marks
The fact that writes occurred in reaching the continuation state does
 not screen off its reads from us, because we're not really its parent.
So detect 'not really the parent' in do_propagate_liveness, and ignore
 write marks in that case.

Fixes: dc503a8ad9 ("bpf/verifier: track liveness for pruning")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:07 -07:00
Edward Cree d893dc26e3 selftests/bpf: add a test for a bug in liveness-based pruning
Writes in straight-line code should not prevent reads from propagating
 along jumps.  With current verifier code, the jump from 3 to 5 does not
 add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
 the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.

Verifier output:
0: (61) r2 = *(u32 *)(r1 +0)
1: (35) if r2 >= 0x0 goto pc+1
 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
2: (b7) r0 = 0
3: (35) if r2 >= 0x0 goto pc+1
 R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
4: (b7) r0 = 0
5: (95) exit

from 3 to 5: safe

from 1 to 3: safe
processed 8 insns, stack depth 0

Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:07 -07:00
Colin Ian King 60890e0460 gre: remove duplicated assignment of iph
iph is being assigned the same value twice; remove the redundant
first assignment. (Thanks to Nikolay Aleksandrov for pointing out
that the first asssignment should be removed and not the second)

Fixes warning:
net/ipv4/ip_gre.c:265:2: warning: Value stored to 'iph' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:36:03 -07:00
Arvind Yadav 042a90106b net: tipc: constify genl_ops
genl_ops are not supposed to change at runtime. All functions
working with genl_ops provided by <net/genetlink.h> work with
const genl_ops. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:31:38 -07:00
Colin Ian King 5719e5eb31 net: hinic: make functions set_ctrl0 and set_ctrl1 static
The functions set_ctrl0 and set_ctrl1 are local to the source and do
not need to be in global scope, so make them static.

Cleans up sparse warnings:
symbol 'set_ctrl0' was not declared. Should it be static?
symbol 'set_ctrl1' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:20:28 -07:00
Paolo Abeni 257a73031d net/sock: allow the user to set negative peek offset
This is necessary to allow the user to disable peeking with
offset once it's enabled.
Unix sockets already allow the above, with this patch we
permit it for udp[6] sockets, too.

Fixes: 627d2d6b55 ("udp: enable MSG_PEEK at non-zero offset")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:18:44 -07:00