1
0
Fork 0
Commit Graph

62 Commits (c593642c8be046915ca3a4a300243a68077cd207)

Author SHA1 Message Date
Pankaj Bharadiya c593642c8b treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09 10:36:44 -08:00
Taehee Yoo ab818362c9 net: use rhashtable_lookup() instead of rhashtable_lookup_fast()
rhashtable_lookup_fast() internally calls rcu_read_lock() then,
calls rhashtable_lookup(). So if rcu_read_lock() is already held,
rhashtable_lookup() is enough.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23 12:15:01 -08:00
Andrii Nakryiko 1e0bd5a091 bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails
92117d8443 ("bpf: fix refcnt overflow") turned refcounting of bpf_map into
potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit
(32k). Due to using 32-bit counter, it's possible in practice to overflow
refcounter and make it wrap around to 0, causing erroneous map free, while
there are still references to it, causing use-after-free problems.

But having a failing refcounting operations are problematic in some cases. One
example is mmap() interface. After establishing initial memory-mapping, user
is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily
splitting it into multiple non-contiguous regions. All this happening without
any control from the users of mmap subsystem. Rather mmap subsystem sends
notifications to original creator of memory mapping through open/close
callbacks, which are optionally specified during initial memory mapping
creation. These callbacks are used to maintain accurate refcount for bpf_map
(see next patch in this series). The problem is that open() callback is not
supposed to fail, because memory-mapped resource is set up and properly
referenced. This is posing a problem for using memory-mapping with BPF maps.

One solution to this is to maintain separate refcount for just memory-mappings
and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively.
There are similar use cases in current work on tcp-bpf, necessitating extra
counter as well. This seems like a rather unfortunate and ugly solution that
doesn't scale well to various new use cases.

Another approach to solve this is to use non-failing refcount_t type, which
uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX,
stays there. This utlimately causes memory leak, but prevents use after free.

But given refcounting is not the most performance-critical operation with BPF
maps (it's not used from running BPF program code), we can also just switch to
64-bit counter that can't overflow in practice, potentially disadvantaging
32-bit platforms a tiny bit. This simplifies semantics and allows above
described scenarios to not worry about failing refcount increment operation.

In terms of struct bpf_map size, we are still good and use the same amount of
space:

BEFORE (3 cache lines, 8 bytes of padding at the end):
struct bpf_map {
	const struct bpf_map_ops  * ops __attribute__((__aligned__(64))); /*     0     8 */
	struct bpf_map *           inner_map_meta;       /*     8     8 */
	void *                     security;             /*    16     8 */
	enum bpf_map_type  map_type;                     /*    24     4 */
	u32                        key_size;             /*    28     4 */
	u32                        value_size;           /*    32     4 */
	u32                        max_entries;          /*    36     4 */
	u32                        map_flags;            /*    40     4 */
	int                        spin_lock_off;        /*    44     4 */
	u32                        id;                   /*    48     4 */
	int                        numa_node;            /*    52     4 */
	u32                        btf_key_type_id;      /*    56     4 */
	u32                        btf_value_type_id;    /*    60     4 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct btf *               btf;                  /*    64     8 */
	struct bpf_map_memory memory;                    /*    72    16 */
	bool                       unpriv_array;         /*    88     1 */
	bool                       frozen;               /*    89     1 */

	/* XXX 38 bytes hole, try to pack */

	/* --- cacheline 2 boundary (128 bytes) --- */
	atomic_t                   refcnt __attribute__((__aligned__(64))); /*   128     4 */
	atomic_t                   usercnt;              /*   132     4 */
	struct work_struct work;                         /*   136    32 */
	char                       name[16];             /*   168    16 */

	/* size: 192, cachelines: 3, members: 21 */
	/* sum members: 146, holes: 1, sum holes: 38 */
	/* padding: 8 */
	/* forced alignments: 2, forced holes: 1, sum forced holes: 38 */
} __attribute__((__aligned__(64)));

AFTER (same 3 cache lines, no extra padding now):
struct bpf_map {
	const struct bpf_map_ops  * ops __attribute__((__aligned__(64))); /*     0     8 */
	struct bpf_map *           inner_map_meta;       /*     8     8 */
	void *                     security;             /*    16     8 */
	enum bpf_map_type  map_type;                     /*    24     4 */
	u32                        key_size;             /*    28     4 */
	u32                        value_size;           /*    32     4 */
	u32                        max_entries;          /*    36     4 */
	u32                        map_flags;            /*    40     4 */
	int                        spin_lock_off;        /*    44     4 */
	u32                        id;                   /*    48     4 */
	int                        numa_node;            /*    52     4 */
	u32                        btf_key_type_id;      /*    56     4 */
	u32                        btf_value_type_id;    /*    60     4 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct btf *               btf;                  /*    64     8 */
	struct bpf_map_memory memory;                    /*    72    16 */
	bool                       unpriv_array;         /*    88     1 */
	bool                       frozen;               /*    89     1 */

	/* XXX 38 bytes hole, try to pack */

	/* --- cacheline 2 boundary (128 bytes) --- */
	atomic64_t                 refcnt __attribute__((__aligned__(64))); /*   128     8 */
	atomic64_t                 usercnt;              /*   136     8 */
	struct work_struct work;                         /*   144    32 */
	char                       name[16];             /*   176    16 */

	/* size: 192, cachelines: 3, members: 21 */
	/* sum members: 154, holes: 1, sum holes: 38 */
	/* forced alignments: 2, forced holes: 1, sum forced holes: 38 */
} __attribute__((__aligned__(64)));

This patch, while modifying all users of bpf_map_inc, also cleans up its
interface to match bpf_map_put with separate operations for bpf_map_inc and
bpf_map_inc_with_uref (to match bpf_map_put and bpf_map_put_with_uref,
respectively). Also, given there are no users of bpf_map_inc_not_zero
specifying uref=true, remove uref flag and default to uref=false internally.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-2-andriin@fb.com
2019-11-18 11:41:59 +01:00
Jakub Kicinski f24e29099f nfp: bpf: add simple map op cache
Each get_next and lookup call requires a round trip to the device.
However, the device is capable of giving us a few entries back,
instead of just one.

In this patch we ask for a small yet reasonable number of entries
(4) on every get_next call, and on subsequent get_next/lookup calls
check this little cache for a hit. The cache is only kept for 250us,
and is invalidated on every operation which may modify the map
(e.g. delete or update call). Note that operations may be performed
simultaneously, so we have to keep track of operations in flight.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-31 00:49:05 +02:00
Jakub Kicinski bcf0cafab4 nfp: split out common control message handling code
BPF's control message handler seems like a good base to built
on for request-reply control messages.  Split it out to allow
for reuse.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:29:15 -07:00
Jakub Kicinski dd27c2e3d0 bpf: offload: add priv field for drivers
Currently bpf_offload_dev does not have any priv pointer, forcing
the drivers to work backwards from the netdev in program metadata.
This is not great given programs are conceptually associated with
the offload device, and it means one or two unnecessary deferences.
Add a priv pointer to bpf_offload_dev.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-12 17:07:09 +01:00
Jakub Kicinski 9a06927e77 nfp: bpf: support removing dead code
Add a verifier callback to the nfp JIT to remove the instructions
the verifier deemed to be dead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
Jakub Kicinski a32014b351 nfp: bpf: support optimizing dead branches
Verifier will now optimize out branches to dead code, implement
the replace_insn callback to take advantage of that optimization.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
Jakub Kicinski e2fc61146a nfp: bpf: save original program length
Instead of passing env->prog->len around, and trying to adjust
for optimized out instructions just save the initial number
of instructions in struct nfp_prog.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
Quentin Monnet 16a8cb5cff bpf: do not pass netdev to translate() and prepare() offload callbacks
The kernel functions to prepare verifier and translate for offloaded
program retrieve "offload" from "prog", and "netdev" from "offload".
Then both "prog" and "netdev" are passed to the callbacks.

Simplify this by letting the drivers retrieve the net device themselves
from the offload object attached to prog - if they need it at all. There
is currently no need to pass the netdev as an argument to those
functions.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet a40a26322a bpf: pass prog instead of env to bpf_prog_offload_verifier_prep()
Function bpf_prog_offload_verifier_prep(), called from the kernel BPF
verifier to run a driver-specific callback for preparing for the
verification step for offloaded programs, takes a pointer to a struct
bpf_verifier_env object. However, no driver callback needs the whole
structure at this time: the two drivers supporting this, nfp and
netdevsim, only need a pointer to the struct bpf_prog instance held by
env.

Update the callback accordingly, on kernel side and in these two
drivers.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet eb9119471e bpf: pass destroy() as a callback and remove its ndo_bpf subcommand
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.

Remove function __bpf_offload_ndo(), which is no longer used.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet b07ade27e9 bpf: pass translate() as a callback and remove its ndo_bpf subcommand
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet 00db12c3d1 bpf: call verifier_prep from its callback in struct bpf_offload_dev
In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.

Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet 1385d755cf bpf: pass a struct with offload callbacks to bpf_offload_dev_create()
For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since
none of those operations rely on RTNL, we can turn these three commands
into hooks inside the struct bpf_prog_offload_ops, and pass them as part
of bpf_offload_dev_create().

This commit effectively passes a pointer to the struct to
bpf_offload_dev_create(). We temporarily have two struct
bpf_prog_offload_ops instances, one under offdev->ops and one under
offload->dev_ops. The next patches will make the transition towards the
former, so that offload->dev_ops can be removed, and callbacks relying
on ndo_bpf() added to offdev->ops as well.

While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and
similarly for netdevsim).

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:53 -08:00
Quentin Monnet 1da6f57338 nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c
We are about to add several new callbacks to the struct, all of them
defined in offload.c. Move the struct bpf_prog_offload_ops object in
that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no
longer be static.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:53 -08:00
Jiong Wang cf599f5031 nfp: bpf: relax prog rejection through max_pkt_offset
NFP is refusing to offload programs whenever the MTU is set to a value
larger than the max packet bytes that fits in NFP Cluster Target Memory
(CTM). However, a eBPF program doesn't always need to access the whole
packet data.

Verifier has always calculated maximum direct packet access (DPA) offset,
and kept it in max_pkt_offset inside prog auxiliar information. This patch
relax prog rejection based on max_pkt_offset.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 09:16:32 +01:00
Jakub Kicinski 44b6fed0c1 nfp: bpf: double check vNIC capabilities after object sharing
Program translation stage checks that program can be offloaded to
the netdev which was passed during the load (bpf_attr->prog_ifindex).
After program sharing was introduced, however, the netdev on which
program is loaded can theoretically be different, and therefore
we should recheck the program size and max stack size at load time.

This was found by code inspection, AFAIK today all vNICs have
identical caps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16 15:23:58 -07:00
Jakub Kicinski 527db74b71 nfp: bpf: protect against mis-initializing atomic counters
Atomic operations on the NFP are currently always in big endian.
The driver keeps track of regions of memory storing atomic values
and byte swaps them accordingly.  There are corner cases where
the map values may be initialized before the driver knows they
are used as atomic counters.  This can happen either when the
datapath is performing the update and the stack contents are
unknown or when map is updated before the program which will
use it for atomic values is loaded.

To avoid situation where user initializes the value to 0 1 2 3
and then after loading a program which uses the word as an atomic
counter starts reading 3 2 1 0 - only allow atomic counters to be
initialized to endian-neutral values.

For updates from the datapath the stack information may not be
as precise, so just allow initializing such values to 0.

Example code which would break:
struct bpf_map_def SEC("maps") rxcnt = {
       .type = BPF_MAP_TYPE_HASH,
       .key_size = sizeof(__u32),
       .value_size = sizeof(__u64),
       .max_entries = 1,
};

int xdp_prog1()
{
      	__u64 nonzeroval = 3;
	__u32 key = 0;
	__u64 *value;

	value = bpf_map_lookup_elem(&rxcnt, &key);
	if (!value)
		bpf_map_update_elem(&rxcnt, &key, &nonzeroval, BPF_ANY);
	else
		__sync_fetch_and_add(value, 1);

	return XDP_PASS;
}

$ offload bpftool map dump
key: 00 00 00 00 value: 00 00 00 03 00 00 00 00

should be:

$ offload bpftool map dump
key: 00 00 00 00 value: 03 00 00 00 00 00 00 00

Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16 15:23:58 -07:00
Jakub Kicinski 96de25060d nfp: replace long license headers with SPDX
Replace the repeated license text with SDPX identifiers.
While at it bump the Copyright dates for files we touched
this year.

Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11 12:16:21 -07:00
Quentin Monnet fb19816541 nfp: bpf: account for additional stack usage when checking stack limit
Offloaded programs using BPF-to-BPF calls use the stack to store the
return address when calling into a subprogram. Callees also need some
space to save eBPF registers R6 to R9. And contrarily to kernel
verifier, we align stack frames on 64 bytes (and not 32). Account for
all this when checking the stack size limit before JIT-ing the program.
This means we have to recompute maximum stack usage for the program, we
cannot get the value from the kernel.

In addition to adapting the checks on stack usage, move them to the
finalize() callback, now that we have it and because such checks are
part of the verification step rather than translation.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet 389f263b60 nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver
This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.

The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.

For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).

Return to the caller is performed with the RTN instruction newly added
to the JIT.

For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.

Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet c5da54d93e nfp: bpf: copy eBPF subprograms information from kernel verifier
In order to support BPF-to-BPF calls in offloaded programs, the nfp
driver must collect information about the distinct subprograms: namely,
the number of subprograms composing the complete program and the stack
depth of those subprograms. The latter in particular is non-trivial to
collect, so we copy those elements from the kernel verifier via the
newly added post-verification hook. The struct nfp_prog is extended to
store this information. Stack depths are stored in an array of dedicated
structs.

Subprogram start indexes are not collected. Instead, meta instructions
associated to the start of a subprogram will be marked with a flag in a
later patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:12 +02:00
Quentin Monnet 1a7e62e632 nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth
In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:12 +02:00
Jakub Kicinski 17082566a9 nfp: bpf: improve map offload info messages
FW can put constraints on map element size to maximize resource
use and efficiency.  When user attempts offload of a map which
does not fit into those constraints an informational message is
printed to kernel logs to inform user about the reason offload
failed.  Map offload does not have access to any advanced error
reporting like verifier log or extack.  There is also currently
no way for us to nicely expose the FW capabilities to user
space.  Given all those constraints we should make sure log
messages are as informative as possible.  Improve them.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski ab01f4ac5f nfp: bpf: remember maps by ID
Record perf maps by map ID, not raw kernel pointer.  This helps
with debug messages, because printing pointers to logs is frowned
upon, and makes debug easier for the users, as map ID is something
they should be more familiar with.  Note that perf maps are offload
neutral, therefore IDs won't be orphaned.

While at it use a rate limited print helper for the error message.

Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski 20c5420421 nfp: bpf: pass raw data buffer to nfp_bpf_event_output()
In preparation for SKB-less perf event handling make
nfp_bpf_event_output() take buffer address and length,
not SKB as parameters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski b5faa20d6f nfp: bpf: allow program sharing within ASIC
Allow program sharing between netdevs of the same NFP ASIC.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-18 15:10:34 +02:00
Jiong Wang 33b9431058 nfp: bpf: copy range info for all operands of all ALU operations
NFP verifier hook is coping range information of the shift amount for
indirect shift operation so optimized shift sequences could be generated.

We want to use range info to do more things. For example, to decide whether
multiplication and divide are supported on the given range.

This patch simply let NFP verifier hook to copy range info for all operands
of all ALU operands.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-07 01:45:31 +02:00
Jiong Wang 662c54721d nfp: bpf: rename umin/umax to umin_src/umax_src
The two fields are a copy of umin and umax info of bpf_insn->src_reg
generated by verifier.

Rename to make their meaning clear.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-07 01:45:31 +02:00
Jiong Wang 991f5b3651 nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X)
For indirect shifts, shift amount is not specified as constant, NFP needs
to get the shift amount through the low 5 bits of source A operand in
PREV_ALU, therefore extra instructions are needed compared with shifts by
constants.

Because NFP is 32-bit, so we are using register pair for 64-bit shifts and
therefore would need different instruction sequences depending on whether
shift amount is less than 32 or not.

NFP branch-on-bit-test instruction emitter is added by this patch and is
used for efficient runtime check on shift amount. We'd think the shift
amount is less than 32 if bit 5 is clear and greater or equal than 32
otherwise. Shift amount is greater than or equal to 64 will result in
undefined behavior.

This patch also use range info to avoid generating unnecessary runtime code
if we are certain shift amount is less than 32 or not.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18 21:35:54 +02:00
Jakub Kicinski 9816dd35ec nfp: bpf: perf event output helpers support
Add support for the perf_event_output family of helpers.

The implementation on the NFP will not match the host code exactly.
The state of the host map and rings is unknown to the device, hence
device can't return errors when rings are not installed.  The device
simply packs the data into a firmware notification message and sends
it over to the host, returning success to the program.

There is no notion of a host CPU on the device when packets are being
processed.  Device will only offload programs which set BPF_F_CURRENT_CPU.
Still, if map index doesn't match CPU no error will be returned (see
above).

Dropped/lost firmware notification messages will not cause "lost
events" event on the perf ring, they are only visible via device
error counters.

Firmware notification messages may also get reordered in respect
to the packets which caused their generation.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04 23:41:03 +02:00
Jakub Kicinski 630a4d3874 nfp: bpf: record offload neutral maps in the driver
For asynchronous events originating from the device, like perf event
output, we need to be able to make sure that objects being referred
to by the FW message are valid on the host.  FW events can get queued
and reordered.  Even if we had a FW message "barrier" we should still
protect ourselves from bogus FW output.

Add a reverse-mapping hash table and record in it all raw map pointers
FW may refer to.  Only record neutral maps, i.e. perf event arrays.
These are currently the only objects FW can refer to.  Use RCU protection
on the read side, update side is under RTNL.

Since program vs map destruction order is slightly painful for offload
simply take an extra reference on all the recorded maps to make sure
they don't disappear.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04 23:41:03 +02:00
Jakub Kicinski dcb0c27f3c nfp: bpf: add basic support for atomic adds
Implement atomic add operation for 32 and 64 bit values.  Depend
on the verifier to ensure alignment.  Values have to be kept in
big endian and swapped upon read/write.  For now only support
atomic add of a constant.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-03-28 19:36:13 -07:00
Quentin Monnet 52be9a7cde nfp: bpf: use extack support to improve debugging
Use the recently added extack support for eBPF offload in the driver.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22 16:28:32 -05:00
Jakub Kicinski 7a0ef69395 bpf: offload: allow array map offload
The special handling of different map types is left to the driver.
Allow offload of array maps by simply adding it to accepted types.
For nfp we have to make sure array elements are not deleted.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18 22:54:25 +01:00
Jiong Wang eb1d7db927 nfp: bpf: set new jit info fields
This patch set those new jit info fields introduced in this patch set.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18 01:26:15 +01:00
Jakub Kicinski 1bba4c413a nfp: bpf: implement bpf map offload
Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:31 +01:00
Jakub Kicinski 0a9c1991f2 bpf: rename bpf_dev_offload -> bpf_prog_offload
With map offload coming, we need to call program offload structure
something less ambiguous.  Pure rename, no functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:29 +01:00
Jakub Kicinski af93d15ac6 nfp: hand over to BPF offload app at coarser granularity
Instead of having an app callback per message type hand off
all offload-related handling to apps with one "rest of ndo_bpf"
callback.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-10 13:49:36 +01:00
Jakub Kicinski 44a12ecc1c nfp: bpf: don't depend on high order allocations for program image
The translator pre-allocates a buffer of maximal program size.
Due to HW/FW limitations the program buffer can't currently be
longer than 128Kb, so we used to kmalloc() it, and then map for
DMA directly.

Now that the late branch resolution is copying the program image
anyway, we can just kvmalloc() the buffer.  While at it, after
translation reallocate the buffer to save space.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-10 13:49:35 +01:00
Jakub Kicinski 2314fe9ed0 nfp: bpf: relocate jump targets just before the load
Don't translate the program assuming it will be loaded at a given
address.  This will be required for sharing programs between ports
of the same NIC, tail calls and subprograms.  It will also make the
jump targets easier to understand when dumping the program to user
space.

Translate the program as if it was going to be loaded at address
zero.  When load happens add the load offset in and set addresses
of special branches.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-10 13:49:35 +01:00
Jakub Kicinski 1549921da3 nfp: bpf: move jump resolution to jit.c
Jump target resolution should be in jit.c not offload.c.
No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-10 13:49:35 +01:00
Jakub Kicinski c4f7730be5 nfp: bpf: round up the size of the stack
Kernel enforces the alignment of the bottom of the stack, NFP
deals with positive offsets better so we should align the top
of the stack.  Round the stack size to NFP word size (4B).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-10 13:49:35 +01:00
Jakub Kicinski 77a844ee65 nfp: bpf: prepare for parsing BPF FW capabilities
BPF FW creates a run time symbol called bpf_capabilities which
contains TLV-formatted capability information.  Allocate app
private structure to store parsed capabilities and add a skeleton
of parsing logic.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-15 14:18:18 +01:00
Jiong Wang a09d5c52c4 nfp: bpf: flag jump destination to guide insn combine optimizations
NFP eBPF offload JIT engine is doing some instruction combine based
optimizations which however must not be safe if the combined sequences
are across basic block boarders.

Currently, there are post checks during fixing jump destinations. If the
jump destination is found to be eBPF insn that has been combined into
another one, then JIT engine will raise error and abort.

This is not optimal. The JIT engine ought to disable the optimization on
such cross-bb-border sequences instead of abort.

As there is no control flow information in eBPF infrastructure that we
can't do basic block based optimizations, this patch extends the existing
jump destination record pass to also flag the jump destination, then in
instruction combine passes we could skip the optimizations if insns in the
sequence are jump targets.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 20:59:19 +01:00
Jiong Wang 5b674140ad nfp: bpf: record jump destination to simplify jump fixup
eBPF insns are internally organized as dual-list inside NFP offload JIT.
Random access to an insn needs to be done by either forward or backward
traversal along the list.

One place we need to do such traversal is at nfp_fixup_branches where one
traversal is needed for each jump insn to find the destination. Such
traversals could be avoided if jump destinations are collected through a
single travesal in a pre-scan pass, and such information could also be
useful in other places where jump destination info are needed.

This patch adds such jump destination collection in nfp_prog_prepare.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 20:59:19 +01:00
Jakub Kicinski 288b3de55a bpf: offload: move offload device validation out to the drivers
With TC shared block changes we can't depend on correct netdev
pointer being available in cls_bpf.  Move the device validation
to the driver.  Core will only make sure that offloaded programs
are always attached in the driver (or in HW by the driver).  We
trust that drivers which implement offload callbacks will perform
necessary checks.

Moving the checks to the driver is generally a useful thing,
in practice the check should be against a switchdev instance,
not a netdev, given that most ASICs will probably allow using
the same program on many ports.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21 00:37:35 +01:00
Jakub Kicinski c6c580d7bc nfp: bpf: move to new BPF program offload infrastructure
Following steps are taken in the driver to offload an XDP program:

XDP_SETUP_PROG:
 * prepare:
   - allocate program state;
   - run verifier (bpf_analyzer());
   - run translation;
 * load:
   - stop old program if needed;
   - load program;
   - enable BPF if not enabled;
 * clean up:
   - free program image.

With new infrastructure the flow will look like this:

BPF_OFFLOAD_VERIFIER_PREP:
  - allocate program state;
BPF_OFFLOAD_TRANSLATE:
   - run translation;
XDP_SETUP_PROG:
   - stop old program if needed;
   - load program;
   - enable BPF if not enabled;
BPF_OFFLOAD_DESTROY:
   - free program image.

Take advantage of the new infrastructure.  Allocation of driver
metadata has to be moved from jit.c to offload.c since it's now
done at a different stage.  Since there is no separate driver
private data for verification step, move temporary nfp_meta
pointer into nfp_prog.  We will now use user space context
offsets.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05 22:26:19 +09:00
Jakub Kicinski 9314c442d7 nfp: bpf: move translation prepare to offload.c
struct nfp_prog is currently only used internally by the translator.
This means there is a lot of parameter passing going on, between
the translator and different stages of offload.  Simplify things
by allocating nfp_prog in offload.c already.

We will now use kmalloc() to allocate the program area and only
DMA map it for the time of loading (instead of allocating DMA
coherent memory upfront).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05 22:26:19 +09:00