1
0
Fork 0
Commit Graph

202 Commits (redonkable)

Author SHA1 Message Date
Jesper Dangaard Brouer 5010e94842 samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong
The struct bpf_map_def was extended in commit fb30d4b712 ("bpf: Add tests
for map-in-map") with member unsigned int inner_map_idx.  This changed the size
of the maps section in the generated ELF _kern.o files.

Unfortunately the loader in bpf_load.c does not detect or handle this.  Thus,
older _kern.o files became incompatible, and caused hard-to-debug errors
where the syscall validation rejected BPF_MAP_CREATE request.

This patch only detect the situation and aborts load_bpf_file(). It also
add code comments warning people that read this loader for inspiration
for these pitfalls.

Fixes: fb30d4b712 ("bpf: Add tests for map-in-map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30 22:41:59 -04:00
David Ahern 3993f2cb98 samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel
Add option to xdp1 and xdp_tx_iptunnel to insert xdp program in
SKB_MODE:
 - update set_link_xdp_fd to take a flags argument that is added to the
   RTM_SETLINK message

 - Add -S option to xdp1 and xdp_tx_iptunnel user code. When passed in
   XDP_FLAGS_SKB_MODE is set in the flags arg passed to set_link_xdp_fd

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27 12:49:26 -04:00
Alexander Alemayhu dfc5be0dc0 samples/bpf: check before defining offsetof
Fixes the following warning

samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined
 #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)

In file included from ./tools/lib/bpf/bpf.h:25:0,
                 from samples/bpf/libbpf.h:5,
                 from samples/bpf/test_lru_dist.c:24:
/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition
 #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 16:20:19 -04:00
Alexander Alemayhu 4784726f69 samples/bpf: add static to function with no prototype
Fixes the following warning

samples/bpf/cookie_uid_helper_example.c: At top level:
samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes]
 void finish(int ret)
      ^~~~~~
  HOSTLD  samples/bpf/per_socket_stats_example

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 16:20:19 -04:00
Alexander Alemayhu 69b6a7f743 samples/bpf: add -Wno-unknown-warning-option to clang
I was initially going to remove '-Wno-address-of-packed-member' because I
thought it was not supposed to be there but Daniel suggested using
'-Wno-unknown-warning-option'.

This silences several warnings similiar to the one below

warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option]
1 warning generated.
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include
 -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  \
        -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
        -Wno-compare-distinct-pointer-types \
        -Wno-gnu-variable-sized-type-not-at-end \
        -Wno-address-of-packed-member -Wno-tautological-compare \
        -O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o

$ clang --version

 clang version 3.9.1 (tags/RELEASE_391/final)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 InstalledDir: /usr/bin

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 16:20:19 -04:00
David S. Miller b0c47807d3 bpf: Add sparc support to tools and samples.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-04-22 13:01:52 -07:00
Martin KaFai Lau 3a5795b83d bpf: lru: Add map-in-map LRU example
This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.

It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.

BPF_F_NO_COMMON_LRU:
> map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
9234314 (9.23M/s)

map-in-map LRU:
> map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
9962743 (9.96M/s)

Notes that the max_entries for the map-in-map LRU test is 1260000 which
is the max_entries for each inner LRU map.  8 processes have been
started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Martin KaFai Lau 9fd63d05f3 bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def
The current bpf_map_def is statically defined during compile
time.  This patch allows the *_user.c program to change it during
runtime.  It is done by adding load_bpf_file_fixup_map() which
takes a callback.  The callback will be called before creating
each map so that it has a chance to modify the bpf_map_def.

The current usecase is to change max_entries in map_perf_test.
It is interesting to test with a much bigger map size in
some cases (e.g. the following patch on bpf_lru_map.c).
However,  it is hard to find one size to fit all testing
environment.  Hence, it is handy to take the max_entries
as a cmdline arg and then configure the bpf_map_def during
runtime.

This patch adds two cmdline args.  One is to configure
the map's max_entries.  Another is to configure the max_cnt
which controls how many times a syscall is called.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Martin KaFai Lau bf8db5d243 bpf: lru: Refactor LRU map tests in map_perf_test
One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.

One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Chenbo Feng 00f660eaf3 Sample program using SO_COOKIE
Added a per socket traffic monitoring option to illustrate the usage
of new getsockopt SO_COOKIE. The program is based on the socket traffic
monitoring program using xt_eBPF and in the new option the data entry
can be directly accessed using socket cookie. The cookie retrieved
allow us to lookup an element in the eBPF for a specific socket.

Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08 08:07:01 -07:00
Chenbo Feng 51570a5ab2 A Sample of using socket cookie and uid for traffic monitoring
Add a sample program to demostrate the possible usage of
get_socket_cookie and get_socket_uid helper function. The program will
store bytes and packets counting of in/out traffic monitored by iptables
and store the stats in a bpf map in per socket base. The owner uid of
the socket will be stored as part of the data entry. A shell script for
running the program is also included.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 17:01:57 -07:00
Martin KaFai Lau fb30d4b712 bpf: Add tests for map-in-map
Test cases for array of maps and hash of maps.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 15:45:45 -07:00
Alexei Starovoitov 95ff141e52 samples/bpf: add map_lookup microbenchmark
$ map_perf_test 128
speed of HASH bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	46M		58M
after	42M		74M

perf report
before:
    54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
     5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
     1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
     1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial
functions, yet they take sizeable amount of cpu time.
htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts
htab_map_lookup_elem() into three BPF insns which causing cpu time
for bpf_prog_da4fc6a3f41761a2() slightly increase.

$ map_perf_test 256
speed of ARRAY bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	97M		174M
after	64M		280M

before:
    37.33%  map_perf_test  [kernel.kallsyms]  [k] array_map_lookup_elem
    13.95%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     6.54%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     4.57%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    32.86%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     6.54%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

array_map_gen_lookup() removes calls to array_map_lookup_elem()
and bpf_map_lookup_elem() and replaces them with 7 bpf insns.

The performance without JIT is slower, since executing extra insns
in the interpreter is slower than running native C code,
but with JIT the performance gains are obvious,
since native C->x86 code is replaced with fewer bpf->x86 instructions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:44:12 -07:00
Linus Torvalds 3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Linus Torvalds 7f4eb0a6d5 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "On the kernel side the main changes in this cycle were:

   - Add Intel Kaby Lake CPU support (Srinivas Pandruvada)

   - AMD uncore driver updates for fam17 (Janakarajan Natarajan)

   - Intel/PT updates and core events optimizations and cleanups
     (Alexander Shishkin)

   - cgroups events fixes (David Carrillo-Cisneros)

   - kprobes improvements (Masami Hiramatsu)

   - ... plus misc fixes and updates.

  On the tooling side the main changes were:

   - Support clang build in tools/{perf,lib/{bpf,traceevent,api}} with
     CC=clang, to, for instance, take advantage of better warnings
     (Arnaldo Carvalho de Melo):

   - Introduce the 'delta-abs' 'perf diff' compute method, that orders
     the histogram entries by the absolute value of the percentage delta
     for a function in two perf.data files, i.e. the functions that
     changed the most (increase or decrease in samples) comes first
     (Namhyung Kim)

   - Add support for parsing Intel uncore vendor event files and add
     uncore vendor events for the Intel server processors (Haswell,
     Broadwell, IvyBridge), Xeon Phi (Knights Landing) and Broadwell DE
     (Andi Kleen)

   - Introduce 'perf ftrace' a perf front end to the kernel's ftrace
     function and function_graph tracer, defaulting to the
     "function_graph" tracer, more work will be done in reviving this
     effort, forward porting it from its initial patch submission
     (Namhyung Kim)

   - Add 'e' and 'c' hotkeys to expand/collapse call chains for a single
     hist entry in the 'perf report' and 'perf top' TUI (Jiri Olsa)

   - Account thread wait time (off CPU time) separately: sleep, iowait
     and preempt, based on the prev_state of the last event, show the
     breakdown when using "perf sched timehist --state" (Namhyumg Kim)

   - Add more triggers to switch the output file (perf.data.TIMESTAMP).

     Now, in addition to switching to a different output file when
     receiving a SIGUSR2, one can also specify file size and time based
     triggers:

           perf record -a --switch-output=signal

     is equivalent to what we had before:

           perf record -a --switch-output

     While we can also ask for the file to be "sliced" by size, taking
     into account that that will happen only when we get woken up by the
     kernel, i.e. one has to take into account the --mmap-pages (the
     size of the perf mmap ring buffer):

           perf record -a --switch-output=2G

     will break the perf.data output into multiple files limited to 2GB
     of samples, right when generating the output.

     For time based samples, alert() will be used, so to have 1 minute
     limited perf.data output files:

          perf record -a --switch-output=1m

     (Jiri Olsa)

   - Improve 'perf trace' (Arnaldo Carvalho de Melo)

   - 'perf kallsyms' toy tool to look for extended symbol information on
     the running kernel and demonstrate the machine/thread/symbol APIs
     for use in other tools, such as 'perf probe' (Arnaldo Carvalho de
     Melo)

   - ... plus tons of other changes, see the shortlog and Git log for
     details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (131 commits)
  perf tools: Add missing parse_events_error() prototype
  perf pmu: Fix check for unset alias->unit array
  perf tools: Be consistent on the type of map->symbols[] interator
  perf intel pt decoder: clang has no -Wno-override-init
  perf evsel: Do not put a variable sized type not at the end of a struct
  perf probe: Avoid accessing uninitialized 'map' variable
  perf tools: Do not put a variable sized type not at the end of a struct
  perf record: Do not put a variable sized type not at the end of a struct
  perf tests: Synthesize struct instead of using field after variable sized type
  perf bench numa: Make sure dprintf() is not defined
  Revert "perf bench futex: Sanitize numeric parameters"
  tools lib subcmd: Make it an error to pass a signed value to OPTION_UINTEGER
  tools: Set the maximum optimization level according to the compiler being used
  tools: Suppress request for warning options not existent in clang
  samples/bpf: Reset global variables
  samples/bpf: Ignore already processed ELF sections
  samples/bpf: Add missing header
  perf symbols: dso->name is an array, no need to check it against NULL
  perf tests record: No need to test an array against NULL
  perf symbols: No need to check if sym->name is NULL
  ...
2017-02-20 12:21:13 -08:00
David S. Miller 3f64116a83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-16 19:34:01 -05:00
Mickaël Salaün a734fb5d60 samples/bpf: Reset global variables
Before loading a new ELF, clean previous kernel version, license and
processed sections.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-3-mic@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-13 17:22:53 -03:00
Mickaël Salaün 16ad132900 samples/bpf: Ignore already processed ELF sections
Add a missing check for the map fixup loop.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-2-mic@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-13 17:22:47 -03:00
Mickaël Salaün af392a8f53 samples/bpf: Add missing header
Include unistd.h to define __NR_getuid and __NR_getsid.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-4-mic@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-13 17:22:35 -03:00
Alexei Starovoitov 7f67763337 bpf: introduce BPF_F_ALLOW_OVERRIDE flag
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494 ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12 21:52:19 -05:00
David S. Miller 4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
David Herrmann b8a943e294 samples/bpf: add lpm-trie benchmark
Extend the map_perf_test_{user,kern}.c infrastructure to stress test
lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
the latency depending on trie size and lookup count.

On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
bpf program takes roughly 6.5us on my system. Lookups in empty tries
take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
entries take ~7.1us (on the first _and_ any subsequent try).

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-23 16:10:38 -05:00
Jesper Dangaard Brouer cdb749cef1 bpf: fix samples xdp_tx_iptunnel and tc_l2_redirect with fake KBUILD_MODNAME
Fix build errors for samples/bpf xdp_tx_iptunnel and tc_l2_redirect,
when dynamic debugging is enabled (CONFIG_DYNAMIC_DEBUG) by defining a
fake KBUILD_MODNAME.

Just like Daniel Borkmann fixed other samples/bpf in commit
96a8eb1eee ("bpf: fix samples to add fake KBUILD_MODNAME").

Fixes: 12d8bb64e3 ("bpf: xdp: Add XDP example for head adjustment")
Fixes: 90e02896f1 ("bpf: Add test for bpf_redirect to ipip/ip6tnl")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 12:04:07 -05:00
Ingo Molnar 4e06d4f083 perf/urgent fixes and one improvement:
Fixes:
 
 - Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
 
 - Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
 
 - Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
 
 - Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
 
 - Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
 
 - 'perf probe' fixes for cross arch probing (Masami Hiramatsu)
 
 Improvement:
 
 - Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYbS6KAAoJENZQFvNTUqpAWqkP/1xdvbDqeTA2YjcWG+ffQJye
 LULSNQpRShvBGpB3zsOzOebOtw0R5oWb7YAp4JFZNyMIk5qcnS9gVwdw9Qx1lc/W
 Gq8OuhI28Pr9AHE37eTJ1+12HI2aAOylwKFqUpEWMzoaUqBDRh5JV2MeVGVQXBRe
 YBJ4Qwk1a7Ng74uzN/pdiC6V7pjddPEoLZEDG5p35vrgtNAY1JGlX5rvNR7CO6nN
 72QHRnB4r1d9fgNGS5oO5zEU1q5+JN5phe+gDANdk/8pCqDegRjZpHnj5gcm71jr
 mEvbstmlV1XPZSA9aFVu2L5LZuv6asS02HjrbGydqO2DUTpDgrynHsjAuqqaTnwk
 kLGwA2I+Kyh8g155H+HrQANONe2TTOXRkBVWIoFBWkhd1JUS/p3GE2qvqx5N36sr
 U26d9i/Bk/pbOYOdRUFMc4lckZY2HrrRWj8dVr06qLr2rlbYoLv8hpqvdyHr9MJS
 Id2Yqqd1JVzBDU3/+GfdFDBjltVtpSG1mdc6LWy9IpXFMJoSOQNsrbjf+mgMadeV
 EbtY8MdDZ525EB8niv9E9Vfd56LiWTKCjg9Mc758+U+Ns50YlQFsDq2hBO2+jQ/k
 W3QNX+WJiLgxqQK8adveXoYgCSWPNVttIvg8Lj1KNk8Y//vUC1KSrvDfaKiROzLB
 iF0u5mLRAQ+Pfec6X5PP
 =o9KL
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.10-20170104' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo:

Fixes:

  - Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)

  - Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)

  - Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)

  - Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)

  - Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)

  - Fix 'perf probe' for cross arch probing (Masami Hiramatsu)

Improvement:

  - Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-05 08:33:02 +01:00
Arnaldo Carvalho de Melo b6f4c66704 samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3awp0nv8tpnblatojmwjww7z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-28 10:47:13 -03:00
Arnaldo Carvalho de Melo ee12996c9d samples/bpf sock_example: Avoid getting ethhdr from two includes
To avoid the following build failure on Alpine Linux 3.4, that has
clang-3.8 with the bpf target:

    HOSTCC  samples/bpf/sock_example.o
  In file included from /usr/include/net/ethernet.h:10:0,
                   from /git/linux/samples/bpf/sock_example.h:7,
                   from /git/linux/samples/bpf/sock_example.c:30:
  /usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct
  ethhdr'
   struct ethhdr {
          ^
  In file included from /git/linux/samples/bpf/sock_example.c:26:0:
  ./usr/include/linux/if_ether.h:144:8: note: originally defined here
   struct ethhdr {
          ^
  scripts/Makefile.host:124: recipe for target
  'samples/bpf/sock_example.o' failed
  make[2]: *** [samples/bpf/sock_example.o] Error 1
  /git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed

So include net/if_ether.h for the needs of sock_example.h, using the
same include that sock_example.c uses.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-27 21:49:17 -03:00
Linus Torvalds 00198dab3b Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
  plus on the tooling side there's a number of fixes and some late
  updates"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf sched timehist: Fix invalid period calculation
  perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
  perf sched timehist: Enlarge default 'comm_width'
  perf sched timehist: Honour 'comm_width' when aligning the headers
  perf/x86: Fix overlap counter scheduling bug
  perf/x86/pebs: Fix handling of PEBS buffer overflows
  samples/bpf: Move open_raw_sock to separate header
  samples/bpf: Remove perf_event_open() declaration
  samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
  tools lib bpf: Add bpf_prog_{attach,detach}
  samples/bpf: Switch over to libbpf
  perf diff: Do not overwrite valid build id
  perf annotate: Don't throw error for zero length symbols
  perf bench futex: Fix lock-pi help string
  perf trace: Check if MAP_32BIT is defined (again)
  samples/bpf: Make perf_event_read() static
  uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
  samples/bpf: Make samples more libbpf-centric
  tools lib bpf: Add flags to bpf_create_map()
  tools lib bpf: use __u32 from linux/types.h
  ...
2016-12-23 16:49:12 -08:00
Joe Stringer 9899694a7f samples/bpf: Move open_raw_sock to separate header
This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.

Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:40 -03:00
Joe Stringer 205c8ada31 samples/bpf: Remove perf_event_open() declaration
This declaration was made in samples/bpf/libbpf.c for convenience, but
there's already one in tools/perf/perf-sys.h. Reuse that one.

Committer notes:

Testing it:

  $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
  make[1]: Entering directory '/home/build/v4.9.0-rc8+'
    CHK     include/config/kernel.release
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /home/acme/git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    CHK     include/generated/timeconst.h
    CHK     include/generated/bounds.h
    CHK     include/generated/asm-offsets.h
    CALL    /home/acme/git/linux/scripts/checksyscalls.sh
    HOSTCC  samples/bpf/test_verifier.o
    HOSTCC  samples/bpf/libbpf.o
    HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
    HOSTCC  samples/bpf/test_maps.o
    HOSTCC  samples/bpf/sock_example.o
    HOSTCC  samples/bpf/bpf_load.o
<SNIP>
    HOSTLD  samples/bpf/trace_event
    HOSTLD  samples/bpf/sampleip
    HOSTLD  samples/bpf/tc_l2_redirect
  make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
  $

Also tested the offwaketime resulting from the rebuild, seems to work as
before.

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:40 -03:00
Arnaldo Carvalho de Melo 811b4f0d78 samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
Only one of the examples declare the bpf_insn bpf proggie as a const:

  $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
  samples/bpf/fds_example.c:	static const struct bpf_insn insns[] = {
  samples/bpf/sock_example.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach2.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_sock.c:	struct bpf_insn prog[] = {
  $

Which causes this warning:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  <SNIP>
     HOSTCC  samples/bpf/fds_example.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex1_user.o

So just ditch that 'const' to reduce build noise, leaving changing the
bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
adequate.

Cc: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:39 -03:00
Joe Stringer 5dc880de6e tools lib bpf: Add bpf_prog_{attach,detach}
Commit d8c5b17f2b ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.

Committer notes:

Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
like the other wrapper calls to sys_bpf with bpf_attr to make this build
in older toolchais, such as the ones in CentOS 5 and 6.

Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
Link: 353e6f298c.patch
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:39 -03:00
Joe Stringer 43371c83f3 samples/bpf: Switch over to libbpf
Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Committer notes:

Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
  <SNIP>
  [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    HOSTCC  scripts/basic/fixdep
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    HOSTCC  scripts/basic/bin2c
    HOSTCC  arch/x86/tools/relocs_32.o
    HOSTCC  arch/x86/tools/relocs_64.o
    LD      samples/bpf/built-in.o
  <SNIP>
    HOSTCC  samples/bpf/fds_example.o
    HOSTCC  samples/bpf/sockex1_user.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex2_user.o
  <SNIP>
    HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
  clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
	  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
	  -Wno-compare-distinct-pointer-types \
	  -Wno-gnu-variable-sized-type-not-at-end \
	  -Wno-address-of-packed-member -Wno-tautological-compare \
	  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
    HOSTLD  samples/bpf/tc_l2_redirect
  <SNIP>
    HOSTLD  samples/bpf/lwt_len_hist
    HOSTLD  samples/bpf/xdp_tx_iptunnel
  make[1]: Leaving directory '/tmp/build/linux'
  [root@f5065a7d6272 linux]#

And then, in the host:

  [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
  /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
  [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
  [root@jouet bpf]# file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
  [root@jouet bpf]# readelf -SW offwaketime
  offwaketime         offwaketime_kern.o  offwaketime_user.o
  [root@jouet bpf]# readelf -SW offwaketime_kern.o
  There are 11 section headers, starting at offset 0x700:

  Section Headers:
    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
    [ 1] .strtab           STRTAB          0000000000000000 000658 0000a8 00      0   0  1
    [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
    [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
    [ 4] .relkprobe/try_to_wake_up REL             0000000000000000 0005a8 000020 10     10   3  8
    [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
    [ 6] .reltracepoint/sched/sched_switch REL             0000000000000000 0005c8 000090 10     10   5  8
    [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
    [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
    [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
    [10] .symtab           SYMTAB          0000000000000000 000488 000120 18      1   4  8
  Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
    [root@jouet bpf]# ./offwaketime | head -3
  qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
  swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
  [root@jouet bpf]#

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: 5c40f54a52.patch
Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:38 -03:00
Arnaldo Carvalho de Melo 96c2fb69b9 samples/bpf: Make perf_event_read() static
While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed
some warnings building samples/bpf/ on a Fedora Rawhide container, with
clang/llvm 3.9 I noticed this:

  [root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
  <SNIP>
    HOSTCC  samples/bpf/trace_output_user.o
  /git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous
  prototype for 'perf_event_read' [-Wmissing-prototypes]
   void perf_event_read(print_fn fn)
        ^~~~~~~~~~~~~~~
    HOSTLD  samples/bpf/trace_output
  make[1]: Leaving directory '/tmp/build/linux'

Shut up the compiler by making that function static.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161215152927.GC6866@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 09:37:33 -03:00
Linus Torvalds 41e0e24b45 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
   about missing CRCs (Nick Piggin)

 - asm-exports fix for LTO (Nicolas Pitre)

 - thin archives improvements (Nick Piggin)

 - linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
   Piggin)

 - genksyms support for __builtin_va_list keyword

 - misc minor fixes

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  x86/kbuild: enable modversions for symbols exported from asm
  kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
  scripts/kallsyms: remove last remnants of --page-offset option
  make use of make variable CURDIR instead of calling pwd
  kbuild: cmd_export_list: tighten the sed script
  kbuild: minor improvement for thin archives build
  kbuild: modpost warn if export version crc is missing
  kbuild: keep data tables through dead code elimination
  kbuild: improve linker compatibility with lib-ksyms.o build
  genksyms: Regenerate parser
  kbuild/genksyms: handle va_list type
  kbuild: thin archives for multi-y targets
  kbuild: kallsyms allow 3-pass generation if symbols size has changed
2016-12-17 16:24:13 -08:00
Joe Stringer d40fc181eb samples/bpf: Make samples more libbpf-centric
Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.

Committer notes:

Testing it:

On a fedora rawhide container, with clang/llvm 3.9, sharing the host
linux kernel git tree:

  # make O=/tmp/build/linux/ headers_install
  # make O=/tmp/build/linux -C samples/bpf/

Since I forgot to make it privileged, just tested it outside the
container, using what it generated:

  # uname -a
  Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
  # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
  # ls -la offwaketime
  -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
  # file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
  # readelf -SW offwaketime_kern.o  | grep PROGBITS
  [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
  [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
  [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
  [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
  [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
  [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
  # ./offwaketime | head -5
  swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
  CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
  Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
  JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
  #

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15 16:25:47 -03:00
Uwe Kleine-König e19b7cee02 make use of make variable CURDIR instead of calling pwd
make already provides the current working directory in a variable, so make
use of it instead of forking a shell. Also replace usage of PWD by
CURDIR. PWD is provided by most shells, but not all, so this makes the
build system more robust.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-12-11 12:12:56 +01:00
Martin KaFai Lau 12d8bb64e3 bpf: xdp: Add XDP example for head adjustment
The XDP prog checks if the incoming packet matches any VIP:PORT
combination in the BPF hashmap.  If it is, it will encapsulate
the packet with a IPv4/v6 header as instructed by the value of
the BPF hashmap and then XDP_TX it out.

The VIP:PORT -> IP-Encap-Info can be specified by the cmd args
of the user prog.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 14:25:13 -05:00
Sargun Dhillon 9b474ecee5 samples, bpf: Add automated test for cgroup filter attachments
This patch adds the sample program test_cgrp2_attach2. This program is
similar to test_cgrp2_attach, but it performs automated testing of the
cgroupv2 BPF attached filters. It runs the following checks:
* Simple filter attachment
* Application of filters to child cgroups
* Overriding filters on child cgroups
	* Checking that this still works when the parent filter is removed

The filters that are used here are simply allow all / deny all filters, so
it isn't checking the actual functionality of the filters, but rather
the behaviour  around detachment / attachment. If net_cls is enabled,
this test will fail.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 16:07:52 -05:00
Sargun Dhillon 1a922fee66 samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers
This patch modifies test_current_task_under_cgroup_user. The test has
several helpers around creating a temporary environment for cgroup
testing, and moving the current task around cgroups. This set of
helpers can then be used in other tests.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 16:07:11 -05:00
Alexei Starovoitov 69a9d09b22 samples/bpf: silence compiler warnings
silence some of the clang compiler warnings like:
include/linux/fs.h:2693:9: warning: comparison of unsigned enum expression < 0 is always false
arch/x86/include/asm/processor.h:491:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value
include/linux/cgroup-defs.h:326:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension
since they add too much noise to samples/bpf/ build.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 16:01:14 -05:00
David S. Miller 2745529ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
   RX tail pointer properly overlapped with some changes
   to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
   overlapping with a reorganization of the driver to support
   ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
   stmmac driver, meanwhile a lot of this code was cleaned up
   and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
   'net-next' called __fl_delete() and this overlapped with
   Daniel Borkamann's bug fix to use RCU for object destruction
   in 'net'.  It also overlapped with Jiri's change to guard
   the rhashtable_remove_fast() call with a check against
   tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
   unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
   bug fix in 'net' overlapped with a large reorganization of
   the same code in 'net-next'.  Since the 'net-next' code no
   longer had the bug in question, there was nothing to do
   other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 12:29:53 -05:00
David Ahern 554ae6e792 samples/bpf: add userspace example for prohibiting sockets
Add examples preventing a process in a cgroup from opening a socket
based family, protocol and type.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:09 -05:00
David Ahern 4f2e7ae56e samples/bpf: Update bpf loader for cgroup section names
Add support for section names starting with cgroup/skb and cgroup/sock.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:09 -05:00
David Ahern ad2805dc79 samples: bpf: add userspace example for modifying sk_bound_dev_if
Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:08 -05:00
Thomas Graf f74599f7c5 bpf: Add tests and samples for LWT-BPF
Adds a series of tests to verify the functionality of attaching
BPF programs at LWT hooks.

Also adds a sample which collects a histogram of packet sizes which
pass through an LWT hook.

$ ./lwt_len_hist.sh
Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.253.2 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    39857.69
       1 -> 1        : 0        |                                      |
       2 -> 3        : 0        |                                      |
       4 -> 7        : 0        |                                      |
       8 -> 15       : 0        |                                      |
      16 -> 31       : 0        |                                      |
      32 -> 63       : 22       |                                      |
      64 -> 127      : 98       |                                      |
     128 -> 255      : 213      |                                      |
     256 -> 511      : 1444251  |********                              |
     512 -> 1023     : 660610   |***                                   |
    1024 -> 2047     : 535241   |**                                    |
    2048 -> 4095     : 19       |                                      |
    4096 -> 8191     : 180      |                                      |
    8192 -> 16383    : 5578023  |************************************* |
   16384 -> 32767    : 632099   |***                                   |
   32768 -> 65535    : 6575     |                                      |

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:52:00 -05:00
Alexei Starovoitov 9bee294f53 samples/bpf: fix include path
Fix the following build error:
HOSTCC  samples/bpf/test_lru_dist.o
../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory

This is due to objtree != srctree.
Use srctree, since that's where bpf_util.h is located.

Fixes: e00c7b216f ("bpf: fix multiple issues in selftest suite and samples")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 12:42:28 -05:00
Sargun Dhillon 1379fd3c42 samples: bpf: Refactor test_cgrp2_attach -- use getopt, and add mode
This patch modifies test_cgrp2_attach to use getopt so we can use standard
command line parsing.

It also adds an option to run the program in detach only mode. This does
not attach a new filter at the cgroup, but only runs the detach command.

Lastly, it changes the attach code to not detach and then attach. It relies
on the 'hotswap' behaviour of CGroup BPF programs to be able to change
in-place. If detach-then-attach behaviour needs to be tested, the example
can be run in detach only mode prior to attachment.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 10:29:48 -05:00
Michael Holzheu 2dbb4c05d0 bpf/samples: Fix PT_REGS_IP on s390x and use it
The files "sampleip_kern.c" and "trace_event_kern.c" directly access
"ctx->regs.ip" which is not available on s390x. Fix this and use the
PT_REGS_IP() macro instead.

Also fix the macro for s390x and use "psw.addr" from "pt_regs".

Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:26:46 -05:00
Daniel Borkmann e00c7b216f bpf: fix multiple issues in selftest suite and samples
1) The test_lru_map and test_lru_dist fails building on my machine since
   the sys/resource.h header is not included.

2) test_verifier fails in one test case where we try to call an invalid
   function, since the verifier log output changed wrt printing function
   names.

3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
   retrieving the number of possible CPUs. This is broken at least in our
   scenario and really just doesn't work.

   glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
   First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
   if that fails, depending on the config, it either tries to count CPUs
   in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
   If /proc/cpuinfo has some issue, it returns just 1 worst case. This
   oddity is nothing new [1], but semantics/behaviour seems to be settled.
   _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
   that fails it looks into /proc/stat for cpuX entries, and if also that
   fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
   unlikely all breaks down).

   While that might match num_possible_cpus() from the kernel in some
   cases, it's really not guaranteed with CPU hotplugging, and can result
   in a buffer overflow since the array in user space could have too few
   number of slots, and on perpcu map lookup, the kernel will write beyond
   that memory of the value buffer.

   William Tu reported such mismatches:

     [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
     happens when CPU hotadd is enabled. For example, in Fusion when
     setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
     -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
     will be 2 [2]. [...]

   Documentation/cputopology.txt says /sys/devices/system/cpu/possible
   outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
   so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
   own implementation. Later, we could add support to bpf(2) for passing
   a mask via CPU_SET(3), for example, to just select a subset of CPUs.

   BPF samples code needs this fix as well (at least so that people stop
   copying this). Thus, define bpf_num_possible_cpus() once in selftests
   and import it from there for the sample code to avoid duplicating it.
   The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.

After all three issues are fixed, the test suite runs fine again:

  # make run_tests | grep self
  selftests: test_verifier [PASS]
  selftests: test_maps [PASS]
  selftests: test_lru_map [PASS]
  selftests: test_kmod.sh [PASS]

  [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
  [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html

Fixes: 3059303f59 ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes: 86af8b4191 ("Add sample for adding simple drop program to link")
Fixes: df570f5772 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes: e155967179 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes: ebb676daa1 ("bpf: Print function name in addition to function id")
Fixes: 5db58faf98 ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:47 -05:00
Daniel Mack d8c5b17f2b samples: bpf: add userspace example for attaching eBPF programs to cgroups
Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:

 * Create arraymap in kernel with 4 byte keys and 8 byte values

 * Load eBPF program

   The eBPF program accesses the map passed in to store two pieces of
   information. The number of invocations of the program, which maps
   to the number of packets received, is stored to key 0. Key 1 is
   incremented on each iteration by the number of bytes stored in
   the skb.

 * Detach any eBPF program previously attached to the cgroup

 * Attach the new program to the cgroup using BPF_PROG_ATTACH

 * Once a second, read map[0] and map[1] to see how many bytes and
   packets were seen on any socket of tasks in the given cgroup.

The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.

libbpf gained two new wrappers for the new syscall commands.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 16:26:04 -05:00
Alexei Starovoitov db6a71dd9a samples/bpf: fix bpf loader
llvm can emit relocations into sections other than program code
(like debug info sections). Ignore them during parsing of elf file

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:04:52 -05:00
Alexei Starovoitov d2b024d32d samples/bpf: fix sockex2 example
since llvm commit "Do not expand UNDEF SDNode during insn selection lowering"
llvm will generate code that uses uninitialized registers for cases
where C code is actually uses uninitialized data.
So this sockex2 example is technically broken.
Fix it by initializing on the stack variable fully.
Also increase verifier buffer limit, since verifier output
may not fit in 64k for this sockex2 code depending on llvm version.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:04:52 -05:00
Martin KaFai Lau 5db58faf98 bpf: Add tests for the LRU bpf_htab
This patch has some unit tests and a test_lru_dist.

The test_lru_dist reads in the numeric keys from a file.
The files used here are generated by a modified fio-genzipf tool
originated from the fio test suit.  The sample data file can be
found here: https://github.com/iamkafai/bpf-lru

The zipf.* data files have 100k numeric keys and the key is also
ranged from 1 to 100k.

The test_lru_dist outputs the number of unique keys (nr_unique).
F.e. The following means, 61239 of them is unique out of 100k keys.
nr_misses means it cannot be found in the LRU map, so nr_misses
must be >= nr_unique. test_lru_dist also simulates a perfect LRU
map as a comparison:

[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a1_01.out 4000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
    task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
    task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
....
test_parallel_lru_dist (map_type:9 map_flags:0x2):
    task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
    task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)

[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a0_01.out 40000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
    task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
    task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
...
test_parallel_lru_dist (map_type:9 map_flags:0x2):
    task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
    task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)

LRU map has also been added to map_perf_test:
/* Global LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
 1 cpus: 2934082 updates
 4 cpus: 7391434 updates
 8 cpus: 6500576 updates

/* Percpu LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
  1 cpus: 2896553 updates
  4 cpus: 9766395 updates
  8 cpus: 17460553 updates

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 11:50:43 -05:00
David S. Miller bb598c1b8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 10:54:36 -05:00
Martin KaFai Lau 90e02896f1 bpf: Add test for bpf_redirect to ipip/ip6tnl
The test creates two netns, ns1 and ns2.  The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.

    ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)

The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.

The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].

At ns1, the VIPs are routed _via_ the host.

At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl.  The test is configured in a way so
that both ingress and egress can be tested.

At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified.  The return path is routed to the dev ipip/ip6tnl.

During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).

Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12 23:38:07 -05:00
David S. Miller 27058af401 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple overlapping changes.

For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30 12:42:58 -04:00
Daniel Borkmann 96a8eb1eee bpf: fix samples to add fake KBUILD_MODNAME
Some of the sample files are causing issues when they are loaded with tc
and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
file as program/map/etc sections are not present, which can be easily
spotted with readelf(1).

Currently, BPF samples are including some of the kernel headers and mid
term we should change them to refrain from this, really. When dynamic
debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
is easily overlooked in the build as clang spills this along with other
noisy warnings from various header includes, and llc still generates an
ELF file with mentioned characteristics. For just playing around with BPF
examples, this can be a bit of a hurdle to take.

Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
done in xdp*_kern samples already.

Fixes: 65d472fb00 ("samples/bpf: add 'pointer to packet' tests")
Fixes: 6afb1e28b8 ("samples/bpf: Add tunnel set/get tests.")
Fixes: a3f7461734 ("cgroup: bpf: Add an example to do cgroup checking in BPF")
Reported-by: Chandrasekar Kannan <ckannan@console.to>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 14:46:12 -04:00
Daniel Borkmann 5aa5bd14c5 bpf: add initial suite for selftests
Add a start of a test suite for kernel selftests. This moves test_verifier
and test_maps over to tools/testing/selftests/bpf/ along with various
code improvements and also adds a script for invoking test_bpf module.
The test suite can simply be run via selftest framework, f.e.:

  # cd tools/testing/selftests/bpf/
  # make
  # make run_tests

Both test_verifier and test_maps were kind of misplaced in samples/bpf/
directory and we were looking into adding them to selftests for a while
now, so it can be picked up by kbuild bot et al and hopefully also get
more exposure and thus new test case additions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:35:55 -04:00
Daniel Borkmann 1a776b9ce8 bpf: add various tests around spill/fill of regs
Add several spill/fill tests. Besides others, one that performs xadd
on the spilled register, one ldx/stx test where different types are
spilled from two branches and read out from common path. Verfier does
handle all correctly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:35:54 -04:00
Josef Bacik 484611357c bpf: allow access into map value arrays
Suppose you have a map array value that is something like this

struct foo {
	unsigned iter;
	int array[SOME_CONSTANT];
};

You can easily insert this into an array, but you cannot modify the contents of
foo->array[] after the fact.  This is because we have no way to verify we won't
go off the end of the array at verification time.  This patch provides a start
for this work.  We accomplish this by keeping track of a minimum and maximum
value a register could be while we're checking the code.  Then at the time we
try to do an access into a MAP_VALUE we verify that the maximum offset into that
region is a valid access into that memory region.  So in practice, code such as
this

unsigned index = 0;

if (foo->iter >= SOME_CONSTANT)
	foo->iter = index;
else
	index = foo->iter++;
foo->array[index] = bar;

would be allowed, as we can verify that index will always be between 0 and
SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
check to make sure the index isn't less than 0, or do something like index %=
SOME_CONSTANT.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-29 01:35:35 -04:00
Naveen N. Rao 973d94d8a8 bpf samples: update tracex5 sample to use __seccomp_filter
seccomp_phase1() does not exist anymore. Instead, update sample to use
__seccomp_filter(). While at it, set max locked memory to unlimited.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 03:48:58 -04:00
Naveen N. Rao 2b064fff85 bpf samples: fix compiler errors with sockex2 and sockex3
These samples fail to compile as 'struct flow_keys' conflicts with
definition in net/flow_dissector.h. Fix the same by renaming the
structure used in the sample.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 03:48:58 -04:00
Daniel Borkmann 7d95b0ab5b bpf: add test cases for direct packet access
Add couple of test cases for direct write and the negative size issue, and
also adjust the direct packet access test4 since it asserts that writes are
not possible, but since we've just added support for writes, we need to
invert the verdict to ACCEPT, of course. Summary: 133 PASSED, 0 FAILED.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 23:32:11 -04:00
Alexei Starovoitov 173ca26e9b samples/bpf: add comprehensive ipip, ipip6, ip6ip6 test
the test creates 3 namespaces with veth connected via bridge.
First two namespaces simulate two different hosts with the same
IPv4 and IPv6 addresses configured on the tunnel interface and they
communicate with outside world via standard tunnels.
Third namespace creates collect_md tunnel that is driven by BPF
program which selects different remote host (either first or
second namespace) based on tcp dest port number while tcp dst
ip is the same.
This scenario is rough approximation of load balancer use case.
The tests check both traditional tunnel configuration and collect_md mode.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 10:13:07 -04:00
Alexei Starovoitov a1c82704d1 samples/bpf: extend test_tunnel_bpf.sh with IPIP test
extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
It tests both traditional tunnel configuration and
dynamic via bpf helpers.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 10:13:07 -04:00
Daniel Borkmann 2d2be8cab2 bpf: fix range propagation on direct packet access
LLVM can generate code that tests for direct packet access via
skb->data/data_end in a way that currently gets rejected by the
verifier, example:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)

The reason why this gets rejected despite a proper test is that we
currently call find_good_pkt_pointers() only in case where we detect
tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
derived, for example, from a register of type pkt(id=Y,off=0,r=0)
pointing to skb->data. find_good_pkt_pointers() then fills the range
in the current branch to pkt(id=Y,off=0,r=Z) on success.

For above case, we need to extend that to recognize pkt_end >= rX
pattern and mark the other branch that is taken on success with the
appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
only two practical options to test for from what LLVM could have
generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
that we would need to take into account as well.

After the fix:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  29: (bf) r1 = r8
  30: (25) if r8 > 0x3c goto pc+47
   R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
   R9=pkt(id=0,off=0,r=54) R10=fp
  31: (b7) r1 = 1
  [...]

Verifier test cases are also added in this work, one that demonstrates
the mentioned example here and one that tries a bad packet access for
the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
with both test variants (>, >=).

Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:28:37 -07:00
Brendan Gregg 72874418e4 samples/bpf: add sampleip example
sample instruction pointer and frequency count in a BPF map

Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:45 -07:00
Alexei Starovoitov 1c47910ef8 samples/bpf: add perf_event+bpf example
The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++
It's primary purpose to check that key bpf helpers like map lookup, update,
get_stackid, trace_printk and ctx access are all working.
It checks:
- PERF_COUNT_HW_CPU_CYCLES on all cpus
- PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children
- PERF_COUNT_SW_CPU_CLOCK on all cpus
- PERF_COUNT_SW_CPU_CLOCK for current process

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:45 -07:00
William Tu 6afb1e28b8 samples/bpf: Add tunnel set/get tests.
The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE.  A native
tunnel device is created in a namespace to interact with a lwtunnel
device out of the namespace, with metadata enabled.  The bpf_skb_set_*
program is attached to tc egress and bpf_skb_get_* is attached to egress
qdisc.  A ping between two tunnels is used to verify correctness and
the result of bpf_skb_get_* printed by bpf_trace_printk.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 22:42:44 -07:00
David S. Miller 60747ef4d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes for both merge conflicts.

Resolution work done by Stephen Rothwell was used
as a reference.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 01:17:32 -04:00
Aaron Yue 1633ac0a2e samples/bpf: add verifier tests for the helper access to the packet
test various corner cases of the helper function access to the packet
via crafted XDP programs.

Signed-off-by: Aaron Yue <haoxuany@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 21:56:18 -07:00
Daniel Borkmann 747ea55e4f bpf: fix bpf_skb_in_cgroup helper naming
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.

Tejun says:

  So, I think in_cgroup should mean that the object is in that
  particular cgroup while under_cgroup in the subhierarchy of that
  cgroup. Let's rename the other subhierarchy test to under too. I
  think that'd be a lot less confusing going forward.

  [...]

  It's more intuitive and gives us the room to implement the real
  "in" test if ever necessary in the future.

Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.

Fixes: 4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-08-12 21:53:33 -07:00
Sargun Dhillon 9e6e60ecbd samples/bpf: Add test_current_task_under_cgroup test
This test has a BPF program which writes the last known pid to call the
sync syscall within a given cgroup to a map.

The user mode program creates its own mount namespace, and mounts the
cgroupsv2  hierarchy in there, as on all current test systems
(Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default.
Once it does this, it proceeds to test.

The test checks for positive and negative condition. It ensures that
when it's part of a given cgroup, its pid is captured in the map,
and that when it leaves the cgroup, this doesn't happen.

It populate a cgroups arraymap prior to execution in userspace. This means
that the program must be run in the same cgroups namespace as the programs
that are being traced.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 21:49:42 -07:00
Adam Barth 05b8ad25bc samples/bpf: fix bpf_perf_event_output prototype
The commit 555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")
started using 20 of initially reserved upper 32-bits of 'flags' argument
in bpf_perf_event_output(). Adjust corresponding prototype in samples/bpf/bpf_helpers.h

Signed-off-by: Adam Barth <arb@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10 23:12:31 -07:00
Alexei Starovoitov ba0cc3c153 samples/bpf: add bpf_map_update_elem() tests
increase test coverage to check previously missing 'update when full'

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06 20:49:19 -04:00
Sargun Dhillon cf9b1199de samples/bpf: Add test/example of using bpf_probe_write_user bpf helper
This example shows using a kprobe to act as a dnat mechanism to divert
traffic for arbitrary endpoints. It rewrite the arguments to a syscall
while they're still in userspace, and before the syscall has a chance
to copy the argument into kernel space.

Although this is an example, it also acts as a test because the mapped
address is 255.255.255.255:555 -> real address, and that's not a legal
address to connect to. If the helper is broken, the example will fail
on the intermediate steps, as well as the final step to verify the
rewrite of userspace memory succeeded.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 18:07:48 -07:00
Sargun Dhillon 96ae522795 bpf: Add bpf_probe_write_user BPF helper to be called in tracers
This allows user memory to be written to during the course of a kprobe.
It shouldn't be used to implement any kind of security mechanism
because of TOC-TOU attacks, but rather to debug, divert, and
manipulate execution of semi-cooperative processes.

Although it uses probe_kernel_write, we limit the address space
the probe can write into by checking the space with access_ok.
We do this as opposed to calling copy_to_user directly, in order
to avoid sleeping. In addition we ensure the threads's current fs
/ segment is USER_DS and the thread isn't exiting nor a kernel thread.

Given this feature is meant for experiments, and it has a risk of
crashing the system, and running programs, we print a warning on
when a proglet that attempts to use this helper is installed,
along with the pid and process name.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 18:07:48 -07:00
Brenden Blanco d9094bda5c bpf: make xdp sample variable names more meaningful
The naming choice of index is not terribly descriptive, and dropcnt is
in fact incorrect for xdp2. Pick better names for these: ipproto and
rxcnt.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 22:07:24 -07:00
Brenden Blanco 764cbccef8 bpf: add sample for xdp forwarding and rewrite
Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.

Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:33 -07:00
Brenden Blanco 86af8b4191 Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.

Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.

$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17:   20403027 drops/s

./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
  5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
  5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
  5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
  5043067pps 2420Mb/sec (2420672160bps) errors: 0

perf report --no-children:
 26.05%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
 17.84%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
  5.52%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
  4.90%  swapper      [kernel.vmlinux]  [k] poll_idle
  4.14%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  2.78%  ksoftirqd/0  [kernel.vmlinux]  [k] __free_pages_ok
  2.57%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.51%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq
  1.94%  ksoftirqd/0  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
  1.45%  swapper      [mlx4_en]         [k] mlx4_en_alloc_frags
  1.35%  ksoftirqd/0  [kernel.vmlinux]  [k] free_one_page
  1.33%  swapper      [kernel.vmlinux]  [k] intel_idle
  1.04%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5c5
  0.96%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c58d
  0.93%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6ee
  0.92%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6b9
  0.89%  ksoftirqd/0  [kernel.vmlinux]  [k] __alloc_pages_nodemask
  0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c686
  0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5d5
  0.78%  ksoftirqd/0  [mlx4_en]         [k] mlx4_alloc_pages.isra.23
  0.77%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5b4
  0.77%  ksoftirqd/0  [kernel.vmlinux]  [k] net_rx_action

machine specs:
 receiver - Intel E5-1630 v3 @ 3.70GHz
 sender - Intel E5645 @ 2.40GHz
 Mellanox ConnectX-3 @40G

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Martin KaFai Lau a3f7461734 cgroup: bpf: Add an example to do cgroup checking in BPF
test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file.  The pinned file can be loaded by tc and then used
by the bpf prog later.  This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.

test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc.  It is to demonstrate
the usage of bpf_skb_in_cgroup.

test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together.  The idea is like:
1. Load the test_cgrp2_tc_kern.o by tc
2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
   with a cgroup fd
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
   dropped because of a match on the cgroup

Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc.  It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:32:13 -04:00
William Tu eb88d58559 samples/bpf: set max locked memory to ulimited
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25 12:03:46 -04:00
Alexei Starovoitov 883e44e4de samples/bpf: add verifier tests
add few tests for "pointer to packet" logic of the verifier

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov 65d472fb00 samples/bpf: add 'pointer to packet' tests
parse_simple.c - packet parser exapmle with single length check that
filters out udp packets for port 9

parse_varlen.c - variable length parser that understand multiple vlan headers,
ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
The packet is parsed layer by layer with multitple length checks.

parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
Same functionality as parse_simple.

simple = 24.1Mpps per core
varlen = 22.7Mpps
ldabs  = 21.4Mpps

Parser with LD_ABS instructions is slower than full direct access parser
which does more packet accesses and checks.

These examples demonstrate the choice bpf program authors can make between
flexibility of the parser vs speed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
David S. Miller cba6532100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv4/ip_gre.c

Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 00:52:29 -04:00
Jesper Dangaard Brouer bdefbbf2ec samples/bpf: like LLC also verify and allow redefining CLANG command
Users are likely to manually compile both LLVM 'llc' and 'clang'
tools.  Thus, also allow redefining CLANG and verify command exist.

Makefile implementation wise, the target that verify the command have
been generalized.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:26:08 -04:00
Jesper Dangaard Brouer b62a796c10 samples/bpf: allow make to be run from samples/bpf/ directory
It is not intuitive that 'make' must be run from the top level
directory with argument "samples/bpf/" to compile these eBPF samples.

Introduce a kbuild make file trick that allow make to be run from the
"samples/bpf/" directory itself.  It basically change to the top level
directory and call "make samples/bpf/" with the "/" slash after the
directory name.

Also add a clean target that only cleans this directory, by taking
advantage of the kbuild external module setting M=$PWD.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:33 -04:00
Jesper Dangaard Brouer 1c97566d51 samples/bpf: add a README file to get users started
Getting started with using examples in samples/bpf/ is not
straightforward.  There are several dependencies, and specific
versions of these dependencies.

Just compiling the example tool is also slightly obscure, e.g. one
need to call make like:

 make samples/bpf/

Do notice the "/" slash after the directory name.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:32 -04:00
Jesper Dangaard Brouer 7b01dd5793 samples/bpf: Makefile verify LLVM compiler avail and bpf target is supported
Make compiling samples/bpf more user friendly, by detecting if LLVM
compiler tool 'llc' is available, and also detect if the 'bpf' target
is available in this version of LLVM.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:32 -04:00
Jesper Dangaard Brouer 6ccfba75d3 samples/bpf: add back functionality to redefine LLC command
It is practical to be-able-to redefine the location of the LLVM
command 'llc', because not all distros have a LLVM version with bpf
target support.  Thus, it is sometimes required to compile LLVM from
source, and sometimes it is not desired to overwrite the distros
default LLVM version.

This feature was removed with 128d1514be ("samples/bpf: Use llc in
PATH, rather than a hardcoded value").

Add this features back. Note that it is possible to redefine the LLC
on the make command like:

 make samples/bpf/ LLC=~/git/llvm/build/bin/llc

Fixes: 128d1514be ("samples/bpf: Use llc in PATH, rather than a hardcoded value")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 14:25:32 -04:00
Alexei Starovoitov 569cc39d39 samples/bpf: fix trace_output example
llvm cannot always recognize memset as builtin function and optimize
it away, so just delete it. It was a leftover from testing
of bpf_perf_event_output() with large data structures.

Fixes: 39111695b1 ("samples: bpf: add bpf_perf_event_output example")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 17:29:45 -04:00
Daniel Borkmann 3f2050e20e bpf, samples: add test cases for raw stack
This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
verifier behaviour.

  [...]
  #84 raw_stack: no skb_load_bytes OK
  #85 raw_stack: skb_load_bytes, no init OK
  #86 raw_stack: skb_load_bytes, init OK
  #87 raw_stack: skb_load_bytes, spilled regs around bounds OK
  #88 raw_stack: skb_load_bytes, spilled regs corruption OK
  #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
  #90 raw_stack: skb_load_bytes, spilled regs + data OK
  #91 raw_stack: skb_load_bytes, invalid access 1 OK
  #92 raw_stack: skb_load_bytes, invalid access 2 OK
  #93 raw_stack: skb_load_bytes, invalid access 3 OK
  #94 raw_stack: skb_load_bytes, invalid access 4 OK
  #95 raw_stack: skb_load_bytes, invalid access 5 OK
  #96 raw_stack: skb_load_bytes, invalid access 6 OK
  #97 raw_stack: skb_load_bytes, large access OK
  Summary: 98 PASSED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 21:40:42 -04:00
Daniel Borkmann 02413cabd6 bpf, samples: don't zero data when not needed
Remove the zero initialization in the sample programs where appropriate.
Note that this is an optimization which is now possible, old programs
still doing the zero initialization are just fine as well. Also, make
sure we don't have padding issues when we don't memset() the entire
struct anymore.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 21:40:42 -04:00
David S. Miller ae95d71261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
Alexei Starovoitov e3edfdec04 samples/bpf: add tracepoint vs kprobe performance tests
the first microbenchmark does
fd=open("/proc/self/comm");
for() {
  write(fd, "test");
}
and on 4 cpus in parallel:
                                      writes per sec
base (no tracepoints, no kprobes)         930k
with kprobe at __set_task_comm()          420k
with tracepoint at task:task_rename       730k

For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read.
For tracepint bpf program does nothing, since arguments are copied by tracepoint.

2nd microbenchmark does:
fd=open("/dev/urandom");
for() {
  read(fd, buf);
}
and on 4 cpus in parallel:
                                       reads per sec
base (no tracepoints, no kprobes)         300k
with kprobe at urandom_read()             279k
with tracepoint at random:urandom_read    290k

bpf progs attached to kprobe and tracepoint are noop.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 21:04:27 -04:00
Alexei Starovoitov 3c9b16448c samples/bpf: tracepoint example
modify offwaketime to work with sched/sched_switch tracepoint
instead of kprobe into finish_task_switch

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 21:04:27 -04:00
Alexei Starovoitov c07660409e samples/bpf: add tracepoint support to bpf loader
Recognize "tracepoint/" section name prefix and attach the program
to that tracepoint.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 21:04:27 -04:00
Naveen N. Rao 138d6153a1 samples/bpf: Enable powerpc support
Add the necessary definitions for building bpf samples on ppc.

Since ppc doesn't store function return address on the stack, modify how
PT_REGS_RET() and PT_REGS_FP() work.

Also, introduce PT_REGS_IP() to access the instruction pointer.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 16:01:29 -04:00
Naveen N. Rao 128d1514be samples/bpf: Use llc in PATH, rather than a hardcoded value
While at it, remove the generation of .s files and fix some typos in the
related comment.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 16:01:28 -04:00
Naveen N. Rao 77e63534d6 samples/bpf: Fix build breakage with map_perf_test_user.c
Building BPF samples is failing with the below error:

samples/bpf/map_perf_test_user.c: In function ‘main’:
samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has
initializer but incomplete type
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
         ^
samples/bpf/map_perf_test_user.c:134:21: error: ‘RLIM_INFINITY’
undeclared (first use in this function)
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                     ^
samples/bpf/map_perf_test_user.c:134:21: note: each undeclared
identifier is reported only once for each function it appears in
samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
struct initializer [enabled by default]
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
         ^
samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
for ‘r’) [enabled by default]
samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
struct initializer [enabled by default]
samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
for ‘r’) [enabled by default]
samples/bpf/map_perf_test_user.c:134:16: error: storage size of ‘r’
isn’t known
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                ^
samples/bpf/map_perf_test_user.c:139:2: warning: implicit declaration of
function ‘setrlimit’ [-Wimplicit-function-declaration]
  setrlimit(RLIMIT_MEMLOCK, &r);
  ^
samples/bpf/map_perf_test_user.c:139:12: error: ‘RLIMIT_MEMLOCK’
undeclared (first use in this function)
  setrlimit(RLIMIT_MEMLOCK, &r);
            ^
samples/bpf/map_perf_test_user.c:134:16: warning: unused variable ‘r’
[-Wunused-variable]
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                ^
make[2]: *** [samples/bpf/map_perf_test_user.o] Error 1

Fix this by including the necessary header file.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 16:01:28 -04:00