alistair23-linux/samples/bpf
Lawrence Brakmo 40304b2a15 bpf: BPF support for sock_ops
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_sock_ops_kern {
        struct sock *sk;
        __u32  op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
};

/* user version
 * Some fields are in network byte order reflecting the sock struct
 * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
 * convert them to host byte order.
 */
struct bpf_sock_ops {
        __u32 op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
        __u32 family;
        __u32 remote_ip4;     /* In network byte order */
        __u32 local_ip4;      /* In network byte order */
        __u32 remote_ip6[4];  /* In network byte order */
        __u32 local_ip6[4];   /* In network byte order */
        __u32 remote_port;    /* In network byte order */
        __u32 local_port;     /* In host byte horder */
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
..
bpf_helpers.h bpf: Add MIPS support to samples/bpf. 2017-06-14 15:03:22 -04:00
bpf_load.c bpf: BPF support for sock_ops 2017-07-01 16:15:13 -07:00
bpf_load.h samples/bpf: export map_data[] for more info on maps 2017-05-03 09:30:24 -04:00
cgroup_helpers.c samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers 2016-12-03 16:07:11 -05:00
cgroup_helpers.h samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers 2016-12-03 16:07:11 -05:00
cookie_uid_helper_example.c samples/bpf: run cleanup routines when receiving SIGTERM 2017-05-11 21:43:30 -04:00
fds_example.c samples/bpf: Move open_raw_sock to separate header 2016-12-20 12:00:40 -03:00
lathist_kern.c
lathist_user.c samples/bpf: Make samples more libbpf-centric 2016-12-15 16:25:47 -03:00
libbpf.h A Sample of using socket cookie and uid for traffic monitoring 2017-03-23 17:01:57 -07:00
lwt_len_hist.sh bpf: Add tests and samples for LWT-BPF 2016-12-02 10:52:00 -05:00
lwt_len_hist_kern.c bpf: Add tests and samples for LWT-BPF 2016-12-02 10:52:00 -05:00
lwt_len_hist_user.c samples/bpf: Make samples more libbpf-centric 2016-12-15 16:25:47 -03:00
Makefile samples/bpf: fix a build problem 2017-06-22 11:35:19 -04:00
map_perf_test_kern.c bpf: lru: Add map-in-map LRU example 2017-04-17 13:55:52 -04:00
map_perf_test_user.c samples/bpf: load_bpf.c make callback fixup more flexible 2017-05-03 09:30:24 -04:00
offwaketime_kern.c
offwaketime_user.c samples/bpf: run cleanup routines when receiving SIGTERM 2017-05-11 21:43:30 -04:00
parse_ldabs.c bpf: fix samples to add fake KBUILD_MODNAME 2016-10-29 14:46:12 -04:00
parse_simple.c bpf: fix samples to add fake KBUILD_MODNAME 2016-10-29 14:46:12 -04:00
parse_varlen.c bpf: fix samples to add fake KBUILD_MODNAME 2016-10-29 14:46:12 -04:00
README.rst samples/bpf: Switch over to libbpf 2016-12-20 12:00:38 -03:00
run_cookie_uid_helper_example.sh Sample program using SO_COOKIE 2017-04-08 08:07:01 -07:00
sampleip_kern.c bpf/samples: Fix PT_REGS_IP on s390x and use it 2016-11-28 16:26:46 -05:00
sampleip_user.c samples/bpf: run cleanup routines when receiving SIGTERM 2017-05-11 21:43:30 -04:00
sock_example.c samples/bpf: Move open_raw_sock to separate header 2016-12-20 12:00:40 -03:00
sock_example.h samples/bpf sock_example: Avoid getting ethhdr from two includes 2016-12-27 21:49:17 -03:00
sock_flags_kern.c samples/bpf: add userspace example for prohibiting sockets 2016-12-02 13:46:09 -05:00
sockex1_kern.c
sockex1_user.c samples/bpf: Move open_raw_sock to separate header 2016-12-20 12:00:40 -03:00
sockex2_kern.c samples/bpf: fix sockex2 example 2016-11-24 16:04:52 -05:00
sockex2_user.c samples/bpf: Move open_raw_sock to separate header 2016-12-20 12:00:40 -03:00
sockex3_kern.c bpf samples: fix compiler errors with sockex2 and sockex3 2016-09-27 03:48:58 -04:00
sockex3_user.c bpf: Add test for syscall on fd array/htab lookup 2017-06-29 13:13:26 -04:00
spintest_kern.c
spintest_user.c samples/bpf: Make samples more libbpf-centric 2016-12-15 16:25:47 -03:00
syscall_nrs.c samples/bpf: Fix tracex5 to work with MIPS syscalls. 2017-06-14 15:03:23 -04:00
tc_l2_redirect.sh bpf: Add test for bpf_redirect to ipip/ip6tnl 2016-11-12 23:38:07 -05:00
tc_l2_redirect_kern.c bpf: fix samples xdp_tx_iptunnel and tc_l2_redirect with fake KBUILD_MODNAME 2017-01-20 12:04:07 -05:00
tc_l2_redirect_user.c samples/bpf: Make samples more libbpf-centric 2016-12-15 16:25:47 -03:00
tcbpf1_kern.c bpf: fix samples to add fake KBUILD_MODNAME 2016-10-29 14:46:12 -04:00
tcbpf2_kern.c bpf: fix samples to add fake KBUILD_MODNAME 2016-10-29 14:46:12 -04:00
test_cgrp2_array_pin.c samples/bpf: Make samples more libbpf-centric 2016-12-15 16:25:47 -03:00
test_cgrp2_attach.c bpf: introduce BPF_F_ALLOW_OVERRIDE flag 2017-02-12 21:52:19 -05:00
test_cgrp2_attach2.c bpf: introduce BPF_F_ALLOW_OVERRIDE flag 2017-02-12 21:52:19 -05:00
test_cgrp2_sock.c bpf: introduce BPF_F_ALLOW_OVERRIDE flag 2017-02-12 21:52:19 -05:00
test_cgrp2_sock.sh samples: bpf: add userspace example for modifying sk_bound_dev_if 2016-12-02 13:46:08 -05:00
test_cgrp2_sock2.c bpf: introduce BPF_F_ALLOW_OVERRIDE flag 2017-02-12 21:52:19 -05:00
test_cgrp2_sock2.sh samples/bpf: add userspace example for prohibiting sockets 2016-12-02 13:46:09 -05:00
test_cgrp2_tc.sh
test_cgrp2_tc_kern.c bpf: fix samples to add fake KBUILD_MODNAME 2016-10-29 14:46:12 -04:00
test_cls_bpf.sh
test_current_task_under_cgroup_kern.c samples/bpf: Add test_current_task_under_cgroup test 2016-08-12 21:49:42 -07:00
test_current_task_under_cgroup_user.c samples/bpf: Make samples more libbpf-centric 2016-12-15 16:25:47 -03:00
test_ipip.sh samples/bpf: add comprehensive ipip, ipip6, ip6ip6 test 2016-09-17 10:13:07 -04:00
test_lru_dist.c samples/bpf: check before defining offsetof 2017-04-24 16:20:19 -04:00
test_lwt_bpf.c bpf: Add tests and samples for LWT-BPF 2016-12-02 10:52:00 -05:00
test_lwt_bpf.sh bpf: Add tests and samples for LWT-BPF 2016-12-02 10:52:00 -05:00
test_map_in_map_kern.c bpf: Add tests for map-in-map 2017-03-22 15:45:45 -07:00
test_map_in_map_user.c bpf: Add test for syscall on fd array/htab lookup 2017-06-29 13:13:26 -04:00
test_overhead_kprobe_kern.c
test_overhead_tp_kern.c
test_overhead_user.c
test_probe_write_user_kern.c samples/bpf: Add test/example of using bpf_probe_write_user bpf helper 2016-07-25 18:07:48 -07:00
test_probe_write_user_user.c samples/bpf: Make samples more libbpf-centric 2016-12-15 16:25:47 -03:00
test_tunnel_bpf.sh samples/bpf: extend test_tunnel_bpf.sh with IPIP test 2016-09-17 10:13:07 -04:00
trace_event_kern.c bpf/samples: Fix PT_REGS_IP on s390x and use it 2016-11-28 16:26:46 -05:00
trace_event_user.c samples/bpf: add tests for more perf event types 2017-06-04 21:58:15 -04:00
trace_output_kern.c
trace_output_user.c samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include 2016-12-28 10:47:13 -03:00
tracex1_kern.c
tracex1_user.c
tracex2_kern.c
tracex2_user.c samples/bpf: run cleanup routines when receiving SIGTERM 2017-05-11 21:43:30 -04:00
tracex3_kern.c
tracex3_user.c samples/bpf: adjust rlimit RLIMIT_MEMLOCK for traceex2, tracex3 and tracex4 2017-05-03 09:30:23 -04:00
tracex4_kern.c
tracex4_user.c samples/bpf: adjust rlimit RLIMIT_MEMLOCK for traceex2, tracex3 and tracex4 2017-05-03 09:30:23 -04:00
tracex5_kern.c samples/bpf: Fix tracex5 to work with MIPS syscalls. 2017-06-14 15:03:23 -04:00
tracex5_user.c bpf samples: update tracex5 sample to use __seccomp_filter 2016-09-27 03:48:58 -04:00
tracex6_kern.c samples/bpf: add tests for more perf event types 2017-06-04 21:58:15 -04:00
tracex6_user.c samples/bpf: add tests for more perf event types 2017-06-04 21:58:15 -04:00
xdp1_kern.c bpf: make xdp sample variable names more meaningful 2016-07-20 22:07:24 -07:00
xdp1_user.c samples/bpf: run cleanup routines when receiving SIGTERM 2017-05-11 21:43:30 -04:00
xdp2_kern.c bpf: make xdp sample variable names more meaningful 2016-07-20 22:07:24 -07:00
xdp_tx_iptunnel_common.h bpf: xdp: Add XDP example for head adjustment 2016-12-08 14:25:13 -05:00
xdp_tx_iptunnel_kern.c bpf: fix samples xdp_tx_iptunnel and tc_l2_redirect with fake KBUILD_MODNAME 2017-01-20 12:04:07 -05:00
xdp_tx_iptunnel_user.c samples/bpf: run cleanup routines when receiving SIGTERM 2017-05-11 21:43:30 -04:00

eBPF sample programs
====================

This directory contains a test stubs, verifier test-suite and examples
for using eBPF. The examples use libbpf from tools/lib/bpf.

Build dependencies
==================

Compiling requires having installed:
 * clang >= version 3.4.0
 * llvm >= version 3.7.1

Note that LLVM's tool 'llc' must support target 'bpf', list version
and supported targets with command: ``llc --version``

Kernel headers
--------------

There are usually dependencies to header files of the current kernel.
To avoid installing devel kernel headers system wide, as a normal
user, simply call::

 make headers_install

This will creates a local "usr/include" directory in the git/build top
level directory, that the make system automatically pickup first.

Compiling
=========

For building the BPF samples, issue the below command from the kernel
top level directory::

 make samples/bpf/

Do notice the "/" slash after the directory name.

It is also possible to call make from this directory.  This will just
hide the the invocation of make as above with the appended "/".

Manually compiling LLVM with 'bpf' support
------------------------------------------

Since version 3.7.0, LLVM adds a proper LLVM backend target for the
BPF bytecode architecture.

By default llvm will build all non-experimental backends including bpf.
To generate a smaller llc binary one can use::

 -DLLVM_TARGETS_TO_BUILD="BPF"

Quick sniplet for manually compiling LLVM and clang
(build dependencies are cmake and gcc-c++)::

 $ git clone http://llvm.org/git/llvm.git
 $ cd llvm/tools
 $ git clone --depth 1 http://llvm.org/git/clang.git
 $ cd ..; mkdir build; cd build
 $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86"
 $ make -j $(getconf _NPROCESSORS_ONLN)

It is also possible to point make to the newly compiled 'llc' or
'clang' command via redefining LLC or CLANG on the make command line::

 make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang