alistair23-linux

redonkable

Author	SHA1	Message	Date
Daniel T. Lee	0afe0a998c	samples: bpf: Fix lwt_len_hist reusing previous BPF map Currently, lwt_len_hist's map lwt_len_hist_map is uses pinning, and the map isn't cleared on test end. This leds to reuse of that map for each test, which prevents the results of the test from being accurate. This commit fixes the problem by removing of pinned map from bpffs. Also, this commit add the executable permission to shell script files. Fixes: `f74599f7c5` ("bpf: Add tests and samples for LWT-BPF") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201124090310.24374-7-danieltimlee@gmail.com	2020-11-26 19:33:36 -08:00
Daniel T. Lee	c6497df0dd	samples: bpf: Refactor test_overhead program with libbpf This commit refactors the existing program with libbpf bpf loader. Since the kprobe, tracepoint and raw_tracepoint bpf program can be attached with single bpf_program__attach() interface, so the corresponding function of libbpf is used here. Rather than specifying the number of cpus inside the code, this commit uses the number of available cpus with _SC_NPROCESSORS_ONLN. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201124090310.24374-6-danieltimlee@gmail.com	2020-11-26 19:33:36 -08:00
Daniel T. Lee	763af200d6	samples: bpf: Refactor ibumad program with libbpf This commit refactors the existing ibumad program with libbpf bpf loader. Attach/detach of Tracepoint bpf programs has been managed with the generic bpf_program__attach() and bpf_link__destroy() from the libbpf. Also, instead of using the previous BPF MAP definition, this commit refactors ibumad MAP definition with the new BTF-defined MAP format. To verify that this bpf program works without an infiniband device, try loading ib_umad kernel module and test the program as follows: # modprobe ib_umad # ./ibumad Moreover, TRACE_HELPERS has been removed from the Makefile since it is not used on this program. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201124090310.24374-5-danieltimlee@gmail.com	2020-11-26 19:33:36 -08:00
Daniel T. Lee	4fe6641526	samples: bpf: Refactor task_fd_query program with libbpf This commit refactors the existing kprobe program with libbpf bpf loader. To attach bpf program, this uses generic bpf_program__attach() approach rather than using bpf_load's load_bpf_file(). To attach bpf to perf_event, instead of using previous ioctl method, this commit uses bpf_program__attach_perf_event since it manages the enable of perf_event and attach of BPF programs to it, which is much more intuitive way to achieve. Also, explicit close(fd) has been removed since event will be closed inside bpf_link__destroy() automatically. Furthermore, to prevent conflict of same named uprobe events, O_TRUNC flag has been used to clear 'uprobe_events' interface. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201124090310.24374-4-danieltimlee@gmail.com	2020-11-26 19:33:35 -08:00
Daniel T. Lee	d89af13c92	samples: bpf: Refactor test_cgrp2_sock2 program with libbpf This commit refactors the existing cgroup program with libbpf bpf loader. The original test_cgrp2_sock2 has keeped the bpf program attached to the cgroup hierarchy even after the exit of user program. To implement the same functionality with libbpf, this commit uses the BPF_LINK_PINNING to pin the link attachment even after it is closed. Since this uses LINK instead of ATTACH, detach of bpf program from cgroup with 'test_cgrp2_sock' is not used anymore. The code to mount the bpf was added to the .sh file in case the bpff was not mounted on /sys/fs/bpf. Additionally, to fix the problem that shell script cannot find the binary object from the current path, relative path './' has been added in front of binary. Fixes: `554ae6e792` ("samples/bpf: add userspace example for prohibiting sockets") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201124090310.24374-3-danieltimlee@gmail.com	2020-11-26 19:33:35 -08:00
Daniel T. Lee	c5815ac7e2	samples: bpf: Refactor hbm program with libbpf This commit refactors the existing cgroup programs with libbpf bpf loader. Since bpf_program__attach doesn't support cgroup program attachment, this explicitly attaches cgroup bpf program with bpf_program__attach_cgroup(bpf_prog, cg1). Also, to change attach_type of bpf program, this uses libbpf's bpf_program__set_expected_attach_type helper to switch EGRESS to INGRESS. To keep bpf program attached to the cgroup hierarchy even after the exit, this commit uses the BPF_LINK_PINNING to pin the link attachment even after it is closed. Besides, this program was broken due to the typo of BPF MAP definition. But this commit solves the problem by fixing this from 'queue_stats' map struct hvm_queue_stats -> hbm_queue_stats. Fixes: `36b5d47113` ("selftests/bpf: samples/bpf: Split off legacy stuff from bpf_helpers.h") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201124090310.24374-2-danieltimlee@gmail.com	2020-11-26 19:33:35 -08:00
Andrei Matei	fb3558127c	bpf: Fix selftest compilation on clang 11 Before this patch, profiler.inc.h wouldn't compile with clang-11 (before the __builtin_preserve_enum_value LLVM builtin was introduced in https://reviews.llvm.org/D83242). Another test that uses this builtin (test_core_enumval) is conditionally skipped if the compiler is too old. In that spirit, this patch inhibits part of populate_cgroup_info(), which needs this CO-RE builtin. The selftests build again on clang-11. The affected test (the profiler test) doesn't pass on clang-11 because it's missing https://reviews.llvm.org/D85570, but at least the test suite as a whole compiles. The test's expected failure is already called out in the README. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florian Lehner <dev@der-flo.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201125035255.17970-1-andreimatei1@gmail.com	2020-11-26 00:25:55 +01:00
KP Singh	34b82d3ac1	bpf: Add a selftest for bpf_ima_inode_hash The test does the following: - Mounts a loopback filesystem and appends the IMA policy to measure executions only on this file-system. Restricting the IMA policy to a particular filesystem prevents a system-wide IMA policy change. - Executes an executable copied to this loopback filesystem. - Calls the bpf_ima_inode_hash in the bprm_committed_creds hook and checks if the call succeeded and checks if a hash was calculated. The test shells out to the added ima_setup.sh script as the setup is better handled in a shell script and is more complicated to do in the test program or even shelling out individual commands from C. The list of required configs (i.e. IMA, SECURITYFS, IMA_{WRITE,READ}_POLICY) for running this test are also updated. Suggested-by: Mimi Zohar <zohar@linux.ibm.com> (limit policy rule to loopback mount) Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201124151210.1081188-4-kpsingh@chromium.org	2020-11-26 00:25:47 +01:00
KP Singh	27672f0d28	bpf: Add a BPF helper for getting the IMA hash of an inode Provide a wrapper function to get the IMA hash of an inode. This helper is useful in fingerprinting files (e.g executables on execution) and using these fingerprints in detections like an executable unlinking itself. Since the ima_inode_hash can sleep, it's only allowed for sleepable LSM hooks. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201124151210.1081188-3-kpsingh@chromium.org	2020-11-26 00:04:04 +01:00
KP Singh	403319be5d	ima: Implement ima_inode_hash This is in preparation to add a helper for BPF LSM programs to use IMA hashes when attached to LSM hooks. There are LSM hooks like inode_unlink which do not have a struct file * argument and cannot use the existing ima_file_hash API. An inode based API is, therefore, useful in LSM based detections like an executable trying to delete itself which rely on the inode_unlink LSM hook. Moreover, the ima_file_hash function does nothing with the struct file pointer apart from calling file_inode on it and converting it to an inode. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/bpf/20201124151210.1081188-2-kpsingh@chromium.org	2020-11-26 00:04:04 +01:00
Li RongQing	db13db9f67	libbpf: Add support for canceling cached_cons advance Add a new function for returning descriptors the user received after an xsk_ring_cons__peek call. After the application has gotten a number of descriptors from a ring, it might not be able to or want to process them all for various reasons. Therefore, it would be useful to have an interface for returning or cancelling a number of them so that they are returned to the ring. This patch adds a new function called xsk_ring_cons__cancel that performs this operation on nb descriptors counted from the end of the batch of descriptors that was received through the peek call. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> [ Magnus Karlsson: rewrote changelog ] Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/1606202474-8119-1-git-send-email-lirongqing@baidu.com	2020-11-25 13:18:47 +01:00
Wedson Almeida Filho	59e2e27d22	bpf: Refactor check_cfg to use a structured loop. The current implementation uses a number of gotos to implement a loop and different paths within the loop, which makes the code less readable than it would be with an explicit while-loop. This patch also replaces a chain of if/if-elses keyed on the same expression with a switch statement. No change in behaviour is intended. Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201121015509.3594191-1-wedsonaf@google.com	2020-11-24 20:29:26 -08:00
Andrii Nakryiko	607c543f93	bpf: Sanitize BTF data pointer after module is loaded Given .BTF section is not allocatable, it will get trimmed after module is loaded. BPF system handles that properly by creating an independent copy of data. But prevent any accidental misused by resetting the pointer to BTF data. Fixes: `36e68442d1` ("bpf: Load and verify kernel module BTFs") Suggested-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jessica Yu <jeyu@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20201121070829.2612884-2-andrii@kernel.org	2020-11-25 00:05:21 +01:00
Andrii Nakryiko	e732b538f4	kbuild: Skip module BTF generation for out-of-tree external modules In some modes of operation, Kbuild allows to build modules without having vmlinux image around. In such case, generation of module BTF is impossible. This patch changes the behavior to emit a warning about impossibility of generating kernel module BTF, instead of breaking the build. This is especially important for out-of-tree external module builds. In vmlinux-less mode: $ make clean $ make modules_prepare $ touch drivers/acpi/button.c $ make M=drivers/acpi ... CC [M] drivers/acpi/button.o MODPOST drivers/acpi/Module.symvers LD [M] drivers/acpi/button.ko BTF [M] drivers/acpi/button.ko Skipping BTF generation for drivers/acpi/button.ko due to unavailability of vmlinux ... $ readelf -S ~/linux-build/default/drivers/acpi/button.ko \| grep BTF -A1 ... empty ... Now with normal build: $ make all ... LD [M] drivers/acpi/button.ko BTF [M] drivers/acpi/button.ko ... $ readelf -S ~/linux-build/default/drivers/acpi/button.ko \| grep BTF -A1 [60] .BTF PROGBITS 0000000000000000 00029310 000000000000ab3f 0000000000000000 0 0 1 Fixes: `5f9ae91f7c` ("kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it") Reported-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jessica Yu <jeyu@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Link: https://lore.kernel.org/bpf/20201121070829.2612884-1-andrii@kernel.org	2020-11-25 00:05:01 +01:00
Andrei Matei	1c26ac6ab3	selftest/bpf: Fix rst formatting in readme A couple of places in the readme had invalid rst formatting causing the rendering to be off. This patch fixes them with minimal edits. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201122022205.57229-2-andreimatei1@gmail.com	2020-11-24 22:59:52 +01:00
Andrei Matei	05a98d7672	selftest/bpf: Fix link in readme The link was bad because of invalid rst; it was pointing to itself and was rendering badly. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201122022205.57229-1-andreimatei1@gmail.com	2020-11-24 22:59:52 +01:00
Song Liu	91b2db27d3	bpf: Simplify task_file_seq_get_next() Simplify task_file_seq_get_next() by removing two in/out arguments: task and fstruct. Use info->task and info->files instead. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201120002833.2481110-1-songliubraving@fb.com	2020-11-20 20:36:34 +01:00
Yonghong Song	450d060e8f	bpftool: Add {i,d}tlb_misses support for bpftool profile Commit 47c09d6a9f67("bpftool: Introduce "prog profile" command") introduced "bpftool prog profile" command which can be used to profile bpf program with metrics like # of instructions, This patch added support for itlb_misses and dtlb_misses. During an internal bpf program performance evaluation, I found these two metrics are also very useful. The following is an example output: $ bpftool prog profile id 324 duration 3 cycles itlb_misses 1885029 run_cnt 5134686073 cycles 306893 itlb_misses $ bpftool prog profile id 324 duration 3 cycles dtlb_misses 1827382 run_cnt 4943593648 cycles 5975636 dtlb_misses $ bpftool prog profile id 324 duration 3 cycles llc_misses 1836527 run_cnt 5019612972 cycles 4161041 llc_misses From the above, we can see quite some dtlb misses, 3 dtlb misses perf prog run. This might be something worth further investigation. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201119073039.4060095-1-yhs@fb.com	2020-11-20 15:50:38 +01:00
Andrii Nakryiko	4e99d115d8	Merge branch 'RISC-V selftest/bpf fixes' Björn Töpel says: ==================== This series contain some fixes for selftests/bpf when building/running on a RISC-V host. Details can be found in each individual commit. v2: Makefile cosmetics. (Andrii) Simplified unpriv check and added comment. (Andrii) ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2020-11-18 17:45:39 -08:00
Björn Töpel	6007b23cc7	selftests/bpf: Mark tests that require unaligned memory access A lot of tests require unaligned memory access to work. Mark the tests as such, so that they can be avoided on unsupported architectures such as RISC-V. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Luke Nelson <luke.r.nels@gmail.com> Link: https://lore.kernel.org/bpf/20201118071640.83773-4-bjorn.topel@gmail.com	2020-11-18 17:45:35 -08:00
Björn Töpel	c77b0589ca	selftests/bpf: Avoid running unprivileged tests with alignment requirements Some architectures have strict alignment requirements. In that case, the BPF verifier detects if a program has unaligned accesses and rejects them. A user can pass BPF_F_ANY_ALIGNMENT to a program to override this check. That, however, will only work when a privileged user loads a program. An unprivileged user loading a program with this flag will be rejected prior entering the verifier. Hence, it does not make sense to load unprivileged programs without strict alignment when testing the verifier. This patch avoids exactly that. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Luke Nelson <luke.r.nels@gmail.com> Link: https://lore.kernel.org/bpf/20201118071640.83773-3-bjorn.topel@gmail.com	2020-11-18 17:45:31 -08:00
Björn Töpel	6016df8fe8	selftests/bpf: Fix broken riscv build The selftests/bpf Makefile includes system include directories from the host, when building BPF programs. On RISC-V glibc requires that __riscv_xlen is defined. This is not the case for "clang -target bpf", which messes up __WORDSIZE (errno.h -> ... -> wordsize.h) and breaks the build. By explicitly defining __risc_xlen correctly for riscv, we can workaround this. Fixes: `167381f3ea` ("selftests/bpf: Makefile fix "missing" headers on build with -idirafter") Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Luke Nelson <luke.r.nels@gmail.com> Link: https://lore.kernel.org/bpf/20201118071640.83773-2-bjorn.topel@gmail.com	2020-11-18 17:44:59 -08:00
Dmitrii Banshchikov	d055126180	bpf: Add bpf_ktime_get_coarse_ns helper The helper uses CLOCK_MONOTONIC_COARSE source of time that is less accurate but more performant. We have a BPF CGROUP_SKB firewall that supports event logging through bpf_perf_event_output(). Each event has a timestamp and currently we use bpf_ktime_get_ns() for it. Use of bpf_ktime_get_coarse_ns() saves ~15-20 ns in time required for event logging. bpf_ktime_get_ns(): EgressLogByRemoteEndpoint 113.82ns 8.79M bpf_ktime_get_coarse_ns(): EgressLogByRemoteEndpoint 95.40ns 10.48M Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201117184549.257280-1-me@ubique.spb.ru	2020-11-18 23:25:32 +01:00
KP Singh	ea87ae85c9	bpf: Add tests for bpf_bprm_opts_set helper The test forks a child process, updates the local storage to set/unset the securexec bit. The BPF program in the test attaches to bprm_creds_for_exec which checks the local storage of the current task to set the secureexec bit on the binary parameters (bprm). The child then execs a bash command with the environment variable TMPDIR set in the envp. The bash command returns a different exit code based on its observed value of the TMPDIR variable. Since TMPDIR is one of the variables that is ignored by the dynamic loader when the secureexec bit is set, one should expect the child execution to not see this value when the secureexec bit is set. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201117232929.2156341-2-kpsingh@chromium.org	2020-11-18 01:36:27 +01:00
KP Singh	3f6719c7b6	bpf: Add bpf_bprm_opts_set helper The helper allows modification of certain bits on the linux_binprm struct starting with the secureexec bit which can be updated using the BPF_F_BPRM_SECUREEXEC flag. secureexec can be set by the LSM for privilege gaining executions to set the AT_SECURE auxv for glibc. When set, the dynamic linker disables the use of certain environment variables (like LD_PRELOAD). Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201117232929.2156341-1-kpsingh@chromium.org	2020-11-18 01:36:27 +01:00
Daniel Borkmann	cbf398d765	Merge branch 'af-xdp-tx-batch' Magnus Karlsson says: ==================== This patch set improves the performance of mainly the Tx processing of AF_XDP sockets. Though, patch 3 also improves the Rx path. All in all, this patch set improves the throughput of the l2fwd xdpsock application by around 11%. If we just take a look at Tx processing part, it is improved by 35% to 40%. Hopefully the new batched Tx interfaces should be of value to other drivers implementing AF_XDP zero-copy support. But patch #3 is generic and will improve performance of all drivers when using AF_XDP sockets (under the premises explained in that patch). @Daniel. In patch 3, I apply all the padding required to hinder the adjacency prefetcher to prefetch the wrong things. After this patch set, I will submit another patch set that introduces ____cacheline_padding_in_smp in include/linux/cache.h according to your suggestions. The last patch in that patch set will then convert the explicit paddings that we have now to ____cacheline_padding_in_smp. v2 -> v3: * Fixed #pragma warning with clang and defined a loop_unrolled_for macro for easier readability [lkp, Nick] * Simplified invalid descriptor handling in xskq_cons_read_desc_batch() v1 -> v2: * Removed added parameter in i40e_setup_tx_descriptors and adopted a simpler solution [Maciej] * Added test for !xs in xsk_tx_peek_release_desc_batch() [John] * Simplified return path in xsk_tx_peek_release_desc_batch() [John] * Dropped patch #1 in v1 that introduced lazy completions. Hopefully this is not needed when we get busy poll [Jakub] * Iterate over local variable in xskq_prod_reserve_addr_batch() for improved performance * Fixed the fallback path in xsk_tx_peek_release_desc_batch() so that it also produces a batch of descriptors, albeit by using the slower (but more general) older code. This improves the performance of the case when multiple sockets are sharing the same device and queue id. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2020-11-17 22:08:09 +01:00
Magnus Karlsson	3106c580fb	i40e: Use batched xsk Tx interfaces to increase performance Use the new batched xsk interfaces for the Tx path in the i40e driver to improve performance. On my machine, this yields a throughput increase of 4% for the l2fwd sample app in xdpsock. If we instead just look at the Tx part, this patch set increases throughput with above 20% for Tx. Note that I had to explicitly loop unroll the inner loop to get to this performance level, by using a pragma. It is honored by both clang and gcc and should be ignored by versions that do not support it. Using the -funroll-loops compiler command line switch on the source file resulted in a loop unrolling on a higher level that lead to a performance decrease instead of an increase. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com	2020-11-17 22:07:40 +01:00
Magnus Karlsson	9349eb3a9d	xsk: Introduce batched Tx descriptor interfaces Introduce batched descriptor interfaces in the xsk core code for the Tx path to be used in the driver to write a code path with higher performance. This interface will be used by the i40e driver in the next patch. Though other drivers would likely benefit from this new interface too. Note that batching is only implemented for the common case when there is only one socket bound to the same device and queue id. When this is not the case, we fall back to the old non-batched version of the function. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-5-git-send-email-magnus.karlsson@gmail.com	2020-11-17 22:07:40 +01:00
Magnus Karlsson	b8c7aece29	xsk: Introduce padding between more ring pointers Introduce one cache line worth of padding between the consumer pointer and the flags field as well as between the flags field and the start of the descriptors in all the lockless rings. This so that the x86 HW adjacency prefetcher will not prefetch the adjacent pointer/field when only one pointer/field is going to be used. This improves throughput performance for the l2fwd sample app with 1% on my machine with HW prefetching turned on in the BIOS. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-4-git-send-email-magnus.karlsson@gmail.com	2020-11-17 22:07:40 +01:00
Magnus Karlsson	f320460b94	i40e: Remove unnecessary sw_ring access from xsk Tx Remove the unnecessary access to the software ring for the AF_XDP zero-copy driver. This was used to record the length of the packet so that the driver Tx completion code could sum this up to produce the total bytes sent. This is now performed during the transmission of the packet, so no need to record this in the software ring. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-3-git-send-email-magnus.karlsson@gmail.com	2020-11-17 22:07:40 +01:00
Magnus Karlsson	90da4b3208	samples/bpf: Increment Tx stats at sending Increment the statistics over how many Tx packets have been sent at the time of sending instead of at the time of completion. This as a completion event means that the buffer has been sent AND returned to user space. The packet always gets sent shortly after sendto() is called. The kernel might, for performance reasons, decide to not return every single buffer to user space immediately after sending, for example, only after a batch of packets have been transmitted. Incrementing the number of packets sent at completion, will in that case be confusing as if you send a single packet, the counter might show zero for a while even though the packet has been transmitted. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-2-git-send-email-magnus.karlsson@gmail.com	2020-11-17 22:07:40 +01:00
Alan Maguire	de91e631bd	libbpf: bpf__find_by_name[_kind] should use btf__get_nr_types() When operating on split BTF, btf__find_by_name[_kind] will not iterate over all types since they use btf->nr_types to show the number of types to iterate over. For split BTF this is the number of types _on top of base BTF_, so it will underestimate the number of types to iterate over, especially for vmlinux + module BTF, where the latter is much smaller. Use btf__get_nr_types() instead. Fixes: `ba451366bf` ("libbpf: Implement basic split BTF support") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1605437195-2175-1-git-send-email-alan.maguire@oracle.com	2020-11-16 20:51:34 -08:00
Martin KaFai Lau	b93ef089d3	bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usage The intention of the current check is to avoid using bpf_sk_storage in irq and nmi. Jakub pointed out that the current check cannot do that. For example, in_serving_softirq() returns true if the softirq handling is interrupted by hard irq. Fixes: `8e4597c627` ("bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201116200113.2868539-1-kafai@fb.com	2020-11-16 16:46:01 -08:00
Santucci Pierpaolo	024cd2cbd1	selftest/bpf: Fix IPV6FR handling in flow dissector From second fragment on, IPV6FR program must stop the dissection of IPV6 fragmented packet. This is the same approach used for IPV4 fragmentation. This fixes the flow keys calculation for the upper-layer protocols. Note that according to RFC8200, the first fragment packet must include the upper-layer header. Signed-off-by: Santucci Pierpaolo <santucci@epigenesys.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/X7JUzUj34ceE2wBm@santucci.pierpaolo	2020-11-16 16:23:29 +01:00
Jakub Kicinski	2d38c5802f	Merge branch 'ionic-updates' Shannon Nelson says: ==================== ionic updates These updates are a bit of code cleaning and a minor bit of performance tweaking. v3: convert ionic_lif_quiesce() to void v2: added void cast on call to ionic_lif_quiesce() lowered batching threshold added patch to flatten calls to ionic_lif_rx_mode added patch to change from_ndo to can_sleep ==================== Link: https://lore.kernel.org/r/20201112182208.46770-1-snelson@pensando.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:23:01 -08:00
Shannon Nelson	7c8d008cc0	ionic: useful names for booleans With a few more uses of true and false in function calls, we need to give them some useful names so we can tell from the calling point what we're doing. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:59 -08:00
Shannon Nelson	81dbc24147	ionic: change set_rx_mode from_ndo to can_sleep Instead of having two different ways of expressing the same sleepability concept, using opposite logic, we can rework the from_ndo to can_sleep for a more consistent usage. Fixes: `1800eee166` ("net: ionic: Replace in_interrupt() usage.") Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:59 -08:00
Shannon Nelson	e94f76bb20	ionic: flatten calls to ionic_lif_rx_mode The _ionic_lif_rx_mode() is only used once and really doesn't need to be broken out. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:59 -08:00
Shannon Nelson	e0243e1966	ionic: use mc sync for multicast filters We should be using the multicast sync routines for the multicast filters. Also, let's just flatten the logic a bit and pull the small unicast routine back into ionic_set_rx_mode(). Fixes: `1800eee166` ("net: ionic: Replace in_interrupt() usage.") Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:58 -08:00
Shannon Nelson	a8205ab620	ionic: batch rx buffer refilling We don't need to refill the rx descriptors on every napi if only a few were handled. Waiting until we can batch up a few together will save us a few Rx cycles. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:58 -08:00
Shannon Nelson	e7e8e087ac	ionic: add lif quiesce After the queues are stopped, expressly quiesce the lif. This assures that even if the queues were in an odd state, the firmware will close up everything cleanly. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:58 -08:00
Shannon Nelson	f6e428b27e	ionic: check for link after netdev registration Request a link check as soon as the netdev is registered rather than waiting for the watchdog to go off in order to get the interface operational a little more quickly. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:58 -08:00
Shannon Nelson	8f56bc4dc1	ionic: start queues before announcing link up Change the order of operations in the link_up handling to be sure that the queues are up and ready before we announce that the link is up. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 13:22:58 -08:00
YueHaibing	9e6cad531c	net: macb: Fix passing zero to 'PTR_ERR' Check PTR_ERR with IS_ERR to fix this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20201112144936.54776-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 12:35:33 -08:00
Lukas Bulwahn	2e793878ae	ipv6: remove unused function ipv6_skb_idev() Commit `bdb7cc643f` ("ipv6: Count interface receive statistics on the ingress netdev") removed all callees for ipv6_skb_idev(). Hence, since then, ipv6_skb_idev() is unused and make CC=clang W=1 warns: net/ipv6/exthdrs.c:909:33: warning: unused function 'ipv6_skb_idev' [-Wunused-function] So, remove this unused function and a -Wunused-function warning. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20201113135012.32499-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 12:00:27 -08:00
Jakub Kicinski	07cbce2e46	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-11-14 1) Add BTF generation for kernel modules and extend BTF infra in kernel e.g. support for split BTF loading and validation, from Andrii Nakryiko. 2) Support for pointers beyond pkt_end to recognize LLVM generated patterns on inlined branch conditions, from Alexei Starovoitov. 3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh. 4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage infra, from Martin KaFai Lau. 5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the XDP_REDIRECT path, from Lorenzo Bianconi. 6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI context, from Song Liu. 7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build for different target archs on same source tree, from Jean-Philippe Brucker. 8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet. 9) Move functionality from test_tcpbpf_user into the test_progs framework so it can run in BPF CI, from Alexander Duyck. 10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner. Note that for the fix from Song we have seen a sparse report on context imbalance which requires changes in sparse itself for proper annotation detection where this is currently being discussed on linux-sparse among developers [0]. Once we have more clarification/guidance after their fix, Song will follow-up. [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/ https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/ * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits) net: mlx5: Add xdp tx return bulking support net: mvpp2: Add xdp tx return bulking support net: mvneta: Add xdp tx return bulking support net: page_pool: Add bulk support for ptr_ring net: xdp: Introduce bulking for xdp tx return path bpf: Expose bpf_d_path helper to sleepable LSM hooks bpf: Augment the set of sleepable LSM hooks bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Rename some functions in bpf_sk_storage bpf: Folding omem_charge() into sk_storage_charge() selftests/bpf: Add asm tests for pkt vs pkt_end comparison. selftests/bpf: Add skb_pkt_end test bpf: Support for pointers beyond pkt_end. tools/bpf: Always run the *-clean recipes tools/bpf: Add bootstrap/ to .gitignore bpf: Fix NULL dereference in bpf_task_storage tools/bpftool: Fix build slowdown tools/runqslower: Build bpftool using HOSTCC tools/runqslower: Enable out-of-tree build ... ==================== Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-14 09:13:41 -08:00
Steen Hegelund	774626fa44	net: phy: mscc: Add PTP support for 2 more VSC PHYs Add VSC8572 and VSC8574 in the PTP configuration as they also support PTP. The relevant datasheets can be found here: - VSC8572: https://www.microchip.com/wwwproducts/en/VSC8572 - VSC8574: https://www.microchip.com/wwwproducts/en/VSC8574 Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Link: https://lore.kernel.org/r/20201112092250.914079-1-steen.hegelund@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-13 18:25:34 -08:00
Daniel Borkmann	c14d61fca0	Merge branch 'xdp-redirect-bulk' Lorenzo Bianconi says: ==================== XDP bulk APIs introduce a defer/flush mechanism to return pages belonging to the same xdp_mem_allocator object (identified via the mem.id field) in bulk to optimize I-cache and D-cache since xdp_return_frame is usually run inside the driver NAPI tx completion loop. Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs. More details on benchmarks run on mlx5 can be found here: https://github.com/xdp-project/xdp-project/blob/master/areas/mem/xdp_bulk_return01.org Changes since v5: - do not keep looping over ptr_ring if the cache is full but release leftover pages running page_pool_return_page Changes since v4: - fix comments - introduce xdp_frame_bulk_init utility routine - compiler annotations for I-cache code layout - move rcu_read_lock outside fast-path - mlx5 xdp bulking code optimization Changes since v3: - align DEV_MAP_BULK_SIZE to XDP_BULK_QUEUE_SIZE - refactor page_pool_put_page_bulk to avoid code duplication Changes since v2: - move mvneta changes in a dedicated patch Changes since v1: - improve comments - rework xdp_return_frame_bulk routine logic - move count and xa fields at the beginning of xdp_frame_bulk struct - invert logic in page_pool_put_page_bulk for loop ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>	2020-11-14 02:30:03 +01:00
Lorenzo Bianconi	b87c57ae12	net: mlx5: Add xdp tx return bulking support Convert mlx5 driver to xdp_return_frame_bulk APIs. XDP_REDIRECT (upstream codepath): 8.9Mpps XDP_REDIRECT (upstream codepath + bulking APIs): 10.2Mpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/250460319fd868b7b5668fc1deca74dd42813a90.1605267335.git.lorenzo@kernel.org	2020-11-14 02:29:00 +01:00
Lorenzo Bianconi	dbef19ccde	net: mvpp2: Add xdp tx return bulking support Convert mvpp2 driver to xdp_return_frame_bulk APIs. XDP_REDIRECT (upstream codepath): 1.79Mpps XDP_REDIRECT (upstream codepath + bulking APIs): 1.93Mpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Matteo Croce <mcroce@microsoft.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/0b38c295e58e8ce251ef6b4e2187a2f457f9f7a3.1605267335.git.lorenzo@kernel.org	2020-11-14 02:29:00 +01:00

1 2 3 4 5 ...

967812 Commits (0afe0a998c40085a6342e1aeb4c510cccba46caf) All Branches Search

967812 Commits (0afe0a998c40085a6342e1aeb4c510cccba46caf)

All Branches