1
0
Fork 0
Commit Graph

964738 Commits (c8a950d0d3b926a02c7b2e713850d38217cec3d1)

Author SHA1 Message Date
Jean-Philippe Brucker c8a950d0d3 tools: Factor HOSTCC, HOSTLD, HOSTAR definitions
Several Makefiles in tools/ need to define the host toolchain variables.
Move their definition to tools/scripts/Makefile.include

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/bpf/20201110164310.2600671-2-jean-philippe@linaro.org
2020-11-11 12:18:22 -08:00
Andrii Nakryiko 7112d12798 bpf: Compile out btf_parse_module() if module BTF is not enabled
Make sure btf_parse_module() is compiled out if module BTFs are not enabled.

Fixes: 36e68442d1 ("bpf: Load and verify kernel module BTFs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201111040645.903494-1-andrii@kernel.org
2020-11-10 20:15:07 -08:00
Alexei Starovoitov 9600d623db Merge branch 'Remove unused test_ipip.sh test and add missed'
Hangbin Liu says:

====================

In comment 173ca26e9b ("samples/bpf: add comprehensive ipip, ipip6,
ip6ip6 test") we added some bpf tunnel tests. In commit 933a741e3b
("selftests/bpf: bpf tunnel test.") when we moved it to the current
folder, we missed some points:

1. ip6ip6 test is not added
2. forgot to remove test_ipip.sh in sample folder
3. TCP test code is not removed in test_tunnel_kern.c

In this patch set I add back ip6ip6 test and remove unused code. I'm not sure
if this should be net or net-next, so just set to net.

Here is the test result:
```
Testing IP6IP6 tunnel...
PING ::11(::11) 56 data bytes

--- ::11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 63ms
rtt min/avg/max/mdev = 0.014/1028.308/2060.906/841.361 ms, pipe 2
PING 1::11(1::11) 56 data bytes

--- 1::11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 48ms
rtt min/avg/max/mdev = 0.026/0.029/0.036/0.006 ms
PING 1::22(1::22) 56 data bytes

--- 1::22 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 47ms
rtt min/avg/max/mdev = 0.030/0.048/0.067/0.016 ms
PASS: ip6ip6tnl
```

v3:
Add back ICMP check as Martin suggested.

v2: Keep ip6ip6 section in test_tunnel_kern.c.
====================

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-11-10 15:38:12 -08:00
Hangbin Liu e2215b0555 samples/bpf: Remove unused test_ipip.sh
The tcbpf2_kern.o and related kernel sections are moved to bpf
selftest folder since b05cd74043 ("samples/bpf: remove the bpf tunnel
testsuite."). Remove this one as well.

Fixes: b05cd74043 ("samples/bpf: remove the bpf tunnel testsuite.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201110015013.1570716-3-liuhangbin@gmail.com
2020-11-10 15:37:45 -08:00
Hangbin Liu 58cfa49c2b selftest/bpf: Add missed ip6ip6 test back
In comment 173ca26e9b ("samples/bpf: add comprehensive ipip, ipip6,
ip6ip6 test") we added ip6ip6 test for bpf tunnel testing. But in commit
933a741e3b ("selftests/bpf: bpf tunnel test.") when we moved it to
the current folder, we didn't add it.

This patch add the ip6ip6 test back to bpf tunnel test. Update the ipip6's
topology for both IPv4 and IPv6 testing. Since iperf test is removed as
currect framework simplified it in purpose, I also removed unused tcp
checkings in test_tunnel_kern.c.

Fixes: 933a741e3b ("selftests/bpf: bpf tunnel test.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201110015013.1570716-2-liuhangbin@gmail.com
2020-11-10 15:37:45 -08:00
Alexei Starovoitov 0e6f601eb5 Merge branch 'Integrate kernel module BTF support'
Andrii Nakryiko says:

====================

This patch set adds BTF generation for kernel modules using a compact split
BTF approach. Respective patches have all the details.

Kernel module BTFs rely on pahole's split BTF support, which is added in [0]
and will be available starting from v1.19. Support for it is detected
automatically during kernel build time.

This patch set implements in-kernel support for split BTF loading and
validation. It also extends GET_OBJ_INFO API for BTFs to return BTF's module
name and a flag whether BTF itself is in-kernel or user-provided. vmlinux BTF
is also exposed to user-space through the same BTF object iteration APIs.

Follow up patch set will utilize the fact that vmlinux and module BTFs now
have associated ID to provide ability to attach BPF fentry/fexit/etc programs
to functions defined in kernel modules.

bpftool is also extended to show module/vmlinux BTF's name.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=378699&state=*

v3->v4:
  - copy_to_user() on ENOSPC in btf_get_info_by_fd() (Martin);
v2->v3:
  - get rid of unnecessary gotos (Song);
v2->v1:
  - drop WARNs, add fewer pr_warn()'s instead (Greg);
  - properly initialize sysfs binary attribute structure (Greg);
  - add __maybe_unused to any_section_objs, used conditionally by module BTF;
rfc->v1:
  - CONFIG_DEBUG_INFO_BTF_MODULES is derived automatically (Alexei);
  - vmlinux BTF now has explicit "vmlinux" name (Alexei);
  - added sysfs ABI documentation for /sys/kernel/btf/<module> (Greg).

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-11-10 15:25:54 -08:00
Andrii Nakryiko cecaf4a0f2 tools/bpftool: Add support for in-kernel and named BTF in `btf show`
Display vmlinux BTF name and kernel module names when listing available BTFs
on the system.

In human-readable output mode, module BTFs are reported with "name
[module-name]", while vmlinux BTF will be reported as "name [vmlinux]".
Square brackets are added by bpftool and follow kernel convention when
displaying modules in human-readable text outputs.

[vmuser@archvm bpf]$ sudo ../../../bpf/bpftool/bpftool btf s
1: name [vmlinux]  size 4082281B
6: size 2365B  prog_ids 8,6  map_ids 3
7: name [button]  size 46895B
8: name [pcspkr]  size 42328B
9: name [serio_raw]  size 39375B
10: name [floppy]  size 57185B
11: name [i2c_core]  size 76186B
12: name [crc32c_intel]  size 16036B
13: name [i2c_piix4]  size 50497B
14: name [irqbypass]  size 14124B
15: name [kvm]  size 197985B
16: name [kvm_intel]  size 123564B
17: name [cryptd]  size 42466B
18: name [crypto_simd]  size 17187B
19: name [glue_helper]  size 39205B
20: name [aesni_intel]  size 41034B
25: size 36150B
        pids bpftool(2519)

In JSON mode, two fields (boolean "kernel" and string "name") are reported for
each BTF object. vmlinux BTF is reported with name "vmlinux" (kernel itself
returns and empty name for vmlinux BTF).

[vmuser@archvm bpf]$ sudo ../../../bpf/bpftool/bpftool btf s -jp
[{
        "id": 1,
        "size": 4082281,
        "prog_ids": [],
        "map_ids": [],
        "kernel": true,
        "name": "vmlinux"
    },{
        "id": 6,
        "size": 2365,
        "prog_ids": [8,6
        ],
        "map_ids": [3
        ],
        "kernel": false
    },{
        "id": 7,
        "size": 46895,
        "prog_ids": [],
        "map_ids": [],
        "kernel": true,
        "name": "button"
    },{

...

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-6-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Andrii Nakryiko 36e68442d1 bpf: Load and verify kernel module BTFs
Add kernel module listener that will load/validate and unload module BTF.
Module BTFs gets ID generated for them, which makes it possible to iterate
them with existing BTF iteration API. They are given their respective module's
names, which will get reported through GET_OBJ_INFO API. They are also marked
as in-kernel BTFs for tooling to distinguish them from user-provided BTFs.

Also, similarly to vmlinux BTF, kernel module BTFs are exposed through
sysfs as /sys/kernel/btf/<module-name>. This is convenient for user-space
tools to inspect module BTF contents and dump their types with existing tools:

[vmuser@archvm bpf]$ ls -la /sys/kernel/btf
total 0
drwxr-xr-x  2 root root       0 Nov  4 19:46 .
drwxr-xr-x 13 root root       0 Nov  4 19:46 ..

...

-r--r--r--  1 root root     888 Nov  4 19:46 irqbypass
-r--r--r--  1 root root  100225 Nov  4 19:46 kvm
-r--r--r--  1 root root   35401 Nov  4 19:46 kvm_intel
-r--r--r--  1 root root     120 Nov  4 19:46 pcspkr
-r--r--r--  1 root root     399 Nov  4 19:46 serio_raw
-r--r--r--  1 root root 4094095 Nov  4 19:46 vmlinux

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Andrii Nakryiko 5f9ae91f7c kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it
Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.

Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.

Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.

$ time make -j60
...
make -j60  27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60  123.69s user 27.85s system 1566% cpu 9.675 total

So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.

To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.

It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.

FULL BTF
========

$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691

$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738

SPLIT BTF
=========

$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047

$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-4-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Andrii Nakryiko 5329722057 bpf: Assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO
Allocate ID for vmlinux BTF. This makes it visible when iterating over all BTF
objects in the system. To allow distinguishing vmlinux BTF (and later kernel
module BTF) from user-provided BTFs, expose extra kernel_btf flag, as well as
BTF name ("vmlinux" for vmlinux BTF, will equal to module's name for module
BTF).  We might want to later allow specifying BTF name for user-provided BTFs
as well, if that makes sense. But currently this is reserved only for
in-kernel BTFs.

Having in-kernel BTFs exposed IDs will allow to extend BPF APIs that require
in-kernel BTF type with ability to specify BTF types from kernel modules, not
just vmlinux BTF. This will be implemented in a follow up patch set for
fentry/fexit/fmod_ret/lsm/etc.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-3-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Andrii Nakryiko 951bb64621 bpf: Add in-kernel split BTF support
Adjust in-kernel BTF implementation to support a split BTF mode of operation.
Changes are mostly mirroring libbpf split BTF changes, with the exception of
start_id being 0 for in-kernel implementation due to simpler read-only mode.

Otherwise, for split BTF logic, most of the logic of jumping to base BTF,
where necessary, is encapsulated in few helper functions. Type numbering and
string offset in a split BTF are logically continuing where base BTF ends, so
most of the high-level logic is kept without changes.

Type verification and size resolution is only doing an added resolution of new
split BTF types and relies on already cached size and type resolution results
in the base BTF.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-2-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Martin KaFai Lau f52b8fd332 bpf: selftest: Use static globals in tcp_hdr_options and btf_skc_cls_ingress
Some globals in the tcp_hdr_options test and btf_skc_cls_ingress test
are not using static scope.  This patch fixes it.

Targeting bpf-next branch as an improvement since it currently does not
break the build.

Fixes: ad2f8eb009 ("bpf: selftests: Tcp header options")
Fixes: 9a856cae22 ("bpf: selftest: Add test_btf_skc_cls_ingress")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201106225402.4135741-1-kafai@fb.com
2020-11-09 16:18:36 -08:00
Wang Qing 666475ccbf bpf, btf: Remove the duplicate btf_ids.h include
Remove duplicate btf_ids.h header which is included twice.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1604736650-11197-1-git-send-email-wangqing@vivo.com
2020-11-10 00:05:18 +01:00
Menglong Dong a10b4f9610 samples/bpf: Remove duplicate include in hbm
The 'bpf/bpf.h' include in 'samples/bpf/hbm.c' is duplicated.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1604654034-52821-1-git-send-email-dong.menglong@zte.com.cn
2020-11-10 00:01:31 +01:00
Alexei Starovoitov f055f355fa selftests/bpf: Fix selftest build with old libc
pidfd_open was added in 2019. Some versions of libc library don't define it.
Define it manually if it's not available.

Reported-by: Sergei Iudin <siudin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-11-06 12:33:46 -08:00
KP Singh 4170bc6baa bpf: Exercise syscall operations for inode and sk storage
Use the check_syscall_operations added for task_local_storage to
exercise syscall operations for other local storage maps:

* Check the absence of an element for the given fd.
* Create a new element, retrieve and compare its value.
* Delete the element and check again for absence.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-10-kpsingh@chromium.org
2020-11-06 08:08:38 -08:00
KP Singh 9cde3beead bpf: Add tests for task_local_storage
The test exercises the syscall based map operations by creating a pidfd
for the current process.

For verifying kernel / LSM functionality, the test implements a simple
MAC policy which denies an executable from unlinking itself. The LSM
program bprm_committed_creds sets a task_local_storage with a pointer to
the inode. This is then used to detect if the task is trying to unlink
itself in the inode_unlink LSM hook.

The test copies /bin/rm to /tmp and executes it in a child thread with
the intention of deleting itself. A successful test should prevent the
the running executable from deleting itself.

The bpf programs are also updated to call bpf_spin_{lock, unlock} to
trigger the verfier checks for spin locks.

The temporary file is cleaned up later in the test.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-9-kpsingh@chromium.org
2020-11-06 08:08:38 -08:00
KP Singh a367efa71b bpf: Update selftests for local_storage to use vmlinux.h
With the fixing of BTF pruning of embedded types being fixed, the test
can be simplified to use vmlinux.h

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-8-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
KP Singh f0e5ba0bc4 bpf: Fix tests for local_storage
The {inode,sk}_storage_result checking if the correct value was retrieved
was being clobbered unconditionally by the return value of the
bpf_{inode,sk}_storage_delete call.

Also, consistently use the newly added BPF_LOCAL_STORAGE_GET_F_CREATE
flag.

Fixes: cd324d7abb ("bpf: Add selftests for local_storage")
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-7-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
KP Singh 3ca1032ab7 bpf: Implement get_current_task_btf and RET_PTR_TO_BTF_ID
The currently available bpf_get_current_task returns an unsigned integer
which can be used along with BPF_CORE_READ to read data from
the task_struct but still cannot be used as an input argument to a
helper that accepts an ARG_PTR_TO_BTF_ID of type task_struct.

In order to implement this helper a new return type, RET_PTR_TO_BTF_ID,
is added. This is similar to RET_PTR_TO_BTF_ID_OR_NULL but does not
require checking the nullness of returned pointer.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-6-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
KP Singh 864ab0616d bpftool: Add support for task local storage
Updates the binary to handle the BPF_MAP_TYPE_TASK_STORAGE as
"task_storage" for printing and parsing. Also updates the documentation
and bash completion

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-5-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
KP Singh 8885274d22 libbpf: Add support for task local storage
Updates the bpf_probe_map_type API to also support
BPF_MAP_TYPE_TASK_STORAGE similar to other local storage maps.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-4-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
KP Singh 4cf1bc1f10 bpf: Implement task local storage
Similar to bpf_local_storage for sockets and inodes add local storage
for task_struct.

The life-cycle of storage is managed with the life-cycle of the
task_struct.  i.e. the storage is destroyed along with the owning task
with a callback to the bpf_task_storage_free from the task_free LSM
hook.

The BPF LSM allocates an __rcu pointer to the bpf_local_storage in
the security blob which are now stackable and can co-exist with other
LSMs.

The userspace map operations can be done by using a pid fd as a key
passed to the lookup, update and delete operations.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-3-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
KP Singh 9e7a4d9831 bpf: Allow LSM programs to use bpf spin locks
Usage of spin locks was not allowed for tracing programs due to
insufficient preemption checks. The verifier does not currently prevent
LSM programs from using spin locks, but the helpers are not exposed
via bpf_lsm_func_proto.

Based on the discussion in [1], non-sleepable LSM programs should be
able to use bpf_spin_{lock, unlock}.

Sleepable LSM programs can be preempted which means that allowng spin
locks will need more work (disabling preemption and the verifier
ensuring that no sleepable helpers are called when a spin lock is held).

[1]: https://lore.kernel.org/bpf/20201103153132.2717326-1-kpsingh@chromium.org/T/#md601a053229287659071600d3483523f752cd2fb

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-2-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
Florian Lehner c6bde958a6 bpf: Lift hashtab key_size limit
Currently key_size of hashtab is limited to MAX_BPF_STACK.
As the key of hashtab can also be a value from a per cpu map it can be
larger than MAX_BPF_STACK.

The use-case for this patch originates to implement allow/disallow
lists for files and file paths. The maximum length of file paths is
defined by PATH_MAX with 4096 chars including nul.
This limit exceeds MAX_BPF_STACK.

Changelog:

v5:
 - Fix cast overflow

v4:
 - Utilize BPF skeleton in tests
 - Rebase

v3:
 - Rebase

v2:
 - Add a test for bpf side

Signed-off-by: Florian Lehner <dev@der-flo.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201029201442.596690-1-dev@der-flo.net
2020-11-05 20:04:46 -08:00
Alexei Starovoitov b6b466a81f Merge branch 'libbpf: split BTF support'
Andrii Nakryiko says:

====================
This patch set adds support for generating and deduplicating split BTF. This
is an enhancement to the BTF, which allows to designate one BTF as the "base
BTF" (e.g., vmlinux BTF), and one or more other BTFs as "split BTF" (e.g.,
kernel module BTF), which are building upon and extending base BTF with extra
types and strings.

Once loaded, split BTF appears as a single unified BTF superset of base BTF,
with continuous and transparent numbering scheme. This allows all the existing
users of BTF to work correctly and stay agnostic to the base/split BTFs
composition.  The only difference is in how to instantiate split BTF: it
requires base BTF to be alread instantiated and passed to btf__new_xxx_split()
or btf__parse_xxx_split() "constructors" explicitly.

This split approach is necessary if we are to have a reasonably-sized kernel
module BTFs. By deduping each kernel module's BTF individually, resulting
module BTFs contain copies of a lot of kernel types that are already present
in vmlinux BTF. Even those single copies result in a big BTF size bloat. On my
kernel configuration with 700 modules built, non-split BTF approach results in
115MBs of BTFs across all modules. With split BTF deduplication approach,
total size is down to 5.2MBs total, which is on part with vmlinux BTF (at
around 4MBs). This seems reasonable and practical. As to why we'd need kernel
module BTFs, that should be pretty obvious to anyone using BPF at this point,
as it allows all the BTF-powered features to be used with kernel modules:
tp_btf, fentry/fexit/fmod_ret, lsm, bpf_iter, etc.

This patch set is a pre-requisite to adding split BTF support to pahole, which
is a prerequisite to integrating split BTF into the Linux kernel build setup
to generate BTF for kernel modules. The latter will come as a follow-up patch
series once this series makes it to the libbpf and pahole makes use of it.

Patch #4 introduces necessary basic support for split BTF into libbpf APIs.
Patch #8 implements minimal changes to BTF dedup algorithm to allow
deduplicating split BTFs. Patch #11 adds extra -B flag to bpftool to allow to
specify the path to base BTF for cases when one wants to dump or inspect split
BTF. All the rest are refactorings, clean ups, bug fixes and selftests.

v1->v2:
  - addressed Song's feedback.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-11-05 18:37:37 -08:00
Andrii Nakryiko 75fa177769 tools/bpftool: Add bpftool support for split BTF
Add ability to work with split BTF by providing extra -B flag, which allows to
specify the path to the base BTF file.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-12-andrii@kernel.org
2020-11-05 18:37:31 -08:00
Andrii Nakryiko 232338fa2f selftests/bpf: Add split BTF dedup selftests
Add selftests validating BTF deduplication for split BTF case. Add a helper
macro that allows to validate entire BTF with raw BTF dump, not just
type-by-type. This saves tons of code and complexity.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-11-andrii@kernel.org
2020-11-05 18:37:31 -08:00
Andrii Nakryiko 6b6e6b1d09 libbpf: Accomodate DWARF/compiler bug with duplicated identical arrays
In some cases compiler seems to generate distinct DWARF types for identical
arrays within the same CU. That seems like a bug, but it's already out there
and breaks type graph equivalence checks, so accommodate it anyway by checking
for identical arrays, regardless of their type ID.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-10-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko f86524efcf libbpf: Support BTF dedup of split BTFs
Add support for deduplication split BTFs. When deduplicating split BTF, base
BTF is considered to be immutable and can't be modified or adjusted. 99% of
BTF deduplication logic is left intact (module some type numbering adjustments).
There are only two differences.

First, each type in base BTF gets hashed (expect VAR and DATASEC, of course,
those are always considered to be self-canonical instances) and added into
a table of canonical table candidates. Hashing is a shallow, fast operation,
so mostly eliminates the overhead of having entire base BTF to be a part of
BTF dedup.

Second difference is very critical and subtle. While deduplicating split BTF
types, it is possible to discover that one of immutable base BTF BTF_KIND_FWD
types can and should be resolved to a full STRUCT/UNION type from the split
BTF part.  This is, obviously, can't happen because we can't modify the base
BTF types anymore. So because of that, any type in split BTF that directly or
indirectly references that newly-to-be-resolved FWD type can't be considered
to be equivalent to the corresponding canonical types in base BTF, because
that would result in a loss of type resolution information. So in such case,
split BTF types will be deduplicated separately and will cause some
duplication of type information, which is unavoidable.

With those two changes, the rest of the algorithm manages to deduplicate split
BTF correctly, pointing all the duplicates to their canonical counter-parts in
base BTF, but also is deduplicating whatever unique types are present in split
BTF on their own.

Also, theoretically, split BTF after deduplication could end up with either
empty type section or empty string section. This is handled by libbpf
correctly in one of previous patches in the series.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-9-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko d812362450 libbpf: Fix BTF data layout checks and allow empty BTF
Make data section layout checks stricter, disallowing overlap of types and
strings data.

Additionally, allow BTFs with no type data. There is nothing inherently wrong
with having BTF with no types (put potentially with some strings). This could
be a situation with kernel module BTFs, if module doesn't introduce any new
type information.

Also fix invalid offset alignment check for btf->hdr->type_off.

Fixes: 8a138aed4a ("bpf: btf: Add BTF support to libbpf")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-8-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko 1306c980cf selftests/bpf: Add checking of raw type dump in BTF writer APIs selftests
Add re-usable btf_helpers.{c,h} to provide BTF-related testing routines. Start
with adding a raw BTF dumping helpers.

Raw BTF dump is the most succinct and at the same time a very human-friendly
way to validate exact contents of BTF types. Cross-validate raw BTF dump and
writable BTF in a single selftest. Raw type dump checks also serve as a good
self-documentation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-7-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko 197389da2f selftests/bpf: Add split BTF basic test
Add selftest validating ability to programmatically generate and then dump
split BTF.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-6-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko ba451366bf libbpf: Implement basic split BTF support
Support split BTF operation, in which one BTF (base BTF) provides basic set of
types and strings, while another one (split BTF) builds on top of base's types
and strings and adds its own new types and strings. From API standpoint, the
fact that the split BTF is built on top of the base BTF is transparent.

Type numeration is transparent. If the base BTF had last type ID #N, then all
types in the split BTF start at type ID N+1. Any type in split BTF can
reference base BTF types, but not vice versa. Programmatically construction of
a split BTF on top of a base BTF is supported: one can create an empty split
BTF with btf__new_empty_split() and pass base BTF as an input, or pass raw
binary data to btf__new_split(), or use btf__parse_xxx_split() variants to get
initial set of split types/strings from the ELF file with .BTF section.

String offsets are similarly transparent and are a logical continuation of
base BTF's strings. When building BTF programmatically and adding a new string
(explicitly with btf__add_str() or implicitly through appending new
types/members), string-to-be-added would first be looked up from the base
BTF's string section and re-used if it's there. If not, it will be looked up
and/or added to the split BTF string section. Similarly to type IDs, types in
split BTF can refer to strings from base BTF absolutely transparently (but not
vice versa, of course, because base BTF doesn't "know" about existence of
split BTF).

Internal type index is slightly adjusted to be zero-indexed, ignoring a fake
[0] VOID type. This allows to handle split/base BTF type lookups transparently
by using btf->start_id type ID offset, which is always 1 for base/non-split
BTF and equals btf__get_nr_types(base_btf) + 1 for the split BTF.

BTF deduplication is not yet supported for split BTF and support for it will
be added in separate patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-5-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko 88a82c2a9a libbpf: Unify and speed up BTF string deduplication
Revamp BTF dedup's string deduplication to match the approach of writable BTF
string management. This allows to transfer deduplicated strings index back to
BTF object after deduplication without expensive extra memory copying and hash
map re-construction. It also simplifies the code and speeds it up, because
hashmap-based string deduplication is faster than sort + unique approach.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-4-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko d9448f9496 selftest/bpf: Relax btf_dedup test checks
Remove the requirement of a strictly exact string section contents. This used
to be true when string deduplication was done through sorting, but with string
dedup done through hash table, it's no longer true. So relax test harness to
relax strings checks and, consequently, type checks, which now don't have to
have exactly the same string offsets.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-3-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko c81ed6d81e libbpf: Factor out common operations in BTF writing APIs
Factor out commiting of appended type data. Also extract fetching the very
last type in the BTF (to append members to). These two operations are common
across many APIs and will be easier to refactor with split BTF, if they are
extracted into a single place.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-2-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Alexei Starovoitov d0b3d2d7e5 Merge branch 'selftests/bpf: Migrate test_tcpbpf_user to be a part of test_progs'
Alexander Duyck says:

====================
Move the test functionality from test_tcpbpf_user into the test_progs
framework so that it will be run any time the test_progs framework is run.
This will help to prevent future test escapes as the individual tests, such
as test_tcpbpf_user, are less likely to be run by developers and CI
tests.

As a part of moving it over the series goes through and updates the code to
make use of the existing APIs included in the test_progs framework. This is
meant to simplify and streamline the test code and avoid duplication of
effort.

v2: Dropped test_tcpbpf_user from .gitignore
    Replaced CHECK_FAIL calls with CHECK calls
    Minimized changes in patch 1 when moving the file
    Updated stg mail command line to display renames in submission
    Added shutdown logic to end of run_test function to guarantee close
    Added patch that replaces the two maps with use of global variables
v3: Left err at -1 while we are performing send/recv calls w/ data
    Drop extra labels from test_tcpbpf_user in favor of keeping err label
    Dropped redundant zero init for tcpbpf_globals result and key
    Dropped replacing of "printf(" with "fprintf(stderr, "
    Fixed error in use of ASSERT_OK_PTR which was skipping of run_test
    Replaced "{ 0 }" with "{}" in init of global in test_tcpbpf_kern.c
    Added "Acked-by" from Martin KaiFai Lau and Andrii Nakryiko
====================

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-11-03 17:54:03 -08:00
Alexander Duyck 21b5177e99 selftest/bpf: Use global variables instead of maps for test_tcpbpf_kern
Use global variables instead of global_map and sockopt_results_map to track
test data. Doing this greatly simplifies the code as there is not need to
take the extra steps of updating the maps or looking up elements.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/160443931900.1086697.6588858453575682351.stgit@localhost.localdomain
2020-11-03 17:52:37 -08:00
Alexander Duyck 0a099d1429 selftests/bpf: Migrate tcpbpf_user.c to use BPF skeleton
Update tcpbpf_user.c to make use of the BPF skeleton. Doing this we can
simplify test_tcpbpf_user and reduce the overhead involved in setting up
the test.

In addition we can clean up the remaining bits such as the one remaining
CHECK_FAIL at the end of test_tcpbpf_user so that the function only makes
use of CHECK as needed.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/160443931155.1086697.17869006617113525162.stgit@localhost.localdomain
2020-11-03 17:52:37 -08:00
Alexander Duyck d3813ea14b selftests/bpf: Replace EXPECT_EQ with ASSERT_EQ and refactor verify_results
There is already logic in test_progs.h for asserting that a value is
expected to be another value. So instead of reinventing it we should just
make use of ASSERT_EQ in tcpbpf_user.c. This will allow for better
debugging and integrates much more closely with the test_progs framework.

In addition we can refactor the code a bit to merge together the two
verify functions and tie them together into a single function. Doing this
helps to clean the code up a bit and makes it more readable as all the
verification is now done in one function.

Lastly we can relocate the verification to the end of the run_test since it
is logically part of the test itself. With this we can drop the need for a
return value from run_test since verification becomes the last step of the
call and then immediately following is the tear down of the test setup.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/160443930408.1086697.16101205859962113000.stgit@localhost.localdomain
2020-11-03 17:52:37 -08:00
Alexander Duyck 247f0ec361 selftests/bpf: Drop python client/server in favor of threads
Drop the tcp_client/server.py files in favor of using a client and server
thread within the test case. Specifically we spawn a new thread to play the
role of the server, and the main testing thread plays the role of client.

Add logic to the end of the run_test function to guarantee that the sockets
are closed when we begin verifying results.

Doing this we are able to reduce overhead since we don't have two python
workers possibly floating around. In addition we don't have to worry about
synchronization issues and as such the retry loop waiting for the threads
to close the sockets can be dropped as we will have already closed the
sockets in the local executable and synchronized the server thread.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160443929638.1086697.2430242340980315521.stgit@localhost.localdomain
2020-11-03 17:52:37 -08:00
Alexander Duyck aaf376bddf selftests/bpf: Move test_tcppbf_user into test_progs
Recently a bug was missed due to the fact that test_tcpbpf_user is not a
part of test_progs. In order to prevent similar issues in the future move
the test functionality into test_progs. By doing this we can make certain
that it is a part of standard testing and will not be overlooked.

As a part of moving the functionality into test_progs it is necessary to
integrate with the test_progs framework and to drop any redundant code.
This patch:
1. Cleans up the include headers
2. Dropped a duplicate definition of bpf_find_map
3. Switched over to using test_progs specific cgroup functions
4. Renamed main to test_tcpbpf_user
5. Dropped return value in favor of CHECK_FAIL to check for errors

The general idea is that I wanted to keep the changes as small as possible
while moving the file into the test_progs framework. The follow-on patches
are meant to clean up the remaining issues such as the use of CHECK_FAIL.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/160443928881.1086697.17661359319919165370.stgit@localhost.localdomain
2020-11-03 17:52:37 -08:00
Eric Dumazet 8aaeed81fc bpf: Fix error path in htab_map_alloc()
syzbot was able to trigger a use-after-free in htab_map_alloc() [1]

htab_map_alloc() lacks a call to lockdep_unregister_key() in its error path.

lockdep_register_key() and lockdep_unregister_key() can not fail,
it seems better to use them right after htab allocation and before
htab freeing, avoiding more goto/labels in htab_map_alloc()

[1]
BUG: KASAN: use-after-free in lockdep_register_key+0x356/0x3e0 kernel/locking/lockdep.c:1182
Read of size 8 at addr ffff88805fa67ad8 by task syz-executor.3/2356

CPU: 1 PID: 2356 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
 lockdep_register_key+0x356/0x3e0 kernel/locking/lockdep.c:1182
 htab_init_buckets kernel/bpf/hashtab.c:144 [inline]
 htab_map_alloc+0x6c5/0x14a0 kernel/bpf/hashtab.c:521
 find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
 map_create kernel/bpf/syscall.c:825 [inline]
 __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45deb9
Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f0eafee1c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000001a00 RCX: 000000000045deb9
RDX: 0000000000000040 RSI: 0000000020000040 RDI: 405a020000000000
RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007ffd3cf9eabf R14: 00007f0eafee29c0 R15: 000000000118bf2c

Allocated by task 2053:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:461
 kmalloc include/linux/slab.h:554 [inline]
 kzalloc include/linux/slab.h:666 [inline]
 htab_map_alloc+0xdf/0x14a0 kernel/bpf/hashtab.c:454
 find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
 map_create kernel/bpf/syscall.c:825 [inline]
 __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 2053:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
 __kasan_slab_free+0x102/0x140 mm/kasan/common.c:422
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1577
 slab_free mm/slub.c:3142 [inline]
 kfree+0xdb/0x360 mm/slub.c:4124
 htab_map_alloc+0x3f9/0x14a0 kernel/bpf/hashtab.c:549
 find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
 map_create kernel/bpf/syscall.c:825 [inline]
 __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88805fa67800
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 728 bytes inside of
 1024-byte region [ffff88805fa67800, ffff88805fa67c00)
The buggy address belongs to the page:
page:000000003c5582c4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fa60
head:000000003c5582c4 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head)
raw: 00fff00000010200 ffffea0000bc1200 0000000200000002 ffff888010041140
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88805fa67980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88805fa67a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                    ^
 ffff88805fa67b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88805fa67b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: c50eb518e2 ("bpf: Use separate lockdep class for each hashtab")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201102114100.3103180-1-eric.dumazet@gmail.com
2020-11-02 17:20:21 +01:00
Alexei Starovoitov cb5dc5b062 Merge branch 'bpf: safeguard hashtab locking in NMI context'
Song Liu says:

====================
LOCKDEP NMI warning highlighted potential deadlock of hashtab in NMI
context:

[   74.828971] ================================
[   74.828972] WARNING: inconsistent lock state
[   74.828973] 5.9.0-rc8+ #275 Not tainted
[   74.828974] --------------------------------
[   74.828975] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[   74.828976] taskset/1174 [HC2[2]:SC0[0]:HE0:SE1] takes:
[...]
[   74.828999]  Possible unsafe locking scenario:
[   74.828999]
[   74.829000]        CPU0
[   74.829001]        ----
[   74.829001]   lock(&htab->buckets[i].raw_lock);
[   74.829003]   <Interrupt>
[   74.829004]     lock(&htab->buckets[i].raw_lock);

Please refer to patch 1/2 for full trace.

This warning is a false alert, as "INITIAL USE" and "IN-NMI" in the tests
are from different hashtab. On the other hand, in theory, it is possible
to deadlock when a hashtab is access from both non-NMI and NMI context.
Patch 1/2 fixes this false alert by assigning separate lockdep class to
each hashtab. Patch 2/2 introduces map_locked counters, which is similar to
bpf_prog_active counter, to avoid hashtab deadlock in NMI context.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-30 13:03:36 -07:00
Song Liu 20b6cc34ea bpf: Avoid hashtab deadlock with map_locked
If a hashtab is accessed in both non-NMI and NMI context, the system may
deadlock on bucket->lock. Fix this issue with percpu counter map_locked.
map_locked rejects concurrent access to the same bucket from the same CPU.
To reduce memory overhead, map_locked is not added per bucket. Instead,
8 percpu counters are added to each hashtab. buckets are assigned to these
counters based on the lower bits of its hash.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201029071925.3103400-3-songliubraving@fb.com
2020-10-30 13:03:29 -07:00
Song Liu c50eb518e2 bpf: Use separate lockdep class for each hashtab
If a hashtab is accessed in both NMI and non-NMI contexts, it may cause
deadlock in bucket->lock. LOCKDEP NMI warning highlighted this issue:

./test_progs -t stacktrace

[   74.828970]
[   74.828971] ================================
[   74.828972] WARNING: inconsistent lock state
[   74.828973] 5.9.0-rc8+ #275 Not tainted
[   74.828974] --------------------------------
[   74.828975] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[   74.828976] taskset/1174 [HC2[2]:SC0[0]:HE0:SE1] takes:
[   74.828977] ffffc90000ee96b0 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x271/0x5a0
[   74.828981] {INITIAL USE} state was registered at:
[   74.828982]   lock_acquire+0x137/0x510
[   74.828983]   _raw_spin_lock_irqsave+0x43/0x90
[   74.828984]   htab_map_update_elem+0x271/0x5a0
[   74.828984]   0xffffffffa0040b34
[   74.828985]   trace_call_bpf+0x159/0x310
[   74.828986]   perf_trace_run_bpf_submit+0x5f/0xd0
[   74.828987]   perf_trace_urandom_read+0x1be/0x220
[   74.828988]   urandom_read_nowarn.isra.0+0x26f/0x380
[   74.828989]   vfs_read+0xf8/0x280
[   74.828989]   ksys_read+0xc9/0x160
[   74.828990]   do_syscall_64+0x33/0x40
[   74.828991]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   74.828992] irq event stamp: 1766
[   74.828993] hardirqs last  enabled at (1765): [<ffffffff82800ace>] asm_exc_page_fault+0x1e/0x30
[   74.828994] hardirqs last disabled at (1766): [<ffffffff8267df87>] irqentry_enter+0x37/0x60
[   74.828995] softirqs last  enabled at (856): [<ffffffff81043e7c>] fpu__clear+0xac/0x120
[   74.828996] softirqs last disabled at (854): [<ffffffff81043df0>] fpu__clear+0x20/0x120
[   74.828997]
[   74.828998] other info that might help us debug this:
[   74.828999]  Possible unsafe locking scenario:
[   74.828999]
[   74.829000]        CPU0
[   74.829001]        ----
[   74.829001]   lock(&htab->buckets[i].raw_lock);
[   74.829003]   <Interrupt>
[   74.829004]     lock(&htab->buckets[i].raw_lock);
[   74.829006]
[   74.829006]  *** DEADLOCK ***
[   74.829007]
[   74.829008] 1 lock held by taskset/1174:
[   74.829008]  #0: ffff8883ec3fd020 (&cpuctx_lock){-...}-{2:2}, at: perf_event_task_tick+0x101/0x650
[   74.829012]
[   74.829013] stack backtrace:
[   74.829014] CPU: 0 PID: 1174 Comm: taskset Not tainted 5.9.0-rc8+ #275
[   74.829015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   74.829016] Call Trace:
[   74.829016]  <NMI>
[   74.829017]  dump_stack+0x9a/0xd0
[   74.829018]  lock_acquire+0x461/0x510
[   74.829019]  ? lock_release+0x6b0/0x6b0
[   74.829020]  ? stack_map_get_build_id_offset+0x45e/0x800
[   74.829021]  ? htab_map_update_elem+0x271/0x5a0
[   74.829022]  ? rcu_read_lock_held_common+0x1a/0x50
[   74.829022]  ? rcu_read_lock_held+0x5f/0xb0
[   74.829023]  _raw_spin_lock_irqsave+0x43/0x90
[   74.829024]  ? htab_map_update_elem+0x271/0x5a0
[   74.829025]  htab_map_update_elem+0x271/0x5a0
[   74.829026]  bpf_prog_1fd9e30e1438d3c5_oncpu+0x9c/0xe88
[   74.829027]  bpf_overflow_handler+0x127/0x320
[   74.829028]  ? perf_event_text_poke_output+0x4d0/0x4d0
[   74.829029]  ? sched_clock_cpu+0x18/0x130
[   74.829030]  __perf_event_overflow+0xae/0x190
[   74.829030]  handle_pmi_common+0x34c/0x470
[   74.829031]  ? intel_pmu_save_and_restart+0x90/0x90
[   74.829032]  ? lock_acquire+0x3f8/0x510
[   74.829033]  ? lock_release+0x6b0/0x6b0
[   74.829034]  intel_pmu_handle_irq+0x11e/0x240
[   74.829034]  perf_event_nmi_handler+0x40/0x60
[   74.829035]  nmi_handle+0x110/0x360
[   74.829036]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829037]  default_do_nmi+0x6b/0x170
[   74.829038]  exc_nmi+0x106/0x130
[   74.829038]  end_repeat_nmi+0x16/0x55
[   74.829039] RIP: 0010:__intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829042] Code: 2f 1f 03 48 8d bb b8 0c 00 00 e8 29 09 41 00 48 ...
[   74.829043] RSP: 0000:ffff8880a604fc90 EFLAGS: 00000002
[   74.829044] RAX: 000000070000000f RBX: ffff8883ec2195a0 RCX: 000000000000038f
[   74.829045] RDX: 0000000000000007 RSI: ffffffff82e72c20 RDI: ffff8883ec21a258
[   74.829046] RBP: 000000070000000f R08: ffffffff8101b013 R09: fffffbfff0a7982d
[   74.829047] R10: ffffffff853cc167 R11: fffffbfff0a7982c R12: 0000000000000000
[   74.829049] R13: ffff8883ec3f0af0 R14: ffff8883ec3fd120 R15: ffff8883e9c92098
[   74.829049]  ? intel_pmu_lbr_enable_all+0x43/0x240
[   74.829050]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829051]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829052]  </NMI>
[   74.829053]  perf_event_task_tick+0x48d/0x650
[   74.829054]  scheduler_tick+0x129/0x210
[   74.829054]  update_process_times+0x37/0x70
[   74.829055]  tick_sched_handle.isra.0+0x35/0x90
[   74.829056]  tick_sched_timer+0x8f/0xb0
[   74.829057]  __hrtimer_run_queues+0x364/0x7d0
[   74.829058]  ? tick_sched_do_timer+0xa0/0xa0
[   74.829058]  ? enqueue_hrtimer+0x1e0/0x1e0
[   74.829059]  ? recalibrate_cpu_khz+0x10/0x10
[   74.829060]  ? ktime_get_update_offsets_now+0x1a3/0x360
[   74.829061]  hrtimer_interrupt+0x1bb/0x360
[   74.829062]  ? rcu_read_lock_sched_held+0xa1/0xd0
[   74.829063]  __sysvec_apic_timer_interrupt+0xed/0x3d0
[   74.829064]  sysvec_apic_timer_interrupt+0x3f/0xd0
[   74.829064]  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[   74.829065]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[   74.829066] RIP: 0033:0x7fba18d579b4
[   74.829068] Code: 74 54 44 0f b6 4a 04 41 83 e1 0f 41 80 f9 ...
[   74.829069] RSP: 002b:00007ffc9ba69570 EFLAGS: 00000206
[   74.829071] RAX: 00007fba192084c0 RBX: 00007fba18c24d28 RCX: 00000000000007a4
[   74.829072] RDX: 00007fba18c30488 RSI: 0000000000000000 RDI: 000000000000037b
[   74.829073] RBP: 00007fba18ca5760 R08: 00007fba18c248fc R09: 00007fba18c94c30
[   74.829074] R10: 000000000000002f R11: 0000000000073c30 R12: 00007ffc9ba695e0
[   74.829075] R13: 00000000000003f3 R14: 00007fba18c21ac8 R15: 00000000000058d6

However, such warning should not apply across multiple hashtabs. The
system will not deadlock if one hashtab is used in NMI, while another
hashtab is used in non-NMI.

Use separate lockdep class for each hashtab, so that we don't get this
false alert.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201029071925.3103400-2-songliubraving@fb.com
2020-10-30 13:03:29 -07:00
Yonghong Song cf83b2d2e2 bpf: Permit cond_resched for some iterators
Commit e679654a70 ("bpf: Fix a rcu_sched stall issue with
bpf task/task_file iterator") tries to fix rcu stalls warning
which is caused by bpf task_file iterator when running
"bpftool prog".

      rcu: INFO: rcu_sched self-detected stall on CPU
      rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
      \x09(t=21031 jiffies g=2534773 q=179750)
      NMI backtrace for cpu 7
      CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
      Call Trace:
      <IRQ>
      dump_stack+0x57/0x70
      nmi_cpu_backtrace.cold+0x14/0x53
      ? lapic_can_unplug_cpu.cold+0x39/0x39
      nmi_trigger_cpumask_backtrace+0xb7/0xc7
      rcu_dump_cpu_stacks+0xa2/0xd0
      rcu_sched_clock_irq.cold+0x1ff/0x3d9
      ? tick_nohz_handler+0x100/0x100
      update_process_times+0x5b/0x90
      tick_sched_timer+0x5e/0xf0
      __hrtimer_run_queues+0x12a/0x2a0
      hrtimer_interrupt+0x10e/0x280
      __sysvec_apic_timer_interrupt+0x51/0xe0
      asm_call_on_stack+0xf/0x20
      </IRQ>
      sysvec_apic_timer_interrupt+0x6f/0x80
      ...
      task_file_seq_next+0x52/0xa0
      bpf_seq_read+0xb9/0x320
      vfs_read+0x9d/0x180
      ksys_read+0x5f/0xe0
      do_syscall_64+0x38/0x60
      entry_SYSCALL_64_after_hwframe+0x44/0xa9

The fix is to limit the number of bpf program runs to be
one million. This fixed the program in most cases. But
we also found under heavy load, which can increase the wallclock
time for bpf_seq_read(), the warning may still be possible.

For example, calling bpf_delay() in the "while" loop of
bpf_seq_read(), which will introduce artificial delay,
the warning will show up in my qemu run.

  static unsigned q;
  volatile unsigned *p = &q;
  volatile unsigned long long ll;
  static void bpf_delay(void)
  {
         int i, j;

         for (i = 0; i < 10000; i++)
                 for (j = 0; j < 10000; j++)
                         ll += *p;
  }

There are two ways to fix this issue. One is to reduce the above
one million threshold to say 100,000 and hopefully rcu warning will
not show up any more. Another is to introduce a target feature
which enables bpf_seq_read() calling cond_resched().

This patch took second approach as the first approach may cause
more -EAGAIN failures for read() syscalls. Note that not all bpf_iter
targets can permit cond_resched() in bpf_seq_read() as some, e.g.,
netlink seq iterator, rcu read lock critical section spans through
seq_ops->next() -> seq_ops->show() -> seq_ops->next().

For the kernel code with the above hack, "bpftool p" roughly takes
38 seconds to finish on my VM with 184 bpf program runs.
Using the following command, I am able to collect the number of
context switches:
   perf stat -e context-switches -- ./bpftool p >& log
Without this patch,
   69      context-switches
With this patch,
   75      context-switches
This patch added additional 6 context switches, roughly every 6 seconds
to reschedule, to avoid lengthy no-rescheduling which may cause the
above RCU warnings.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201028061054.1411116-1-yhs@fb.com
2020-10-28 14:54:31 -07:00
Linus Torvalds 3cb12d27ff Fixes for 5.10-rc1 from the networking tree:
Cross-tree/merge window issues:
 
  - rtl8150: don't incorrectly assign random MAC addresses; fix late
    in the 5.9 cycle started depending on a return code from
    a function which changed with the 5.10 PR from the usb subsystem
 
 Current release - regressions:
 
  - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
    crashes at probe when control vq was not negotiated/available
 
 Previous releases - regressions:
 
  - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
    bus, only first device would be probed correctly
 
  - nexthop: Fix performance regression in nexthop deletion by
    effectively switching from recently added synchronize_rcu()
    to synchronize_rcu_expedited()
 
  - netsec: ignore 'phy-mode' device property on ACPI systems;
    the property is not populated correctly by the firmware,
    but firmware configures the PHY so just keep boot settings
 
 Previous releases - always broken:
 
  - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
    bulk transfers getting "stuck"
 
  - icmp: randomize the global rate limiter to prevent attackers from
    getting useful signal
 
  - r8169: fix operation under forced interrupt threading, make the
    driver always use hard irqs, even on RT, given the handler is
    light and only wants to schedule napi (and do so through
    a _irqoff() variant, preferably)
 
  - bpf: Enforce pointer id generation for all may-be-null register
    type to avoid pointers erroneously getting marked as null-checked
 
  - tipc: re-configure queue limit for broadcast link
 
  - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
    tunnels
 
  - fix various issues in chelsio inline tls driver
 
 Misc:
 
  - bpf: improve just-added bpf_redirect_neigh() helper api to support
    supplying nexthop by the caller - in case BPF program has already
    done a lookup we can avoid doing another one
 
  - remove unnecessary break statements
 
  - make MCTCP not select IPV6, but rather depend on it
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+R+5UACgkQMUZtbf5S
 Irt9KxAAiYme2aSvMOni0NQsOgQ5mVsy7tk0/4dyRqkAx0ggrfGcFuhgZYNm8ZKY
 KoQsQyn30Wb/2wAp1vX2I4Fod67rFyBfQg/8iWiEAu47X7Bj1lpPPJexSPKhF9/X
 e0TuGxZtoaDuV9C3Su/FOjRmnShGSFQu1SCyJThshwaGsFL3YQ0Ut07VRgRF8x05
 A5fy2SVVIw0JOQgV1oH0GP5oEK3c50oGnaXt8emm56PxVIfAYY0oq69hQUzrfMFP
 zV9R0XbnbCIibT8R3lEghjtXavtQTzK5rYDKazTeOyDU87M+yuykNYj7MhgDwl9Q
 UdJkH2OpMlJylEH3asUjz/+ObMhXfOuj/ZS3INtO5omBJx7x76egDZPMQe4wlpcC
 NT5EZMS7kBdQL8xXDob7hXsvFpuEErSUGruYTHp4H52A9ke1dRTH2kQszcKk87V3
 s+aVVPtJ5bHzF3oGEvfwP0DFLTF6WvjD0Ts0LmTY2DhpE//tFWV37j60Ni5XU21X
 fCPooihQbLOsq9D8zc0ydEvCg2LLWMXM5ovCkqfIAJzbGVYhnxJSryZwpOlKDS0y
 LiUmLcTZDoNR/szx0aJhVHdUUVgXDX/GsllHoc1w7ZvDRMJn40K+xnaF3dSMwtIl
 imhfc5pPi6fdBgjB0cFYRPfhwiwlPMQ4YFsOq9JvynJzmt6P5FQ=
 =ceke
 -----END PGP SIGNATURE-----

Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Cross-tree/merge window issues:

   - rtl8150: don't incorrectly assign random MAC addresses; fix late in
     the 5.9 cycle started depending on a return code from a function
     which changed with the 5.10 PR from the usb subsystem

  Current release regressions:

   - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
     crashes at probe when control vq was not negotiated/available

  Previous release regressions:

   - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
     bus, only first device would be probed correctly

   - nexthop: Fix performance regression in nexthop deletion by
     effectively switching from recently added synchronize_rcu() to
     synchronize_rcu_expedited()

   - netsec: ignore 'phy-mode' device property on ACPI systems; the
     property is not populated correctly by the firmware, but firmware
     configures the PHY so just keep boot settings

  Previous releases - always broken:

   - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
     bulk transfers getting "stuck"

   - icmp: randomize the global rate limiter to prevent attackers from
     getting useful signal

   - r8169: fix operation under forced interrupt threading, make the
     driver always use hard irqs, even on RT, given the handler is light
     and only wants to schedule napi (and do so through a _irqoff()
     variant, preferably)

   - bpf: Enforce pointer id generation for all may-be-null register
     type to avoid pointers erroneously getting marked as null-checked

   - tipc: re-configure queue limit for broadcast link

   - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
     tunnels

   - fix various issues in chelsio inline tls driver

  Misc:

   - bpf: improve just-added bpf_redirect_neigh() helper api to support
     supplying nexthop by the caller - in case BPF program has already
     done a lookup we can avoid doing another one

   - remove unnecessary break statements

   - make MCTCP not select IPV6, but rather depend on it"

* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  tcp: fix to update snd_wl1 in bulk receiver fast path
  net: Properly typecast int values to set sk_max_pacing_rate
  netfilter: nf_fwd_netdev: clear timestamp in forwarding path
  ibmvnic: save changed mac address to adapter->mac_addr
  selftests: mptcp: depends on built-in IPv6
  Revert "virtio-net: ethtool configurable RXCSUM"
  rtnetlink: fix data overflow in rtnl_calcit()
  net: ethernet: mtk-star-emac: select REGMAP_MMIO
  net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
  net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
  bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
  bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
  bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
  mptcp: depends on IPV6 but not as a module
  sfc: move initialisation of efx->filter_sem to efx_init_struct()
  mpls: load mpls_gso after mpls_iptunnel
  net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
  net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
  net: dsa: bcm_sf2: make const array static, makes object smaller
  mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
  ...
2020-10-23 12:05:49 -07:00
Linus Torvalds 0adc313c4f Changes in gfs2:
* Use iomap for non-journaled buffered I/O.  This largely eliminates buffer
   heads on filesystems where the block size matches the page size.  Many thanks
   to Christoph Hellwig for this patch!
 * Fixes for some more journaled data filesystem bugs, found by running xfstests
   with data journaling on for all files (chattr +j $MNT) (Bob Peterson).
 * gfs2_evict_inode refactoring (Bob Peterson).
 * Use the statfs data in the journal during recovery instead of reading it in
   from the local statfs inodes (Abhi Das).
 * Several other minor fixes by various people.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl+S8QsUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqSwQ/+NaKp1Gh1YSQoukoFNJGNhLvdeMf4
 Em71ewCmPmqvNzixJ33U71Iob15ghzc78OHlum4xZYdrn/19Kwfa0sQi8OKV9Cow
 uXAio4iGpEEIyWdBf6CYU58do3hZErKCSf70FxfNIw/GR/wgmviXUVIKDRq4p5zf
 2VflkEqttwHc4OmP/wyvxEs16T8cOa86X57PzWwAGybij9+MufSRJE1mWgVsVFJu
 ASTQxy+T4kHWSUVOYs7hqpjluOdLl0rIJxwkt1HS1FG5kYt49h0qpRpfNL7VW6rs
 RFkdnOg2wpXSrjk66J/P+E5hqp6nexMVyTBGds1ajASfd5/GFMD5UFgk5EpVV/ey
 5+a44oRZEToqbToVa3XOumccfxUSjOcsg5hBYPI1GssBn85Uk1VH44mchb0mUGlY
 M8YuLcp5d2wuotPqtoyt8KpV0TmR7S4pv5Chh3Som6nQ9EtR2NZjtjEW2Cx6hJK5
 m43UxwYRKw4SowZgfKk3oEks6B+QrbKqDPzVGxXX4YP4tVlap4ygk9Uqoto9QZP7
 m/VP/xBtC50xn7EeTGrgiPrsOVX0ZNegTZ18KIYJudWxzV84s1XwHhPEOc/hTJJ0
 Ac4AF3k4lyG5WnXtvfP1L/QUWQOGvx+d6qNKLXcJqzFCAz9kfDGQYALtuQsIQOc9
 30IJ6Cp4lD37R+k=
 =C893
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Use iomap for non-journaled buffered I/O. This largely eliminates
   buffer heads on filesystems where the block size matches the page
   size. Many thanks to Christoph Hellwig for this patch!

 - Fixes for some more journaled data filesystem bugs, found by running
   xfstests with data journaling on for all files (chattr +j $MNT) (Bob
   Peterson)

 - gfs2_evict_inode refactoring (Bob Peterson)

 - Use the statfs data in the journal during recovery instead of reading
   it in from the local statfs inodes (Abhi Das)

 - Several other minor fixes by various people

* tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits)
  gfs2: Recover statfs info in journal head
  gfs2: lookup local statfs inodes prior to journal recovery
  gfs2: Add fields for statfs info in struct gfs2_log_header_host
  gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync
  gfs2: Eliminate gl_vm
  gfs2: Only access gl_delete for iopen glocks
  gfs2: Fix comments to glock_hash_walk
  gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
  gfs2: Ignore journal log writes for jdata holes
  gfs2: simplify gfs2_block_map
  gfs2: Only set PageChecked if we have a transaction
  gfs2: don't lock sd_ail_lock in gfs2_releasepage
  gfs2: make gfs2_ail1_empty_one return the count of active items
  gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
  gfs2: enhance log_blocks trace point to show log blocks free
  gfs2: add missing log_blocks trace points in gfs2_write_revokes
  gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
  gfs2: add validation checks for size of superblock
  gfs2: use-after-free in sysfs deregistration
  gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
  ...
2020-10-23 11:47:42 -07:00