1
0
Fork 0
alistair23-linux/kernel
Song Liu f1838da73c bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()
[ Upstream commit eac9153f2b ]

bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
deadlock on rq_lock():

rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[...]
Call Trace:
 try_to_wake_up+0x1ad/0x590
 wake_up_q+0x54/0x80
 rwsem_wake+0x8a/0xb0
 bpf_get_stack+0x13c/0x150
 bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
 bpf_overflow_handler+0x60/0x100
 __perf_event_overflow+0x4f/0xf0
 perf_swevent_overflow+0x99/0xc0
 ___perf_sw_event+0xe7/0x120
 __schedule+0x47d/0x620
 schedule+0x29/0x90
 futex_wait_queue_me+0xb9/0x110
 futex_wait+0x139/0x230
 do_futex+0x2ac/0xa50
 __x64_sys_futex+0x13c/0x180
 do_syscall_64+0x42/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This can be reproduced by:
1. Start a multi-thread program that does parallel mmap() and malloc();
2. taskset the program to 2 CPUs;
3. Attach bpf program to trace_sched_switch and gather stackmap with
   build-id, e.g. with trace.py from bcc tools:
   trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch

A sample reproducer is attached at the end.

This could also trigger deadlock with other locks that are nested with
rq_lock.

Fix this by checking whether irqs are disabled. Since rq_lock and all
other nested locks are irq safe, it is safe to do up_read() when irqs are
not disable. If the irqs are disabled, postpone up_read() in irq_work.

Fixes: 615755a77b ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com

Reproducer:
============================ 8< ============================

char *filename;

void *worker(void *p)
{
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
}

int main(int argc, char *argv[])
{
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
}
============================ 8< ============================

Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31 16:44:09 +01:00
..
bpf bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack() 2019-12-31 16:44:09 +01:00
cgroup cgroup: pids: use atomic64_t for pids->limit 2019-12-17 19:56:15 +01:00
configs kvm_config: add CONFIG_VIRTIO_MENU 2018-10-24 20:55:56 -04:00
debug kgdb: don't use a notifier to enter kgdb at panic; call directly 2019-09-25 17:51:40 -07:00
dma dma-mapping: fix false positivse warnings in dma_common_free_remap() 2019-10-05 10:24:17 +02:00
events perf/core: Fix missing static inline on perf_cgroup_switch() 2019-11-13 08:16:44 +01:00
gcov um: Enable CONFIG_CONSTRUCTORS 2019-09-15 21:37:13 +02:00
irq irq/irqdomain: Update __irq_domain_alloc_fwnode() function documentation 2019-11-05 00:48:26 +01:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-08-19 13:03:37 +02:00
locking Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" 2019-09-25 10:22:37 +02:00
power PM: QoS: Invalidate frequency QoS requests after removal 2019-11-20 10:46:42 +01:00
printk Merge branch 'for-5.4' into for-linus 2019-09-16 12:54:25 +02:00
rcu Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-16 17:25:49 -07:00
sched sched/uclamp: Fix incorrect condition 2019-11-15 11:02:18 +01:00
time time: Zero the upper 32-bits in __kernel_timespec on 32-bit 2019-12-13 08:42:18 +01:00
trace tracing: Fix race in perf_trace_buf initialization 2019-10-21 19:38:28 -04:00
.gitignore Provide in-kernel headers to make extending kernel easier 2019-04-29 16:48:03 +02:00
Kconfig.freezer treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.hz treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.locks treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.preempt sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y 2019-07-22 18:05:11 +02:00
Makefile Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity 2019-09-27 19:37:27 -07:00
acct.c acct_on(): don't mess with freeze protection 2019-04-04 21:04:13 -04:00
async.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
audit.c audit/stable-5.3 PR 20190702 2019-07-08 18:55:42 -07:00
audit.h audit/stable-5.3 PR 20190702 2019-07-08 18:55:42 -07:00
audit_fsnotify.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
audit_tree.c fsnotify: switch send_to_group() and ->handle_event to const struct qstr * 2019-04-26 13:51:03 -04:00
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-11-10 11:56:55 -05:00
auditfilter.c audit/stable-5.3 PR 20190702 2019-07-08 18:55:42 -07:00
auditsc.c audit: enforce op for string fields 2019-05-28 17:46:43 -04:00
backtracetest.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-10-31 08:54:14 -07:00
capability.c LSM: add SafeSetID module that gates setid calls 2019-01-25 11:22:43 -08:00
compat.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
configs.c kernel/configs: Replace GPL boilerplate code with SPDX identifier 2019-07-30 18:34:15 +02:00
context_tracking.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
cpu.c cpu/speculation: Uninline and export CPU mitigations helpers 2019-11-04 12:22:02 +01:00
cpu_pm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282 2019-06-05 17:36:37 +02:00
crash_core.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230 2019-06-19 17:09:06 +02:00
crash_dump.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
cred.c Merge branch 'access-creds' 2019-07-25 08:36:29 -07:00
delayacct.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 2019-05-21 11:52:39 +02:00
dma.c
elfcore.c kernel/elfcore.c: include proper prototypes 2019-09-25 17:51:39 -07:00
exec_domain.c
exit.c futex: Mark the begin of futex exit explicitly 2019-11-29 10:10:11 +01:00
extable.c extable: Add function to search only kernel exception table 2019-08-21 22:23:48 +10:00
fail_function.c fail_function: no need to check return value of debugfs_create functions 2019-06-03 15:49:06 +02:00
fork.c futex: Split futex_mm_release() for exit/exec 2019-11-29 10:10:10 +01:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c futex: Prevent exit livelock 2019-11-29 10:10:14 +01:00
gen_kheaders.sh kheaders: substituting --sort in archive creation 2019-10-17 09:08:19 +09:00
groups.c
hung_task.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
iomem.c mm/nvdimm: add is_ioremap_addr and use that to check ioremap address 2019-07-12 11:05:40 -07:00
irq_work.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
jump_label.c jump_label: Don't warn on __exit jump entries 2019-08-29 15:10:10 +01:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-08-27 16:19:56 +01:00
kcmp.c
kcov.c kcov: convert kcov.refcount to refcount_t 2019-03-07 18:32:02 -08:00
kexec.c kexec_load: Disable at runtime if the kernel is locked down 2019-08-19 21:54:15 -07:00
kexec_core.c kexec: bail out upon SIGKILL when allocating memory. 2019-09-25 17:51:40 -07:00
kexec_elf.c kexec_elf: support 32 bit ELF files 2019-09-06 23:58:44 +02:00
kexec_file.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
kexec_internal.h
kheaders.c kheaders: Move from proc to sysfs 2019-05-24 20:16:01 +02:00
kmod.c
kprobes.c Tracing updates: 2019-09-20 11:19:48 -07:00
ksysfs.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 170 2019-05-30 11:26:39 -07:00
kthread.c kthread: make __kthread_queue_delayed_work static 2019-10-16 09:20:58 -07:00
latencytop.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
module-internal.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
module.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
module_signature.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module_signing.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
notifier.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
nsproxy.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
padata.c padata: remove cpu_index from the parallel_queue 2019-09-13 21:15:41 +10:00
panic.c panic: ensure preemption is disabled during panic() 2019-10-07 15:47:19 -07:00
params.c lockdown: Lock down module params that specify hardware parameters (eg. ioport) 2019-08-19 21:54:16 -07:00
pid.c kernel/pid.c: convert struct pid count to refcount_t 2019-07-16 19:23:24 -07:00
pid_namespace.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
profile.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
ptrace.c ptrace: add PTRACE_GET_SYSCALL_INFO request 2019-07-16 19:23:24 -07:00
range.c
reboot.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
relay.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 13:27:20 -07:00
resource.c mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range() 2019-09-24 15:54:09 -07:00
rseq.c signal: Remove task parameter from force_sig 2019-05-27 09:36:28 -05:00
seccomp.c signal: Remove the signal number and task parameters from force_sig_info 2019-05-29 09:31:44 -05:00
signal.c cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop() 2019-10-11 08:39:57 -07:00
smp.c smp: Warn on function calls from softirq context 2019-07-20 11:27:16 +02:00
smpboot.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
smpboot.h
softirq.c Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 11:01:13 -07:00
stackleak.c stackleak: Mark stackleak_track_stack() as notrace 2018-12-05 19:31:44 -08:00
stacktrace.c stacktrace: Don't skip first entry on noncurrent tasks 2019-11-04 21:19:25 +01:00
stop_machine.c stop_machine: Avoid potential race behaviour 2019-10-17 12:47:12 +02:00
sys.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-17 12:35:15 -07:00
sys_ni.c arch: handle arches who do not yet define clone3 2019-06-21 01:54:53 +02:00
sysctl.c parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ define 2019-10-14 21:43:54 +02:00
sysctl_binary.c kernel/sysctl: add panic_print into sysctl 2019-01-04 13:13:47 -08:00
task_work.c
taskstats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
test_kprobes.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 2019-05-21 11:52:39 +02:00
torture.c torture: Remove exporting of internal functions 2019-08-01 14:30:22 -07:00
tracepoint.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
tsacct.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
ucount.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
uid16.c
uid16.h
umh.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
up.c smp: Remove smp_call_function() and on_each_cpu() return values 2019-06-23 14:26:26 +02:00
user-return-notifier.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
user.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
user_namespace.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
utsname.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
utsname_sysctl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
watchdog.c watchdog: Mark watchdog_hrtimer to expire in hard interrupt context 2019-08-01 20:51:20 +02:00
watchdog_hld.c kernel/watchdog_hld.c: hard lockup message should end with a newline 2019-04-19 09:46:05 -07:00
workqueue.c workqueue: Fix missing kfree(rescuer) in destroy_workqueue() 2019-12-17 19:56:54 +01:00
workqueue_internal.h sched/core, workqueues: Distangle worker accounting from rq lock 2019-04-16 16:55:15 +02:00