From ae23bff1d71f8b416ed740bc458df67355c77c92 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Sat, 24 Aug 2013 16:45:54 +0200 Subject: [PATCH] perf: Prevent race in unthrottling code The current throttling code triggers WARN below via following workload (only hit on AMD machine with 48 CPUs): # while [ 1 ]; do perf record perf bench sched messaging; done WARNING: at arch/x86/kernel/cpu/perf_event.c:1054 x86_pmu_start+0xc6/0x100() SNIP Call Trace: [] dump_stack+0x19/0x1b [] warn_slowpath_common+0x61/0x80 [] warn_slowpath_null+0x1a/0x20 [] x86_pmu_start+0xc6/0x100 [] perf_adjust_freq_unthr_context.part.75+0x182/0x1a0 [] perf_event_task_tick+0xc8/0xf0 [] scheduler_tick+0xd1/0x140 [] update_process_times+0x66/0x80 [] tick_sched_handle.isra.15+0x25/0x60 [] tick_sched_timer+0x41/0x60 [] __run_hrtimer+0x74/0x1d0 [] ? tick_sched_handle.isra.15+0x60/0x60 [] hrtimer_interrupt+0xf7/0x240 [] smp_apic_timer_interrupt+0x69/0x9c [] apic_timer_interrupt+0x6d/0x80 [] ? __perf_event_task_sched_in+0x184/0x1a0 [] ? kfree_skbmem+0x37/0x90 [] ? __slab_free+0x1ac/0x30f [] ? kfree+0xfd/0x130 [] kmem_cache_free+0x1b2/0x1d0 [] kfree_skbmem+0x37/0x90 [] consume_skb+0x34/0x80 [] unix_stream_recvmsg+0x4e7/0x820 [] sock_aio_read.part.7+0x116/0x130 [] ? __perf_sw_event+0x19c/0x1e0 [] sock_aio_read+0x21/0x30 [] do_sync_read+0x80/0xb0 [] vfs_read+0x145/0x170 [] SyS_read+0x49/0xa0 [] ? __audit_syscall_exit+0x1f6/0x2a0 [] system_call_fastpath+0x16/0x1b ---[ end trace 622b7e226c4a766a ]--- The reason is a race in perf_event_task_tick() throttling code. The race flow (simplified code): - perf_throttled_count is per cpu variable and is CPU throttling flag, here starting with 0 - perf_throttled_seq is sequence/domain for allowed count of interrupts within the tick, gets increased each tick on single CPU (CPU bounded event): ... workload perf_event_task_tick: | | T0 inc(perf_throttled_seq) | T1 needs_unthr = xchg(perf_throttled_count, 0) == 0 tick gets interrupted: ... event gets throttled under new seq ... T2 last NMI comes, event is throttled - inc(perf_throttled_count) back to tick: | perf_adjust_freq_unthr_context: | | T3 unthrottling is skiped for event (needs_unthr == 0) | T4 event is stop and started via freq adjustment | tick ends ... workload ... no sample is hit for event ... perf_event_task_tick: | | T5 needs_unthr = xchg(perf_throttled_count, 0) != 0 (from T2) | T6 unthrottling is done on event (interrupts == MAX_INTERRUPTS) | event is already started (from T4) -> WARN Fixing this by not checking needs_unthr again and thus check all events for unthrottling. Signed-off-by: Jiri Olsa Reported-by: Jan Stancek Suggested-by: Peter Zijlstra Cc: Corey Ashford Cc: Frederic Weisbecker Cc: Namhyung Kim Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Stephane Eranian Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1377355554-8934-1-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index f86599e8c123..258eaaffe95a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2712,7 +2712,7 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx, hwc = &event->hw; - if (needs_unthr && hwc->interrupts == MAX_INTERRUPTS) { + if (hwc->interrupts == MAX_INTERRUPTS) { hwc->interrupts = 0; perf_log_throttle(event, 1); event->pmu->start(event, 0);