1
0
Fork 0
alistair23-linux/arch/x86/events/intel
Jiri Olsa 6f55967ad9 perf/x86/intel: Fix race in intel_pmu_disable_event()
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:

  3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")

The race causes panic for PEBS events with enabled callchains:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  ...
  RIP: 0010:perf_prepare_sample+0x8c/0x530
  Call Trace:
   <NMI>
   perf_event_output_forward+0x2a/0x80
   __perf_event_overflow+0x51/0xe0
   handle_pmi_common+0x19e/0x240
   intel_pmu_handle_irq+0xad/0x170
   perf_event_nmi_handler+0x2e/0x50
   nmi_handle+0x69/0x110
   default_do_nmi+0x3e/0x100
   do_nmi+0x11a/0x180
   end_repeat_nmi+0x16/0x1a
  RIP: 0010:native_write_msr+0x6/0x20
  ...
   </NMI>
   intel_pmu_disable_event+0x98/0xf0
   x86_pmu_stop+0x6e/0xb0
   x86_pmu_del+0x46/0x140
   event_sched_out.isra.97+0x7e/0x160
  ...

The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:

          x86_pmu_stop
            test_bit(hwc->idx, cpuc->active_mask)
            intel_pmu_disable_event(event)
            {
              ...
              intel_pmu_pebs_disable(event);
              ...

EVENT OVERFLOW ->  <NMI>
                     intel_pmu_handle_irq
                       handle_pmi_common
   TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                           perf_event_overflow
                             perf_prepare_sample
                             {
                               ...
                               if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                     data->callchain = perf_callchain(event, regs);

         CRASH ->              size += data->callchain->nr;
                             }
                   </NMI>
              ...
              x86_pmu_disable_event(event)
            }

            __clear_bit(hwc->idx, cpuc->active_mask);

Fixing this by disabling the event itself before setting
off the PEBS bit.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-05 13:00:48 +02:00
..
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bts.c perf/aux: Make perf_event accessible to setup_aux() 2019-02-06 10:00:39 -03:00
core.c perf/x86/intel: Fix race in intel_pmu_disable_event() 2019-05-05 13:00:48 +02:00
cstate.c perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters 2019-04-25 08:59:31 +02:00
ds.c perf/x86/kvm: Avoid unnecessary work in guest filtering 2019-02-11 08:00:39 +01:00
knc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
lbr.c x86/events: Mark expected switch-case fall-throughs 2019-01-29 16:23:47 +01:00
p4.c x86: Fix various typos in comments 2018-12-03 10:49:13 +01:00
p6.c x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping 2018-02-15 01:15:52 +01:00
pt.c perf/x86/intel/pt: Remove software double buffering PMU capability 2019-05-03 12:46:20 +02:00
pt.h perf/x86/intel/pt: Export pt_cap_get() 2018-12-21 11:28:32 +01:00
rapl.c perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs 2019-01-21 11:01:27 +01:00
uncore.c perf/x86/intel/uncore: Fix client IMC events return huge result 2019-03-09 14:10:31 +01:00
uncore.h perf/x86/intel/uncore: Fix client IMC events return huge result 2019-03-09 14:10:31 +01:00
uncore_nhmex.c perf/x86/intel/uncore: Correct fixed counter index check for NHM 2018-05-31 12:36:28 +02:00
uncore_snb.c perf/x86/intel/uncore: Fix client IMC events return huge result 2019-03-09 14:10:31 +01:00
uncore_snbep.c perf/x86/intel/uncore: Add Node ID mask 2019-02-04 08:44:43 +01:00