Commit graph

197 commits

Author SHA1 Message Date
Aneesh Kumar K.V 891121e6c0 powerpc/mm: Differentiate between hugetlb and THP during page walk
We need to properly identify whether a hugepage is an explicit or
a transparent hugepage in follow_huge_addr(). We used to depend
on hugepage shift argument to do that. But in some case that can
result in wrong results. For ex:

On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
We do prevent reusing the pfn page via the usage of
kick_all_cpus_sync(). But that happens after we updated the pte to 0.
Hence in follow_huge_addr() we can find hugepage shift set, but transparent
huge page check fail for a thp pte.

NOTE: We fixed a variant of this race against thp split in commit
691e95fd73
("powerpc/mm/thp: Make page table walk safe against thp split/collapse")

Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
follow_page_mask occasionally.

In the long term, we may want to switch ppc64 64k page size config to
enable CONFIG_ARCH_WANT_GENERAL_HUGETLB

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-12 15:30:09 +11:00
Sukadev Bhattiprolu 8f3e5684d3 perf/core: Drop PERF_EVENT_TXN
We currently use PERF_EVENT_TXN flag to determine if we are in the middle
of a transaction. If in a transaction, we defer the schedulability checks
from pmu->add() operation to the pmu->commit() operation.

Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
we can use the type to determine if we are in a transaction and drop the
PERF_EVENT_TXN flag.

When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
becomes unused, so drop that field as well.

This is an extension of the Powerpc patch from Peter Zijlstra to s390,
Sparc and x86 architectures.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:30 +02:00
Sukadev Bhattiprolu 88a486132d powerpc, perf/powerpc/hv-24x7: Use PMU_TXN_READ interface
The 24x7 counters in Powerpc allow monitoring a large number of counters
simultaneously. They also allow reading several counters in a single
HCALL so we can get a more consistent snapshot of the system.

Use the PMU's transaction interface to monitor and read several event
counters at once. The idea is that users can group several 24x7 events
into a single group of events. We use the following logic to submit
the group of events to the PMU and read the values:

	pmu->start_txn()		// Initialize before first event

	for each event in group
		pmu->read(event);	// Queue each event to be read

	pmu->commit_txn()		// Read/update all queuedcounters

The ->commit_txn() also updates the event counts in the respective
perf_event objects.  The perf subsystem can then directly get the
event counts from the perf_event and can avoid submitting a new
->read() request to the PMU.

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-10-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:29 +02:00
Sukadev Bhattiprolu fbbe070115 perf/core: Add a 'flags' parameter to the PMU transactional interfaces
Currently, the PMU interface allows reading only one counter at a time.
But some PMUs like the 24x7 counters in Power, support reading several
counters at once. To leveage this functionality, extend the transaction
interface to support a "transaction type".

The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
i.e. used to _schedule_ all the events on the PMU as a group. A second
transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
by the 24x7 counters to read several counters at once.

Extend the transaction interfaces to the PMU to accept a 'txn_flags'
parameter and use this parameter to ignore any transactions that are
not of type PERF_PMU_TXN_ADD.

Thanks to Peter Zijlstra for his input.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[peterz: s390 compile fix]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:25 +02:00
Anshuman Khandual f0322f7f1e powerpc/perf: Change type of the bhrb_users variable
This patch just changes data type of bhrb_users variable from
int to unsigned int because it never contains a negative value.

Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-27 14:31:44 +10:00
Sukadev Bhattiprolu 465345ca38 powerpc/perf/hv-24x7: Simplify extracting counter from result buffer
Simplify code that extracts a 24x7 counter from the HCALL's result buffer.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-25 10:49:43 +10:00
Sukadev Bhattiprolu 40386217cd powerpc/perf/hv-24x7: Whitespace - fix parameter alignment
Fix parameter alignment to be consistent with coding style.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-25 10:48:30 +10:00
Sukadev Bhattiprolu 442053e57a powerpc/perf/24x7: Fix lockdep warning
The sysfs attributes for the 24x7 counters are dynamically allocated.
Initialize the attributes using sysfs_attr_init() to fix following
warning which occurs when CONFIG_DEBUG_LOCK_VMALLOC=y.

[    0.346249] audit: initializing netlink subsys (disabled)
[    0.346284] audit: type=2000 audit(1436295254.340:1): initialized
[    0.346489] BUG: key c0000000efe90198 not in .data!
[    0.346491] DEBUG_LOCKS_WARN_ON(1)
[    0.346502] ------------[ cut here ]------------
[    0.346504] WARNING: at ../kernel/locking/lockdep.c:3002
[    0.346506] Modules linked in:

Reported-by: Gustavo Luiz Duarte <gustavold@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: Gustavo Luiz Duarte <gustavold@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-08 15:18:04 +10:00
Anton Blanchard 72e349f112 powerpc/perf: Fix book3s kernel to userspace backtraces
When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.

If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().

Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:

0.11%  qemu-system-ppc  [kernel.kallsyms]        [k] _raw_spin_lock_irqsave
       |
       ---_raw_spin_lock_irqsave
          |
          |--52.35%-- 0
          |          |
          |          |--46.39%-- __hrtimer_start_range_ns
          |          |          kvmppc_run_core
          |          |          kvmppc_vcpu_run_hv
          |          |          kvmppc_vcpu_run
          |          |          kvm_arch_vcpu_ioctl_run
          |          |          kvm_vcpu_ioctl
          |          |          do_vfs_ioctl
          |          |          sys_ioctl
          |          |          system_call
          |          |          |
          |          |          |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
          |          |          |          |
          |          |          |           --100.00%-- 0x7e714
          |          |          |                     0x7e714

Notice the bogus _raw_spin_irqsave when we transition from kernel
(system_call) to userspace (0x7e714). We inserted what was in the SIAR.

Add a check in regs_use_siar() to check that the regs in question
are from a PMU exception. With this fix the backtrace makes sense:

     0.47%  qemu-system-ppc  [kernel.vmlinux]         [k] _raw_spin_lock_irqsave
            |
            ---_raw_spin_lock_irqsave
               |
               |--53.83%-- 0
               |          |
               |          |--44.73%-- hrtimer_try_to_cancel
               |          |          kvmppc_start_thread
               |          |          kvmppc_run_core
               |          |          kvmppc_vcpu_run_hv
               |          |          kvmppc_vcpu_run
               |          |          kvm_arch_vcpu_ioctl_run
               |          |          kvm_vcpu_ioctl
               |          |          do_vfs_ioctl
               |          |          sys_ioctl
               |          |          system_call
               |          |          __ioctl
               |          |          0x7e714
               |          |          0x7e714

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-02 13:26:38 +10:00
Aneesh Kumar K.V 691e95fd73 powerpc/mm/thp: Make page table walk safe against thp split/collapse
We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:39 +10:00
Linus Torvalds d19d5efd8c powerpc updates for 4.1
- Numerous minor fixes, cleanups etc.
 - More EEH work from Gavin to remove its dependency on device_nodes.
 - Memory hotplug implemented entirely in the kernel from Nathan Fontenot.
 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.
 - Rewrite of VPHN parsing logic & tests from Greg Kurz.
 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.
 - Support for pstore on powernv from Hari Bathini.
 - Removal of old powerpc specific byte swap routines by David Gibson.
 - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing
   your firmware when it wasn't.
 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.
 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek.
 - Some fixes for migration from Tyrel Datwyler.
 - A new syscall to switch the cpu endian by Michael Ellerman.
 - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn.
 - A fix for the OPAL sensor driver from Cédric Le Goater.
 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.
 - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per
   machine.
 - Small patch from Sam Bobroff to explicitly abort non-suspended transactions
   on syscalls, plus a test to exercise it.
 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.
 - Small patch to enable the hard lockup detector from Anton Blanchard.
 - Fix from Dave Olson for missing L2 cache information on some CPUs.
 - Some fixes from Michael Ellerman to get Cell machines booting again.
 - Freescale updates from Scott: Highlights include BMan device tree nodes, an
   MSI erratum workaround, a couple minor performance improvements, config
   updates, and misc fixes/cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVL2cxAAoJEFHr6jzI4aWAR8cP/19VTo/CzCE4ffPSx7qR464n
 F+WFZcbNjIMXu6+B0YLuJZEsuWtKKrCit/MCg3+mSgE4iqvxmtI+HDD0445Buszj
 UD4E4HMdPrXQ+KUSUDORvRjv/FFUXIa94LSv/0g2UeMsPz/HeZlhMxEu7AkXw9Nf
 rTxsmRTsOWME85Y/c9ss7XHuWKXT3DJV7fOoK9roSaN3dJAuWTtG3WaKS0nUu0ok
 0M81D6ZczoD6ybwh2DUMPD9K6SGxLdQ4OzQwtW6vWzcQIBDfy5Pdeo0iAFhGPvXf
 T4LLPkv4cF4AwHsAC4rKDPHQNa+oZBoLlScrHClaebAlDiv+XYKNdMogawUObvSh
 h7avKmQr0Ygp1OvvZAaXLhuDJI9FJJ8lf6AOIeULgHsDR9SyKMjZWxRzPe11uarO
 Fyi0qj3oJaQu6LjazZraApu8mo+JBtQuD3z3o5GhLxeFtBBF60JXj6zAXJikufnl
 kk1/BUF10nKUhtKcDX767AMUCtMH3fp5hx8K/z9T5v+pobJB26Wup1bbdT68pNBT
 NjdKUppV6QTjZvCsA6U2/ECu6E9KeIaFtFSL2IRRoiI0dWBN5/5eYn3RGkO2ZFoL
 1NdwKA2XJcchwTPkpSRrUG70sYH0uM2AldNYyaLfjzrQqza7Y6lF699ilxWmCN/H
 OplzJAE5cQ8Am078veTW
 =03Yh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - Numerous minor fixes, cleanups etc.

 - More EEH work from Gavin to remove its dependency on device_nodes.

 - Memory hotplug implemented entirely in the kernel from Nathan
   Fontenot.

 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

 - Rewrite of VPHN parsing logic & tests from Greg Kurz.

 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.

 - Support for pstore on powernv from Hari Bathini.

 - Removal of old powerpc specific byte swap routines by David Gibson.

 - Fix from Vasant Hegde to prevent the flash driver telling you it was
   flashing your firmware when it wasn't.

 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
   Stancek.

 - Some fixes for migration from Tyrel Datwyler.

 - A new syscall to switch the cpu endian by Michael Ellerman.

 - Large series from Wei Yang to implement SRIOV, reviewed and acked by
   Bjorn.

 - A fix for the OPAL sensor driver from Cédric Le Goater.

 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

 - Large series from Daniel Axtens to make our PCI hooks per PHB rather
   than per machine.

 - Small patch from Sam Bobroff to explicitly abort non-suspended
   transactions on syscalls, plus a test to exercise it.

 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

 - Small patch to enable the hard lockup detector from Anton Blanchard.

 - Fix from Dave Olson for missing L2 cache information on some CPUs.

 - Some fixes from Michael Ellerman to get Cell machines booting again.

 - Freescale updates from Scott: Highlights include BMan device tree
   nodes, an MSI erratum workaround, a couple minor performance
   improvements, config updates, and misc fixes/cleanup.

* tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
  powerpc/powermac: Fix build error seen with powermac smp builds
  powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
  powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
  powerpc/cell: Fix iommu breakage caused by controller_ops change
  powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
  powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
  powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
  powerpc/pseries: Correct memory hotplug locking
  powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
  powerpc: Add ppc64 hard lockup detector support
  oprofile: Disable oprofile NMI timer on ppc64
  powerpc/perf/hv-24x7: Add missing put_cpu_var()
  powerpc/perf/hv-24x7: Break up single_24x7_request
  powerpc/perf/hv-24x7: Define update_event_count()
  powerpc/perf/hv-24x7: Whitespace cleanup
  powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
  powerpc/perf/hv-24x7: Rename hv_24x7_event_update
  powerpc/perf/hv-24x7: Move debug prints to separate function
  powerpc/perf/hv-24x7: Drop event_24x7_request()
  powerpc/perf/hv-24x7: Use pr_devel() to log message
  ...

Conflicts:
	tools/testing/selftests/powerpc/Makefile
	tools/testing/selftests/powerpc/tm/Makefile
2015-04-16 13:53:32 -05:00
Linus Torvalds 6c8a53c9e6 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
2015-04-14 14:37:47 -07:00
Linus Torvalds d0bbe0dd35 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Usual trivial tree updates.  Nothing outstanding -- mostly printk()
  and comment fixes and unused identifier removals"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  goldfish: goldfish_tty_probe() is not using 'i' any more
  powerpc: Fix comment in smu.h
  qla2xxx: Fix printks in ql_log message
  lib: correct link to the original source for div64_u64
  si2168, tda10071, m88ds3103: Fix firmware wording
  usb: storage: Fix printk in isd200_log_config()
  qla2xxx: Fix printk in qla25xx_setup_mode
  init/main: fix reset_device comment
  ipwireless: missing assignment
  goldfish: remove unreachable line of code
  coredump: Fix do_coredump() comment
  stacktrace.h: remove duplicate declaration task_struct
  smpboot.h: Remove unused function prototype
  treewide: Fix typo in printk messages
  treewide: Fix typo in printk messages
  mod_devicetable: fix comment for match_flags
2015-04-14 09:50:27 -07:00
Anton Blanchard 9a5cbce421 powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
(currently 127), but we forgot to do the same for 64bit backtraces.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-14 16:21:16 +10:00
Li Zhong 7debc970ae powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
As Michael pointed out, create_events_from_catalog() fails when we
either have:
 - a kernel bug
 - some sort of hypervisor misconfiguration
 - ENOMEM

In all the above cases, we can also fail 24x7 initcall.

For hypervisor errors, EIO is used so there is something reported
in dmesg.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-14 13:19:15 +10:00
Sukadev Bhattiprolu b816ce67fc powerpc/perf/hv-24x7: Add missing put_cpu_var()
Add missing put_cpu_var() for 24x7 requests. This went missing in
commit f34b6c7 (3.18-rc3).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:27 +10:00
Sukadev Bhattiprolu aeab199d84 powerpc/perf/hv-24x7: Break up single_24x7_request
Break up the function single_24x7_request() into smaller functions.
This would later enable us to "prepare" a multi-event request
buffer and then submit a single hcall for several events.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:26 +10:00
Sukadev Bhattiprolu 529ce8c9dd powerpc/perf/hv-24x7: Define update_event_count()
Move the code to update an event count into a new function,
update_event_count().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:26 +10:00
Sukadev Bhattiprolu 3ca4ea71cb powerpc/perf/hv-24x7: Whitespace cleanup
Fix minor whitespace damages.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:25 +10:00
Sukadev Bhattiprolu e3ee15dc5d powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
Move code that maps a perf_event to a 24x7 request buffer into a
separate function, add_event_to_24x7_request().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:25 +10:00
Sukadev Bhattiprolu 33ba14c0d8 powerpc/perf/hv-24x7: Rename hv_24x7_event_update
For consistency with the pmu operation ->read() and with other
pmus, rename hv_24x7_event_update() to hv_24x7_event_read().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:24 +10:00
Sukadev Bhattiprolu f954825dd9 powerpc/perf/hv-24x7: Move debug prints to separate function
To simplify/cleanup code, move the rather long printk() to a separate
function.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:23 +10:00
Sukadev Bhattiprolu 8079876497 powerpc/perf/hv-24x7: Drop event_24x7_request()
The function event_24x7_request() is essentially a wrapper to the
function single_24x7_request() and can be dropped to simplify code.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:23 +10:00
Sukadev Bhattiprolu 7aabe0cec2 powerpc/perf/hv-24x7: Use pr_devel() to log message
Use pr_devel_ratelimited() to log error message when the 24x7 HCALL
fails. Since users specify events by their sysfs name, the HCALL should
succeed. Any errors reported by the HCALL would be of interest to the
developer, rather than the user/administrator.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:22 +10:00
Sukadev Bhattiprolu f2b1237c73 powerpc/perf/hv-24x7: Remove unnecessary parameter
Remove the 'success_expected' parameter and log the message unconditionally.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:22 +10:00
Sukadev Bhattiprolu 145264e212 powerpc/perf/hv-24x7: Modify definition of request and result buffers
The parameters to the 24x7 HCALL have variable number of elements in them.
Set the minimum number of such elements to 1 rather than 0 and eliminate
the temporary structures.

This would enable us to submit multiple counter requests and process
multiple results from a single HCALL (in a follow on patch).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:21 +10:00
Jan Stancek 68de8867ea powerpc/perf: add missing put_cpu_var in power_pmu_event_init
One path in power_pmu_event_init() calls get_cpu_var(), but is
missing matching call to put_cpu_var(), which causes preemption
imbalance and crash in user-space:

  Page fault in user mode with in_atomic() = 1 mm = c000001fefa5a280
  NIP = 3fff9bf2cae0  MSR = 900000014280f032
  Oops: Weird page fault, sig: 11 [#23]
  SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: <snip>
  CPU: 43 PID: 10285 Comm: a.out Tainted: G      D         4.0.0-rc5+ #1
  task: c000001fe82c9200 ti: c000001fe835c000 task.ti: c000001fe835c000
  NIP: 00003fff9bf2cae0 LR: 00003fff9bee4898 CTR: 00003fff9bf2cae0
  REGS: c000001fe835fea0 TRAP: 0401   Tainted: G      D          (4.0.0-rc5+)
  MSR: 900000014280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI>  CR: 22000028  XER: 00000000
  CFAR: 00003fff9bee4894 SOFTE: 1
   GPR00: 00003fff9bee494c 00003fffe01c2ee0 00003fff9c084410 0000000010020068
   GPR04: 0000000000000000 0000000000000002 0000000000000008 0000000000000001
   GPR08: 0000000000000001 00003fff9c074a30 00003fff9bf2cae0 00003fff9bf2cd70
   GPR12: 0000000052000022 00003fff9c10b700
  NIP [00003fff9bf2cae0] 0x3fff9bf2cae0
  LR [00003fff9bee4898] 0x3fff9bee4898
  Call Trace:
  ---[ end trace 5d3d952b5d4185d4 ]---

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
  in_atomic(): 1, irqs_disabled(): 0, pid: 10285, name: a.out
  INFO: lockdep is turned off.
  CPU: 43 PID: 10285 Comm: a.out Tainted: G      D         4.0.0-rc5+ #1
  Call Trace:
  [c000001fe835f990] [c00000000089c014] .dump_stack+0x98/0xd4 (unreliable)
  [c000001fe835fa10] [c0000000000e4138] .___might_sleep+0x1d8/0x2e0
  [c000001fe835faa0] [c000000000888da8] .down_read+0x38/0x110
  [c000001fe835fb30] [c0000000000bf2f4] .exit_signals+0x24/0x160
  [c000001fe835fbc0] [c0000000000abde0] .do_exit+0xd0/0xe70
  [c000001fe835fcb0] [c00000000001f4c4] .die+0x304/0x450
  [c000001fe835fd60] [c00000000088e1f4] .do_page_fault+0x2d4/0x900
  [c000001fe835fe30] [c000000000008664] handle_page_fault+0x10/0x30
  note: a.out[10285] exited with preempt_count 1

Reproducer:
  #include <stdio.h>
  #include <unistd.h>
  #include <syscall.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <linux/perf_event.h>
  #include <linux/hw_breakpoint.h>

  static struct perf_event_attr event = {
          .type = PERF_TYPE_RAW,
          .size = sizeof(struct perf_event_attr),
          .sample_type = PERF_SAMPLE_BRANCH_STACK,
          .branch_sample_type = PERF_SAMPLE_BRANCH_ANY_RETURN,
  };

  int main()
  {
          syscall(__NR_perf_event_open, &event, 0, -1, -1, 0);
  }

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-27 20:07:01 +11:00
Masanari Iida f42cf8d6a3 treewide: Fix typo in printk messages
This patch fix spelling typo in printk messages.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-06 23:04:40 +01:00
Ingo Molnar e9e4e44309 Linux 34.0-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW
 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L
 G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU
 do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN
 KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg
 r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss=
 =vXB6
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc1' into perf/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-26 12:24:50 +01:00
Peter Zijlstra acba3c7e46 perf, powerpc: Fix up flush_branch_stack() users
The recent LBR rework for x86 left a stray flush_branch_stack() user in
the PowerPC code, fix that up.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:24:57 +01:00
Michael Ellerman a604c96eb0 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include 8xx optimizations, some more work on datapath device
tree content, e300 machine check support, t1040 corenet error reporting,
and various cleanups and fixes."
2015-02-04 12:03:21 +11:00
Cody P Schafer 97bf264018 powerpc/perf/hv-gpci: add the remaining gpci requests
Add the remaining gpci requests that contain counters suitable for use
by perf. Omit those that don't contain any counters (but note their
ommision).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:39 +11:00
Cody P Schafer 9e9f601084 powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
This adds (in req-gen/) a framework for defining gpci counter requests.
It uses macro magic similar to ftrace.

Also convert the existing hv-gpci request structures and enum values to
use the new framework (and adjust old users of the structs and enum
values to cope with changes in naming).

In exchange for this macro disaster, we get autogenerated event listing
for GPCI in sysfs, build time field offset checking, and zero
duplication of information about GPCI requests.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:39 +11:00
Cody P Schafer 5c5cd7b502 powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
Retrieves and parses the 24x7 catalog on POWER systems that supply it
(right now, only POWER 8). Events are exposed via sysfs in the standard
fashion, and are all parameterized.

	$ cd /sys/bus/event_source/devices/hv_24x7/events

	$ cat HPM_CS_FROM_L4_LDATA__PHYS_CORE
	domain=0x2,offset=0xd58,core=?,lpar=0x0

	$ cat HPM_TLBIE__VCPU_HOME_CHIP
	domain=0x4,offset=0x358,vcpu=?,lpar=?

where user is required to specify values for the fields with '?' (like
core, vcpu, lpar above), when specifying the event with the perf tool.

Catalog is (at the moment) only parsed on boot. It needs re-parsing
when a some hypervisor events occur. At that point we'll also need to
prevent old events from continuing to function (counter that is passed
in via spare space in the config values?).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:38 +11:00
sukadev@linux.vnet.ibm.com e08e52824e perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
Define a lite version of the EVENT_DEFINE_RANGE_FORMAT() that avoids
defining helper functions for the bit-field ranges.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:38 +11:00
Alexandru-Cezar Sardan 0d7d9b3a45 perf/powerpc: reset event hw state when adding it to the PMU
When adding an event to the PMU with PERF_EF_START the STOPPED and UPTODATE
flags need to be cleared in the hw.event status variable because they are
preventing the update of the event count on overflow interrupt.

Signed-off-by: Alexandru-Cezar Sardan <alexandru.sardan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 23:44:18 -06:00
Tom Huynh d2caa3cebd powerpc/perf: fix fsl_emb_pmu_start to write correct pmc value
PMCs on PowerPC increases towards 0x80000000 and triggers an overflow
interrupt when the msb is set to collect a sample. Therefore, to setup
for the next sample collection, pmu_start should set the pmc value to
0x80000000 - left instead of left which incorrectly delays the next
overflow interrupt. Same as commit 9a45a9407c ("powerpc/perf:
power_pmu_start restores incorrect values, breaking frequency events")
for book3s.

Signed-off-by: Tom Huynh <tom.huynh@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 20:05:56 -06:00
Sukadev Bhattiprolu ec2aef5a8d power/perf/hv-24x7: Use kmem_cache_free() instead of kfree
Use kmem_cache_free() to free a buffer allocated with kmem_cache_alloc().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12 16:06:13 +11:00
sukadev@linux.vnet.ibm.com f34b6c72c3 powerpc/perf/hv-24x7: Use per-cpu page buffer
The 24x7 counters are continuously running and not updated on an
interrupt. So we record the event counts when stopping the event or
deleting it.

But to "read" a single counter in 24x7, we allocate a page and pass it
into the hypervisor (The HV returns the page full of counters from which
we extract the specific counter for this event).

We allocate a page using GFP_USER and when deleting the event, we end up
with the following warning because we are blocking in interrupt context.

  [  698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000

We could use GFP_ATOMIC but that could result in failures. Pre-allocate
a buffer so we don't have to allocate in interrupt context. Further as
Michael Ellerman suggested, use Per-CPU buffer so we only need to
allocate once per CPU.

Cc: stable@vger.kernel.org
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12 16:05:18 +11:00
Christoph Lameter 69111bac42 powerpc: Replace __get_cpu_var uses
This still has not been merged and now powerpc is the only arch that does
not have this change. Sorry about missing linuxppc-dev before.

V2->V2
  - Fix up to work against 3.18-rc1

__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e.  using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
[mpe: Fix build errors caused by set/or_softirq_pending(), and rework
      assignment in __set_breakpoint() to use memcpy().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-03 12:12:32 +11:00
Peter Zijlstra c719f56092 perf: Fix and clean up initialization of pmu::event_idx
Andy reported that the current state of event_idx is rather confused.
So remove all but the x86_pmu implementation and change the default to
return 0 (the safe option).

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Himangi Saraogi <himangi774@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com>
Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux390@de.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:51:01 +01:00
sukadev@linux.vnet.ibm.com 56f12bee55 powerpc/perf/hv-24x7: Simplify catalog_read()
catalog_read() implements the read interface for the sysfs file

	/sys/bus/event_source/devices/hv_24x7/interface/catalog

It essentially takes a buffer, an offset and count as parameters
to the read() call.  It makes a hypervisor call to read a specific
page from the catalog and copy the required bytes into the given
buffer. Each call to catalog_read() returns at most one 4K page.

Given these requirements, we should be able to simplify the
catalog_read().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-07 16:57:10 +11:00
Cody P Schafer 48bee8a6c9 powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations
Ian pointed out the use of __aligned(4096) caused rather large stack
consumption in single_24x7_request(), so use the kmem_cache
hv_page_cache (which we've already got set up for other allocations)
insead of allocating locally.

CC: Haren Myneni <hbabu@us.ibm.com>
Reported-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Cody P Schafer <dev@codyps.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-07 16:52:58 +11:00
Anton Blanchard e51df2c170 powerpc: Make a bunch of things static
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:41 +10:00
Anton Blanchard 85101af13b powerpc/perf: Fix ABIv2 kernel backtraces
ABIv2 kernels are failing to backtrace through the kernel. An example:

39.30%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
            |
            --- find_get_entry
               __GI___libc_read

The problem is in valid_next_sp() where we check that the new stack
pointer is at least STACK_FRAME_OVERHEAD below the previous one.

ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes
and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes
with no paramter save area.

STACK_FRAME_OVERHEAD is in theory the minimum stack frame size,
but we over 240 uses of it, some of which assume that it includes
space for the parameter area.

We need to work through all our stack defines and rationalise them
but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using
in valid_next_sp(). This fixes the issue:

30.64%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
            |
            --- find_get_entry
               pagecache_get_page
               generic_file_read_iter
               new_sync_read
               vfs_read
               sys_read
               syscall_exit
               __GI___libc_read

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
2014-09-09 19:02:45 +10:00
Himangi Saraogi d658972284 powerpc/perf/hv-24x7: Use kmem_cache_free
Free memory allocated using kmem_cache_zalloc using kmem_cache_free
rather than kfree.

The Coccinelle semantic patch that makes this change is as follows:

// <smpl>
@@
expression x,E,c;
@@

 x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
 ... when != x = E
     when != &x
?-kfree(x)
+kmem_cache_free(c,x)
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:14:04 +10:00
Linus Torvalds f536b3cae8 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "This is the powerpc new goodies for 3.17.  The short story:

  The biggest bit is Michael removing all of pre-POWER4 processor
  support from the 64-bit kernel.  POWER3 and rs64.  This gets rid of a
  ton of old cruft that has been bitrotting in a long while.  It was
  broken for quite a few versions already and nobody noticed.  Nobody
  uses those machines anymore.  While at it, he cleaned up a bunch of
  old dusty cabinets, getting rid of a skeletton or two.

  Then, we have some base VFIO support for KVM, which allows assigning
  of PCI devices to KVM guests, support for large 64-bit BARs on
  "powernv" platforms, support for HMI (Hardware Management Interrupts)
  on those same platforms, some sparse-vmemmap improvements (for memory
  hotplug),

  There is the usual batch of Freescale embedded updates (summary in the
  merge commit) and fixes here or there, I think that's it for the
  highlights"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits)
  powerpc/eeh: Export eeh_iommu_group_to_pe()
  powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API
  powerpc: Reduce scariness of interrupt frames in stack traces
  powerpc: start loop at section start of start in vmemmap_populated()
  powerpc: implement vmemmap_free()
  powerpc: implement vmemmap_remove_mapping() for BOOK3S
  powerpc: implement vmemmap_list_free()
  powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
  powerpc/book3s: Fix endianess issue for HMI handling on napping cpus.
  powerpc/book3s: handle HMIs for cpus in nap mode.
  powerpc/powernv: Invoke opal call to handle hmi.
  powerpc/book3s: Add basic infrastructure to handle HMI in Linux.
  powerpc/iommu: Fix comments with it_page_shift
  powerpc/powernv: Handle compound PE in config accessors
  powerpc/powernv: Handle compound PE for EEH
  powerpc/powernv: Handle compound PE
  powerpc/powernv: Split ioda_eeh_get_state()
  powerpc/powernv: Allow to freeze PE
  powerpc/powernv: Enable M64 aperatus for PHB3
  powerpc/eeh: Aux PE data for error log
  ...
2014-08-07 08:50:34 -07:00
Linus Torvalds ef35ad26f8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Kernel side changes:

   - Consolidate the PMU interrupt-disabled code amongst architectures
     (Vince Weaver)

   - misc fixes

  Tooling changes (new features, user visible changes):

   - Add support for pagefault tracing in 'trace', please see multiple
     examples in the changeset messages (Stanislav Fomichev).

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add header for columns in 'top' and 'report' TUI browsers (Jiri
     Olsa)

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add IO mode into timechart command (Stanislav Fomichev)

   - Fallback to syscalls:* when raw_syscalls:* is not available in the
     perl and python perf scripts.  (Daniel Bristot de Oliveira)

   - Add --repeat global option to 'perf bench' to be used in benchmarks
     such as the existing 'futex' one, that was modified to use it
     instead of a local option.  (Davidlohr Bueso)

   - Fix fd -> pathname resolution in 'trace', be it using /proc or a
     vfs_getname probe point.  (Arnaldo Carvalho de Melo)

   - Add suggestion of how to set perf_event_paranoid sysctl, to help
     non-root users trying tools like 'trace' to get a working
     environment.  (Arnaldo Carvalho de Melo)

   - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup
     (Steven Rostedt, Jan Kiszka)

   - Support S/390 in 'perf kvm stat' (Alexander Yarygin)

  Tooling infrastructure changes:

   - Allow reserving a row for header purposes in the hists browser
     (Arnaldo Carvalho de Melo)

   - Various fixes and prep work related to supporting Intel PT (Adrian
     Hunter)

   - Introduce multiple debug variables control (Jiri Olsa)

   - Add callchain and additional sample information for python scripts
     (Joseph Schuchart)

   - More prep work to support Intel PT: (Adrian Hunter)
     - Polishing 'script' BTS output
     - 'inject' can specify --kallsym
     - VDSO is per machine, not a global var
     - Expose data addr lookup functions previously private to 'script'
     - Large mmap fixes in events processing

   - Include standard stringify macros in power pc code (Sukadev
     Bhattiprolu)

  Tooling cleanups:

   - Convert open coded equivalents to asprintf() (Andy Shevchenko)

   - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)

   - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de
     Melo)

   - No need to reimplement err() in 'perf bench sched-messaging', drop
     barf().  (Davidlohr Bueso).

   - Remove ev_name argument from perf_evsel__hists_browse, can be
     obtained from the other parameters.  (Jiri Olsa)

  Tooling fixes:

   - Fix memory leak in the 'sched-messaging' perf bench test.
     (Davidlohr Bueso)

   - The -o and -n 'perf bench mem' options are mutually exclusive, emit
     error when both are specified.  (Davidlohr Bueso)

   - Fix scrollbar refresh row index in the ui browser, problem exposed
     now that headers will be added and will be allowed to be switched
     on/off.  (Jiri Olsa)

   - Handle the num array type in python properly (Sebastian Andrzej
     Siewior)

   - Fix wrong condition for allocation failure (Jiri Olsa)

   - Adjust callchain based on DWARF debug info on powerpc (Sukadev
     Bhattiprolu)

   - Fix a risk for doing free on uninitialized pointer in traceevent
     lib (Rickard Strandqvist)

   - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa)

   - Enable close-on-exec flag on perf file descriptor (Yann Droneaud)

   - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo)

   - Event ordering fixes (Jiri Olsa)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits)
  Revert "perf tools: Fix jump label always changing during tracing"
  perf tools: Fix perf usage string leftover
  perf: Check permission only for parent tracepoint event
  perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds
  perf record: Always force PERF_RECORD_FINISHED_ROUND event
  perf inject: Add --kallsyms parameter
  perf tools: Expose 'addr' functions so they can be reused
  perf session: Fix accounting of ordered samples queue
  perf powerpc: Include util/util.h and remove stringify macros
  perf tools: Fix build on gcc 4.4.7
  perf tools: Add thread parameter to vdso__dso_findnew()
  perf tools: Add dso__type()
  perf tools: Separate the VDSO map name from the VDSO dso name
  perf tools: Add vdso__new()
  perf machine: Fix the lifetime of the VDSO temporary file
  perf tools: Group VDSO global variables into a structure
  perf session: Add ability to skip 4GiB or more
  perf session: Add ability to 'skip' a non-piped event stream
  perf tools: Pass machine to vdso__dso_findnew()
  perf tools: Add dso__data_size()
  ...
2014-08-04 16:09:53 -07:00
Ingo Molnar 5030c69755 Linux 3.16-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT1VYNAAoJEHm+PkMAQRiGQJwIAKSYp1Uqz5O/e5r0V1TlZKT4
 1B4Njopl57PwSrJQWcGEuH2yHyM896vfPO4L6BJIOfyWzh8kwpQqclDt6uhXoF/v
 OsO1zb/7/j+n/pDZsePqP9AyIgErsHEBgUbhecDqzjN++ITPcZjQ6TIMPglZaumN
 jFAdAZuAaEwqAk8jqN2wlm689Fh9MuUEarHXbXLCqu5RgLrWhFGhp/cTWY62aqnZ
 XfEeQ9KtpRZmlR/IYjerbb1eRH7ZdJsZ88WngLX9dj/JdNxHWBkWQBXGAusXk5Fk
 y6LsIV3TjyBdrRKJ1Ifyg/2EIXHNBs8HxTFGXpjtp2HPuMLDxZOWOWikb9URtNg=
 =Fjf4
 -----END PGP SIGNATURE-----

Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:00:33 +02:00
Michael Ellerman 9de5cb0f6d powerpc/perf: Add per-event excludes on Power8
Power8 has a new register (MMCR2), which contains individual freeze bits
for each counter. This is an improvement on previous chips as it means
we can have multiple events on the PMU at the same time with different
exclude_{user,kernel,hv} settings. Previously we had to ensure all
events on the PMU had the same exclude settings.

The core of the patch is fairly simple. We use the 207S feature flag to
indicate that the PMU backend supports per-event excludes, if it's set
we skip the generic logic that enforces the equality of excludes between
events. We also use that flag to skip setting the freeze bits in MMCR0,
the PMU backend is expected to have handled setting them in MMCR2.

The complication arises with EBB. The FCxP bits in MMCR2 are accessible
R/W to a task using EBB. Which means a task using EBB will be able to
see that we are using MMCR2 for freezing, whereas the old logic which
used MMCR0 is not user visible.

The task can not see or affect exclude_kernel & exclude_hv, so we only
need to consider exclude_user.

The table below summarises the behaviour both before and after this
commit is applied:

 exclude_user           true  false
 ------------------------------------
        | User visible |  N    N
 Before | Can freeze   |  Y    Y
        | Can unfreeze |  N    Y
 ------------------------------------
        | User visible |  Y    Y
  After | Can freeze   |  Y    Y
        | Can unfreeze |  Y/N  Y
 ------------------------------------

So firstly I assert that the simple visibility of the exclude_user
setting in MMCR2 is a non-issue. The event belongs to the task, and
was most likely created by the task. So the exclude_user setting is not
privileged information in any way.

Secondly, the behaviour in the exclude_user = false case is unchanged.
This is important as it is the case that is actually useful, ie. the
event is created with no exclude setting and the task uses MMCR2 to
implement exclusion manually.

For exclude_user = true there is no meaningful change to freezing the
event. Previously the task could use MMCR2 to freeze the event, though
it was already frozen with MMCR0. With the new code the task can use
MMCR2 to freeze the event, though it was already frozen with MMCR2.

The only real change is when exclude_user = true and the task tries to
use MMCR2 to unfreeze the event. Previously this had no effect, because
the event was already frozen in MMCR0. With the new code the task can
unfreeze the event in MMCR2, but at some indeterminate time in the
future the kernel will overwrite its setting and refreeze the event.

Therefore my final assertion is that any task using exclude_user = true
and also fiddling with MMCR2 was deeply confused before this change, and
remains so after it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:30:58 +10:00
Michael Ellerman 8abd818fc7 powerpc/perf: Pass the struct perf_events down to compute_mmcr()
To support per-event exclude settings on Power8 we need access to the
struct perf_events in compute_mmcr().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:30:47 +10:00
Michael Ellerman 79a4cb28a0 powerpc/perf: Clear all MMCR settings before calling compute_mmcr()
Because we reuse cpuhw->mmcr on each call to compute_mmcr() there's a
risk that we could forget to set one of the values and use whatever
value was in there previously.

Currently all the implementations are careful to set all the values, but
it's safer to clear them all before we call compute_mmcr().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:11:34 +10:00
Michael Ellerman 8903461c9b powerpc/perf: Fix MMCR2 handling for EBB
In the recent commit b50a6c584b "Clear MMCR2 when enabling PMU", I
screwed up the handling of MMCR2 for tasks using EBB.

We must make sure we set MMCR2 *before* ebb_switch_in(), otherwise we
overwrite the value of MMCR2 that userspace may have written. That
potentially breaks a task that uses EBB and manually uses MMCR2 for
event freezing.

Fixes: b50a6c584b ("powerpc/perf: Clear MMCR2 when enabling PMU")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-23 17:16:47 +10:00
Anton Blanchard f56029410a powerpc/perf: Never program book3s PMCs with values >= 0x80000000
We are seeing a lot of PMU warnings on POWER8:

    Can't find PMC that caused IRQ

Looking closer, the active PMC is 0 at this point and we took a PMU
exception on the transition from negative to 0. Some versions of POWER8
have an issue where they edge detect and not level detect PMC overflows.

A number of places program the PMC with (0x80000000 - period_left),
where period_left can be negative. We can either fix all of these or
just ensure that period_left is always >= 1.

This patch takes the second option.

Cc: <stable@vger.kernel.org>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 13:50:47 +10:00
Joel Stanley b50a6c584b powerpc/perf: Clear MMCR2 when enabling PMU
On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze
the PMU counters. Aside from on boot they are then never reset,
resulting in stuck perf counters for any user in the guest or host.

We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane
state for perf to use the PMU counters under either the guest or the
host.

This was manifesting as a bug with ppc64_cpu --frequency:

    $ sudo ppc64_cpu --frequency
    WARNING: couldn't run on cpu 0
    WARNING: couldn't run on cpu 8
      ...
    WARNING: couldn't run on cpu 144
    WARNING: couldn't run on cpu 152
    min:    18446744073.710 GHz (cpu -1)
    max:    0.000 GHz (cpu -1)
    avg:    0.000 GHz

The command uses a perf counter to measure CPU cycles over a fixed
amount of time, in order to approximate the frequency of the machine.
The counters were returning zero once a guest was started, regardless of
weather it was still running or had been shut down.

By dumping the value of MMCR2, it was observed that once a guest is
running MMCR2 is set to 1s - which stops counters from running:

    $ sudo sh -c 'echo p > /proc/sysrq-trigger'
    CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6
    PMC1:  5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000
    PMC5:  1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef
    MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000
    MMCR2: fffffffffffffc00 EBBHR: 0000000000000000
    EBBRR: 0000000000000000 BESCR: 0000000000000000
    SIAR:  00000000000a51cc SDAR:  c00000000fc40000 SIER:  0000000001000000

This is done unconditionally in book3s_hv_interrupts.S upon entering the
guest, and the original value is only save/restored if the host has
indicated it was using the PMU. This is okay, however the user of the
PMU needs to ensure that it is in a defined state when it starts using
it.

Fixes: e05b9b9e5c ("powerpc/perf: Power8 PMU support")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:08 +10:00
Joel Stanley 4d9690dd56 powerpc/perf: Add PPMU_ARCH_207S define
Instead of separate bits for every POWER8 PMU feature, have a single one
for v2.07 of the architecture.

This saves us adding a MMCR2 define for a future patch.

Cc: stable@vger.kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:07 +10:00
Vince Weaver cc56d673a9 powerpc, perf: Use common PMU interrupt disabled code
Transition to using the new generic PERF_PMU_CAP_NO_INTERRUPT method for
failing a sampling event when no PMU interrupt is available.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406191435440.27913@vincent-weaver-1.umelst.maine.edu
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-05 11:21:51 +02:00
Cody P Schafer bbad3e50e8 powerpc/perf/hv-24x7: Catalog version number is be64, not be32
The catalog version number was changed from a be32 (with proceeding
32bits of padding) to a be64, update the code to treat it as a be64

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 16:31:50 +10:00
Cody P Schafer 1ee9fcc1a0 powerpc/perf/hv-24x7: Remove [static 4096], sparse chokes on it
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 13:11:27 +10:00
Cody P Schafer 78d13166b1 powerpc/perf/hv-24x7: Use (unsigned long) not (u32) values when calling plpar_hcall_norets()
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 13:11:26 +10:00
Cody P Schafer 58a685c2d8 powerpc/perf/hv-gpci: Make device attr static
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 13:11:26 +10:00
Cody P Schafer 0a8cf9e28c powerpc/perf/hv_gpci: Probe failures use pr_debug(), and padding reduced
fixup for "powerpc/perf: Add support for the hv gpci (get performance
counter info) interface".

Makes the "not enabled" message less awful (and hidden unless
debugging).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 13:11:25 +10:00
Cody P Schafer e98bf005d5 powerpc/perf/hv_24x7: Probe errors changed to pr_debug(), padding fixed
fixup for "powerpc/perf: Add support for the hv 24x7 interface"

Makes the "not enabled" message less awful (and hides it in most cases).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 13:11:25 +10:00
Michael Ellerman e9aaac1ac3 powerpc/perf: Fix handling of L3 events with bank == 1
Currently we reject events which have the L3 bank == 1, such as
0x000084918F, because the cache field is non-zero.

However that is incorrect, because although the bank is non-zero, the
value we would write into MMCRC is zero, and so we can count the event.

So fix the check to ignore the bank selector when checking whether the
cache selector is non-zero.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:33 +11:00
Cody P Schafer 30daeb6c8f powerpc/perf: Add kconfig option for hypervisor provided counters
The commit adds a Kconfig option which allows the hv_gpci and hv_24x7
PMUs, added in the preceeding commits, to be built.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:32 +11:00
Cody P Schafer 0e93a6edd9 powerpc/perf: Add support for the hv 24x7 interface
This provides a basic interface between hv_24x7 and perf. Similar to
the one provided for gpci, it lacks transaction support and does not
list any events.

Example usage via perf tool:

	perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:32 +11:00
Cody P Schafer 220a0c609a powerpc/perf: Add support for the hv gpci (get performance counter info) interface
This provides a basic link between perf and hv_gpci. Notably, it does
not yet support transactions and does not list any events (they can
still be manually composed).

Example usage via perf tool:

	perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:31 +11:00
Cody P Schafer 7b43c67950 powerpc/perf: Add macros for defining event fields & formats
Add two macros which generate functions to extract the relevent bits
from event->attr.config{,1,2}.

EVENT_DEFINE_RANGE() defines an accessor for a range of bits in the
event, as well as a "max" function that gives the maximum value of the
field based on the bit width.

EVENT_DEFINE_RANGE_FORMAT() defines the accessor & max routine and also
a format attribute for use in the PMU's attr_groups.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
[mpe: move to powerpc, ugly but descriptive macro names]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:31 +11:00
Cody P Schafer 2d1b21ad7d powerpc/perf: Add a shared interface to get gpci version and capabilities
This exposes a simple way to grab the firmware provided
collect_priveliged, ga, expanded, and lab capability bits. All of these
bits come in from the same gpci request, so we've exposed all of them.

Only the collect_priveliged bit is really used by the hv-gpci/hv-24x7
code, the other bits are simply exposed in sysfs to inform the user.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:30 +11:00
Cody P Schafer a8b2c43671 powerpc/perf: Add 24x7 interface headers
24x7 (also called hv_24x7 or H_24X7) is an interface to obtain
performance counters from the hypervisor. These counters do not have a
fixed format/possition and are instead documented in a "24x7 Catalog",
which is provided by the hypervisor (that interface is also documented
paritialy in the included hv-24x7-catalog.h and fully in at
https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h ).

The 24x7 data access is simply a copy operation into a 4 dimentional
array of 64bit counters (from hypervisor to kernel memory). There is no
interupt triggered on overflow, these are completely disjoint from the
typical power pmu.

This method of obtaining performance counters from the hypervisor is
intended to paritialy replace the gpci interface.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:29 +11:00
Cody P Schafer a67f144739 powerpc/perf: Add hv_gpci interface header
"H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
here on) is an interface to retrieve specific performance counters and
other data from the hypervisor. All outputs have a fixed format. This
header only describes the portions of the interface that we plan on
using in linux at this time.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:29 +11:00
Michael Ellerman 76cb8a783a powerpc/perf: Enable BHRB access for EBB events
The previous commit added constraint and register handling to allow
processes using EBB (Event Based Branches) to request access to the BHRB
(Branch History Rolling Buffer).

With that in place we can allow processes using EBB to access the BHRB.
This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We
must also clear BHRBA when we are disabling.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:27 +11:00
Michael Ellerman ba969237cf powerpc/perf: Add BHRB constraint and IFM MMCRA handling for EBB
We want a way for users of EBB (Event Based Branches) to also access the
BHRB (Branch History Rolling Buffer). EBB does not interoperate with our
existing BHRB support, which is wired into the generic Linux branch
stack sampling support.

To support EBB & BHRB we add three new bits to the event code. The first
bit indicates that the event wants access to the BHRB, and the other two
bits indicate the desired IFM (Instruction Filtering Mode).

We allow multiple events to request access to the BHRB, but they must
agree on the IFM value. Events which are not interested in the BHRB can
also interoperate with events which do.

Finally we program the desired IFM value into MMCRA. Although we do this
for every event, we know that the value will be identical for all events
that request BHRB access.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:27 +11:00
Michael Ellerman 7cbba63028 powerpc/perf: Avoid mutating event in power8_get_constraint()
We only need to mask the EBB bit out of the event for the check of the
special PMC 5 & 6 events. So use a local to do it just for that code,
rather than changing the event value for the life of the function.

While we're there move the set of mask and value after all the checks.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:26 +11:00
Michael Ellerman fb568d763f powerpc/perf: Clean up the EBB hash defines a little
Rather than using PERF_EVENT_CONFIG_EBB_SHIFT everywhere, add an
EVENT_EBB_SHIFT like every other event and use that.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:26 +11:00
Michael Ellerman 58b5fb0049 powerpc/perf: Reject EBB events which specify a sample_type
Although we already block EBB events which request sampling using
sample_period, technically it's possible for an event to set sample_type
but not sample_period.

Nothing terrible will happen if an EBB event does specify sample_type,
but it signals a major confusion on the part of userspace, and so we do
them the favor of rejecting it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:25 +11:00
Michael Ellerman c2e37a2626 powerpc/perf: Add lost exception workaround
Some power8 revisions have a hardware bug where we can lose a PMU
exception, this commit adds a workaround to detect the bad condition and
rectify the situation.

See the comment in the commit for a full description.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:25 +11:00
Anshuman Khandual 5f6d0380c6 powerpc/perf: Define perf_event_print_debug() to print PMU register values
Currently the sysrq ShowRegs command does not print any PMU registers as
we have an empty definition for perf_event_print_debug(). This patch
defines perf_event_print_debug() to print various PMU registers.

Example output:

CPU: 0 PMU registers, ppmu = POWER7 n_counters = 6
PMC1:  00000000 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000
PMC5:  00000000 PMC6: 00000000 PMC7: deadbeef PMC8: deadbeef
MMCR0: 0000000080000000 MMCR1: 0000000000000000 MMCRA: 0f00000001000000
SIAR:  0000000000000000 SDAR:  0000000000000000 SIER:  0000000000000000

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Fix 32 bit build and rework formatting for compactness]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:23 +11:00
Anshuman Khandual 2f0695232c powerpc/perf: Make some new raw event codes available in sysfs
This patchset adds some missing event list for POWER7 PMU raw
events which are exported through sysfs interface. Also updates
the ABI documentation to add all the sysfs exported raw events.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:23 +11:00
Anshuman Khandual b4d6c06c8d powerpc/perf: Configure BHRB filter before enabling PMU interrupts
Right now the config_bhrb() PMU specific call happens after
write_mmcr0(), which actually enables the PMU for event counting and
interrupts. So there is a small window of time where the PMU and BHRB
runs without the required HW branch filter (if any) enabled in BHRB.

This can cause some of the branch samples to be collected through BHRB
without any filter applied and hence affects the correctness of
the results. This patch moves the BHRB config function call before
enabling interrupts.

Here are some data points captured via trace prints which depicts how we
could get PMU interrupts with BHRB filter NOT enabled with a standard
perf record command line (asking for branch record information as well).

    $ perf record -j any_call ls

Before the patch:-

    ls-1962  [003] d...  2065.299590: .perf_event_interrupt: MMCRA: 40000000000
    ls-1962  [003] d...  2065.299603: .perf_event_interrupt: MMCRA: 40000000000
    ...

    All the PMU interrupts before this point did not have the requested
    HW branch filter enabled in the MMCRA.

    ls-1962  [003] d...  2065.299647: .perf_event_interrupt: MMCRA: 40040000000
    ls-1962  [003] d...  2065.299662: .perf_event_interrupt: MMCRA: 40040000000

After the patch:-

    ls-1850  [008] d...   190.311828: .perf_event_interrupt: MMCRA: 40040000000
    ls-1850  [008] d...   190.311848: .perf_event_interrupt: MMCRA: 40040000000

    All the PMU interrupts have the requested HW BHRB branch filter
    enabled in MMCRA.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Fixed up whitespace and cleaned up changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11 11:24:50 +11:00
Michael Ellerman 2fdd313f54 powerpc/perf: Add Power8 cache & TLB events
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11 11:24:48 +11:00
Michael Ellerman a53b27b3ab powerpc/perf: Fix handling of FAB events
Commit 4df4899 "Add power8 EBB support" included a bug in the handling
of the FAB_CRESP_MATCH and FAB_TYPE_MATCH fields.

These values are pulled out of the event code using EVENT_THR_CTL_SHIFT,
however we were then or'ing that value directly into MMCR1.

This meant we were failing to set the FAB fields correctly, and also
potentially corrupting the value for PMC4SEL. Leading to no counts for
the FAB events and incorrect counts for PMC4.

The fix is simply to shift left the FAB value correctly before or'ing it
with MMCR1.

Reported-by: Sooraj Ravindran Nair <soonair3@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-03 17:25:38 +10:00
Linus Torvalds 39eda2aba6 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "Here's the powerpc batch for this merge window.  Some of the
  highlights are:

   - A bunch of endian fixes ! We don't have full LE support yet in that
     release but this contains a lot of fixes all over arch/powerpc to
     use the proper accessors, call the firmware with the right endian
     mode, etc...

   - A few updates to our "powernv" platform (non-virtualized, the one
     to run KVM on), among other, support for bridging the P8 LPC bus
     for UARTs, support and some EEH fixes.

   - Some mpc51xx clock API cleanups in preparation for a clock API
     overhaul

   - A pile of cleanups of our old math emulation code, including better
     support for using it to emulate optional FP instructions on
     embedded chips that otherwise have a HW FPU.

   - Some infrastructure in selftest, for powerpc now, but could be
     generalized, initially used by some tests for our perf instruction
     counting code.

   - A pile of fixes for hotplug on pseries (that was seriously
     bitrotting)

   - The usual slew of freescale embedded updates, new boards, 64-bit
     hiberation support, e6500 core PMU support, etc..."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
  powerpc: Correct FSCR bit definitions
  powerpc/xmon: Fix printing of set of CPUs in xmon
  powerpc/pseries: Move lparcfg.c to platforms/pseries
  powerpc/powernv: Return secondary CPUs to firmware on kexec
  powerpc/btext: Fix CONFIG_PPC_EARLY_DEBUG_BOOTX on ppc32
  powerpc: Cleanup handling of the DSCR bit in the FSCR register
  powerpc/pseries: Child nodes are not detached by dlpar_detach_node
  powerpc/pseries: Add mising of_node_put in delete_dt_node
  powerpc/pseries: Make dlpar_configure_connector parent node aware
  powerpc/pseries: Do all node initialization in dlpar_parse_cc_node
  powerpc/pseries: Fix parsing of initial node path in update_dt_node
  powerpc/pseries: Pack update_props_workarea to map correctly to rtas buffer header
  powerpc/pseries: Fix over writing of rtas return code in update_dt_node
  powerpc/pseries: Fix creation of loop in device node property list
  powerpc: Skip emulating & leave interrupts off for kernel program checks
  powerpc: Add more exception trampolines for hypervisor exceptions
  powerpc: Fix location and rename exception trampolines
  powerpc: Add more trap names to xmon
  powerpc/pseries: Add a warning in the case of cross-cpu VPA registration
  powerpc: Update the 00-Index in Documentation/powerpc
  ...
2013-09-06 10:49:42 -07:00
Ingo Molnar c9572f010d Linux 3.11-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSCDSjAAoJEHm+PkMAQRiGDXMIAI7Loae0Oqb1eoeJkvjyZsBS
 OJDeeEcn+k58VbxVHyRdc7hGo4yI4tUZm172SpnOaM8sZ/ehPU7zBrwJK2lzX334
 /jAM3uvVPfxA2nu0I4paNpkED/NQ8NRRsYE1iTE8dzHXOH6dA3mgp5qfco50rQvx
 rvseXpME4KIAJEq4jnyFZF5+nuHiPueM9JftPmSSmJJ3/KY9kY1LESovyWd7ttg1
 jYSVPFal9J0E+tl2UQY5g9H16GqhhjYn+39Iei6Q5P4bL4ZubQgTRQTN9nyDc06Z
 ezQtGoqZ8kEz/2SyRlkda6PzjSEhgXlc8mCL5J7AW+dMhTHHx2IrosjiCA80kG8=
 =c0rK
 -----END PGP SIGNATURE-----

Merge tag 'v3.11-rc5' into perf/core

Merge Linux 3.11-rc5, to sync up with the latest upstream fixes since -rc1.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:00:09 +02:00
Anton Blanchard b0d436c739 powerpc: Fix a number of sparse warnings
Address some of the trivial sparse warnings in arch/powerpc.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 11:50:24 +10:00
Benjamin Herrenschmidt a12e4537ad Merge remote-tracking branch 'scott/next' into next
Merge some Freescale updates from Scott Wood
2013-08-09 16:01:40 +10:00
Priyanka Jain 3c83658ca9 powerpc/perf: Add e6500 PMU driver
e6500 core performance monitors has the following features:
- 6 performance monitor counters
- 512 events supported
- no threshold events

e6500 PMU has more specific events (Data L1 cache misses, Instruction L1
cache misses, etc ) than e500 PMU (which only had Data L1 cache reloads,
etc). Where available, the more specific events have been used which will
produce slightly different results than e500 PMU equivalents.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-08-07 18:38:04 -05:00
Lijun Pan 5815c434fd powerpc/perf: add 2 additional performance monitor counters for e6500 core
There are 6 counters in e6500 core instead of 4 in e500 core.

Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-08-07 18:38:03 -05:00
Catalin Udma 96c3c9e78f powerpc/perf: increase the perf HW events to 6
This change is required after the e6500 perf support has been added.
There are 6 counters in e6500 core instead of 4 in e500 core and
the MAX_HWEVENTS counter should be changed accordingly from 4 to 6.
Added also runtime check for counters overflow.

Signed-off-by: Catalin Udma <catalin.udma@freescale.com>
Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-08-07 18:38:03 -05:00
Michael Ellerman 8d7c55d01e powerpc/perf: Export PERF_EVENT_CONFIG_EBB_SHIFT to userspace
We use bit 63 of the event code for userspace to request that the event
be counted using EBB (Event Based Branches). Export this value, making
it part of the API - though only on processors that support EBB.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-01 13:11:46 +10:00
Anshuman Khandual ff3d79dc12 powerpc/perf: BHRB filter configuration should follow the task
When the task moves around the system, the corresponding cpuhw
per cpu strcuture should be popullated with the BHRB filter
request value so that PMU could be configured appropriately with
that during the next call into power_pmu_enable().

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:42:34 +10:00
Anshuman Khandual 7689bdcab1 powerpc/perf: Ignore separate BHRB privilege state filter request
Completely ignore BHRB privilege state filter request as we are
already configuring that with privilege state filtering attribute
for the accompanying PMU event. This would help achieve cleaner
user space interaction for BHRB.

This patch fixes a situation like this

Before patch:-
------------
./perf record -j any -e branch-misses:k ls
Error:
The sys_perf_event_open() syscall returned with 95 (Operation not
supported) for event (branch-misses:k).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?

Here 'perf record' actually copies over ':k' filter request into BHRB
privilege state filter config and our previous check in kernel would
fail that.

After patch:-
-------------
./perf record -j any -e branch-misses:k ls
perf  perf.data  perf.data.old  test-mmap-ring
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (~102 samples)]

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:42:31 +10:00
Michael Ellerman 5d7ead0039 powerpc/perf: Set PPC_FEATURE2_EBB when we register the power8 PMU
The presence or absence of EBB is advertised to userspace via the presence
or absence of PPC_FEATURE2_EBB in cpu_user_features2.

Because the kernel can be built without PMU support, we should only add
PPC_FEATURE2_EBB to cpu_user_features2 when we successfully register the
power8 PMU support.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:18:45 +10:00
Ingo Molnar 5a9821321e perf/core improvements and fixes:
. Add missing 'finished_round' event forwarding in 'perf inject', from Adrian Hunter.
 
 . Assorted tidy ups, from Adrian Hunter.
 
 . Fall back to sysfs event names when parsing fails, from Andi Kleen.
 
 . List pmu events in perf list, from Andi Kleen.
 
 . Cleanup some memory allocation/freeing uses, from David Ahern.
 
 . Add option to collapse undesired parts of call graph, from Greg Price.
 
 . Prep work for multi perf data file storage, from Jiri Olsa.
 
 . Add support for more than two files comparision in 'perf diff', from Jiri Olsa
 
 . A few more 'perf test' improvements, from Jiri Olsa
 
 . libtraceevent cleanups, from Namhyung Kim.
 
 . Remove odd build stall in 'perf sched' by moving a large struct initialization
   from a local variable to a global one, from Namhyung Kim.
 
 . Add support for callchains in the gtk UI, from Namhyung Kim.
 
 . Do not apply symfs for an absolute vmlinux path, fix from Namhyung Kim.
 
 . Use default include path notation for libtraceevent, from Robert Richter.
 
 . Fix 'make tools/perf', from Robert Richter.
 
 . Make Power7 events available, from Runzhen Wang.
 
 . Add --objdump option to 'perf top', from Sukadev Bhattiprolu.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR6EOuAAoJENZQFvNTUqpAqjEP/1Ist/eR9be8YhljMz8Yxl1o
 JXktgxSkMS/n59lRibUuGZrgPKPNxivK6AEbnbZxzZoHDBS8tnAAOXUuUVTtNCoT
 YsQurQjCmyXHIvYqwMaYarrhoIv33LdJyshskW3GZ81UfeeC6QoC56he3VTg1dEd
 k8snS4F8LJpBQizRJN6s959nF+pyw16wqiGYKJ80G1nhPTsStz8NSSWdCRVbyXl9
 fG0S/lLvUfilGT/ixHcvS62ENHiErL4N6jGNV4XeQqoADhrQvCwziDr+BORfJB9K
 udbO0PFS5uR4HOGNqZOPZfPxW8cTUXV9cCscLScKEVUghKz9rzHbPTTSDejXna/h
 cqLjRW1xpWUmIRY7Y5zSoBJIsh2t3vo4TkZoRNZxhCexoOT/qIUL6bWVZoxqaKzG
 xwL4DopOvb/DdUDkb+UB9+9kW4rDoMR1wUb6XXuGx8EqM8LHiA3TAPcGwmNh/IM6
 4maUhgyOFad3rm5mcjO7IoCU7NxoWR1dKsjGYteZeZv17X30UfRFwIIH+8l2Ma4o
 bpBQ4DKu07jXRUZXcUajOhj7cZO+UmE6c4YoAPpv+CQ1YKVH4/YHYf6RBWdjGLqV
 kqOrQuSopVztmaglrh93e+cYkfvNjZAC2EOzhx+UxcPqh9zh1Pil8LE1DwjChW5m
 /Y/NPH98nDvXY3N4vlLK
 =xk/v
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

 * Add missing 'finished_round' event forwarding in 'perf inject', from Adrian Hunter.

 * Assorted tidy ups, from Adrian Hunter.

 * Fall back to sysfs event names when parsing fails, from Andi Kleen.

 * List pmu events in perf list, from Andi Kleen.

 * Cleanup some memory allocation/freeing uses, from David Ahern.

 * Add option to collapse undesired parts of call graph, from Greg Price.

 * Prep work for multi perf data file storage, from Jiri Olsa.

 * Add support for more than two files comparision in 'perf diff', from Jiri Olsa

 * A few more 'perf test' improvements, from Jiri Olsa

 * libtraceevent cleanups, from Namhyung Kim.

 * Remove odd build stall in 'perf sched' by moving a large struct initialization
   from a local variable to a global one, from Namhyung Kim.

 * Add support for callchains in the gtk UI, from Namhyung Kim.

 * Do not apply symfs for an absolute vmlinux path, fix from Namhyung Kim.

 * Use default include path notation for libtraceevent, from Robert Richter.

 * Fix 'make tools/perf', from Robert Richter.

 * Make Power7 events available, from Runzhen Wang.

 * Add --objdump option to 'perf top', from Sukadev Bhattiprolu.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19 09:35:30 +02:00
Linus Torvalds 560ae37178 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 - fix for do_div() abuse on x86
 - locking fix in perf core
 - a pile of (build) fixes and cleanups in perf tools

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  perf/x86: Fix incorrect use of do_div() in NMI warning
  perf: Fix perf_lock_task_context() vs RCU
  perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario
  perf: Clone child context from parent context pmu
  perf script: Fix broken include in Context.xs
  perf tools: Fix -ldw/-lelf link test when static linking
  perf tools: Revert regression in configuration of Python support
  perf tools: Fix perf version generation
  perf stat: Fix per-socket output bug for uncore events
  perf symbols: Fix vdso list searching
  perf evsel: Fix missing increment in sample parsing
  perf tools: Update symbol_conf.nr_events when processing attribute events
  perf tools: Fix new_term() missing free on error path
  perf tools: Fix parse_events_terms() segfault on error path
  perf evsel: Fix count parameter to read call in event_format__new
  perf tools: fix a typo of a Power7 event name
  perf tools: Fix -x/--exclude-other option for report command
  perf evlist: Enhance perf_evlist__start_workload()
  perf record: Remove -f/--force option
  perf record: Remove -A/--append option
  ...
2013-07-13 15:35:47 -07:00
Runzhen Wang cfe0d8ba14 perf tools: Make Power7 events available for perf
Power7 supports over 530 different perf events but only a small subset
of these can be specified by name, for the remaining events, we must
specify them by their raw code:

        perf stat -e r2003c <application>

This patch makes all the POWER7 events available in sysfs.  So we can
instead specify these as:

        perf stat -e 'cpu/PM_CMPLU_STALL_DFU/' <application>

where PM_CMPLU_STALL_DFU is the r2003c in previous example.

Before this patch is applied, the size of power7-pmu.o is:

$ size arch/powerpc/perf/power7-pmu.o
   text	   data	    bss	    dec	    hex	filename
   3073	   2720	      0	   5793	   16a1	arch/powerpc/perf/power7-pmu.o

and after the patch is applied, it is:

$ size arch/powerpc/perf/power7-pmu.o
   text	   data	    bss	    dec	    hex	filename
  15950	  31112	      0	  47062	   b7d6	arch/powerpc/perf/power7-pmu.o

For the run time overhead, I use two scripts, one is "event_name.sh",
which contains 50 event names, it looks like:

 # ./perf record  -e 'cpu/PM_CMPLU_STALL_DFU/' -e .....  /bin/sleep 1

the other one is named "event_code.sh" which use corresponding  events
raw
code instead of events names, it looks like:

 # ./perf record -e r2003c -e ......  /bin/sleep 1

below is the result.

Using events name:

[root@localhost perf]# time ./event_name.sh
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (~102 samples) ]

real	0m1.192s
user	0m0.028s
sys	0m0.106s

Using events raw code:

[root@localhost perf]# time ./event_code.sh
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (~112 samples) ]

real	0m1.198s
user	0m0.028s
sys	0m0.105s

Signed-off-by: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Cc: icycoder@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Runzhen Wang <runzhew@clemson.edu>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1372407297-6996-3-git-send-email-runzhen@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-07-12 13:46:09 -03:00
Runzhen Wang 7e40c92019 perf tools: fix a typo of a Power7 event name
In the Power7 PMU guide:
https://www.power.org/documentation/commonly-used-metrics-for-performance-analysis/
PM_BRU_MPRED is referred to as PM_BR_MPRED.

It fixed the typo by changing the name of the event in kernel and
documentation accordingly.

This patch changes the ABI, there are some reasons I think it's ok:

- It is relatively new interface, specific to the Power7 platform.

- No tools that we know of actually use this interface at this point
 (none are listed near the interface).

- Users of this interface (eg oprofile users migrating to perf)
  would be more used to the "PM_BR_MPRED" rather than "PM_BRU_MPRED".

- These are in the ABI/testing at this point rather than ABI/stable,
  so hoping we have some wiggle room.

Signed-off-by: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Cc: icycoder@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Runzhen Wang <runzhew@clemson.edu>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1372407297-6996-2-git-send-email-runzhen@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-07-08 17:40:05 -03:00
Linus Torvalds 65b97fb730 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "This is the powerpc changes for the 3.11 merge window.  In addition to
  the usual bug fixes and small updates, the main highlights are:

   - Support for transparent huge pages by Aneesh Kumar for 64-bit
     server processors.  This allows the use of 16M pages as transparent
     huge pages on kernels compiled with a 64K base page size.

   - Base VFIO support for KVM on power by Alexey Kardashevskiy

   - Wiring up of our nvram to the pstore infrastructure, including
     putting compressed oopses in there by Aruna Balakrishnaiah

   - Move, rework and improve our "EEH" (basically PCI error handling
     and recovery) infrastructure.  It is no longer specific to pseries
     but is now usable by the new "powernv" platform as well (no
     hypervisor) by Gavin Shan.

   - I fixed some bugs in our math-emu instruction decoding and made it
     usable to emulate some optional FP instructions on processors with
     hard FP that lack them (such as fsqrt on Freescale embedded
     processors).

   - Support for Power8 "Event Based Branch" facility by Michael
     Ellerman.  This facility allows what is basically "userspace
     interrupts" for performance monitor events.

   - A bunch of Transactional Memory vs.  Signals bug fixes and HW
     breakpoint/watchpoint fixes by Michael Neuling.

  And more ...  I appologize in advance if I've failed to highlight
  something that somebody deemed worth it."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  pstore: Add hsize argument in write_buf call of pstore_ftrace_call
  powerpc/fsl: add MPIC timer wakeup support
  powerpc/mpic: create mpic subsystem object
  powerpc/mpic: add global timer support
  powerpc/mpic: add irq_set_wake support
  powerpc/85xx: enable coreint for all the 64bit boards
  powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
  powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
  powerpc/mpic: Add get_version API both for internal and external use
  powerpc: Handle both new style and old style reserve maps
  powerpc/hw_brk: Fix off by one error when validating DAWR region end
  powerpc/pseries: Support compression of oops text via pstore
  powerpc/pseries: Re-organise the oops compression code
  pstore: Pass header size in the pstore write callback
  powerpc/powernv: Fix iommu initialization again
  powerpc/pseries: Inform the hypervisor we are using EBB regs
  powerpc/perf: Add power8 EBB support
  powerpc/perf: Core EBB support for 64-bit book3s
  powerpc/perf: Drop MMCRA from thread_struct
  powerpc/perf: Don't enable if we have zero events
  ...
2013-07-04 10:29:23 -07:00
Linus Torvalds f0bb4c0ab0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel improvements:

   - watchdog driver improvements by Li Zefan
   - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
   - event multiplexing via hrtimers and other improvements by Stephane
     Eranian
   - kernel stack use optimization by Andrew Hunter
   - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
   - NMI handling rate-limits by Dave Hansen
   - various hw_breakpoint fixes by Oleg Nesterov
   - hw_breakpoint overflow period sampling and related signal handling
     fixes by Jiri Olsa
   - Intel Haswell PMU support by Andi Kleen

  Tooling improvements:

   - Reset SIGTERM handler in workload child process, fix from David
     Ahern.
   - Makefile reorganization, prep work for Kconfig patches, from Jiri
     Olsa.
   - Add automated make test suite, from Jiri Olsa.
   - Add --percent-limit option to 'top' and 'report', from Namhyung
     Kim.
   - Sorting improvements, from Namhyung Kim.
   - Expand definition of sysfs format attribute, from Michael Ellerman.

  Tooling fixes:

   - 'perf tests' fixes from Jiri Olsa.
   - Make Power7 CPI stack events available in sysfs, from Sukadev
     Bhattiprolu.
   - Handle death by SIGTERM in 'perf record', fix from David Ahern.
   - Fix printing of perf_event_paranoid message, from David Ahern.
   - Handle realloc failures in 'perf kvm', from David Ahern.
   - Fix divide by 0 in variance, from David Ahern.
   - Save parent pid in thread struct, from David Ahern.
   - Handle JITed code in shared memory, from Andi Kleen.
   - Fixes for 'perf diff', from Jiri Olsa.
   - Remove some unused struct members, from Jiri Olsa.
   - Add missing liblk.a dependency for python/perf.so, fix from Jiri
     Olsa.
   - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
   - No need to do locking when adding hists in perf report, only 'top'
     needs that, from Namhyung Kim.
   - Fix alignment of symbol column in in the hists browser (top,
     report) when -v is given, from NAmhyung Kim.
   - Fix 'perf top' -E option behavior, from Namhyung Kim.
   - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
   - Fix compile errors in bp_signal 'perf test', from Sukadev
     Bhattiprolu.

  ... and more things"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
  perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
  perf/x86: Fix shared register mutual exclusion enforcement
  perf/x86/intel: Support full width counting
  x86: Add NMI duration tracepoints
  perf: Drop sample rate when sampling is too slow
  x86: Warn when NMI handlers take large amounts of time
  hw_breakpoint: Introduce "struct bp_cpuinfo"
  hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  hw_breakpoint: Introduce cpumask_of_bp()
  hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  perf/x86/intel: Add mem-loads/stores support for Haswell
  perf/x86/intel: Support Haswell/v4 LBR format
  perf/x86/intel: Move NMI clearing to end of PMI handler
  perf/x86/intel: Add Haswell PEBS support
  perf/x86/intel: Add simple Haswell PMU support
  perf/x86/intel: Add Haswell PEBS record support
  perf/x86/intel: Fix sparse warning
  perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
  perf/x86/amd: Add IOMMU Performance Counter resource management
  ...
2013-07-02 16:15:23 -07:00
Michael Ellerman 4df4899911 powerpc/perf: Add power8 EBB support
Add logic to the power8 PMU code to support EBB. Future processors would
also be expected to implement similar constraints. At that time we could
possibly factor these out into common code.

Finally mark the power8 PMU as supporting EBB, which is the actual
enable switch which allows EBBs to be configured.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01 11:50:13 +10:00