Commit graph

80 commits

Author SHA1 Message Date
Michael O'Farrell 9d2dcc8fc6 arm64: perf: Add cap_user_time aarch64
It is useful to get the running time of a thread.  Doing so in an
efficient manner can be important for performance of user applications.
Avoiding system calls in `clock_gettime` when handling
CLOCK_THREAD_CPUTIME_ID is important.  Other clocks are handled in the
VDSO, but CLOCK_THREAD_CPUTIME_ID falls back on the system call.

CLOCK_THREAD_CPUTIME_ID is not handled in the VDSO since it would have
costs associated with maintaining updated user space accessible time
offsets.  These offsets have to be updated everytime the a thread is
scheduled/descheduled.  However, for programs regularly checking the
running time of a thread, this is a performance improvement.

This patch takes a middle ground, and adds support for cap_user_time an
optional feature of the perf_event API.  This way costs are only
incurred when the perf_event api is enabled.  This is done the same way
as it is in x86.

Ultimately this allows calculating the thread running time in userspace
on aarch64 as follows (adapted from perf_event_open manpage):

u32 seq, time_mult, time_shift;
u64 running, count, time_offset, quot, rem, delta;
struct perf_event_mmap_page *pc;
pc = buf;  // buf is the perf event mmaped page as documented in the API.

if (pc->cap_usr_time) {
    do {
        seq = pc->lock;
        barrier();
        running = pc->time_running;

        count = readCNTVCT_EL0();  // Read ARM hardware clock.
        time_offset = pc->time_offset;
        time_mult   = pc->time_mult;
        time_shift  = pc->time_shift;

        barrier();
    } while (pc->lock != seq);

    quot = (count >> time_shift);
    rem = count & (((u64)1 << time_shift) - 1);
    delta = time_offset + quot * time_mult +
            ((rem * time_mult) >> time_shift);

    running += delta;
    // running now has the current nanosecond level thread time.
}

Summary of changes in the patch:

For aarch64 systems, make arch_perf_update_userpage update the timing
information stored in the perf_event page.  Requiring the following
calculations:
  - Calculate the appropriate time_mult, and time_shift factors to convert
    ticks to nano seconds for the current clock frequency.
  - Adjust the mult and shift factors to avoid shift factors of 32 bits.
    (possibly unnecessary)
  - The time_offset userspace should apply when doing calculations:
    negative the current sched time (now), because time_running and
    time_enabled fields of the perf_event page have just been updated.
Toggle bits to appropriate values:
  - Enable cap_user_time

Signed-off-by: Michael O'Farrell <micpof@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 10:14:00 +01:00
Suzuki K Poulose c132079053 arm64: perf: Add support for chaining event counters
Add support for 64bit event by using chained event counters
and 64bit cycle counters.

PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
forming a 64-bit counter. The low/even counter is programmed to count
the event of interest, and the high/odd counter is programmed to count
the CHAIN event, taken when the low/even counter overflows.

For CPU cycles, when 64bit mode is requested, the cycle counter
is used in 64bit mode. If the cycle counter is not available,
falls back to chaining.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10 18:19:30 +01:00
Suzuki K Poulose 3cce50dfec arm64: perf: Disable PMU while processing counter overflows
The arm64 PMU updates the event counters and reprograms the
counters in the overflow IRQ handler without disabling the
PMU. This could potentially cause skews in for group counters,
where the overflowed counters may potentially loose some event
counts, while they are reprogrammed. To prevent this, disable
the PMU while we process the counter overflows and enable it
right back when we are done.

This patch also moves the PMU stop/start routines to avoid a
forward declaration.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10 18:19:02 +01:00
Suzuki K Poulose 0c55d19c16 arm64: perf: Clean up armv8pmu_select_counter
armv8pmu_select_counter always returns the passed idx. So
let us make that void and get rid of the pointless checks.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10 18:19:02 +01:00
Suzuki K Poulose 7dfc8db1d1 arm_pmu: Tidy up clear_event_idx call backs
The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. To keep things tidy, mandate the implementation
of clear_event_idx() and add it for exisiting backends.
This will be useful for adding the chained event support, where
we leave the event idx maintenance to the backend.

Also, when an event is removed from the PMU, reset the hw.idx
to indicate that a counter is not allocated for this event,
to help the backends do better checks. This will be also used
for the chain counter support.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10 18:19:02 +01:00
Suzuki K Poulose 3a95200d3f arm_pmu: Change API to support 64bit counter values
Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters. The backends still
use 32bit values and we pass them 32bit values only. So in effect
there are no functional changes.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10 18:19:02 +01:00
Suzuki K Poulose 8d3e994241 arm_pmu: Clean up maximum period handling
Each PMU defines their max_period of the counter as the maximum
value that can be counted. Since all the PMU backends support
32bit counters by default, let us remove the redundant field.

No functional changes.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-10 18:19:02 +01:00
Mark Rutland 0788f1e973 arm_pmu: simplify arm_pmu::handle_irq
The arm_pmu::handle_irq() callback has the same prototype as a generic
IRQ handler, taking the IRQ number and a void pointer argument which it
must convert to an arm_pmu pointer.

This means that all arm_pmu::handle_irq() take an IRQ number they never
use, and all must explicitly cast the void pointer to an arm_pmu
pointer.

Instead, let's change arm_pmu::handle_irq to take an arm_pmu pointer,
allowing these casts to be removed. The redundant IRQ number parameter
is also removed.

Suggested-by: Hoeun Ryu <hoeun.ryu@lge.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-05-21 18:07:05 +01:00
Mark Rutland 0331365edb arm64: perf: correct PMUVer probing
The ID_AA64DFR0_EL1.PMUVer field doesn't follow the usual ID registers
scheme. While value 0xf indicates a non-architected PMU is implemented,
values 0x1 to 0xe indicate an increasingly featureful architected PMU,
as if the field were unsigned.

For more details, see ARM DDI 0487C.a, D10.1.4, "Alternative ID scheme
used for the Performance Monitors Extension version".

Currently, we treat the field as signed, and erroneously bail out for
values 0x8 to 0xe. Let's correct that.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-02-20 11:34:54 +00:00
Yury Norov 3aa56885e5 bitmap: replace bitmap_{from,to}_u32array
with bitmap_{from,to}_arr32 over the kernel. Additionally to it:
* __check_eq_bitmap() now takes single nbits argument.
* __check_eq_u32_array is not used in new test but may be used in
  future. So I don't remove it here, but annotate as __used.

Tested on arm64 and 32-bit BE mips.

[arnd@arndb.de: perf: arm_dsu_pmu: convert to bitmap_from_arr32]
  Link: http://lkml.kernel.org/r/20180201172508.5739-2-ynorov@caviumnetworks.com
[ynorov@caviumnetworks.com: fix net/core/ethtool.c]
  Link: http://lkml.kernel.org/r/20180205071747.4ekxtsbgxkj5b2fz@yury-thinkpad
Link: http://lkml.kernel.org/r/20171228150019.27953-2-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: David Decotigny <decot@googlers.com>,
Cc: David S. Miller <davem@davemloft.net>,
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:44 -08:00
Xu YiPing f8ada18955 arm64: perf: remove unsupported events for Cortex-A73
bus access read/write events are not supported in A73, based on the
Cortex-A73 TRM r0p2, section 11.9 Events (pages 11-457 to 11-460).

Fixes: 5561b6c5e9 "arm64: perf: add support for Cortex-A73"
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Xu YiPing <xuyiping@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-01 13:05:08 +00:00
Julien Thierry e884f80cf2 arm64: perf: add support for Cortex-A35
The Cortex-A35 uses some implementation defined perf events.

The Cortex-A35 derives from the Cortex-A53 core, using the same event mapings
based on Cortex-A35 TRM r0p2, section C2.3 - Performance monitoring events
(pages C2-562 to C2-565).

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-10 17:46:49 +01:00
Julien Thierry 5561b6c5e9 arm64: perf: add support for Cortex-A73
The Cortex-A73 uses some implementation defined perf events.

This patch sets up the necessary mapping for Cortex-A73.

Mappings are based on Cortex-A73 TRM r0p2, section 11.9 Events
(pages 11-457 to 11-460).

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-10 17:46:44 +01:00
Will Deacon d0d09d4d99 arm64: perf: Remove redundant entries from CPU-specific event maps
Now that the event mapping code always looks into the PMUv3 events
before any extended mappings, the extended mappings can be reduced to
only those events that are not discoverable through the PMCEID registers.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-10 17:45:07 +01:00
Julien Thierry 5cf7fb26ea arm64: perf: Connect additional events to pmu counters
Last level caches and node events were almost never connected in current
supported cores.

We connect last level caches to the actual last level within the core and
node events are connected to bus accesses.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-10 17:44:58 +01:00
Will Deacon 6c833bb924 arm64: perf: Allow standard PMUv3 events to be extended by the CPU type
Rather than continue adding CPU-specific event maps, instead look up by
default in the PMUv3 event map and only fallback to the CPU-specific maps
if either the event isn't described by PMUv3, or it is described but
the PMCEID registers say that it is unsupported by the current CPU.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-08 17:12:34 +01:00
Pratyush Anand 1031a15929 arm64: perf: Allow more than one cycle counter to be used
Currently:
$ perf stat -e cycles:u -e cycles:k  true

 Performance counter stats for 'true':

          2,24,699      cycles:u
     <not counted>      cycles:k	(0.00%)

       0.000788087 seconds time elapsed

We can not count more than one cycle counter in one instance,because we
allow to map cycle counter into PMCCNTR_EL0 only. However, if I did not
miss anything then specification do not prohibit to use PMEVCNTR<n>_EL0
for cycle count as well.

Modify the code so that it still prefers to use PMCCNTR_EL0 for cycle
counter, however allow to use PMEVCNTR<n>_EL0 if PMCCNTR_EL0 is already
in use.

After this patch:

$ perf stat -e cycles:u -e cycles:k   true

 Performance counter stats for 'true':

          2,17,310      cycles:u
          7,40,009      cycles:k

       0.000764149 seconds time elapsed

Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-08 14:33:13 +01:00
Shaokun Zhang fe7296e192 arm64: perf: Extend event config for ARMv8.1
Perf has supported ARMv8.1 feature with 16-bit evtCount filed [see c210ae8
arm64: perf: Extend event mask for ARMv8.1], event config should be
extended to 16-bit too, otherwise, if use -e event_name whose event_code
is more than 0x3ff, pmu_config_term will return -EINVAL because function
pmu_format_max_value depends on event config.

This patch extends event config to 16-bit.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 12:15:14 +01:00
Ganapatrao Kulkarni 78a19cfdf3 arm64: perf: Ignore exclude_hv when kernel is running in HYP
commit d98ecdaca2 ("arm64: perf: Count EL2 events if the kernel is
running in HYP") returns -EINVAL when perf system call perf_event_open is
called with exclude_hv != exclude_kernel. This change breaks applications
on VHE enabled ARMv8.1 platforms. The issue was observed with HHVM
application, which calls perf_event_open with exclude_hv = 1 and
exclude_kernel = 0.

There is no separate hypervisor privilege level when VHE is enabled, the
host kernel runs at EL2. So when VHE is enabled, we should ignore
exclude_hv from the application. This behaviour is consistent with PowerPC
where the exclude_hv is ignored when the hypervisor is not present and with
x86 where this flag is ignored.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
[will: added comment to justify the behaviour of exclude_hv]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-05-15 18:30:37 +01:00
Florian Fainelli f5337346cd arm64: pmu: Wire-up Cortex A53 L2 cache events and DTLB refills
Add missing L2 cache events: read/write accesses and misses, as well as
the DTLB refills.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-28 15:23:36 +01:00
Mark Rutland faa9a08397 arm64: pmuv3: handle pmuv3+
Commit f1b36dcb5c ("arm64: pmuv3: handle !PMUv3 when probing") is
a little too restrictive, and prevents the use of of backwards
compatible PMUv3 extenstions, which have a PMUver value other than 1.

For instance, ARMv8.1 PMU extensions (as implemented by ThunderX2) are
reported with PMUver value 4.

Per the usual ID register principles, at least 0x1-0x7 imply a
PMUv3-compatible PMU. It's not currently clear whether 0x8-0xe imply the
same.

For the time being, treat the value as signed, and with 0x1-0x7 treated
as meaning PMUv3 is implemented. This may be relaxed by future patches.

Reported-by: Jayachandran C <jnair@caviumnetworks.com>
Tested-by: Jayachandran C <jnair@caviumnetworks.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-25 15:12:59 +01:00
Mark Rutland f00fa5f416 arm64: pmuv3: use arm_pmu ACPI framework
Now that we have a framework to handle the ACPI bits, make the PMUv3
code use this. The framework is a little different to what was
originally envisaged, and we can drop some unused support code in the
process of moving over to it.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
[will: make armv8_pmu_driver_init static]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11 16:29:54 +01:00
Mark Rutland f1b36dcb5c arm64: pmuv3: handle !PMUv3 when probing
When probing via ACPI, we won't know up-front whether a CPU has a PMUv3
compatible PMU. Thus we need to consult ID registers during probe time.

This patch updates our PMUv3 probing code to test for the presence of
PMUv3 functionality before touching an PMUv3-specific registers, and
before updating the struct arm_pmu with PMUv3 data.

When a PMUv3-compatible PMU is not present, probing will return -ENODEV.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-11 16:29:54 +01:00
Wei Huang b112c84a6f KVM: arm64: Fix the issues when guest PMCCFILTR is configured
KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
But this function can't deals with PMCCFILTR correctly because the evtCount
bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this
function shouldn't return immediately; instead it needs to check further
if select_idx is ARMV8_PMU_CYCLE_IDX.

Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
blindly to attr.config. Instead it ought to convert the request to the
"cpu cycle" event type (i.e. 0x11).

To support this patch and to prevent duplicated definitions, a limited
set of ARMv8 perf event types were relocated from perf_event.c to
asm/perf_event.h.

Cc: stable@vger.kernel.org # 4.6+
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-18 09:06:58 +00:00
Jeremy Linton 85023b2e13 arm64: pmu: Hoist pmu platform device name
Move the PMU name into a common header file so it may
be referenced by other users.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-16 17:11:34 +01:00
Jeremy Linton 236b9b91cd arm64: pmu: Probe default hw/cache counters
ARMv8 machines can identify the micro/arch defined counters
that are available on a machine. Add all these counters to the
default armv8 perf map. At run-time disable the counters which
are not available on the given PMU.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-16 17:11:33 +01:00
Mark Salter dbee3a74ef arm64: pmu: add fallback probe table
In preparation for ACPI support, add a pmu_probe_info table to
the arm_pmu_device_probe() call. This table gets used when
probing in the absence of a devicetree node for PMU.

Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-16 17:11:33 +01:00
Mark Rutland 569de9026c arm64: perf: move to common attr_group fields
By using a common attr_groups array, the common arm_pmu code can set up
common files (e.g. cpumask) for us in subsequent patches.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09 14:51:51 +01:00
Kefeng Wang 826d05623f arm64: perf: Use the builtin_platform_driver
Use the builtin_platform_driver() to simplify code.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-08-22 10:00:48 +01:00
Will Deacon 4ba2578fa7 arm64: perf: don't expose CHAIN event in sysfs
The CHAIN event allows two 32-bit counters to be treated as a single
64-bit counter, under certain allocation restrictions on the PMU.

Whilst userspace could theoretically create CHAIN events using the raw
event syntax, we don't really want to advertise this in sysfs, since
it's useless in isolation. This patch removes the event from our /sys
entries.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 15:05:24 +01:00
Ashok Kumar 201a72b282 arm64/perf: Add Broadcom Vulcan PMU support
Broadcom Vulcan uses ARMv8 PMUv3 and supports most of
the ARMv8 recommended implementation defined events.

Added Vulcan events mapping for perf and perf_cache map.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 14:11:30 +01:00
Ashok Kumar 4b1a9e6934 arm64/perf: Filter common events based on PMCEIDn_EL0
The complete common architectural and micro-architectural
event number structure is filtered based on PMCEIDn_EL0 and
exposed to /sys using is_visibile function pointer in events
attribute_group.
To filter the events in is_visible function, pmceid based bitmap
is stored in arm_pmu structure and the id field from
perf_pmu_events_attr is used to check against the bitmap.

The function which derives event bitmap from PMCEIDn_EL0 is
executed in the cpus, which has the pmu being initialized,
for heterogeneous pmu support.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 14:11:10 +01:00
Ashok Kumar bf2d4782e7 arm64/perf: Access pmu register using <read/write>_sys_reg
changed pmu register access to make use of <read/write>_sys_reg
from sysreg.h instead of accessing them directly.

Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 14:11:06 +01:00
Ashok Kumar 0893f74545 arm64/perf: Define complete ARMv8 recommended implementation defined events
Defined all the ARMv8 recommended implementation defined events
from J3 - "ARM recommendations for IMPLEMENTATION DEFINED event numbers"
in ARM DDI 0487A.g.

Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 14:11:06 +01:00
Ashok Kumar 03598fdbc9 arm64/perf: Changed events naming as per the ARM ARM
changed all the common events name definition as per the document
ARM DDI 0487A.g

SoC specific event names follow the general naming style in
the file and doesn't reflect any document.
changed ARMV8_A53_PERFCTR_PREFETCH_LINEFILL to
ARMV8_A53_PERFCTR_PREF_LINEFILL to match with other SoC specific
event names which use _PREF_ style.

corrected typo l21 to l2i.

Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 14:11:06 +01:00
Shannon Zhao b8cfadfcef arm64: perf: Move PMU register related defines to asm/perf_event.h
To use the ARMv8 PMU related register defines from the KVM code, we move
the relevant definitions to asm/perf_event.h header file and rename them
with prefix ARMV8_PMU_. This allows us to get rid of kvm_perf_event.h.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-29 16:04:57 +01:00
Linus Torvalds 2c856e14da arm[64] perf updates for 4.6:
- Initial support for ARMv8.1 CPU PMUs
 
 - Support for the CPU PMU in Cavium ThunderX
 
 - CPU PMU support for systems running 32-bit Linux in secure mode
 
 - Support for the system PMU in ARM CCI-550 (Cache Coherent Interconnect)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJW794rAAoJELescNyEwWM0O5IH/0ejoUjip3n4dFZnSzAbQQZe
 VxCy3DXW5gS8YaswwX2dFw9K772/BpHlazq8AIJIhaR+b+Zzl5t0iOc12HluDilV
 pMvi0JTCxwJhsEiKZnP0cVAU9HM6MAgtMOEegkd/YNESKQey30NeDtIcz/pQfTUV
 28AF71+w5VPj/1EpHEEhHQsASRIx7eDbKzThzdlb8PnDS0o23QJhL9HjVTNIAlB8
 BGxrUBKtBu0eH2Hx33vNjc7UYx1WZQlCk5cAaXevA8mbFXzYaMQI2Cel2nbNMO9i
 eu5zPkDUCG7dq16PxK6IgM4AsDCtmmDuckLdN6UEQWYxkLbb2qHNRKtj0bKB8Sk=
 =E4PE
 -----END PGP SIGNATURE-----

Merge tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm[64] perf updates from Will Deacon:
 "I have another mixed bag of ARM-related perf patches here.

  It's about 25% CPU and 75% interconnect, but with drivers/bus/
  languishing without an obvious maintainer or tree, Olof and I agreed
  to keep all of these PMU patches together.  I suspect a whole load of
  code from drivers/bus/arm-* can be moved under drivers/perf/, so
  that's on the radar for the future.

  Summary:

   - Initial support for ARMv8.1 CPU PMUs

   - Support for the CPU PMU in Cavium ThunderX

   - CPU PMU support for systems running 32-bit Linux in secure mode

   - Support for the system PMU in ARM CCI-550 (Cache Coherent Interconnect)"

* tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (26 commits)
  drivers/perf: arm_pmu: avoid NULL dereference when not using devicetree
  arm64: perf: Extend ARMV8_EVTYPE_MASK to include PMCR.LC
  arm-cci: remove unused variable
  arm-cci: don't return value from void function
  arm-cci: make private functions static
  arm-cci: CoreLink CCI-550 PMU driver
  arm-cci500: Rearrange PMU driver for code sharing with CCI-550 PMU
  arm-cci: CCI-500: Work around PMU counter writes
  arm-cci: Provide hook for writing to PMU counters
  arm-cci: Add helper to enable PMU without synchornising counters
  arm-cci: Add routines to save/restore all counters
  arm-cci: Get the status of a counter
  arm-cci: write_counter: Remove redundant check
  arm-cci: Delay PMU counter writes to pmu::pmu_enable
  arm-cci: Refactor CCI PMU enable/disable methods
  arm-cci: Group writes to counter
  arm-cci: fix handling cpumask_any_but return value
  arm-cci: simplify sysfs attr handling
  drivers/perf: arm_pmu: implement CPU_PM notifier
  arm64: dts: Add Cavium ThunderX specific PMU
  ...
2016-03-21 13:14:16 -07:00
Will Deacon fe638401a0 arm64: perf: Extend ARMV8_EVTYPE_MASK to include PMCR.LC
Commit 7175f0591e ("arm64: perf: Enable PMCR long cycle counter bit")
added initial support for a 64-bit cycle counter enabled using PMCR.LC.

Unfortunately, that patch doesn't extend ARMV8_EVTYPE_MASK, so any
attempts to set the enable bit are ignored by armv8pmu_pmcr_write.

This patch extends the mask to include the new bit.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-29 23:23:59 +00:00
Marc Zyngier d98ecdaca2 arm64: perf: Count EL2 events if the kernel is running in HYP
When the kernel is running in HYP (with VHE), it is necessary to
include EL2 events if the user requests counting kernel or
hypervisor events.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29 18:34:18 +00:00
Jan Glauber c210ae80e4 arm64: perf: Extend event mask for ARMv8.1
ARMv8.1 increases the PMU event number space to 16 bit so increase
the EVTYPE mask.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-18 17:23:41 +00:00
Jan Glauber 7175f0591e arm64: perf: Enable PMCR long cycle counter bit
With the long cycle counter bit (LC) disabled the cycle counter is not
working on ThunderX SOC (ThunderX only implements Aarch64).
Also, according to documentation LC == 0 is deprecated.

To keep the code simple the patch does not introduce 64 bit wide counter
functions. Instead writing the cycle counter always sets the upper
32 bits so overflow interrupts are generated as before.

Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-18 17:23:41 +00:00
Jan Glauber d0aa2bffcf arm64/perf: Add Cavium ThunderX PMU support
Support PMU events on Caviums ThunderX SOC. ThunderX supports
some additional counters compared to the default ARMv8 PMUv3:

- branch instructions counter
- stall frontend & backend counters
- L1 dcache load & store counters
- L1 icache counters
- iTLB & dTLB counters
- L1 dcache & icache prefetch counters

Signed-off-by: Jan Glauber <jglauber@cavium.com>
[will: capitalisation]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-18 17:23:39 +00:00
Jan Glauber 5f140ccef3 arm64: perf: Rename Cortex A57 events
The implemented Cortex A57 events are strictly-speaking not
A57 specific. They are ARM recommended implementation defined events
and can be found on other ARMv8 SOCs like Cavium ThunderX too.

Therefore rename these events to allow using them in other
implementations too.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
[will: capitalisation and ordering]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-18 17:22:42 +00:00
Will Deacon 5d7ee87708 arm64: perf: add support for Cortex-A72
Cortex-A72 has a PMUv3 implementation that is compatible with the PMU
implemented by Cortex-A57.

This patch hooks up the new compatible string so that the Cortex-A57
event mappings are used.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-22 14:45:35 +00:00
Will Deacon 57d7412395 arm64: perf: add format entry to describe event -> config mapping
It's all very well providing an events directory to userspace that
details our events in terms of "event=0xNN", but if we don't define how
to encode the "event" field in the perf attr.config, then it's a waste
of time.

This patch adds a single format entry to describe that the event field
occupies the bottom 10 bits of our config field on ARMv8 (PMUv3).

Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-22 14:45:07 +00:00
Lorenzo Pieralisi 60792ad349 arm64: kernel: enforce pmuserenr_el0 initialization and restore
The pmuserenr_el0 register value is architecturally UNKNOWN on reset.
Current kernel code resets that register value iff the core pmu device is
correctly probed in the kernel. On platforms with missing DT pmu nodes (or
disabled perf events in the kernel), the pmu is not probed, therefore the
pmuserenr_el0 register is not reset in the kernel, which means that its
value retains the reset value that is architecturally UNKNOWN (system
may run with eg pmuserenr_el0 == 0x1, which means that PMU counters access
is available at EL0, which must be disallowed).

This patch adds code that resets pmuserenr_el0 on cold boot and restores
it on core resume from shutdown, so that the pmuserenr_el0 setup is
always enforced in the kernel.

Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-21 14:43:04 +00:00
Drew Richardson 9e9caa6a49 arm64: perf: Add event descriptions
Add additional information about the ARM architected hardware events
to make counters self describing. This makes the hardware PMUs easier
to use as perf list contains possible events instead of users having
to refer to documentation like the ARM TRMs.

Signed-off-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-16 17:09:02 +00:00
Drew Richardson 90381cba64 arm64: perf: Convert event enums to #defines
The enums are not necessary and this allows the event values to be
used to construct static strings at compile time.

Signed-off-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-16 17:09:02 +00:00
Mark Rutland 62a4dda9d6 arm64: perf: add Cortex-A57 support
The Cortex-A57 PMU supports a few events outside of the required PMUv3
set that are rather useful.

This patch adds the event map data for said events.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 14:25:24 +01:00
Mark Rutland ac82d12772 arm64: perf: add Cortex-A53 support
The Cortex-A53 PMU supports a few events outside of the required PMUv3
set that are rather useful.

This patch adds the event map data for said events.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 14:25:07 +01:00