From ebcbd75e396258a5041d2b28fec02c27f65d59bb Mon Sep 17 00:00:00 2001 From: Alan Kao Date: Tue, 8 May 2018 10:59:33 +0800 Subject: [PATCH 01/12] riscv: Fix the bug in memory access fixup code A piece of fixup code is currently shared by __copy_user and __clear_user. It first disables the access to user-space memory and then returns the "n" argument, which represents #(bytes not processed). However,__copy_user's "n" is in register a2, while __clear_user's in a1, and thus it causes errors for programs like setdomainname02 testcase in LTP. This patch fixes this issue by separating their fixup code and returning the right value for the kernel to handle a relative fault properly. Signed-off-by: Alan Kao Cc: Greentime Hu Cc: Zong Li Cc: Vincent Chen Signed-off-by: Palmer Dabbelt --- arch/riscv/lib/uaccess.S | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S index 58fb2877c865..0173ea296baa 100644 --- a/arch/riscv/lib/uaccess.S +++ b/arch/riscv/lib/uaccess.S @@ -84,7 +84,7 @@ ENTRY(__clear_user) bgeu t0, t1, 2f bltu a0, t0, 4f 1: - fixup REG_S, zero, (a0), 10f + fixup REG_S, zero, (a0), 11f addi a0, a0, SZREG bltu a0, t1, 1b 2: @@ -96,12 +96,12 @@ ENTRY(__clear_user) li a0, 0 ret 4: /* Edge case: unalignment */ - fixup sb, zero, (a0), 10f + fixup sb, zero, (a0), 11f addi a0, a0, 1 bltu a0, t0, 4b j 1b 5: /* Edge case: remainder */ - fixup sb, zero, (a0), 10f + fixup sb, zero, (a0), 11f addi a0, a0, 1 bltu a0, a3, 5b j 3b @@ -109,9 +109,14 @@ ENDPROC(__clear_user) .section .fixup,"ax" .balign 4 + /* Fixup code for __copy_user(10) and __clear_user(11) */ 10: /* Disable access to user memory */ csrs sstatus, t6 - sub a0, a3, a0 + mv a0, a2 + ret +11: + csrs sstatus, t6 + mv a0, a1 ret .previous From 3ed45d7ffa565876e627c4716a8b8b4986a471b1 Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Fri, 20 Apr 2018 12:37:14 -0700 Subject: [PATCH 02/12] MAINTAINERS: Add myself as a maintainer for SiFive's drivers There aren't actually any files in the tree that match these patterns right now, but we've just started submitting our drivers so I thought it would be good to make sure there's at least someone at SiFive who's listed as maintaining them. I'm leaving the RISC-V lists on here because: * As of today, all the RISC-V ASICs that people can actually buy are from SiFive -- though hopefully there'll be more soon! * The RTL for many of our devices is open source, so I anticipate these devices might make they way chips from other vendors. * We may standardize some of these devices as part of a RISC-V specification at some point in the future. I'm a bit swamped right now so I might not be the most active maintainer of these drivers, but I think it'd be good to make sure someone who has hardware access gets CC'd on updates to our drivers just as a sanity check. Hopefully that's an OK way to handle this. Signed-off-by: Palmer Dabbelt --- MAINTAINERS | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 9c125f705f78..a8f906d1d1e2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12764,6 +12764,14 @@ F: drivers/media/usb/siano/ F: drivers/media/usb/siano/ F: drivers/media/mmc/siano/ +SIFIVE DRIVERS +M: Palmer Dabbelt +L: linux-riscv@lists.infradead.org +T: git git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux.git +S: Supported +K: sifive +N: sifive + SILEAD TOUCHSCREEN DRIVER M: Hans de Goede L: linux-input@vger.kernel.org From 9c5217640e381294c76f5256933a113386971b7c Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Fri, 27 Apr 2018 18:15:07 -0700 Subject: [PATCH 03/12] MAINTAINERS: Update Albert's email, he's back at Berkeley When I was adding a MAINTAINERS entry for SiFive's drivers I realized that Albert's email is out of date -- he's gone back to Berkeley, so his SiFive email is technically defunct. This patch updates his entry to a current email address, hosted at Berkeley. Signed-off-by: Palmer Dabbelt --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index a8f906d1d1e2..c2d32726264f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12004,7 +12004,7 @@ F: drivers/mtd/nand/raw/r852.h RISC-V ARCHITECTURE M: Palmer Dabbelt -M: Albert Ou +M: Albert Ou L: linux-riscv@lists.infradead.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux.git S: Supported From 178e9fc47aaec1b8952b553444e94802d7570599 Mon Sep 17 00:00:00 2001 From: Alan Kao Date: Fri, 20 Apr 2018 07:27:49 +0800 Subject: [PATCH 04/12] perf: riscv: preliminary RISC-V support This patch provide a basic PMU, riscv_base_pmu, which supports two general hardware event, instructions and cycles. Furthermore, this PMU serves as a reference implementation to ease the portings in the future. riscv_base_pmu should be able to run on any RISC-V machine that conforms to the Priv-Spec. Note that the latest qemu model hasn't fully support a proper behavior of Priv-Spec 1.10 yet, but work around should be easy with very small fixes. Please check https://github.com/riscv/riscv-qemu/pull/115 for future updates. Cc: Nick Hu Cc: Greentime Hu Signed-off-by: Alan Kao Signed-off-by: Palmer Dabbelt --- arch/riscv/Kconfig | 14 + arch/riscv/include/asm/Kbuild | 1 + arch/riscv/include/asm/perf_event.h | 84 +++++ arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/perf_event.c | 485 ++++++++++++++++++++++++++++ 5 files changed, 586 insertions(+) create mode 100644 arch/riscv/include/asm/perf_event.h create mode 100644 arch/riscv/kernel/perf_event.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index cd4fd85fde84..4495604394e5 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -25,6 +25,7 @@ config RISCV select HAVE_DMA_API_DEBUG select HAVE_DMA_CONTIGUOUS select HAVE_GENERIC_DMA_COHERENT + select HAVE_PERF_EVENTS select IRQ_DOMAIN select NO_BOOTMEM select RISCV_ISA_A if SMP @@ -198,6 +199,19 @@ config RISCV_ISA_C config RISCV_ISA_A def_bool y +menu "supported PMU type" + depends on PERF_EVENTS + +config RISCV_BASE_PMU + bool "Base Performance Monitoring Unit" + def_bool y + help + A base PMU that serves as a reference implementation and has limited + feature of perf. It can run on any RISC-V machines so serves as the + fallback, but this option can also be disable to reduce kernel size. + +endmenu + endmenu menu "Kernel type" diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 4286a5f83876..576ffdca06ba 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -25,6 +25,7 @@ generic-y += kdebug.h generic-y += kmap_types.h generic-y += kvm_para.h generic-y += local.h +generic-y += local64.h generic-y += mm-arch-hooks.h generic-y += mman.h generic-y += module.h diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h new file mode 100644 index 000000000000..0e638a0c3feb --- /dev/null +++ b/arch/riscv/include/asm/perf_event.h @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2018 SiFive + * Copyright (C) 2018 Andes Technology Corporation + * + */ + +#ifndef _ASM_RISCV_PERF_EVENT_H +#define _ASM_RISCV_PERF_EVENT_H + +#include +#include + +#define RISCV_BASE_COUNTERS 2 + +/* + * The RISCV_MAX_COUNTERS parameter should be specified. + */ + +#ifdef CONFIG_RISCV_BASE_PMU +#define RISCV_MAX_COUNTERS 2 +#endif + +#ifndef RISCV_MAX_COUNTERS +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." +#endif + +/* + * These are the indexes of bits in counteren register *minus* 1, + * except for cycle. It would be coherent if it can directly mapped + * to counteren bit definition, but there is a *time* register at + * counteren[1]. Per-cpu structure is scarce resource here. + * + * According to the spec, an implementation can support counter up to + * mhpmcounter31, but many high-end processors has at most 6 general + * PMCs, we give the definition to MHPMCOUNTER8 here. + */ +#define RISCV_PMU_CYCLE 0 +#define RISCV_PMU_INSTRET 1 +#define RISCV_PMU_MHPMCOUNTER3 2 +#define RISCV_PMU_MHPMCOUNTER4 3 +#define RISCV_PMU_MHPMCOUNTER5 4 +#define RISCV_PMU_MHPMCOUNTER6 5 +#define RISCV_PMU_MHPMCOUNTER7 6 +#define RISCV_PMU_MHPMCOUNTER8 7 + +#define RISCV_OP_UNSUPP (-EOPNOTSUPP) + +struct cpu_hw_events { + /* # currently enabled events*/ + int n_events; + /* currently enabled events */ + struct perf_event *events[RISCV_MAX_COUNTERS]; + /* vendor-defined PMU data */ + void *platform; +}; + +struct riscv_pmu { + struct pmu *pmu; + + /* generic hw/cache events table */ + const int *hw_events; + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX]; + /* method used to map hw/cache events */ + int (*map_hw_event)(u64 config); + int (*map_cache_event)(u64 config); + + /* max generic hw events in map */ + int max_events; + /* number total counters, 2(base) + x(general) */ + int num_counters; + /* the width of the counter */ + int counter_width; + + /* vendor-defined PMU features */ + void *platform; + + irqreturn_t (*handle_irq)(int irq_num, void *dev); + int irq; +}; + +#endif /* _ASM_RISCV_PERF_EVENT_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 8586dd96c2f0..e1274fc03af4 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -39,4 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o +obj-$(CONFIG_PERF_EVENTS) += perf_event.o + clean: diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c new file mode 100644 index 000000000000..b0e10c4e9f77 --- /dev/null +++ b/arch/riscv/kernel/perf_event.c @@ -0,0 +1,485 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2008 Thomas Gleixner + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar + * Copyright (C) 2009 Jaswinder Singh Rajput + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra + * Copyright (C) 2009 Intel Corporation, + * Copyright (C) 2009 Google, Inc., Stephane Eranian + * Copyright 2014 Tilera Corporation. All Rights Reserved. + * Copyright (C) 2018 Andes Technology Corporation + * + * Perf_events support for RISC-V platforms. + * + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough + * functionality for perf event to fully work, this file provides + * the very basic framework only. + * + * For platform portings, please check Documentations/riscv/pmu.txt. + * + * The Copyright line includes x86 and tile ones. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static const struct riscv_pmu *riscv_pmu __read_mostly; +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); + +/* + * Hardware & cache maps and their methods + */ + +static const int riscv_hw_event_map[] = { + [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE, + [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET, + [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP, +}; + +#define C(x) PERF_COUNT_HW_CACHE_##x +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX] +[PERF_COUNT_HW_CACHE_OP_MAX] +[PERF_COUNT_HW_CACHE_RESULT_MAX] = { + [C(L1D)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(L1I)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(LL)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(DTLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(ITLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(BPU)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, +}; + +static int riscv_map_hw_event(u64 config) +{ + if (config >= riscv_pmu->max_events) + return -EINVAL; + + return riscv_pmu->hw_events[config]; +} + +int riscv_map_cache_decode(u64 config, unsigned int *type, + unsigned int *op, unsigned int *result) +{ + return -ENOENT; +} + +static int riscv_map_cache_event(u64 config) +{ + unsigned int type, op, result; + int err = -ENOENT; + int code; + + err = riscv_map_cache_decode(config, &type, &op, &result); + if (!riscv_pmu->cache_events || err) + return err; + + if (type >= PERF_COUNT_HW_CACHE_MAX || + op >= PERF_COUNT_HW_CACHE_OP_MAX || + result >= PERF_COUNT_HW_CACHE_RESULT_MAX) + return -EINVAL; + + code = (*riscv_pmu->cache_events)[type][op][result]; + if (code == RISCV_OP_UNSUPP) + return -EINVAL; + + return code; +} + +/* + * Low-level functions: reading/writing counters + */ + +static inline u64 read_counter(int idx) +{ + u64 val = 0; + + switch (idx) { + case RISCV_PMU_CYCLE: + val = csr_read(cycle); + break; + case RISCV_PMU_INSTRET: + val = csr_read(instret); + break; + default: + WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS); + return -EINVAL; + } + + return val; +} + +static inline void write_counter(int idx, u64 value) +{ + /* currently not supported */ + WARN_ON_ONCE(1); +} + +/* + * pmu->read: read and update the counter + * + * Other architectures' implementation often have a xxx_perf_event_update + * routine, which can return counter values when called in the IRQ, but + * return void when being called by the pmu->read method. + */ +static void riscv_pmu_read(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u64 prev_raw_count, new_raw_count; + u64 oldval; + int idx = hwc->idx; + u64 delta; + + do { + prev_raw_count = local64_read(&hwc->prev_count); + new_raw_count = read_counter(idx); + + oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count, + new_raw_count); + } while (oldval != prev_raw_count); + + /* + * delta is the value to update the counter we maintain in the kernel. + */ + delta = (new_raw_count - prev_raw_count) & + ((1ULL << riscv_pmu->counter_width) - 1); + local64_add(delta, &event->count); + /* + * Something like local64_sub(delta, &hwc->period_left) here is + * needed if there is an interrupt for perf. + */ +} + +/* + * State transition functions: + * + * stop()/start() & add()/del() + */ + +/* + * pmu->stop: stop the counter + */ +static void riscv_pmu_stop(struct perf_event *event, int flags) +{ + struct hw_perf_event *hwc = &event->hw; + + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); + hwc->state |= PERF_HES_STOPPED; + + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) { + riscv_pmu->pmu->read(event); + hwc->state |= PERF_HES_UPTODATE; + } +} + +/* + * pmu->start: start the event. + */ +static void riscv_pmu_start(struct perf_event *event, int flags) +{ + struct hw_perf_event *hwc = &event->hw; + + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) + return; + + if (flags & PERF_EF_RELOAD) { + WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE)); + + /* + * Set the counter to the period to the next interrupt here, + * if you have any. + */ + } + + hwc->state = 0; + perf_event_update_userpage(event); + + /* + * Since we cannot write to counters, this serves as an initialization + * to the delta-mechanism in pmu->read(); otherwise, the delta would be + * wrong when pmu->read is called for the first time. + */ + local64_set(&hwc->prev_count, read_counter(hwc->idx)); +} + +/* + * pmu->add: add the event to PMU. + */ +static int riscv_pmu_add(struct perf_event *event, int flags) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc = &event->hw; + + if (cpuc->n_events == riscv_pmu->num_counters) + return -ENOSPC; + + /* + * We don't have general conunters, so no binding-event-to-counter + * process here. + * + * Indexing using hwc->config generally not works, since config may + * contain extra information, but here the only info we have in + * hwc->config is the event index. + */ + hwc->idx = hwc->config; + cpuc->events[hwc->idx] = event; + cpuc->n_events++; + + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; + + if (flags & PERF_EF_START) + riscv_pmu->pmu->start(event, PERF_EF_RELOAD); + + return 0; +} + +/* + * pmu->del: delete the event from PMU. + */ +static void riscv_pmu_del(struct perf_event *event, int flags) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc = &event->hw; + + cpuc->events[hwc->idx] = NULL; + cpuc->n_events--; + riscv_pmu->pmu->stop(event, PERF_EF_UPDATE); + perf_event_update_userpage(event); +} + +/* + * Interrupt: a skeletion for reference. + */ + +static DEFINE_MUTEX(pmc_reserve_mutex); + +irqreturn_t riscv_base_pmu_handle_irq(int irq_num, void *dev) +{ + return IRQ_NONE; +} + +static int reserve_pmc_hardware(void) +{ + int err = 0; + + mutex_lock(&pmc_reserve_mutex); + if (riscv_pmu->irq >= 0 && riscv_pmu->handle_irq) { + err = request_irq(riscv_pmu->irq, riscv_pmu->handle_irq, + IRQF_PERCPU, "riscv-base-perf", NULL); + } + mutex_unlock(&pmc_reserve_mutex); + + return err; +} + +void release_pmc_hardware(void) +{ + mutex_lock(&pmc_reserve_mutex); + if (riscv_pmu->irq >= 0) + free_irq(riscv_pmu->irq, NULL); + mutex_unlock(&pmc_reserve_mutex); +} + +/* + * Event Initialization/Finalization + */ + +static atomic_t riscv_active_events = ATOMIC_INIT(0); + +static void riscv_event_destroy(struct perf_event *event) +{ + if (atomic_dec_return(&riscv_active_events) == 0) + release_pmc_hardware(); +} + +static int riscv_event_init(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + struct hw_perf_event *hwc = &event->hw; + int err; + int code; + + if (atomic_inc_return(&riscv_active_events) == 1) { + err = reserve_pmc_hardware(); + + if (err) { + pr_warn("PMC hardware not available\n"); + atomic_dec(&riscv_active_events); + return -EBUSY; + } + } + + switch (event->attr.type) { + case PERF_TYPE_HARDWARE: + code = riscv_pmu->map_hw_event(attr->config); + break; + case PERF_TYPE_HW_CACHE: + code = riscv_pmu->map_cache_event(attr->config); + break; + case PERF_TYPE_RAW: + return -EOPNOTSUPP; + default: + return -ENOENT; + } + + event->destroy = riscv_event_destroy; + if (code < 0) { + event->destroy(event); + return code; + } + + /* + * idx is set to -1 because the index of a general event should not be + * decided until binding to some counter in pmu->add(). + * + * But since we don't have such support, later in pmu->add(), we just + * use hwc->config as the index instead. + */ + hwc->config = code; + hwc->idx = -1; + + return 0; +} + +/* + * Initialization + */ + +static struct pmu min_pmu = { + .name = "riscv-base", + .event_init = riscv_event_init, + .add = riscv_pmu_add, + .del = riscv_pmu_del, + .start = riscv_pmu_start, + .stop = riscv_pmu_stop, + .read = riscv_pmu_read, +}; + +static const struct riscv_pmu riscv_base_pmu = { + .pmu = &min_pmu, + .max_events = ARRAY_SIZE(riscv_hw_event_map), + .map_hw_event = riscv_map_hw_event, + .hw_events = riscv_hw_event_map, + .map_cache_event = riscv_map_cache_event, + .cache_events = &riscv_cache_event_map, + .counter_width = 63, + .num_counters = RISCV_BASE_COUNTERS + 0, + .handle_irq = &riscv_base_pmu_handle_irq, + + /* This means this PMU has no IRQ. */ + .irq = -1, +}; + +static const struct of_device_id riscv_pmu_of_ids[] = { + {.compatible = "riscv,base-pmu", .data = &riscv_base_pmu}, + { /* sentinel value */ } +}; + +int __init init_hw_perf_events(void) +{ + struct device_node *node = of_find_node_by_type(NULL, "pmu"); + const struct of_device_id *of_id; + + riscv_pmu = &riscv_base_pmu; + + if (node) { + of_id = of_match_node(riscv_pmu_of_ids, node); + + if (of_id) + riscv_pmu = of_id->data; + } + + perf_pmu_register(riscv_pmu->pmu, "cpu", PERF_TYPE_RAW); + return 0; +} +arch_initcall(init_hw_perf_events); From 0d431558d7fd1b67f81ff13a502bb803b76d6005 Mon Sep 17 00:00:00 2001 From: Alan Kao Date: Fri, 20 Apr 2018 07:27:50 +0800 Subject: [PATCH 05/12] perf: riscv: Add Document for Future Porting Guide Reviewed-by: Alex Solomatnikov Cc: Nick Hu Cc: Greentime Hu Signed-off-by: Alan Kao Signed-off-by: Palmer Dabbelt --- Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++ 1 file changed, 249 insertions(+) create mode 100644 Documentation/riscv/pmu.txt diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt new file mode 100644 index 000000000000..b29f03a6d82f --- /dev/null +++ b/Documentation/riscv/pmu.txt @@ -0,0 +1,249 @@ +Supporting PMUs on RISC-V platforms +========================================== +Alan Kao , Mar 2018 + +Introduction +------------ + +As of this writing, perf_event-related features mentioned in The RISC-V ISA +Privileged Version 1.10 are as follows: +(please check the manual for more details) + +* [m|s]counteren +* mcycle[h], cycle[h] +* minstret[h], instret[h] +* mhpeventx, mhpcounterx[h] + +With such function set only, porting perf would require a lot of work, due to +the lack of the following general architectural performance monitoring features: + +* Enabling/Disabling counters + Counters are just free-running all the time in our case. +* Interrupt caused by counter overflow + No such feature in the spec. +* Interrupt indicator + It is not possible to have many interrupt ports for all counters, so an + interrupt indicator is required for software to tell which counter has + just overflowed. +* Writing to counters + There will be an SBI to support this since the kernel cannot modify the + counters [1]. Alternatively, some vendor considers to implement + hardware-extension for M-S-U model machines to write counters directly. + +This document aims to provide developers a quick guide on supporting their +PMUs in the kernel. The following sections briefly explain perf' mechanism +and todos. + +You may check previous discussions here [1][2]. Also, it might be helpful +to check the appendix for related kernel structures. + + +1. Initialization +----------------- + +*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains +various methods according to perf's internal convention and PMU-specific +parameters. One should declare such instance to represent the PMU. By default, +*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very +basic support to a baseline QEMU model. + +Then he/she can either assign the instance's pointer to *riscv_pmu* so that +the minimal and already-implemented logic can be leveraged, or invent his/her +own *riscv_init_platform_pmu* implementation. + +In other words, existing sources of *riscv_base_pmu* merely provide a +reference implementation. Developers can flexibly decide how many parts they +can leverage, and in the most extreme case, they can customize every function +according to their needs. + + +2. Event Initialization +----------------------- + +When a user launches a perf command to monitor some events, it is first +interpreted by the userspace perf tool into multiple *perf_event_open* +system calls, and then each of them calls to the body of *event_init* +member function that was assigned in the previous step. In *riscv_base_pmu*'s +case, it is *riscv_event_init*. + +The main purpose of this function is to translate the event provided by user +into bitmap, so that HW-related control registers or counters can directly be +manipulated. The translation is based on the mappings and methods provided in +*riscv_pmu*. + +Note that some features can be done in this stage as well: + +(1) interrupt setting, which is stated in the next section; +(2) privilege level setting (user space only, kernel space only, both); +(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*; +(4) tweaks for non-sampling events, which will be utilized by functions such as +*perf_adjust_period*, usually something like the follows: + +if (!is_sampling_event(event)) { + hwc->sample_period = x86_pmu.max_period; + hwc->last_period = hwc->sample_period; + local64_set(&hwc->period_left, hwc->sample_period); +} + +In the case of *riscv_base_pmu*, only (3) is provided for now. + + +3. Interrupt +------------ + +3.1. Interrupt Initialization + +This often occurs at the beginning of the *event_init* method. In common +practice, this should be a code segment like + +int x86_reserve_hardware(void) +{ + int err = 0; + + if (!atomic_inc_not_zero(&pmc_refcount)) { + mutex_lock(&pmc_reserve_mutex); + if (atomic_read(&pmc_refcount) == 0) { + if (!reserve_pmc_hardware()) + err = -EBUSY; + else + reserve_ds_buffers(); + } + if (!err) + atomic_inc(&pmc_refcount); + mutex_unlock(&pmc_reserve_mutex); + } + + return err; +} + +And the magic is in *reserve_pmc_hardware*, which usually does atomic +operations to make implemented IRQ accessible from some global function pointer. +*release_pmc_hardware* serves the opposite purpose, and it is used in event +destructors mentioned in previous section. + +(Note: From the implementations in all the architectures, the *reserve/release* +pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading. +It does NOT deal with the binding between an event and a physical counter, +which will be introduced in the next section.) + +3.2. IRQ Structure + +Basically, a IRQ runs the following pseudo code: + +for each hardware counter that triggered this overflow + + get the event of this counter + + // following two steps are defined as *read()*, + // check the section Reading/Writing Counters for details. + count the delta value since previous interrupt + update the event->count (# event occurs) by adding delta, and + event->hw.period_left by subtracting delta + + if the event overflows + sample data + set the counter appropriately for the next overflow + + if the event overflows again + too frequently, throttle this event + fi + fi + +end for + +However as of this writing, none of the RISC-V implementations have designed an +interrupt for perf, so the details are to be completed in the future. + +4. Reading/Writing Counters +--------------------------- + +They seem symmetric but perf treats them quite differently. For reading, there +is a *read* interface in *struct pmu*, but it serves more than just reading. +According to the context, the *read* function not only reads the content of the +counter (event->count), but also updates the left period to the next interrupt +(event->hw.period_left). + +But the core of perf does not need direct write to counters. Writing counters +is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one +has to set the counter to a good value for the next interrupt; 2) inside the IRQ +it should set the counter to the same resonable value. + +Reading is not a problem in RISC-V but writing would need some effort, since +counters are not allowed to be written by S-mode. + + +5. add()/del()/start()/stop() +----------------------------- + +Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop() +starts/stop the counter of some event in the PMU. All of them take the same +arguments: *struct perf_event *event* and *int flag*. + +Consider perf as a state machine, then you will find that these functions serve +as the state transition process between those states. +Three states (event->hw.state) are defined: + +* PERF_HES_STOPPED: the counter is stopped +* PERF_HES_UPTODATE: the event->count is up-to-date +* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now + +A normal flow of these state transitions are as follows: + +* A user launches a perf event, resulting in calling to *event_init*. +* When being context-switched in, *add* is called by the perf core, with a flag + PERF_EF_START, which means that the event should be started after it is added. + At this stage, a general event is bound to a physical counter, if any. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now + stopped, and the (software) event count does not need updating. +** *start* is then called, and the counter is enabled. + With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check + previous section for detail). + Nothing is written if the flag does not contain PERF_EF_RELOAD. + The state now is reset to none, because it is neither stopped nor updated + (the counting already started) +* When being context-switched out, *del* is called. It then checks out all the + events in the PMU and calls *stop* to update their counts. +** *stop* is called by *del* + and the perf core with flag PERF_EF_UPDATE, and it often shares the same + subroutine as *read* with the same logic. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again. + +** Life cycle of these two pairs: *add* and *del* are called repeatedly as + tasks switch in-and-out; *start* and *stop* is also called when the perf core + needs a quick stop-and-start, for instance, when the interrupt period is being + adjusted. + +Current implementation is sufficient for now and can be easily extended to +features in the future. + +A. Related Structures +--------------------- + +* struct pmu: include/linux/perf_event.h +* struct riscv_pmu: arch/riscv/include/asm/perf_event.h + + Both structures are designed to be read-only. + + *struct pmu* defines some function pointer interfaces, and most of them take +*struct perf_event* as a main argument, dealing with perf events according to +perf's internal state machine (check kernel/events/core.c for details). + + *struct riscv_pmu* defines PMU-specific parameters. The naming follows the +convention of all other architectures. + +* struct perf_event: include/linux/perf_event.h +* struct hw_perf_event + + The generic structure that represents perf events, and the hardware-related +details. + +* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h + + The structure that holds the status of events, has two fixed members: +the number of events and the array of the events. + +References +---------- + +[1] https://github.com/riscv/riscv-linux/pull/124 +[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA From 2861ae302f6bf7221db2dac5bd4cf0f2e4cab13b Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck Date: Fri, 1 Jun 2018 17:21:21 +0200 Subject: [PATCH 06/12] riscv: use NULL instead of a plain 0 sbi_remote_sfence_vma() & sbi_remote_fence_i() takes a pointer as first argument but some macros call them with a plain 0 which, while legal C, is frowned upon in the kernel. Change this by replacing the 0 by NULL. Signed-off-by: Luc Van Oostenryck Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/cacheflush.h | 2 +- arch/riscv/include/asm/tlbflush.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h index efd89a88d2d0..8f13074413a7 100644 --- a/arch/riscv/include/asm/cacheflush.h +++ b/arch/riscv/include/asm/cacheflush.h @@ -47,7 +47,7 @@ static inline void flush_dcache_page(struct page *page) #else /* CONFIG_SMP */ -#define flush_icache_all() sbi_remote_fence_i(0) +#define flush_icache_all() sbi_remote_fence_i(NULL) void flush_icache_mm(struct mm_struct *mm, bool local); #endif /* CONFIG_SMP */ diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 7b209aec355d..85c2d8bae957 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -49,7 +49,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, #include -#define flush_tlb_all() sbi_remote_sfence_vma(0, 0, -1) +#define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) #define flush_tlb_range(vma, start, end) \ sbi_remote_sfence_vma(mm_cpumask((vma)->vm_mm)->bits, \ From 9bf97390b3030b68a465681043a66461c7cf6a65 Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck Date: Fri, 1 Jun 2018 17:21:22 +0200 Subject: [PATCH 07/12] riscv: no __user for probe_kernel_address() In is_valid_bugaddr(), probe_kernel_address() is called with the PC casted to (bug_inst_t __user *) but this function only take a plain void* as argument, not a __user pointer. Fix this by removing the unnneded __user in the cast. Signed-off-by: Luc Van Oostenryck Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/traps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 93132cb59184..4c92e5af86d3 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -160,7 +160,7 @@ int is_valid_bugaddr(unsigned long pc) if (pc < PAGE_OFFSET) return 0; - if (probe_kernel_address((bug_insn_t __user *)pc, insn)) + if (probe_kernel_address((bug_insn_t *)pc, insn)) return 0; return (insn == __BUG_INSN); } From 86406d51d3600bfa2b6f86e1e6bfce712bec0d53 Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck Date: Sat, 9 Jun 2018 02:33:51 +0200 Subject: [PATCH 08/12] riscv: split the declaration of __copy_user We use a single __copy_user assembly function to copy memory both from and to userspace. While this works, it triggers sparse errors because we're implicitly casting between the kernel and user address spaces by calling __copy_user. This patch splits the C declaration into a pair of functions, __asm_copy_{to,from}_user, that have sane semantics WRT __user. This split make things fine from sparse's point of view. The assembly implementation keeps a single definition but add a double ENTRY() for it, one for __asm_copy_to_user and another one for __asm_copy_from_user. The result is a spare-safe implementation that pays no performance or code size penalty. Signed-off-by: Luc Van Oostenryck Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/uaccess.h | 8 +++++--- arch/riscv/kernel/riscv_ksyms.c | 3 ++- arch/riscv/lib/uaccess.S | 6 ++++-- 3 files changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h index 14b0b22fb578..473cfc84e412 100644 --- a/arch/riscv/include/asm/uaccess.h +++ b/arch/riscv/include/asm/uaccess.h @@ -392,19 +392,21 @@ do { \ }) -extern unsigned long __must_check __copy_user(void __user *to, +extern unsigned long __must_check __asm_copy_to_user(void __user *to, + const void *from, unsigned long n); +extern unsigned long __must_check __asm_copy_from_user(void *to, const void __user *from, unsigned long n); static inline unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n) { - return __copy_user(to, from, n); + return __asm_copy_to_user(to, from, n); } static inline unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n) { - return __copy_user(to, from, n); + return __asm_copy_from_user(to, from, n); } extern long strncpy_from_user(char *dest, const char __user *src, long count); diff --git a/arch/riscv/kernel/riscv_ksyms.c b/arch/riscv/kernel/riscv_ksyms.c index 551734248748..f247d6d2137c 100644 --- a/arch/riscv/kernel/riscv_ksyms.c +++ b/arch/riscv/kernel/riscv_ksyms.c @@ -13,6 +13,7 @@ * Assembly functions that may be used (directly or indirectly) by modules */ EXPORT_SYMBOL(__clear_user); -EXPORT_SYMBOL(__copy_user); +EXPORT_SYMBOL(__asm_copy_to_user); +EXPORT_SYMBOL(__asm_copy_from_user); EXPORT_SYMBOL(memset); EXPORT_SYMBOL(memcpy); diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S index 58fb2877c865..f8e6440cad6e 100644 --- a/arch/riscv/lib/uaccess.S +++ b/arch/riscv/lib/uaccess.S @@ -13,7 +13,8 @@ _epc: .previous .endm -ENTRY(__copy_user) +ENTRY(__asm_copy_to_user) +ENTRY(__asm_copy_from_user) /* Enable access to user memory */ li t6, SR_SUM @@ -63,7 +64,8 @@ ENTRY(__copy_user) addi a0, a0, 1 bltu a1, a3, 5b j 3b -ENDPROC(__copy_user) +ENDPROC(__asm_copy_to_user) +ENDPROC(__asm_copy_from_user) ENTRY(__clear_user) From 889d746edd02a4498d80df3a12017d484cc78e5c Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck Date: Thu, 31 May 2018 17:42:01 +0200 Subject: [PATCH 09/12] riscv: add riscv-specific predefines to CHECKFLAGS RISC-V uses the macro __riscv_xlen, predefined by GCC, to make the distinction between 32 or 64 bit code. However, sparse doesn't know anything about this macro which lead to wrong warnings and failures. Fix this by adding a define of __riscv_xlen to CHECKFLAGS and add one for __riscv too. Signed-off-by: Luc Van Oostenryck Signed-off-by: Palmer Dabbelt --- arch/riscv/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 76e958a5414a..6d4a5f6c3f4f 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -71,6 +71,9 @@ KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-relax) # architectures. It's faster to have GCC emit only aligned accesses. KBUILD_CFLAGS += $(call cc-option,-mstrict-align) +# arch specific predefines for sparse +CHECKFLAGS += -D__riscv -D__riscv_xlen=$(BITS) + head-y := arch/riscv/kernel/head.o core-y += arch/riscv/kernel/ arch/riscv/mm/ From 1dd985229d5fc19359250bc0e4235aff217672b2 Mon Sep 17 00:00:00 2001 From: Alan Kao Date: Tue, 8 May 2018 11:21:57 +0800 Subject: [PATCH 10/12] riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set The EXPORT_SYMBOL(_mcount) for RISC-V ended up inside a CONFIG_DYNAMIC_FTRACE ifdef. If you enable modules without enabling CONFIG_DYNAMIC_FTRACE then you'll get a build error without this patch because the modules won't be able to find _mcount. The new behavior is to export _mcount whenever CONFIG_FUNCTION_TRACER is defined. This matches what every other architecture is doing. Signed-off-by: Alan Kao Cc: Greentime Hu Cc: Zong Li Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/mcount.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/kernel/mcount.S b/arch/riscv/kernel/mcount.S index ce9bdc57a2a1..5721624886a1 100644 --- a/arch/riscv/kernel/mcount.S +++ b/arch/riscv/kernel/mcount.S @@ -126,5 +126,5 @@ do_trace: RESTORE_ABI_STATE ret ENDPROC(_mcount) -EXPORT_SYMBOL(_mcount) #endif +EXPORT_SYMBOL(_mcount) From 77aa85de16aeefd75d639737c7bfcf0d2604e471 Mon Sep 17 00:00:00 2001 From: Andreas Schwab Date: Thu, 7 Jun 2018 12:27:27 +0200 Subject: [PATCH 11/12] RISC-V: Handle R_RISCV_32 in modules With CONFIG_MODVERSIONS=y the R_RISCV_32 relocation is used by the __kcrctab section. Signed-off-by: Andreas Schwab Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/module.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c index 5dddba301d0a..1d5e9b934b8c 100644 --- a/arch/riscv/kernel/module.c +++ b/arch/riscv/kernel/module.c @@ -17,6 +17,17 @@ #include #include +static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v) +{ + if (v != (u32)v) { + pr_err("%s: value %016llx out of range for 32-bit field\n", + me->name, v); + return -EINVAL; + } + *location = v; + return 0; +} + static int apply_r_riscv_64_rela(struct module *me, u32 *location, Elf_Addr v) { *(u64 *)location = v; @@ -265,6 +276,7 @@ static int apply_r_riscv_sub32_rela(struct module *me, u32 *location, static int (*reloc_handlers_rela[]) (struct module *me, u32 *location, Elf_Addr v) = { + [R_RISCV_32] = apply_r_riscv_32_rela, [R_RISCV_64] = apply_r_riscv_64_rela, [R_RISCV_BRANCH] = apply_r_riscv_branch_rela, [R_RISCV_JAL] = apply_r_riscv_jal_rela, From 24a130ccfe58e0ef7907ce63030ad0ff7d7c633b Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Thu, 8 Mar 2018 13:57:58 -0800 Subject: [PATCH 12/12] RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig The SBI exists on all RISC-V systems, so there's no reason not to compile this driver in. Signed-off-by: Palmer Dabbelt --- arch/riscv/configs/defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig index bca0eee733b0..07326466871b 100644 --- a/arch/riscv/configs/defconfig +++ b/arch/riscv/configs/defconfig @@ -44,6 +44,7 @@ CONFIG_INPUT_MOUSEDEV=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_OF_PLATFORM=y +CONFIG_HVC_RISCV_SBI=y # CONFIG_PTP_1588_CLOCK is not set CONFIG_DRM=y CONFIG_DRM_RADEON=y