1
0
Fork 0
alistair23-linux/arch/arm64/include/asm/memory.h

356 lines
11 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Based on arch/arm/include/asm/memory.h
*
* Copyright (C) 2000-2002 Russell King
* Copyright (C) 2012 ARM Ltd.
*
* Note: this file should not be included by non-asm/.h files
*/
#ifndef __ASM_MEMORY_H
#define __ASM_MEMORY_H
#include <linux/compiler.h>
#include <linux/const.h>
#include <linux/sizes.h>
#include <linux/types.h>
#include <asm/bug.h>
#include <asm/page-def.h>
arm64: Fix overlapping VA allocations PCI IO space was intended to be 16MiB, at 32MiB below MODULES_VADDR, but commit d1e6dc91b532d3d3 ("arm64: Add architectural support for PCI") extended this to cover the full 32MiB. The final 8KiB of this 32MiB is also allocated for the fixmap, allowing for potential clashes between the two. This change was masked by assumptions in mem_init and the page table dumping code, which assumed the I/O space to be 16MiB long through seaparte hard-coded definitions. This patch changes the definition of the PCI I/O space allocation to live in asm/memory.h, along with the other VA space allocations. As the fixmap allocation depends on the number of fixmap entries, this is moved below the PCI I/O space allocation. Both the fixmap and PCI I/O space are guarded with 2MB of padding. Sites assuming the I/O space was 16MiB are moved over use new PCI_IO_{START,END} definitions, which will keep in sync with the size of the IO space (now restored to 16MiB). As a useful side effect, the use of the new PCI_IO_{START,END} definitions prevents a build issue in the dumping code due to a (now redundant) missing include of io.h for PCI_IOBASE. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Will Deacon <will.deacon@arm.com> [catalin.marinas@arm.com: reorder FIXADDR and PCI_IO address_markers_idx enum] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-01-22 11:20:35 -07:00
/*
* Size of the PCI I/O space. This must remain a power of two so that
* IO_SPACE_LIMIT acts as a mask for the low bits of I/O addresses.
*/
#define PCI_IO_SIZE SZ_16M
/*
* VMEMMAP_SIZE - allows the whole linear region to be covered by
* a struct page array
*
* If we are configured with a 52-bit kernel VA then our VMEMMAP_SIZE
* needs to cover the memory region from the beginning of the 52-bit
* PAGE_OFFSET all the way to PAGE_END for 48-bit. This allows us to
* keep a constant PAGE_OFFSET and "fallback" to using the higher end
* of the VMEMMAP where 52-bit support is not available in hardware.
*/
#define VMEMMAP_SIZE ((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) \
>> (PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT))
/*
* PAGE_OFFSET - the virtual address of the start of the linear map, at the
* start of the TTBR1 address space.
* PAGE_END - the end of the linear map, where all other kernel mappings begin.
* KIMAGE_VADDR - the virtual address of the start of the kernel image.
* VA_BITS - the maximum number of bits for virtual addresses.
*/
#define VA_BITS (CONFIG_ARM64_VA_BITS)
#define _PAGE_OFFSET(va) (-(UL(1) << (va)))
#define PAGE_OFFSET (_PAGE_OFFSET(VA_BITS))
#define KIMAGE_VADDR (MODULES_END)
arm64: kasan: Switch to using KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET is a constant that is supplied to gcc as a command line argument and affects the codegen of the inline address sanetiser. Essentially, for an example memory access: *ptr1 = val; The compiler will insert logic similar to the below: shadowValue = *(ptr1 >> KASAN_SHADOW_SCALE_SHIFT + KASAN_SHADOW_OFFSET) if (somethingWrong(shadowValue)) flagAnError(); This code sequence is inserted into many places, thus KASAN_SHADOW_OFFSET is essentially baked into many places in the kernel text. If we want to run a single kernel binary with multiple address spaces, then we need to do this with KASAN_SHADOW_OFFSET fixed. Thankfully, due to the way the KASAN_SHADOW_OFFSET is used to provide shadow addresses we know that the end of the shadow region is constant w.r.t. VA space size: KASAN_SHADOW_END = ~0 >> KASAN_SHADOW_SCALE_SHIFT + KASAN_SHADOW_OFFSET This means that if we increase the size of the VA space, the start of the KASAN region expands into lower addresses whilst the end of the KASAN region is fixed. Currently the arm64 code computes KASAN_SHADOW_OFFSET at build time via build scripts with the VA size used as a parameter. (There are build time checks in the C code too to ensure that expected values are being derived). It is sufficient, and indeed is a simplification, to remove the build scripts (and build time checks) entirely and instead provide KASAN_SHADOW_OFFSET values. This patch removes the logic to compute the KASAN_SHADOW_OFFSET in the arm64 Makefile, and instead we adopt the approach used by x86 to supply offset values in kConfig. To help debug/develop future VA space changes, the Makefile logic has been preserved in a script file in the arm64 Documentation folder. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-07 09:55:15 -06:00
#define BPF_JIT_REGION_START (KASAN_SHADOW_END)
#define BPF_JIT_REGION_SIZE (SZ_128M)
#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
#define MODULES_END (MODULES_VADDR + MODULES_VSIZE)
#define MODULES_VADDR (BPF_JIT_REGION_END)
arm64: add support for kernel ASLR This adds support for KASLR is implemented, based on entropy provided by the bootloader in the /chosen/kaslr-seed DT property. Depending on the size of the address space (VA_BITS) and the page size, the entropy in the virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all 4 levels), with the sidenote that displacements that result in the kernel image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB granule kernels, respectively) are not allowed, and will be rounded up to an acceptable value. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is randomized independently from the core kernel. This makes it less likely that the location of core kernel data structures can be determined by an adversary, but causes all function calls from modules into the core kernel to be resolved via entries in the module PLTs. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is randomized by choosing a page aligned 128 MB region inside the interval [_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of entropy (depending on page size), independently of the kernel randomization, but still guarantees that modules are within the range of relative branch and jump instructions (with the caveat that, since the module region is shared with other uses of the vmalloc area, modules may need to be loaded further away if the module region is exhausted) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 06:12:01 -07:00
#define MODULES_VSIZE (SZ_128M)
#define VMEMMAP_START (-VMEMMAP_SIZE - SZ_2M)
#define PCI_IO_END (VMEMMAP_START - SZ_2M)
arm64: Fix overlapping VA allocations PCI IO space was intended to be 16MiB, at 32MiB below MODULES_VADDR, but commit d1e6dc91b532d3d3 ("arm64: Add architectural support for PCI") extended this to cover the full 32MiB. The final 8KiB of this 32MiB is also allocated for the fixmap, allowing for potential clashes between the two. This change was masked by assumptions in mem_init and the page table dumping code, which assumed the I/O space to be 16MiB long through seaparte hard-coded definitions. This patch changes the definition of the PCI I/O space allocation to live in asm/memory.h, along with the other VA space allocations. As the fixmap allocation depends on the number of fixmap entries, this is moved below the PCI I/O space allocation. Both the fixmap and PCI I/O space are guarded with 2MB of padding. Sites assuming the I/O space was 16MiB are moved over use new PCI_IO_{START,END} definitions, which will keep in sync with the size of the IO space (now restored to 16MiB). As a useful side effect, the use of the new PCI_IO_{START,END} definitions prevents a build issue in the dumping code due to a (now redundant) missing include of io.h for PCI_IOBASE. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Will Deacon <will.deacon@arm.com> [catalin.marinas@arm.com: reorder FIXADDR and PCI_IO address_markers_idx enum] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-01-22 11:20:35 -07:00
#define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE)
#define FIXADDR_TOP (PCI_IO_START - SZ_2M)
#if VA_BITS > 48
#define VA_BITS_MIN (48)
#else
#define VA_BITS_MIN (VA_BITS)
#endif
#define _PAGE_END(va) (-(UL(1) << ((va) - 1)))
#define KERNEL_START _text
#define KERNEL_END _end
#ifdef CONFIG_ARM64_VA_BITS_52
#define MAX_USER_VA_BITS 52
#else
#define MAX_USER_VA_BITS VA_BITS
#endif
/*
* Generic and tag-based KASAN require 1/8th and 1/16th of the kernel virtual
* address space for the shadow region respectively. They can bloat the stack
* significantly, so double the (minimum) stack size when they are in use.
*/
#ifdef CONFIG_KASAN
arm64: kasan: Switch to using KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET is a constant that is supplied to gcc as a command line argument and affects the codegen of the inline address sanetiser. Essentially, for an example memory access: *ptr1 = val; The compiler will insert logic similar to the below: shadowValue = *(ptr1 >> KASAN_SHADOW_SCALE_SHIFT + KASAN_SHADOW_OFFSET) if (somethingWrong(shadowValue)) flagAnError(); This code sequence is inserted into many places, thus KASAN_SHADOW_OFFSET is essentially baked into many places in the kernel text. If we want to run a single kernel binary with multiple address spaces, then we need to do this with KASAN_SHADOW_OFFSET fixed. Thankfully, due to the way the KASAN_SHADOW_OFFSET is used to provide shadow addresses we know that the end of the shadow region is constant w.r.t. VA space size: KASAN_SHADOW_END = ~0 >> KASAN_SHADOW_SCALE_SHIFT + KASAN_SHADOW_OFFSET This means that if we increase the size of the VA space, the start of the KASAN region expands into lower addresses whilst the end of the KASAN region is fixed. Currently the arm64 code computes KASAN_SHADOW_OFFSET at build time via build scripts with the VA size used as a parameter. (There are build time checks in the C code too to ensure that expected values are being derived). It is sufficient, and indeed is a simplification, to remove the build scripts (and build time checks) entirely and instead provide KASAN_SHADOW_OFFSET values. This patch removes the logic to compute the KASAN_SHADOW_OFFSET in the arm64 Makefile, and instead we adopt the approach used by x86 to supply offset values in kConfig. To help debug/develop future VA space changes, the Makefile logic has been preserved in a script file in the arm64 Documentation folder. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-07 09:55:15 -06:00
#define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
#define KASAN_SHADOW_END ((UL(1) << (64 - KASAN_SHADOW_SCALE_SHIFT)) \
+ KASAN_SHADOW_OFFSET)
#define KASAN_THREAD_SHIFT 1
#else
#define KASAN_THREAD_SHIFT 0
#define KASAN_SHADOW_END (_PAGE_END(VA_BITS_MIN))
#endif /* CONFIG_KASAN */
#define MIN_THREAD_SHIFT (14 + KASAN_THREAD_SHIFT)
/*
* VMAP'd stacks are allocated at page granularity, so we must ensure that such
* stacks are a multiple of page size.
*/
#if defined(CONFIG_VMAP_STACK) && (MIN_THREAD_SHIFT < PAGE_SHIFT)
#define THREAD_SHIFT PAGE_SHIFT
#else
#define THREAD_SHIFT MIN_THREAD_SHIFT
#endif
#if THREAD_SHIFT >= PAGE_SHIFT
#define THREAD_SIZE_ORDER (THREAD_SHIFT - PAGE_SHIFT)
#endif
#define THREAD_SIZE (UL(1) << THREAD_SHIFT)
/*
* By aligning VMAP'd stacks to 2 * THREAD_SIZE, we can detect overflow by
* checking sp & (1 << THREAD_SHIFT), which we can do cheaply in the entry
* assembly.
*/
#ifdef CONFIG_VMAP_STACK
#define THREAD_ALIGN (2 * THREAD_SIZE)
#else
#define THREAD_ALIGN THREAD_SIZE
#endif
#define IRQ_STACK_SIZE THREAD_SIZE
arm64: add VMAP_STACK overflow detection This patch adds stack overflow detection to arm64, usable when vmap'd stacks are in use. Overflow is detected in a small preamble executed for each exception entry, which checks whether there is enough space on the current stack for the general purpose registers to be saved. If there is not enough space, the overflow handler is invoked on a per-cpu overflow stack. This approach preserves the original exception information in ESR_EL1 (and where appropriate, FAR_EL1). Task and IRQ stacks are aligned to double their size, enabling overflow to be detected with a single bit test. For example, a 16K stack is aligned to 32K, ensuring that bit 14 of the SP must be zero. On an overflow (or underflow), this bit is flipped. Thus, overflow (of less than the size of the stack) can be detected by testing whether this bit is set. The overflow check is performed before any attempt is made to access the stack, avoiding recursive faults (and the loss of exception information these would entail). As logical operations cannot be performed on the SP directly, the SP is temporarily swapped with a general purpose register using arithmetic operations to enable the test to be performed. This gives us a useful error message on stack overflow, as can be trigger with the LKDTM overflow test: [ 305.388749] lkdtm: Performing direct entry OVERFLOW [ 305.395444] Insufficient stack space to handle exception! [ 305.395482] ESR: 0x96000047 -- DABT (current EL) [ 305.399890] FAR: 0xffff00000a5e7f30 [ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000] [ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000] [ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0] [ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.412785] Hardware name: linux,dummy-virt (DT) [ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000 [ 305.419221] PC is at recursive_loop+0x10/0x48 [ 305.421637] LR is at recursive_loop+0x38/0x48 [ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145 [ 305.428020] sp : ffff00000a5e7f50 [ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00 [ 305.433191] x27: ffff000008981000 x26: ffff000008f80400 [ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8 [ 305.440369] x23: ffff000008f80138 x22: 0000000000000009 [ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188 [ 305.444552] x19: 0000000000000013 x18: 0000000000000006 [ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8 [ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8 [ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff [ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d [ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770 [ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1 [ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000 [ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400 [ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012 [ 305.467724] Kernel panic - not syncing: kernel stack overflow [ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.473325] Hardware name: linux,dummy-virt (DT) [ 305.475070] Call trace: [ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378 [ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20 [ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8 [ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280 [ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70 [ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128 [ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0) [ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000 [ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313 [ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548 [ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d [ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013 [ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138 [ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000 [ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50 [ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000 [ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330 [ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c [ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48 [ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20 [ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28 [ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170 [ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8 [ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128 [ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0 [ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0 [ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000) [ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080 [ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f [ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038 [ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000 [ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458 [ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0 [ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040 [ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28 [ 305.504720] Kernel Offset: disabled [ 305.505189] CPU features: 0x002082 [ 305.505473] Memory Limit: none [ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow This patch was co-authored by Ard Biesheuvel and Mark Rutland. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-07-14 13:30:35 -06:00
#define OVERFLOW_STACK_SIZE SZ_4K
/*
* Alignment of kernel segments (e.g. .text, .data).
*/
#if defined(CONFIG_DEBUG_ALIGN_RODATA)
/*
* 4 KB granule: 1 level 2 entry
* 16 KB granule: 128 level 3 entries, with contiguous bit
* 64 KB granule: 32 level 3 entries, with contiguous bit
*/
#define SEGMENT_ALIGN SZ_2M
#else
/*
* 4 KB granule: 16 level 3 entries, with contiguous bit
* 16 KB granule: 4 level 3 entries, without contiguous bit
* 64 KB granule: 1 level 3 entry
*/
#define SEGMENT_ALIGN SZ_64K
#endif
/*
* Memory types available.
*/
#define MT_DEVICE_nGnRnE 0
#define MT_DEVICE_nGnRE 1
#define MT_DEVICE_GRE 2
#define MT_NORMAL_NC 3
#define MT_NORMAL 4
#define MT_NORMAL_WT 5
/*
* Memory types for Stage-2 translation
*/
#define MT_S2_NORMAL 0xf
#define MT_S2_DEVICE_nGnRE 0x1
/*
* Memory types for Stage-2 translation when ID_AA64MMFR2_EL1.FWB is 0001
* Stage-2 enforces Normal-WB and Device-nGnRE
*/
#define MT_S2_FWB_NORMAL 6
#define MT_S2_FWB_DEVICE_nGnRE 1
#ifdef CONFIG_ARM64_4K_PAGES
#define IOREMAP_MAX_ORDER (PUD_SHIFT)
#else
#define IOREMAP_MAX_ORDER (PMD_SHIFT)
#endif
#ifndef __ASSEMBLY__
extern u64 vabits_actual;
#define PAGE_END (_PAGE_END(vabits_actual))
#include <linux/bitops.h>
#include <linux/mmdebug.h>
extern s64 physvirt_offset;
extern s64 memstart_addr;
/* PHYS_OFFSET - the physical address of the start of memory. */
#define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
arm64: add support for kernel ASLR This adds support for KASLR is implemented, based on entropy provided by the bootloader in the /chosen/kaslr-seed DT property. Depending on the size of the address space (VA_BITS) and the page size, the entropy in the virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all 4 levels), with the sidenote that displacements that result in the kernel image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB granule kernels, respectively) are not allowed, and will be rounded up to an acceptable value. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is randomized independently from the core kernel. This makes it less likely that the location of core kernel data structures can be determined by an adversary, but causes all function calls from modules into the core kernel to be resolved via entries in the module PLTs. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is randomized by choosing a page aligned 128 MB region inside the interval [_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of entropy (depending on page size), independently of the kernel randomization, but still guarantees that modules are within the range of relative branch and jump instructions (with the caveat that, since the module region is shared with other uses of the vmalloc area, modules may need to be loaded further away if the module region is exhausted) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 06:12:01 -07:00
/* the virtual base of the kernel image (minus TEXT_OFFSET) */
extern u64 kimage_vaddr;
/* the offset between the kernel virtual and physical mappings */
extern u64 kimage_voffset;
static inline unsigned long kaslr_offset(void)
{
return kimage_vaddr - KIMAGE_VADDR;
}
/*
* Allow all memory at the discovery stage. We will clip it later.
*/
#define MIN_MEMBLOCK_ADDR 0
#define MAX_MEMBLOCK_ADDR U64_MAX
/*
* PFNs are used to describe any physical page; this means
* PFN 0 == physical address 0.
*
* This is the PFN of the first RAM page in the kernel
* direct-mapped view. We assume this is the first page
* of RAM in the mem_map as well.
*/
#define PHYS_PFN_OFFSET (PHYS_OFFSET >> PAGE_SHIFT)
/*
* When dealing with data aborts, watchpoints, or instruction traps we may end
* up with a tagged userland pointer. Clear the tag to get a sane pointer to
* pass on to access_ok(), for instance.
*/
#define __untagged_addr(addr) \
((__force __typeof__(addr))sign_extend64((__force u64)(addr), 55))
#define untagged_addr(addr) ({ \
u64 __addr = (__force u64)addr; \
__addr &= __untagged_addr(__addr); \
(__force __typeof__(addr))__addr; \
})
#ifdef CONFIG_KASAN_SW_TAGS
#define __tag_shifted(tag) ((u64)(tag) << 56)
#define __tag_reset(addr) __untagged_addr(addr)
#define __tag_get(addr) (__u8)((u64)(addr) >> 56)
#else
#define __tag_shifted(tag) 0UL
#define __tag_reset(addr) (addr)
#define __tag_get(addr) 0
#endif /* CONFIG_KASAN_SW_TAGS */
static inline const void *__tag_set(const void *addr, u8 tag)
{
u64 __addr = (u64)addr & ~__tag_shifted(0xff);
return (const void *)(__addr | __tag_shifted(tag));
}
/*
* Physical vs virtual RAM address space conversion. These are
* private definitions which should NOT be used outside memory.h
* files. Use virt_to_phys/phys_to_virt/__pa/__va instead.
*/
/*
* The linear kernel range starts at the bottom of the virtual address
* space. Testing the top bit for the start of the region is a
* sufficient check and avoids having to worry about the tag.
*/
#define __is_lm_address(addr) (!(((u64)addr) & BIT(vabits_actual - 1)))
#define __lm_to_phys(addr) (((addr) + physvirt_offset))
#define __kimg_to_phys(addr) ((addr) - kimage_voffset)
#define __virt_to_phys_nodebug(x) ({ \
phys_addr_t __x = (phys_addr_t)(__tag_reset(x)); \
__is_lm_address(__x) ? __lm_to_phys(__x) : __kimg_to_phys(__x); \
})
#define __pa_symbol_nodebug(x) __kimg_to_phys((phys_addr_t)(x))
#ifdef CONFIG_DEBUG_VIRTUAL
extern phys_addr_t __virt_to_phys(unsigned long x);
extern phys_addr_t __phys_addr_symbol(unsigned long x);
#else
#define __virt_to_phys(x) __virt_to_phys_nodebug(x)
#define __phys_addr_symbol(x) __pa_symbol_nodebug(x)
#endif /* CONFIG_DEBUG_VIRTUAL */
#define __phys_to_virt(x) ((unsigned long)((x) - physvirt_offset))
#define __phys_to_kimg(x) ((unsigned long)((x) + kimage_voffset))
/*
* Convert a page to/from a physical address
*/
#define page_to_phys(page) (__pfn_to_phys(page_to_pfn(page)))
#define phys_to_page(phys) (pfn_to_page(__phys_to_pfn(phys)))
/*
* Note: Drivers should NOT use these. They are the wrong
* translation for translating DMA addresses. Use the driver
* DMA support - see dma-mapping.h.
*/
#define virt_to_phys virt_to_phys
static inline phys_addr_t virt_to_phys(const volatile void *x)
{
return __virt_to_phys((unsigned long)(x));
}
#define phys_to_virt phys_to_virt
static inline void *phys_to_virt(phys_addr_t x)
{
return (void *)(__phys_to_virt(x));
}
/*
* Drivers should NOT use these either.
*/
#define __pa(x) __virt_to_phys((unsigned long)(x))
#define __pa_symbol(x) __phys_addr_symbol(RELOC_HIDE((unsigned long)(x), 0))
#define __pa_nodebug(x) __virt_to_phys_nodebug((unsigned long)(x))
#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x)))
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
#define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
#define sym_to_pfn(x) __phys_to_pfn(__pa_symbol(x))
/*
* virt_to_page(x) convert a _valid_ virtual address to struct page *
* virt_addr_valid(x) indicates whether a virtual address is valid
*/
#define ARCH_PFN_OFFSET ((unsigned long)PHYS_PFN_OFFSET)
#if !defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_DEBUG_VIRTUAL)
#define virt_to_page(x) pfn_to_page(virt_to_pfn(x))
#else
#define page_to_virt(x) ({ \
__typeof__(x) __page = x; \
u64 __idx = ((u64)__page - VMEMMAP_START) / sizeof(struct page);\
u64 __addr = PAGE_OFFSET + (__idx * PAGE_SIZE); \
(void *)__tag_set((const void *)__addr, page_kasan_tag(__page));\
})
#define virt_to_page(x) ({ \
u64 __idx = (__tag_reset((u64)x) - PAGE_OFFSET) / PAGE_SIZE; \
u64 __addr = VMEMMAP_START + (__idx * sizeof(struct page)); \
(struct page *)__addr; \
})
#endif /* !CONFIG_SPARSEMEM_VMEMMAP || CONFIG_DEBUG_VIRTUAL */
#define virt_addr_valid(addr) ({ \
__typeof__(addr) __addr = addr; \
__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \
})
#endif /* !ASSEMBLY */
arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table In the irqchip and EFI code, we have what basically amounts to a quirk to work around a peculiarity in the GICv3 architecture, which permits the system memory address of LPI tables to be programmable only once after a CPU reset. This means kexec kernels must use the same memory as the first kernel, and thus ensure that this memory has not been given out for other purposes by the time the ITS init code runs, which is not very early for secondary CPUs. On systems with many CPUs, these reservations could overflow the memblock reservation table, and this was addressed in commit: eff896288872 ("efi/arm: Defer persistent reservations until after paging_init()") However, this turns out to have made things worse, since the allocation of page tables and heap space for the resized memblock reservation table itself may overwrite the regions we are attempting to reserve, which may cause all kinds of corruption, also considering that the ITS will still be poking bits into that memory in response to incoming MSIs. So instead, let's grow the static memblock reservation table on such systems so it can accommodate these reservations at an earlier time. This will permit us to revert the above commit in a subsequent patch. [ mingo: Minor cleanups. ] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20190215123333.21209-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-15 05:33:32 -07:00
/*
* Given that the GIC architecture permits ITS implementations that can only be
* configured with a LPI table address once, GICv3 systems with many CPUs may
* end up reserving a lot of different regions after a kexec for their LPI
* tables (one per CPU), as we are forced to reuse the same memory after kexec
* (and thus reserve it persistently with EFI beforehand)
*/
#if defined(CONFIG_EFI) && defined(CONFIG_ARM_GIC_V3_ITS)
# define INIT_MEMBLOCK_RESERVED_REGIONS (INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
#endif
#include <asm-generic/memory_model.h>
#endif /* __ASM_MEMORY_H */