1
0
Fork 0
alistair23-linux/arch/mips/include/asm/fpu_emulator.h

192 lines
4.9 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0-only */
/*
*
* Further private data for which no space exists in mips_fpu_struct.
* This should be subsumed into the mips_fpu_struct structure as
* defined in processor.h as soon as the absurd wired absolute assembler
* offsets become dynamic at compile time.
*
* Kevin D. Kissell, kevink@mips.com and Carsten Langgaard, carstenl@mips.com
* Copyright (C) 2000 MIPS Technologies, Inc. All rights reserved.
*/
#ifndef _ASM_FPU_EMULATOR_H
#define _ASM_FPU_EMULATOR_H
#include <linux/sched.h>
MIPS: Use per-mm page to execute branch delay slot instructions In some cases the kernel needs to execute an instruction from the delay slot of an emulated branch instruction. These cases include: - Emulated floating point branch instructions (bc1[ft]l?) for systems which don't include an FPU, or upon which the kernel is run with the "nofpu" parameter. - MIPSr6 systems running binaries targeting older revisions of the architecture, which may include branch instructions whose encodings are no longer valid in MIPSr6. Executing instructions from such delay slots is done by writing the instruction to memory followed by a trap, as part of an "emuframe", and executing it. This avoids the requirement of an emulator for the entire MIPS instruction set. Prior to this patch such emuframes are written to the user stack and executed from there. This patch moves FP branch delay emuframes off of the user stack and into a per-mm page. Allocating a page per-mm leaves userland with access to only what it had access to previously, and compared to other solutions is relatively simple. When a thread requires a delay slot emulation, it is allocated a frame. A thread may only have one frame allocated at any one time, since it may only ever be executing one instruction at any one time. In order to ensure that we can free up allocated frame later, its index is recorded in struct thread_struct. In the typical case, after executing the delay slot instruction we'll execute a break instruction with the BRK_MEMU code. This traps back to the kernel & leads to a call to do_dsemulret which frees the allocated frame & moves the user PC back to the instruction that would have executed following the emulated branch. In some cases the delay slot instruction may be invalid, such as a branch, or may trigger an exception. In these cases the BRK_MEMU break instruction will not be hit. In order to ensure that frames are freed this patch introduces dsemul_thread_cleanup() and calls it to free any allocated frame upon thread exit. If the instruction generated an exception & leads to a signal being delivered to the thread, or indeed if a signal simply happens to be delivered to the thread whilst it is executing from the struct emuframe, then we need to take care to exit the frame appropriately. This is done by either rolling back the user PC to the branch or advancing it to the continuation PC prior to signal delivery, using dsemul_thread_rollback(). If this were not done then a sigreturn would return to the struct emuframe, and if that frame had meanwhile been used in response to an emulated branch instruction within the signal handler then we would execute the wrong user code. Whilst a user could theoretically place something like a compact branch to self in a delay slot and cause their thread to become stuck in an infinite loop with the frame never being deallocated, this would: - Only affect the users single process. - Be architecturally invalid since there would be a branch in the delay slot, which is forbidden. - Be extremely unlikely to happen by mistake, and provide a program with no more ability to harm the system than a simple infinite loop would. If a thread requires a delay slot emulation & no frame is available to it (ie. the process has enough other threads that all frames are currently in use) then the thread joins a waitqueue. It will sleep until a frame is freed by another thread in the process. Since we now know whether a thread has an allocated frame due to our tracking of its index, the cookie field of struct emuframe is removed as we can be more certain whether we have a valid frame. Since a thread may only ever have a single frame at any given time, the epc field of struct emuframe is also removed & the PC to continue from is instead stored in struct thread_struct. Together these changes simplify & shrink struct emuframe somewhat, allowing twice as many frames to fit into the page allocated for them. The primary benefit of this patch is that we are now free to mark the user stack non-executable where that is possible. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: Maciej Rozycki <maciej.rozycki@imgtec.com> Cc: Faraz Shahbazker <faraz.shahbazker@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: Matthew Fortune <matthew.fortune@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13764/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-08 04:06:19 -06:00
#include <asm/dsemul.h>
#include <asm/thread_info.h>
#include <asm/inst.h>
#include <asm/local.h>
#include <asm/processor.h>
#ifdef CONFIG_DEBUG_FS
struct mips_fpu_emulator_stats {
unsigned long emulated;
unsigned long loads;
unsigned long stores;
unsigned long branches;
unsigned long cp1ops;
unsigned long cp1xops;
unsigned long errors;
unsigned long ieee754_inexact;
unsigned long ieee754_underflow;
unsigned long ieee754_overflow;
unsigned long ieee754_zerodiv;
unsigned long ieee754_invalidop;
unsigned long ds_emul;
MIPS: math-emu: Add FP emu debugfs stats for individual instructions Add FP emulation debugfs statistics for individual instructions. The debugfs files that contain counter values are placed in a separate directory called "instructions". This means that the default path for these new stat is "/sys/kernel/debug/mips/fpuemustats/instructions". Each instruction counter is mapped to the debugfs file that has the same name as instruction name. The lowercase is choosen as more commonly used case for instruction names. One example of usage: mips_host::/sys/kernel/debug/mips/fpuemustats/instructions # grep "" * The shortened output of this command is: abs.d:34 abs.s:5711 add.d:10401 add.s:399307 bc1eqz:3199 ... ... ... sub.s:167211 trunc.l.d:375 trunc.l.s:8054 trunc.w.d:421 trunc.w.s:27032 The limitation of this patch is that it handles R6 FP emulation instructions only. There are altogether 114 handled instructions. Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Maciej W. Rozycki <macro@imgtec.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17145/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-08-21 06:24:52 -06:00
unsigned long abs_s;
unsigned long abs_d;
unsigned long add_s;
unsigned long add_d;
unsigned long bc1eqz;
unsigned long bc1nez;
unsigned long ceil_w_s;
unsigned long ceil_w_d;
unsigned long ceil_l_s;
unsigned long ceil_l_d;
unsigned long class_s;
unsigned long class_d;
unsigned long cmp_af_s;
unsigned long cmp_af_d;
unsigned long cmp_eq_s;
unsigned long cmp_eq_d;
unsigned long cmp_le_s;
unsigned long cmp_le_d;
unsigned long cmp_lt_s;
unsigned long cmp_lt_d;
unsigned long cmp_ne_s;
unsigned long cmp_ne_d;
unsigned long cmp_or_s;
unsigned long cmp_or_d;
unsigned long cmp_ueq_s;
unsigned long cmp_ueq_d;
unsigned long cmp_ule_s;
unsigned long cmp_ule_d;
unsigned long cmp_ult_s;
unsigned long cmp_ult_d;
unsigned long cmp_un_s;
unsigned long cmp_un_d;
unsigned long cmp_une_s;
unsigned long cmp_une_d;
unsigned long cmp_saf_s;
unsigned long cmp_saf_d;
unsigned long cmp_seq_s;
unsigned long cmp_seq_d;
unsigned long cmp_sle_s;
unsigned long cmp_sle_d;
unsigned long cmp_slt_s;
unsigned long cmp_slt_d;
unsigned long cmp_sne_s;
unsigned long cmp_sne_d;
unsigned long cmp_sor_s;
unsigned long cmp_sor_d;
unsigned long cmp_sueq_s;
unsigned long cmp_sueq_d;
unsigned long cmp_sule_s;
unsigned long cmp_sule_d;
unsigned long cmp_sult_s;
unsigned long cmp_sult_d;
unsigned long cmp_sun_s;
unsigned long cmp_sun_d;
unsigned long cmp_sune_s;
unsigned long cmp_sune_d;
unsigned long cvt_d_l;
unsigned long cvt_d_s;
unsigned long cvt_d_w;
unsigned long cvt_l_s;
unsigned long cvt_l_d;
unsigned long cvt_s_d;
unsigned long cvt_s_l;
unsigned long cvt_s_w;
unsigned long cvt_w_s;
unsigned long cvt_w_d;
unsigned long div_s;
unsigned long div_d;
unsigned long floor_w_s;
unsigned long floor_w_d;
unsigned long floor_l_s;
unsigned long floor_l_d;
unsigned long maddf_s;
unsigned long maddf_d;
unsigned long max_s;
unsigned long max_d;
unsigned long maxa_s;
unsigned long maxa_d;
unsigned long min_s;
unsigned long min_d;
unsigned long mina_s;
unsigned long mina_d;
unsigned long mov_s;
unsigned long mov_d;
unsigned long msubf_s;
unsigned long msubf_d;
unsigned long mul_s;
unsigned long mul_d;
unsigned long neg_s;
unsigned long neg_d;
unsigned long recip_s;
unsigned long recip_d;
unsigned long rint_s;
unsigned long rint_d;
unsigned long round_w_s;
unsigned long round_w_d;
unsigned long round_l_s;
unsigned long round_l_d;
unsigned long rsqrt_s;
unsigned long rsqrt_d;
unsigned long sel_s;
unsigned long sel_d;
unsigned long seleqz_s;
unsigned long seleqz_d;
unsigned long selnez_s;
unsigned long selnez_d;
unsigned long sqrt_s;
unsigned long sqrt_d;
unsigned long sub_s;
unsigned long sub_d;
unsigned long trunc_w_s;
unsigned long trunc_w_d;
unsigned long trunc_l_s;
unsigned long trunc_l_d;
};
DECLARE_PER_CPU(struct mips_fpu_emulator_stats, fpuemustats);
#define MIPS_FPU_EMU_INC_STATS(M) \
do { \
preempt_disable(); \
__this_cpu_inc(fpuemustats.M); \
preempt_enable(); \
} while (0)
#else
#define MIPS_FPU_EMU_INC_STATS(M) do { } while (0)
#endif /* CONFIG_DEBUG_FS */
extern int fpu_emulator_cop1Handler(struct pt_regs *xcp,
struct mips_fpu_struct *ctx, int has_fpu,
void __user **fault_addr);
2016-10-28 01:21:03 -06:00
void force_fcr31_sig(unsigned long fcr31, void __user *fault_addr,
struct task_struct *tsk);
int process_fpemu_return(int sig, void __user *fault_addr,
unsigned long fcr31);
MIPS: Use per-mm page to execute branch delay slot instructions In some cases the kernel needs to execute an instruction from the delay slot of an emulated branch instruction. These cases include: - Emulated floating point branch instructions (bc1[ft]l?) for systems which don't include an FPU, or upon which the kernel is run with the "nofpu" parameter. - MIPSr6 systems running binaries targeting older revisions of the architecture, which may include branch instructions whose encodings are no longer valid in MIPSr6. Executing instructions from such delay slots is done by writing the instruction to memory followed by a trap, as part of an "emuframe", and executing it. This avoids the requirement of an emulator for the entire MIPS instruction set. Prior to this patch such emuframes are written to the user stack and executed from there. This patch moves FP branch delay emuframes off of the user stack and into a per-mm page. Allocating a page per-mm leaves userland with access to only what it had access to previously, and compared to other solutions is relatively simple. When a thread requires a delay slot emulation, it is allocated a frame. A thread may only have one frame allocated at any one time, since it may only ever be executing one instruction at any one time. In order to ensure that we can free up allocated frame later, its index is recorded in struct thread_struct. In the typical case, after executing the delay slot instruction we'll execute a break instruction with the BRK_MEMU code. This traps back to the kernel & leads to a call to do_dsemulret which frees the allocated frame & moves the user PC back to the instruction that would have executed following the emulated branch. In some cases the delay slot instruction may be invalid, such as a branch, or may trigger an exception. In these cases the BRK_MEMU break instruction will not be hit. In order to ensure that frames are freed this patch introduces dsemul_thread_cleanup() and calls it to free any allocated frame upon thread exit. If the instruction generated an exception & leads to a signal being delivered to the thread, or indeed if a signal simply happens to be delivered to the thread whilst it is executing from the struct emuframe, then we need to take care to exit the frame appropriately. This is done by either rolling back the user PC to the branch or advancing it to the continuation PC prior to signal delivery, using dsemul_thread_rollback(). If this were not done then a sigreturn would return to the struct emuframe, and if that frame had meanwhile been used in response to an emulated branch instruction within the signal handler then we would execute the wrong user code. Whilst a user could theoretically place something like a compact branch to self in a delay slot and cause their thread to become stuck in an infinite loop with the frame never being deallocated, this would: - Only affect the users single process. - Be architecturally invalid since there would be a branch in the delay slot, which is forbidden. - Be extremely unlikely to happen by mistake, and provide a program with no more ability to harm the system than a simple infinite loop would. If a thread requires a delay slot emulation & no frame is available to it (ie. the process has enough other threads that all frames are currently in use) then the thread joins a waitqueue. It will sleep until a frame is freed by another thread in the process. Since we now know whether a thread has an allocated frame due to our tracking of its index, the cookie field of struct emuframe is removed as we can be more certain whether we have a valid frame. Since a thread may only ever have a single frame at any given time, the epc field of struct emuframe is also removed & the PC to continue from is instead stored in struct thread_struct. Together these changes simplify & shrink struct emuframe somewhat, allowing twice as many frames to fit into the page allocated for them. The primary benefit of this patch is that we are now free to mark the user stack non-executable where that is possible. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: Maciej Rozycki <maciej.rozycki@imgtec.com> Cc: Faraz Shahbazker <faraz.shahbazker@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: Matthew Fortune <matthew.fortune@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13764/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-08 04:06:19 -06:00
int isBranchInstr(struct pt_regs *regs, struct mm_decoded_insn dec_insn,
unsigned long *contpc);
int mm_isBranchInstr(struct pt_regs *regs, struct mm_decoded_insn dec_insn,
unsigned long *contpc);
2016-10-28 01:21:03 -06:00
/*
* Mask the FCSR Cause bits according to the Enable bits, observing
* that Unimplemented is always enabled.
*/
static inline unsigned long mask_fcr31_x(unsigned long fcr31)
{
return fcr31 & (FPU_CSR_UNI_X |
((fcr31 & FPU_CSR_ALL_E) <<
(ffs(FPU_CSR_ALL_X) - ffs(FPU_CSR_ALL_E))));
}
#endif /* _ASM_FPU_EMULATOR_H */