1
0
Fork 0
alistair23-linux/arch/x86/include/asm/desc.h

627 lines
16 KiB
C
Raw Normal View History

#ifndef _ASM_X86_DESC_H
#define _ASM_X86_DESC_H
#include <asm/desc_defs.h>
#include <asm/ldt.h>
#include <asm/mmu.h>
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:07 -06:00
#include <asm/fixmap.h>
#include <linux/smp.h>
#include <linux/percpu.h>
static inline void fill_ldt(struct desc_struct *desc, const struct user_desc *info)
{
desc->limit0 = info->limit & 0x0ffff;
desc->base0 = (info->base_addr & 0x0000ffff);
desc->base1 = (info->base_addr & 0x00ff0000) >> 16;
desc->type = (info->read_exec_only ^ 1) << 1;
desc->type |= info->contents << 2;
desc->s = 1;
desc->dpl = 0x3;
desc->p = info->seg_not_present ^ 1;
desc->limit = (info->limit & 0xf0000) >> 16;
desc->avl = info->useable;
desc->d = info->seg_32bit;
desc->g = info->limit_in_pages;
desc->base2 = (info->base_addr & 0xff000000) >> 24;
/*
* Don't allow setting of the lm bit. It would confuse
* user_64bit_mode and would get overridden by sysret anyway.
*/
desc->l = 0;
}
extern struct desc_ptr idt_descr;
extern gate_desc idt_table[];
extern const struct desc_ptr debug_idt_descr;
extern gate_desc debug_idt_table[];
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:07 -06:00
extern pgprot_t pg_fixmap_gdt_flags;
struct gdt_page {
struct desc_struct gdt[GDT_ENTRIES];
} __attribute__((aligned(PAGE_SIZE)));
DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page);
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:07 -06:00
/* Provide the original GDT */
static inline struct desc_struct *get_cpu_gdt_rw(unsigned int cpu)
{
return per_cpu(gdt_page, cpu).gdt;
}
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:07 -06:00
static inline unsigned long get_cpu_gdt_rw_vaddr(unsigned int cpu)
{
return (unsigned long)get_cpu_gdt_rw(cpu);
}
/* Provide the current original GDT */
static inline struct desc_struct *get_current_gdt_rw(void)
{
return this_cpu_ptr(&gdt_page)->gdt;
}
static inline unsigned long get_current_gdt_rw_vaddr(void)
{
return (unsigned long)get_current_gdt_rw();
}
/* Get the fixmap index for a specific processor */
static inline unsigned int get_cpu_gdt_ro_index(int cpu)
{
return FIX_GDT_REMAP_BEGIN + cpu;
}
/* Provide the fixmap address of the remapped GDT */
static inline struct desc_struct *get_cpu_gdt_ro(int cpu)
{
unsigned int idx = get_cpu_gdt_ro_index(cpu);
return (struct desc_struct *)__fix_to_virt(idx);
}
static inline unsigned long get_cpu_gdt_ro_vaddr(int cpu)
{
return (unsigned long)get_cpu_gdt_ro(cpu);
}
/* Provide the current read-only GDT */
static inline struct desc_struct *get_current_gdt_ro(void)
{
return get_cpu_gdt_ro(smp_processor_id());
}
static inline unsigned long get_current_gdt_ro_vaddr(void)
{
return (unsigned long)get_current_gdt_ro();
}
#ifdef CONFIG_X86_64
static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func,
unsigned dpl, unsigned ist, unsigned seg)
{
gate->offset_low = PTR_LOW(func);
gate->segment = __KERNEL_CS;
gate->ist = ist;
gate->p = 1;
gate->dpl = dpl;
gate->zero0 = 0;
gate->zero1 = 0;
gate->type = type;
gate->offset_middle = PTR_MIDDLE(func);
gate->offset_high = PTR_HIGH(func);
}
#else
static inline void pack_gate(gate_desc *gate, unsigned char type,
unsigned long base, unsigned dpl, unsigned flags,
unsigned short seg)
{
gate->a = (seg << 16) | (base & 0xffff);
gate->b = (base & 0xffff0000) | (((0x80 | type | (dpl << 5)) & 0xff) << 8);
}
#endif
static inline int desc_empty(const void *ptr)
{
const u32 *desc = ptr;
return !(desc[0] | desc[1]);
}
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else
#define load_TR_desc() native_load_tr_desc()
#define load_gdt(dtr) native_load_gdt(dtr)
#define load_idt(dtr) native_load_idt(dtr)
#define load_tr(tr) asm volatile("ltr %0"::"m" (tr))
#define load_ldt(ldt) asm volatile("lldt %0"::"m" (ldt))
#define store_gdt(dtr) native_store_gdt(dtr)
#define store_idt(dtr) native_store_idt(dtr)
#define store_tr(tr) (tr = native_store_tr())
#define load_TLS(t, cpu) native_load_tls(t, cpu)
#define set_ldt native_set_ldt
#define write_ldt_entry(dt, entry, desc) native_write_ldt_entry(dt, entry, desc)
#define write_gdt_entry(dt, entry, desc, type) native_write_gdt_entry(dt, entry, desc, type)
#define write_idt_entry(dt, entry, g) native_write_idt_entry(dt, entry, g)
static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
{
}
static inline void paravirt_free_ldt(struct desc_struct *ldt, unsigned entries)
{
}
#endif /* CONFIG_PARAVIRT */
#define store_ldt(ldt) asm("sldt %0" : "=m"(ldt))
static inline void native_write_idt_entry(gate_desc *idt, int entry, const gate_desc *gate)
{
memcpy(&idt[entry], gate, sizeof(*gate));
}
static inline void native_write_ldt_entry(struct desc_struct *ldt, int entry, const void *desc)
{
memcpy(&ldt[entry], desc, 8);
}
static inline void
native_write_gdt_entry(struct desc_struct *gdt, int entry, const void *desc, int type)
{
unsigned int size;
switch (type) {
case DESC_TSS: size = sizeof(tss_desc); break;
case DESC_LDT: size = sizeof(ldt_desc); break;
default: size = sizeof(*gdt); break;
}
memcpy(&gdt[entry], desc, size);
}
static inline void pack_descriptor(struct desc_struct *desc, unsigned long base,
unsigned long limit, unsigned char type,
unsigned char flags)
{
desc->a = ((base & 0xffff) << 16) | (limit & 0xffff);
desc->b = (base & 0xff000000) | ((base & 0xff0000) >> 16) |
(limit & 0x000f0000) | ((type & 0xff) << 8) |
((flags & 0xf) << 20);
desc->p = 1;
}
static inline void set_tssldt_descriptor(void *d, unsigned long addr, unsigned type, unsigned size)
{
#ifdef CONFIG_X86_64
struct ldttss_desc64 *desc = d;
memset(desc, 0, sizeof(*desc));
desc->limit0 = size & 0xFFFF;
desc->base0 = PTR_LOW(addr);
desc->base1 = PTR_MIDDLE(addr) & 0xFF;
desc->type = type;
desc->p = 1;
desc->limit1 = (size >> 16) & 0xF;
desc->base2 = (PTR_MIDDLE(addr) >> 8) & 0xFF;
desc->base3 = PTR_HIGH(addr);
#else
pack_descriptor((struct desc_struct *)d, addr, size, 0x80 | type, 0);
#endif
}
static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
{
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:07 -06:00
struct desc_struct *d = get_cpu_gdt_rw(cpu);
tss_desc tss;
set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS,
__KERNEL_TSS_LIMIT);
write_gdt_entry(d, entry, &tss, DESC_TSS);
}
#define set_tss_desc(cpu, addr) __set_tss_desc(cpu, GDT_ENTRY_TSS, addr)
static inline void native_set_ldt(const void *addr, unsigned int entries)
{
if (likely(entries == 0))
asm volatile("lldt %w0"::"q" (0));
else {
unsigned cpu = smp_processor_id();
ldt_desc ldt;
set_tssldt_descriptor(&ldt, (unsigned long)addr, DESC_LDT,
entries * LDT_ENTRY_SIZE - 1);
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:07 -06:00
write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_LDT,
&ldt, DESC_LDT);
asm volatile("lldt %w0"::"q" (GDT_ENTRY_LDT*8));
}
}
x86: Make the GDT remapping read-only on 64-bit This patch makes the GDT remapped pages read-only, to prevent accidental (or intentional) corruption of this key data structure. This change is done only on 64-bit, because 32-bit needs it to be writable for TSS switches. The native_load_tr_desc function was adapted to correctly handle a read-only GDT. The LTR instruction always writes to the GDT TSS entry. This generates a page fault if the GDT is read-only. This change checks if the current GDT is a remap and swap GDTs as needed. This function was tested by booting multiple machines and checking hibernation works properly. KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the per-cpu variable was removed for functions to fetch the original GDT. Instead of reloading the previous GDT, VMX will reload the fixmap GDT as expected. For testing, VMs were started and restored on multiple configurations. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:08 -06:00
static inline void native_load_gdt(const struct desc_ptr *dtr)
{
asm volatile("lgdt %0"::"m" (*dtr));
}
static inline void native_load_idt(const struct desc_ptr *dtr)
{
asm volatile("lidt %0"::"m" (*dtr));
}
static inline void native_store_gdt(struct desc_ptr *dtr)
{
asm volatile("sgdt %0":"=m" (*dtr));
}
static inline void native_store_idt(struct desc_ptr *dtr)
{
asm volatile("sidt %0":"=m" (*dtr));
}
/*
* The LTR instruction marks the TSS GDT entry as busy. On 64-bit, the GDT is
* a read-only remapping. To prevent a page fault, the GDT is switched to the
* original writeable version when needed.
*/
#ifdef CONFIG_X86_64
static inline void native_load_tr_desc(void)
{
x86: Make the GDT remapping read-only on 64-bit This patch makes the GDT remapped pages read-only, to prevent accidental (or intentional) corruption of this key data structure. This change is done only on 64-bit, because 32-bit needs it to be writable for TSS switches. The native_load_tr_desc function was adapted to correctly handle a read-only GDT. The LTR instruction always writes to the GDT TSS entry. This generates a page fault if the GDT is read-only. This change checks if the current GDT is a remap and swap GDTs as needed. This function was tested by booting multiple machines and checking hibernation works properly. KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the per-cpu variable was removed for functions to fetch the original GDT. Instead of reloading the previous GDT, VMX will reload the fixmap GDT as expected. For testing, VMs were started and restored on multiple configurations. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:08 -06:00
struct desc_ptr gdt;
int cpu = raw_smp_processor_id();
bool restore = 0;
struct desc_struct *fixmap_gdt;
native_store_gdt(&gdt);
fixmap_gdt = get_cpu_gdt_ro(cpu);
/*
* If the current GDT is the read-only fixmap, swap to the original
* writeable version. Swap back at the end.
*/
if (gdt.address == (unsigned long)fixmap_gdt) {
load_direct_gdt(cpu);
restore = 1;
}
asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
x86: Make the GDT remapping read-only on 64-bit This patch makes the GDT remapped pages read-only, to prevent accidental (or intentional) corruption of this key data structure. This change is done only on 64-bit, because 32-bit needs it to be writable for TSS switches. The native_load_tr_desc function was adapted to correctly handle a read-only GDT. The LTR instruction always writes to the GDT TSS entry. This generates a page fault if the GDT is read-only. This change checks if the current GDT is a remap and swap GDTs as needed. This function was tested by booting multiple machines and checking hibernation works properly. KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the per-cpu variable was removed for functions to fetch the original GDT. Instead of reloading the previous GDT, VMX will reload the fixmap GDT as expected. For testing, VMs were started and restored on multiple configurations. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:08 -06:00
if (restore)
load_fixmap_gdt(cpu);
}
#else
static inline void native_load_tr_desc(void)
{
asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
}
#endif
static inline unsigned long native_store_tr(void)
{
unsigned long tr;
asm volatile("str %0":"=r" (tr));
return tr;
}
static inline void native_load_tls(struct thread_struct *t, unsigned int cpu)
{
struct desc_struct *gdt = get_cpu_gdt_rw(cpu);
unsigned int i;
for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++)
gdt[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i];
}
DECLARE_PER_CPU(bool, __tss_limit_invalid);
static inline void force_reload_TR(void)
{
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14 11:05:07 -06:00
struct desc_struct *d = get_current_gdt_rw();
tss_desc tss;
memcpy(&tss, &d[GDT_ENTRY_TSS], sizeof(tss_desc));
/*
* LTR requires an available TSS, and the TSS is currently
* busy. Make it be available so that LTR will work.
*/
tss.type = DESC_TSS;
write_gdt_entry(d, GDT_ENTRY_TSS, &tss, DESC_TSS);
load_TR_desc();
this_cpu_write(__tss_limit_invalid, false);
}
/*
* Call this if you need the TSS limit to be correct, which should be the case
* if and only if you have TIF_IO_BITMAP set or you're switching to a task
* with TIF_IO_BITMAP set.
*/
static inline void refresh_tss_limit(void)
{
DEBUG_LOCKS_WARN_ON(preemptible());
if (unlikely(this_cpu_read(__tss_limit_invalid)))
force_reload_TR();
}
/*
* If you do something evil that corrupts the cached TSS limit (I'm looking
* at you, VMX exits), call this function.
*
* The optimization here is that the TSS limit only matters for Linux if the
* IO bitmap is in use. If the TSS limit gets forced to its minimum value,
* everything works except that IO bitmap will be ignored and all CPL 3 IO
* instructions will #GP, which is exactly what we want for normal tasks.
*/
static inline void invalidate_tss_limit(void)
{
DEBUG_LOCKS_WARN_ON(preemptible());
if (unlikely(test_thread_flag(TIF_IO_BITMAP)))
force_reload_TR();
else
this_cpu_write(__tss_limit_invalid, true);
}
/* This intentionally ignores lm, since 32-bit apps don't have that field. */
#define LDT_empty(info) \
((info)->base_addr == 0 && \
(info)->limit == 0 && \
(info)->contents == 0 && \
(info)->read_exec_only == 1 && \
(info)->seg_32bit == 0 && \
(info)->limit_in_pages == 0 && \
(info)->seg_not_present == 1 && \
(info)->useable == 0)
x86, tls: Interpret an all-zero struct user_desc as "no segment" The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't *quite* what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 12:27:59 -07:00
/* Lots of programs expect an all-zero user_desc to mean "no segment at all". */
static inline bool LDT_zero(const struct user_desc *info)
{
return (info->base_addr == 0 &&
info->limit == 0 &&
info->contents == 0 &&
info->read_exec_only == 0 &&
info->seg_32bit == 0 &&
info->limit_in_pages == 0 &&
info->seg_not_present == 0 &&
info->useable == 0);
}
static inline void clear_LDT(void)
{
set_ldt(NULL, 0);
}
static inline unsigned long get_desc_base(const struct desc_struct *desc)
{
return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
}
static inline void set_desc_base(struct desc_struct *desc, unsigned long base)
{
desc->base0 = base & 0xffff;
desc->base1 = (base >> 16) & 0xff;
desc->base2 = (base >> 24) & 0xff;
}
static inline unsigned long get_desc_limit(const struct desc_struct *desc)
{
return desc->limit0 | (desc->limit << 16);
}
static inline void set_desc_limit(struct desc_struct *desc, unsigned long limit)
{
desc->limit0 = limit & 0xffff;
desc->limit = (limit >> 16) & 0xf;
}
#ifdef CONFIG_X86_64
static inline void set_nmi_gate(int gate, void *addr)
{
gate_desc s;
pack_gate(&s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS);
write_idt_entry(debug_idt_table, gate, &s);
}
#endif
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
#ifdef CONFIG_TRACING
extern struct desc_ptr trace_idt_descr;
extern gate_desc trace_idt_table[];
static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
{
write_idt_entry(trace_idt_table, entry, gate);
}
static inline void _trace_set_gate(int gate, unsigned type, void *addr,
unsigned dpl, unsigned ist, unsigned seg)
{
gate_desc s;
pack_gate(&s, type, (unsigned long)addr, dpl, ist, seg);
/*
* does not need to be atomic because it is only done once at
* setup time
*/
write_trace_idt_entry(gate, &s);
}
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
#else
static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
{
}
#define _trace_set_gate(gate, type, addr, dpl, ist, seg)
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
#endif
static inline void _set_gate(int gate, unsigned type, void *addr,
unsigned dpl, unsigned ist, unsigned seg)
{
gate_desc s;
pack_gate(&s, type, (unsigned long)addr, dpl, ist, seg);
/*
* does not need to be atomic because it is only done once at
* setup time
*/
write_idt_entry(idt_table, gate, &s);
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
write_trace_idt_entry(gate, &s);
}
/*
* This needs to use 'idt_table' rather than 'idt', and
* thus use the _nonmapped_ version of the IDT, as the
* Pentium F0 0F bugfix can have resulted in the mapped
* IDT being write-protected.
*/
#define set_intr_gate_notrace(n, addr) \
do { \
BUG_ON((unsigned)n > 0xFF); \
_set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0, \
__KERNEL_CS); \
} while (0)
#define set_intr_gate(n, addr) \
do { \
set_intr_gate_notrace(n, addr); \
_trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\
0, 0, __KERNEL_CS); \
} while (0)
extern int first_system_vector;
/* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
extern unsigned long used_vectors[];
static inline void alloc_system_vector(int vector)
{
if (!test_bit(vector, used_vectors)) {
set_bit(vector, used_vectors);
if (first_system_vector > vector)
first_system_vector = vector;
} else {
BUG();
}
}
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
#define alloc_intr_gate(n, addr) \
do { \
alloc_system_vector(n); \
set_intr_gate(n, addr); \
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
} while (0)
/*
* This routine sets up an interrupt gate at directory privilege level 3.
*/
static inline void set_system_intr_gate(unsigned int n, void *addr)
{
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0x3, 0, __KERNEL_CS);
}
static inline void set_system_trap_gate(unsigned int n, void *addr)
{
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_TRAP, addr, 0x3, 0, __KERNEL_CS);
}
static inline void set_trap_gate(unsigned int n, void *addr)
{
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_TRAP, addr, 0, 0, __KERNEL_CS);
}
static inline void set_task_gate(unsigned int n, unsigned int gdt_entry)
{
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_TASK, (void *)0, 0, 0, (gdt_entry<<3));
}
static inline void set_intr_gate_ist(int n, void *addr, unsigned ist)
{
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0, ist, __KERNEL_CS);
}
static inline void set_system_intr_gate_ist(int n, void *addr, unsigned ist)
{
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0x3, ist, __KERNEL_CS);
}
#ifdef CONFIG_X86_64
DECLARE_PER_CPU(u32, debug_idt_ctr);
static inline bool is_debug_idt_enabled(void)
{
if (this_cpu_read(debug_idt_ctr))
return true;
return false;
}
static inline void load_debug_idt(void)
{
load_idt((const struct desc_ptr *)&debug_idt_descr);
}
#else
static inline bool is_debug_idt_enabled(void)
{
return false;
}
static inline void load_debug_idt(void)
{
}
#endif
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
#ifdef CONFIG_TRACING
extern atomic_t trace_idt_ctr;
static inline bool is_trace_idt_enabled(void)
{
if (atomic_read(&trace_idt_ctr))
return true;
return false;
}
static inline void load_trace_idt(void)
{
load_idt((const struct desc_ptr *)&trace_idt_descr);
}
#else
static inline bool is_trace_idt_enabled(void)
{
return false;
}
static inline void load_trace_idt(void)
{
}
#endif
/*
trace,x86: Do not call local_irq_save() in load_current_idt() As load_current_idt() is now what is used to update the IDT for the switches needed for NMI, lockdep debug, and for tracing, it must not call local_irq_save(). This is because one of the users of this is lockdep, which does tracing of local_irq_save() and when the debug trap is hit, we need to update the IDT before tracing interrupts being disabled. As load_current_idt() is used to do this, calling local_irq_save() which lockdep traces, defeats the point of calling load_current_idt(). As interrupts are already disabled when used by lockdep and NMI, the only other user is tracing that can disable interrupts itself. Simply have the tracing update disable interrupts before calling load_current_idt() instead of breaking the other users. Here's the dump that happened: ------------[ cut here ]------------ WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398() DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled) Modules linked in: CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0 Call Trace: [<ffffffff8192822b>] dump_stack+0x19/0x1b [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52 [<ffffffff810341f7>] copy_process+0x2c3/0x1398 [<ffffffff8103539d>] do_fork+0xa8/0x260 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56 [<ffffffff810355cf>] SyS_clone+0x16/0x18 [<ffffffff81938369>] stub_clone+0x69/0x90 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b ---[ end trace 8b157a9d20ca1aa2 ]--- in fork.c: #ifdef CONFIG_PROVE_LOCKING DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled); #endif Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-22 11:16:19 -06:00
* The load_current_idt() must be called with interrupts disabled
* to avoid races. That way the IDT will always be set back to the expected
trace,x86: Do not call local_irq_save() in load_current_idt() As load_current_idt() is now what is used to update the IDT for the switches needed for NMI, lockdep debug, and for tracing, it must not call local_irq_save(). This is because one of the users of this is lockdep, which does tracing of local_irq_save() and when the debug trap is hit, we need to update the IDT before tracing interrupts being disabled. As load_current_idt() is used to do this, calling local_irq_save() which lockdep traces, defeats the point of calling load_current_idt(). As interrupts are already disabled when used by lockdep and NMI, the only other user is tracing that can disable interrupts itself. Simply have the tracing update disable interrupts before calling load_current_idt() instead of breaking the other users. Here's the dump that happened: ------------[ cut here ]------------ WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398() DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled) Modules linked in: CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0 Call Trace: [<ffffffff8192822b>] dump_stack+0x19/0x1b [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52 [<ffffffff810341f7>] copy_process+0x2c3/0x1398 [<ffffffff8103539d>] do_fork+0xa8/0x260 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56 [<ffffffff810355cf>] SyS_clone+0x16/0x18 [<ffffffff81938369>] stub_clone+0x69/0x90 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b ---[ end trace 8b157a9d20ca1aa2 ]--- in fork.c: #ifdef CONFIG_PROVE_LOCKING DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled); #endif Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-22 11:16:19 -06:00
* descriptor. It's also called when a CPU is being initialized, and
* that doesn't need to disable interrupts, as nothing should be
* bothering the CPU then.
*/
static inline void load_current_idt(void)
{
if (is_debug_idt_enabled())
load_debug_idt();
x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 09:46:53 -06:00
else if (is_trace_idt_enabled())
load_trace_idt();
else
load_idt((const struct desc_ptr *)&idt_descr);
}
#endif /* _ASM_X86_DESC_H */