1
0
Fork 0
Commit Graph

89 Commits (5f54c8b2d4fad95d1f8ecbe023ebe6038e6d3760)

Author SHA1 Message Date
Thomas Gleixner 84393817db x86/mtrr: Prevent CPU hotplug lock recursion
Larry reported a CPU hotplug lock recursion in the MTRR code.

============================================
WARNING: possible recursive locking detected

systemd-udevd/153 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30
 
 but task is already holding lock:
  (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470

....

 cpus_read_lock+0x48/0x90
 stop_machine+0x16/0x30
 mtrr_add_page+0x18b/0x470
 mtrr_add+0x3e/0x70

mtrr_add_page() holds the hotplug rwsem already and calls stop_machine()
which acquires it again.

Call stop_machine_cpuslocked() instead.

Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>
2017-08-15 13:03:47 +02:00
Sebastian Andrzej Siewior 547efeadd4 x86/mtrr: Remove get_online_cpus() from mtrr_save_state()
mtrr_save_state() is invoked from native_cpu_up() which is in the context
of a CPU hotplug operation and therefor calling get_online_cpus() is
pointless.

While this works in the current get_online_cpus() implementation it
prevents from converting the hotplug locking to percpu rwsems.

Remove it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170524081547.651378834@linutronix.de
2017-05-26 10:10:38 +02:00
Ingo Molnar 66441bd3cf x86/boot/e820: Move asm/e820.h to asm/e820/api.h
In line with asm/e820/types.h, move the e820 API declarations to
asm/e820/api.h and update all usage sites.

This is just a mechanical, obviously correct move & replace patch,
there will be subsequent changes to clean up the code and to make
better use of the new header organization.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 09:31:13 +01:00
Kees Cook 404f6aac9b x86: Apply more __ro_after_init and const
Guided by grsecurity's analogous __read_only markings in arch/x86,
this applies several uses of __ro_after_init to structures that are
only updated during __init, and const for some structures that are
never updated.  Additionally extends __init markings to some functions
that are only used during __init, and cleans up some missing C99 style
static initializers.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20160808232906.GA29731@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:55:05 +02:00
Paul Gortmaker eb008eb6f8 x86: Audit and remove any remaining unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  In the case of
some of these which are modular, we can extend that to also include
files that are building basic support functionality but not related
to loading or registering the final module; such files also have
no need whatsoever for module.h

The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each instance for the
presence of either and replace as needed.

In the case of crypto/glue_helper.c we delete a redundant instance
of MODULE_LICENSE in order to delete module.h -- the license info
is already present at the top of the file.

The uncore change warrants a mention too; it is uncore.c that uses
module.h and not uncore.h; hence the relocation done there.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-9-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:07:00 +02:00
Toshi Kani ad025a73f0 x86/mtrr: Fix PAT init handling when MTRR is disabled
get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled.
This results in calling pat_init() on BSP only since APs do not call
pat_init() when MTRR is disabled.  This inconsistency between BSP
and APs leads to undefined behavior.

Make BSP's calling condition to pat_init() consistent with AP's,
mtrr_ap_init() and mtrr_aps_init().

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-6-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:23:26 +02:00
Toshi Kani edfe63ec97 x86/mtrr: Fix Xorg crashes in Qemu sessions
A Xorg failure on qemu32 was reported as a regression [1] caused by
commit 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled").

This patch fixes the Xorg crash.

Negative effects of this regression were the following two failures [2]
in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
triggered by the fact that its virtual CPU does not support MTRRs.

 #1. copy_process() failed in the check in reserve_pfn_range()

    copy_process
     copy_mm
      dup_mm
       dup_mmap
        copy_page_range
         track_pfn_copy
          reserve_pfn_range

 A WC map request was tracked as WC in memtype, which set a PTE as
 UC (pgprot) per __cachemode2pte_tbl[].  This led to this error in
 reserve_pfn_range() called from track_pfn_copy(), which obtained
 a pgprot from a PTE.  It converts pgprot to page_cache_mode, which
 does not necessarily result in the original page_cache_mode since
 __cachemode2pte_tbl[] redirects multiple types to UC.

 #2. error path in copy_process() then hit WARN_ON_ONCE in
     untrack_pfn().

     x86/PAT: Xorg:509 map pfn expected mapping type uncached-
     minus for [mem 0xfd000000-0xfdffffff], got write-combining
      Call Trace:
     dump_stack
     warn_slowpath_common
     ? untrack_pfn
     ? untrack_pfn
     warn_slowpath_null
     untrack_pfn
     ? __kunmap_atomic
     unmap_single_vma
     ? pagevec_move_tail_fn
     unmap_vmas
     exit_mmap
     mmput
     copy_process.part.47
     _do_fork
     SyS_clone
     do_syscall_32_irqs_on
     entry_INT80_32

These negative effects are caused by two separate bugs, but they
can be addressed in separate patches.  Fixing the pat_init() issue
described below addresses the root cause, and avoids Xorg to hit
these cases.

When the CPU does not support MTRRs, MTRR does not call pat_init(),
which leaves PAT enabled without initializing PAT.  This pat_init()
issue is a long-standing issue, but manifested as issue #1 (and then
hit issue #2) with the above-mentioned commit because the memtype
now tracks cache attribute with 'page_cache_mode'.

This pat_init() issue existed before the commit, but we used pgprot
in memtype.  Hence, we did not have issue #1 before.  But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT
is not initialized.  This is not how it was designed to work.  When
PAT is set to disable properly, WC is converted to UC.  The use of
WT can result in a system crash if the target range does not support
WT.  Fortunately, nobody ran into such issue before.

To fix this pat_init() issue, PAT code has been enhanced to provide
pat_disable() interface.  Call this interface when MTRRs are disabled.
By setting PAT to disable properly, PAT bypasses the memtype check,
and avoids issue #1.

  [1]: https://lkml.org/lkml/2016/3/3/828
  [2]: https://lkml.org/lkml/2016/3/4/775

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:23:26 +02:00
Linus Torvalds ba33ea811e Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "This is another big update. Main changes are:

   - lots of x86 system call (and other traps/exceptions) entry code
     enhancements.  In particular the complex parts of the 64-bit entry
     code have been migrated to C code as well, and a number of dusty
     corners have been refreshed.  (Andy Lutomirski)

   - vDSO special mapping robustification and general cleanups (Andy
     Lutomirski)

   - cpufeature refactoring, cleanups and speedups (Borislav Petkov)

   - lots of other changes ..."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/cpufeature: Enable new AVX-512 features
  x86/entry/traps: Show unhandled signal for i386 in do_trap()
  x86/entry: Call enter_from_user_mode() with IRQs off
  x86/entry/32: Change INT80 to be an interrupt gate
  x86/entry: Improve system call entry comments
  x86/entry: Remove TIF_SINGLESTEP entry work
  x86/entry/32: Add and check a stack canary for the SYSENTER stack
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
  x86/entry: Vastly simplify SYSENTER TF (single-step) handling
  x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
  x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/asm-offsets: Remove PARAVIRT_enabled
  x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
  uprobes: __create_xol_area() must nullify xol_mapping.fault
  x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
  ...
2016-03-15 09:32:27 -07:00
Chen Yucong 1b74dde7c4 x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)
- Use the more current logging style pr_<level>(...) instead of the old
   printk(KERN_<LEVEL> ...).

 - Convert pr_warning() to pr_warn().

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454384702-21707-1-git-send-email-slaoub@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-03 10:30:03 +01:00
Borislav Petkov cd4d09ec6f x86/cpufeature: Carve out X86_FEATURE_*
Move them to a separate header and have the following
dependency:

  x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h

This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:17 +01:00
Borislav Petkov 362f924b64 x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.

The remaining ones need more careful inspection before a conversion can
happen. On the TODO.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:55 +01:00
Luis R. Rodriguez 2baa891e42 x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()
The effort to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the effort done now, hide direct MTRR access for
drivers.

The legacy user-space /proc/mtrr ABI is not affected.

Update x86 documentation on MTRR to reflect the completion of
the phasing out of direct access to MTRR, also add a note on
platform firmware code use of MTRRs based on the obituary
discussion of MTRRs on Linux [0].

  [0] http://lkml.kernel.org/r/1438991330.3109.196.camel@hp.com

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: <syrjala@sci.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: airlied@linux.ie
Cc: benh@kernel.crashing.org
Cc: bhelgaas@google.com
Cc: dan.j.williams@intel.com
Cc: konrad.wilk@oracle.com
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: mst@redhat.com
Cc: netdev@vger.kernel.org
Cc: vinod.koul@intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1440443613-13696-12-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-28 10:09:28 +02:00
Luis R. Rodriguez cb32edf65b x86/mm/pat: Wrap pat_enabled into a function API
We use pat_enabled in x86-specific code to see if PAT is enabled
or not but we're granting full access to it even though readers
do not need to set it. If, for instance, we granted access to it
to modules later they then could override the variable
setting... no bueno.

This renames pat_enabled to a new static variable __pat_enabled.
Folks are redirected to use pat_enabled() now.

Code that sets this can only be internal to pat.c. Apart from
the early kernel parameter "nopat" to disable PAT, we also have
a few cases that disable it later and make use of a helper
pat_disable(). It is wrapped under an ifdef but since that code
cannot run unless PAT was enabled its not required to wrap it
with ifdefs, unwrap that. Likewise, since "nopat" doesn't really
change non-PAT systems just remove that ifdef as well.

Although we could add and use an early_param_off(), these
helpers don't use __read_mostly but we want to keep
__read_mostly for __pat_enabled as this is a hot path -- upon
boot, for instance, a simple guest may see ~4k accesses to
pat_enabled(). Since __read_mostly early boot params are not
that common we don't add a helper for them just yet.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:01 +02:00
Luis R. Rodriguez f9626104a5 x86/mm/mtrr: Generalize runtime disabling of MTRRs
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end
up with a system with MTRR functionality disabled but PAT
functionality enabled. This can happen, for instance, when the
Xen hypervisor is used where MTRRs are not supported but PAT is.
This can happen on Linux as of commit

  47591df505 ("xen: Support Xen pv-domains using PAT")

by Juergen, introduced in v3.19.

Technically, we should assume the proper CPU bits would be set
to disable MTRRs but we can't always rely on this. At least on
the Xen Hypervisor, for instance, only X86_FEATURE_MTRR was
disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not
X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or
X86_FEATURE_CYRIX_ARR for instance.

Roger Pau Monné has clarified though that although this is
technically true we will never support PVH on these CPU types so
Xen has no need to disable these bits on those systems. As per
Roger, AMD K6, Centaur and VIA chips don't have the necessary
hardware extensions to allow running PVH guests [1].

As per Toshi it is also possible for the BIOS to disable MTRR
support, in such cases get_mtrr_state() would update the MTRR
state as per the BIOS, we need to propagate this information as
well.

x86 MTRR code relies on quite a bit of checks for mtrr_if being
set to check to see if MTRRs did get set up. Instead, lets
provide a generic getter for that. This also adds a few checks
where they were not before which could potentially safeguard
ourselves against incorrect usage of MTRR where this was not
desirable.

Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.

Lastly, since disabling MTRRs can happen at run time and we
could end up with PAT enabled, best record now in our logs when
MTRRs are disabled.

[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: bhelgaas@google.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: konrad.wilk@oracle.com
Cc: venkatesh.pallipadi@intel.com
Cc: ville.syrjala@linux.intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:01 +02:00
Luis R. Rodriguez 7d010fdf29 x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index()
There is only one user but since we're going to bury MTRR next
out of access to drivers, expose this last piece of API to
drivers in a general fashion only needing io.h for access to
helpers.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:00 +02:00
Luis R. Rodriguez 2f9e897353 x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages
As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different non-PAT
page attributes flags and different PAT entry values. Extend
arch_phys_wc_add() documentation to clarify power of two sizes /
boundary requirements as we phase out mtrr_add() use.

Lastly hint towards ioremap_uc() for corner cases on device
drivers working with devices with mixed regions where MTRR size
requirements would otherwise not enable write-combining
effective memory types.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:59 +02:00
Dave Hansen 9298b815ef x86: Add more disabled features
The original motivation for these patches was for an Intel CPU
feature called MPX.  The patch to add a disabled feature for it
will go in with the other parts of the support.

But, in the meantime, there are a few other features than MPX
that we can make assumptions about at compile-time based on
compile options.  Add them to disabled-features.h and check them
with cpu_feature_enabled().

Note that this gets rid of the last things that needed an #ifdef
CONFIG_X86_64 in cpufeature.h.  Yay!

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11 14:30:17 -07:00
Linus Torvalds 2e17c5a97e Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Okay this is the big one, I was stalled on the fbdev pull req as I
  stupidly let fbdev guys merge a patch I required to fix a warning with
  some patches I had, they ended up merging the patch from the wrong
  place, but the warning should be fixed.  In future I'll just take the
  patch myself!

  Outside drm:

  There are some snd changes for the HDMI audio interactions on haswell,
  they've been acked for inclusion via my tree.  This relies on the
  wound/wait tree from Ingo which is already merged.

  Major changes:

  AMD finally released the dynamic power management code for all their
  GPUs from r600->present day, this is great, off by default for now but
  also a huge amount of code, in fact it is most of this pull request.

  Since it landed there has been a lot of community testing and Alex has
  sent a lot of fixes for any bugs found so far.  I suspect radeon might
  now be the biggest kernel driver ever :-P p.s.  radeon.dpm=1 to enable
  dynamic powermanagement for anyone.

  New drivers:

  Renesas r-car display unit.

  Other highlights:

   - core: GEM CMA prime support, use new w/w mutexs for TTM
     reservations, cursor hotspot, doc updates
   - dvo chips: chrontel 7010B support
   - i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell),
     Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp
     support (this time for sure)
   - nouveau: async buffer object deletion, context/register init
     updates, kernel vp2 engine support, GF117 support, GK110 accel
     support (with external nvidia ucode), context cleanups.
   - exynos: memory leak fixes, Add S3C64XX SoC series support, device
     tree updates, common clock framework support,
   - qxl: cursor hotspot support, multi-monitor support, suspend/resume
     support
   - mgag200: hw cursor support, g200 mode limiting
   - shmobile: prime support
   - tegra: fixes mostly

  I've been banging on this quite a lot due to the size of it, and it
  seems to okay on everything I've tested it on."

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits)
  drm/radeon/dpm: implement vblank_too_short callback for si
  drm/radeon/dpm: implement vblank_too_short callback for cayman
  drm/radeon/dpm: implement vblank_too_short callback for btc
  drm/radeon/dpm: implement vblank_too_short callback for evergreen
  drm/radeon/dpm: implement vblank_too_short callback for 7xx
  drm/radeon/dpm: add checks against vblank time
  drm/radeon/dpm: add helper to calculate vblank time
  drm/radeon: remove stray line in old pm code
  drm/radeon/dpm: fix display_gap programming on rv7xx
  drm/nvc0/gr: fix gpc firmware regression
  drm/nouveau: fix minor thinko causing bo moves to not be async on kepler
  drm/radeon/dpm: implement force performance level for TN
  drm/radeon/dpm: implement force performance level for ON/LN
  drm/radeon/dpm: implement force performance level for SI
  drm/radeon/dpm: implement force performance level for cayman
  drm/radeon/dpm: implement force performance levels for 7xx/eg/btc
  drm/radeon/dpm: add infrastructure to force performance levels
  drm/radeon: fix surface setup on r1xx
  drm/radeon: add support for 3d perf states on older asics
  drm/radeon: set default clocks for SI when DPM is disabled
  ...
2013-07-09 16:04:31 -07:00
Yinghai Lu d5c78673b1 x86: Fix /proc/mtrr with base/size more than 44bits
On one sytem that mtrr range is more then 44bits, in dmesg we have
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-DFFFF write-through
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   8 disabled
[    0.000000]   9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
  mainline for a bit.  It is not a regression. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-25 13:08:10 -07:00
Andy Lutomirski d0d98eedee Add arch_phys_wc_{add, del} to manipulate WC MTRRs if needed
Several drivers currently use mtrr_add through various #ifdef guards
and/or drm wrappers.  The vast majority of them want to add WC MTRRs
on x86 systems and don't actually need the MTRR if PAT (i.e.
ioremap_wc, etc) are working.

arch_phys_wc_add and arch_phys_wc_del are new functions, available
on all architectures and configurations, that add WC MTRRs on x86 if
needed (and handle errors) and do nothing at all otherwise.  They're
also easier to use than mtrr_add and mtrr_del, so the call sites can
be simplified.

As an added benefit, this will avoid wasting MTRRs and possibly
warning pointlessly on PAT-supporting systems.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-05-31 13:02:52 +10:00
Linus Torvalds a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
Olaf Hering 4319eb4fc7 x86 mtrr: fix comment typo in mtrr_bp_init
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-07 11:19:19 +01:00
Fenghua Yu 30242aa602 x86, hotplug: The first online processor saves the MTRR state
Ask the first online CPU to save mtrr instead of asking BSP. BSP could be
offline when mtrr_save_state() is called.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-12-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:28:10 -08:00
Tejun Heo cbbfa38fcb mtrr: fix UP breakage caused during switch to stop_machine
While removing custom rendezvous code and switching to stop_machine,
commit 192d885742 ("x86, mtrr: use stop_machine APIs for doing MTRR
rendezvous") completely dropped mtrr setting code on !CONFIG_SMP
breaking MTRR settting on UP.

Fix it by removing the incorrect CONFIG_SMP.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Anders Eriksson <aeriksson@fastmail.fm>
Tested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 11:02:29 -07:00
Sergei Shtylyov 50c31e4a24 x86, mtrr: Use pci_dev->revision
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to
arch/x86/...

Do it now at last -- and save one level of indentation...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/201107012242.08347.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-02 11:10:07 +02:00
Suresh Siddha 192d885742 x86, mtrr: use stop_machine APIs for doing MTRR rendezvous
MTRR rendezvous sequence is not implemened using stop_machine() before, as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).

Now that we have a new stop_machine_from_inactive_cpu() API, use it for
rendezvous during mtrr init of a logical processor that is coming online.

For the rest (runtime MTRR modification, system boot, resume paths), use
stop_machine() to implement the rendezvous sequence. This will consolidate and
cleanup the code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182057.076997177@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-06-27 15:17:13 -07:00
Suresh Siddha 6d3321e8e2 x86, mtrr: lock stop machine during MTRR rendezvous sequence
MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).

MTRR rendezvous sequence is not implemented using stop_machine() as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
stop_machine() works with only online cpus.

For now, take the stop_machine mutex in the MTRR rendezvous sequence that
gets called from an online cpu (here we are in the process context
and can potentially sleep while taking the mutex). And the MTRR rendezvous
that gets triggered during cpu online doesn't need to take this stop_machine
lock (as the stop_machine() already ensures that there is no cpu hotplug
going on in parallel by doing get_online_cpus())

    TBD: Pursue a cleaner solution of extending the stop_machine()
         infrastructure to handle the case where the calling cpu is
         still not online and use this for MTRR rendezvous sequence.

fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008

Reported-by: Vadim Kotelnikov <vadimuzzz@inbox.ru>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182056.807230326@sbsiddha-MOBL3.sc.intel.com
Cc: stable@kernel.org # 2.6.35+, backport a week or two after this gets more testing in mainline
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-06-27 14:00:46 -07:00
Suresh Siddha 84ac7cdbdd x86, mtrr, pat: Fix one cpu getting out of sync during resume
On laptops with core i5/i7, there were reports that after resume
graphics workloads were performing poorly on a specific AP, while
the other cpu's were ok. This was observed on a 32bit kernel
specifically.

Debug showed that the PAT init was not happening on that AP
during resume and hence it contributing to the poor workload
performance on that cpu.

On this system, resume flow looked like this:

1. BP starts the resume sequence and we reinit BP's MTRR's/PAT
   early on using mtrr_bp_restore()

2. Resume sequence brings all AP's online

3. Resume sequence now kicks off the MTRR reinit on all the AP's.

4. For some reason, between point 2 and 3, we moved from BP
   to one of the AP's. My guess is that printk() during resume
   sequence is contributing to this. We don't see similar
   behavior with the 64bit kernel but there is no guarantee that
   at this point the remaining resume sequence (after AP's bringup)
   has to happen on BP.

5. set_mtrr() was assuming that we are still on BP and skipped the
   MTRR/PAT init on that cpu (because of 1 above)

6. But we were on an AP and this led to not reprogramming PAT
   on this cpu leading to bad performance.

Fix this by doing unconditional mtrr_if->set_all() in set_mtrr()
during MTRR/PAT init. This might be unnecessary if we are still
running on BP. But it is of no harm and will guarantee that after
resume, all the cpu's will be in sync with respect to the
MTRR/PAT registers.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Keith Packard <keithp@keithp.com>
Cc: stable@kernel.org	[v2.6.32+]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-03-29 16:17:42 -07:00
Rafael J. Wysocki f3c6ea1b06 x86: Use syscore_ops instead of sysdev classes and sysdevs
Some subsystems in the x86 tree need to carry out suspend/resume and
shutdown operations with one CPU on-line and interrupts disabled and
they define sysdev classes and sysdevs or sysdev drivers for this
purpose.  This leads to unnecessarily complicated code and excessive
memory usage, so switch them to using struct syscore_ops objects for
this purpose instead.

Generally, there are three categories of subsystems that use
sysdevs for implementing PM operations: (1) subsystems whose
suspend/resume callbacks ignore their arguments entirely (the
majority), (2) subsystems whose suspend/resume callbacks use their
struct sys_device argument, but don't really need to do that,
because they can be implemented differently in an arguably simpler
way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
use their struct sys_device argument, but the value of that argument
is always the same and could be ignored (microcode_core.c).  In all
of these cases the subsystems in question may be readily converted to
using struct syscore_ops objects for power management and shutdown.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
2011-03-23 22:15:54 +01:00
Suresh Siddha f7448548a9 x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
Markus Kohn ran into a hard hang regression on an acer aspire
1310, when acpi is enabled. git bisect showed the following
commit as the bad one that introduced the boot regression.

	commit d0af9eed5a
	Author: Suresh Siddha <suresh.b.siddha@intel.com>
	Date:   Wed Aug 19 18:05:36 2009 -0700

	    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init

Because of the UP configuration of that platform,
native_smp_prepare_cpus() bailed out (in smp_sanity_check())
before doing the set_mtrr_aps_delayed_init()

Further down the boot path, native_smp_cpus_done() will call the
delayed MTRR initialization for the AP's (mtrr_aps_init()) with
mtrr_aps_delayed_init not set. This resulted in the boot
processor reprogramming its MTRR's to the values seen during the
start of the OS boot. While this is not needed ideally, this
shouldn't have caused any side-effects. This is because the
reprogramming of MTRR's (set_mtrr_state() that gets called via
set_mtrr()) will check if the live register contents are
different from what is being asked to write and will do the actual
write only if they are different.

BP's mtrr state is read during the start of the OS boot and
typically nothing would have changed when we ask to reprogram it
on BP again because of the above scenario on an UP platform. So
on a normal UP platform no reprogramming of BP MTRR MSR's
happens and all is well.

However, on this platform, bios seems to be modifying the fixed
mtrr range registers between the start of OS boot and when we
double check the live registers for reprogramming BP MTRR
registers. And as the live registers are modified, we end up
reprogramming the MTRR's to the state seen during the start of
the OS boot.

During ACPI initialization, something in the bios (probably smi
handler?) don't like this fact and results in a hard lockup.

We didn't see this boot hang issue on this platform before the
commit d0af9eed5a, because only
the AP's (if any) will program its MTRR's to the value that BP
had at the start of the OS boot.

Fix this issue by checking mtrr_aps_delayed_init before
continuing further in the mtrr_aps_init(). Now, only AP's (if
any) will program its MTRR's to the BP values during boot.

Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393

  [ By the way, this behavior of the bios modifying MTRR's after the start
    of the OS boot is not common and the kernel is not prepared to
    handle this situation well. Irrespective of this issue, during
    suspend/resume, linux kernel will try to reprogram the BP's MTRR values
    to the values seen during the start of the OS boot. So suspend/resume might
    be already broken on this platform for all linux kernel versions. ]

Reported-and-bisected-by: Markus Kohn <jabber@gmx.org>
Tested-by: Markus Kohn <jabber@gmx.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Thomas Renninger <trenn@novell.com>
Cc: Rafael Wysocki <rjw@novell.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: stable@kernel.org # [v2.6.32+]
LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03 12:10:38 +01:00
Suresh Siddha 68f202e4e8 x86, mtrr: Use stop machine context to rendezvous all the cpu's
Use the stop machine context rather than IPI's to rendezvous all the cpus for
MTRR initialization that happens during cpu bringup or for MTRR modifications
during runtime.

This avoids deadlock scenario (reported by Prarit) like:

cpu A holds a read_lock (tasklist_lock for example) with irqs enabled
cpu B waits for the same lock with irqs disabled using write_lock_irq
cpu C doing set_mtrr() (during AP bringup for example), which will try to
rendezvous all the cpus using IPI's

This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the lock and thus not
reaching the rendezvous point.

Using stop cpu (run in the process context of per cpu based keventd) to do
this rendezvous, avoids this deadlock scenario.

Also make sure all the cpu's are in the rendezvous handler before we proceed
with the local_irq_save() on each cpu. This lock step disabling irqs on all
the cpus will avoid other deadlock scenarios (for example involving
with the blocking smp_call_function's etc).

   [ This problem is very old. Marking -stable only for 2.6.35 as the
     stop_one_cpu_nowait() API is present only in 2.6.35. Any older
     kernel interested in this fix need to do some more work in backporting
     this patch. ]

Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1280515602.2682.10.camel@sbsiddha-MOBL3.sc.intel.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@kernel.org	[2.6.35]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-30 15:59:49 -07:00
Randy Dunlap 6c550ee415 x86: fix mtrr missing kernel-doc
Fix missing kernel-doc notation in mtrr/main.c:

Warning(arch/x86/kernel/cpu/mtrr/main.c:152): No description found for parameter 'info'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-05 11:46:03 -08:00
Emese Revfy 3b9cfc0a99 x86, mtrr: Constify struct mtrr_ops
This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
LKML-Reference: <4B65D712.3080804@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-01 11:20:43 -08:00
H. Peter Anvin 5400743db5 x86, mtrr: make mtrr_aps_delayed_init static bool
mtr_aps_delayed_init was declared u32 and made global, but it only
ever takes boolean values and is only ever used in
arch/x86/kernel/cpu/mtrr/main.c.  Declare it "static bool" and remove
external references.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2009-08-21 17:00:02 -07:00
Suresh Siddha d0af9eed5a x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
the need for synchronizing the logical cpu's while initializing/updating
MTRR.

Currently Linux kernel does the synchronization of all cpu's only when
a single MTRR register is programmed/updated. During an AP online
(during boot/cpu-online/resume)  where we initialize all the MTRR/PAT registers,
we don't follow this synchronization algorithm.

This can lead to scenarios where during a dynamic cpu online, that logical cpu
is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
HT sibling continue to run (also with cache disabled because of cr0.cd=1
on its sibling).

Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
(because of some VMX performance optimizations) and the above scenario
(with one logical cpu doing VMX activity and another logical cpu coming online)
can result in system crash.

Fix the MTRR initialization by doing rendezvous of all the cpus. During
boot and resume, we delay the MTRR/PAT init for APs till all the
logical cpu's come online and the rendezvous process at the end of AP's bringup,
will initialize the MTRR/PAT for all AP's.

For dynamic single cpu online, we synchronize all the logical cpus and
do the MTRR/PAT init on the AP that is coming online.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-21 16:25:55 -07:00
Jaswinder Singh Rajput dbd51be026 x86: Clean up mtrr/main.c
Fix following trivial style problems:

  ERROR: trailing whitespace X 25
  WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
  WARNING: Use #include <linux/kvm_para.h> instead of <asm/kvm_para.h>
  ERROR: do not initialise externals to 0 or NULL X 2
  ERROR: "foo * bar" should be "foo *bar" X 5
  ERROR: do not use assignment in if condition X 2
  WARNING: line over 80 characters X 8
  ERROR: return is not a function, parentheses are not required
  WARNING: braces {} are not necessary for any arm of this statement
  ERROR: space required before the open parenthesis '(' X 2
  ERROR: open brace '{' following function declarations go on the next line
  ERROR: space required after that ',' (ctx:VxV) X 8
  ERROR: space required before the open parenthesis '(' X 3
  ERROR: else should follow close brace '}'
  WARNING: space prohibited between function name and open parenthesis '('
  WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable X 2

Also use pr_debug and pr_warning where possible.

total: 50 errors, 14 warnings

arch/x86/kernel/cpu/mtrr/main.o:

   text	   data	    bss	    dec	    hex	filename
   3668	    116	   4156	   7940	   1f04	main.o.before
   3668	    116	   4156	   7940	   1f04	main.o.after

md5:
   e01af2fd28deef77c8d01e71acfbd365  main.o.before.asm
   e01af2fd28deef77c8d01e71acfbd365  main.o.after.asm

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
Cc: Avi Kivity <avi@redhat.com> # Avi, please have a look at the kvm_para.h bit
[ More cleanups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:19:55 +02:00
Jaswinder Singh Rajput d9bcc01d58 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap
Use standard msr-index.h's MSR declaration and no need to declare again.

[ Impact: cleanup, no object code change ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-15 07:49:00 -07:00
Yinghai Lu f0348c438c x86: MTRR workaround for system with stange var MTRRs
Impact: don't trim e820 according to wrong mtrr

Ozan reports that his server emits strange warning.
it turns out the BIOS sets the MTRRs incorrectly.

Ignore those strange ranges, and don't trim e820,
just emit one warning about BIOS

Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49BEE1E7.7020706@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-17 10:47:47 +01:00
Yinghai Lu 0d890355bf x86: separate mtrr cleanup/mtrr_e820 trim to separate file
Impact: cleanup

mtrr main.c is too big, seperate mtrr cleanup and mtrr e820 trim
code to another file.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B87C7B.80809@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:52:19 +01:00
Yinghai Lu c1ab7e93c6 x86: print out mtrr_range_state when user specify size
Impact: print more debug info

Keep it consistent with autodetect version.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49B87C0A.4010105@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13 02:52:18 +01:00
Ingo Molnar bf3647c44b x86: tone down mtrr_trim_uncached_memory() warning
kerneloops.org is reporting a lot of these warnings that come due to
vmware not setting up any MTRRs for emulated CPUs:

| Reported 709 times (14696 total reports)
| BIOS bug (often in VMWare) where the MTRR's are set up incorrectly
| or not at all
|
| This warning was last seen in version 2.6.29-rc2-git1, and first
| seen in 2.6.24.
|
| More info:
|   http://www.kerneloops.org/searchweek.php?search=mtrr_trim_uncached_memory

Keep a one-liner KERN_INFO about it - so that we have so notice if empty
MTRRs are caused by native hardware/BIOS weirdness.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-29 11:45:35 +01:00
Ingo Molnar 923a789b49 Merge branch 'linus' into x86/cleanups
Conflicts:
	arch/x86/kernel/reboot.c
2009-01-02 22:41:36 +01:00
Sheng Yang b558bc0a25 x86: Rename mtrr_state struct and macro names
Prepare for exporting them.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:44 +02:00
Ingo Molnar 34bf5d0ff5 Merge branch 'x86/core' into x86/cleanups 2008-12-27 11:30:05 +01:00
Rusty Russell 71ab6b245f x86: remove impossible test in mtrr/main.c
Impact: cleanup

enable_mtrr_cleanup is static, and is never set to anything but 0 or 1.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-25 12:46:06 -08:00
Yinghai Lu 30604bb410 x86: break up mtrr_cleanup() into several small functions.
Ingo said mtrr_cleanup() is big and ugly.

so break it up into more functions and make it more readable.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-28 16:30:57 +01:00
Ingo Molnar e496e3d645 Merge branches 'x86/alternatives', 'x86/cleanups', 'x86/commandline', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/doc', 'x86/exports', 'x86/fpu', 'x86/gart', 'x86/idle', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/oprofile', 'x86/paravirt', 'x86/reboot', 'x86/sparse-fixes', 'x86/tsc', 'x86/urgent' and 'x86/vmalloc' into x86-v28-for-linus-phase1 2008-10-06 18:17:07 +02:00
Ingo Molnar 0962f402af Merge branch 'x86/prototypes' into x86-v28-for-linus-phase1
Conflicts:
	arch/x86/kernel/process_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-06 18:06:53 +02:00
Yinghai Lu dd5523552c x86: mtrr_cleanup: treat WRPROT as UNCACHEABLE
For the purpose of MTRR canonicalization, treat WRPROT as UNCACHEABLE.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-04 20:10:22 -07:00
Yinghai Lu 99e1aa17ce x86: mtrr_cleanup: first 1M may be covered in var mtrrs
The first 1M is don't care when it comes to the variables MTRRs.
Cover it as WB as a heuristic approximation; this is generally what we
want to minimize the number of registers.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-04 20:09:14 -07:00