Commit graph

2333 commits

Author SHA1 Message Date
Izik Eidus 3fe913e7c5 KVM: x86: task switch: fix wrong bit setting for the busy flag
The busy bit is bit 1 of the type field, not bit 8.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:43 +03:00
Sheng Yang 1439442c7b KVM: VMX: Enable EPT feature for KVM
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:42 +03:00
Sheng Yang b7ebfb0509 KVM: VMX: Prepare an identity page table for EPT in real mode
[aliguory: plug leak]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:41 +03:00
Sheng Yang 1ac593c97e KVM: MMU: Remove #ifdef CONFIG_X86_64 to support 4 level EPT
Currently EPT level is 4 for both pae and x86_64. The patch remove the #ifdef
for alloc root_hpa and free root_hpa to support EPT.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:39 +03:00
Sheng Yang 7b52345e2c KVM: MMU: Add EPT support
Enable kvm_set_spte() to generate EPT entries.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:38 +03:00
Sheng Yang 67253af52e KVM: Add kvm_x86_ops get_tdp_level()
The function get_tdp_level() provided the number of tdp level for EPT and
NPT rather than the NPT specific macro.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:34 +03:00
Sheng Yang 8c6d6adc6b KVM: MMU: Move some definitions to a header file
Move some definitions to mmu.h in order to allow building common table
entries between EPT and non-EPT.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 12:26:38 +03:00
Sheng Yang d56f546db9 KVM: VMX: EPT Feature Detection
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 12:26:38 +03:00
Ulrich Drepper d35c7b0e54 unified (weak) sys_pipe implementation
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation.  This removes almost 250 lines of duplicated
code.

It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)

I still haven't changed the cris version even though Linus says the BKL
isn't needed.  The arch maintainer can easily do it if there are really
no obstacles.

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-03 13:50:33 -07:00
Roman Zippel 6f6d6a1a6a rename div64_64 to div64_u64
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Linus Torvalds 6d98ca7364 x86: Mark OPTIMIZE_INLINING broken
So Ingo finally did figure out why UML broke with this option: UML
passes gcc the -fno-unit-at-a-time flag, and apparently that wreaks
havoc with gcc's inlining.

We could turn off -fno-unit-at-a-time for UML for gcc4+ (which is what
x86 does), but there's bad blood about this whole option, and it does
show that the thing is just fragile as heck.

So let tempers cool, and disable the thing, and we can revisit the
decision later.

Cc: Adrian Bunk <bunk@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 20:07:22 -07:00
Ingo Molnar 895d30935e x86: numaq fix
do not override the existing pci-y rule when adding visws or
numaq rules.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-30 23:15:35 +02:00
Ingo Molnar 6b8e1c7ec4 x86: 8K stacks by default
Switch back to 8K stacks as the safer default. Out-of-memory
situations are less problematic than silent and hard to debug
stack corruption.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Andres Salomon cb8ab687c3 x86: ioremap ram check fix
bdd3cee2e4 (x86: ioremap(), extend check
to all RAM pages) breaks OLPC's ioremap call.  The ioremap that OLPC uses is:

        romsig = ioremap(0xffffffc0, 16);

The commit that breaks it is basically:

-       for (pfn = phys_addr >> PAGE_SHIFT; pfn < max_pfn_mapped &&
-            (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+       for (pfn = phys_addr >> PAGE_SHIFT;
+                               (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+

Previously, the 'pfn < max_pfn_mapped' check would've caused us to not
enter the loop.  Removing that check means we loop infinitely.  The
reason for that is because pfn is 0xfffff, and last_addr is 0xffffffcf.
The remaining check that is used to exit the loop is not sufficient;
when pfn<<PAGE_SHIFT is 0xfffff000, that is less than 0xffffffcf; when
we increment pfn and it overflows (pfn == 0x100000), pfn<<PAGE_SHIFT
ends up being 0.  That, of course, is less than last_addr.  In effect,
pfn<<PAGE_SHIFT is never lower than last_addr.

The simple fix for this is to limit the last_addr check to the PAGE_MASK;
a patch is below.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar 5de8f68b43 x86: optimize inlining off
default to inline optimizing off.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar acbaa93e3d x86: CONFIG_X86_ELAN fix
move the X86_CPU section out of the !X86_ELAN branch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar c9af1e3323 x86: Kconfig fix
Andrew noticed that OPTIMIZE_INLINING appeared in the toplevel
menu - fix it.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Suresh Siddha de33c442ed x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()
Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC.
Once all the X drivers move to ioremap_wc(), we can go back to strong
UC semantics for ioremap() and ioremap_nocache().

To avoid attribute aliasing issues, pci_mmap_page_range() will also
use UC_MINUS for default non write-combining mapping request.

Next steps:
	a) change all the video drivers using ioremap() or ioremap_nocache()
	   and adding WC MTTR using mttr_add() to ioremap_wc()

	b) for strict usage, we can go back to strong uc semantics
	   for ioremap() and ioremap_nocache() after some grace period for
	   completing step-a.

	c) user level X server needs to use the appropriate method for setting
	   up WC mapping (like using resourceX_wc sysfs file instead of
	   adding MTRR for WC and using /dev/mem or resourceX under /sys)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Sam Ravnborg b9b39bfba5 x86: use defconfigs from x86/configs/*
Daniel Drake <dsd@gentoo.org> reported:

In 2.6.23, if you unpacked a kernel source tarball and then
ran "make menuconfig" you'd be presented with this message:
    # using defaults found in arch/i386/defconfig

and the default options would be set.

The same thing in 2.6.24 does not give you any "using defaults" message, and
the default config options within menuconfig are rather blank (e.g. no PCI
support). You can work around this by explicitly running "make defconfig"
before menuconfig, but it would be nice to have the behaviour the way it was
for 2.6.23 (and the way it still is for other archs).

Fixed by adding a x86 specific defconfig list to Kconfig.

Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10470
Tested-by: dsd@gentoo.org
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar 2544a873ab revert: "x86: ioremap(), extend check to all RAM pages"
Vegard Nossum reported a large (150 seconds) boot delay during bootup,
and bisected it to "x86: ioremap(), extend check to all RAM pages"
(commit bdd3cee2e4). Revert this commit for now.

Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Jeremy Fitzhardinge a4c863f497 x86: don't bother printing compat vdso address
The kernel prints the compat vdso address regardless of whether compat
vdso mode is enabled or not, which is confusing.  Given that this
isn't very interesting information anyway, just remove the printk.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Gerhard Mack <gmack@innerfire.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Andi Kleen f6c133f7d5 fix: x86: support for new UV apic
Don't warn in read_apic_id() when preemptible but only one CPU online.

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Vegard Nossum 575ca7351b x86: fix early-BUG message
The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.

Replace .asciz with multiple .ascii directives and terminate with .asciz.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Dmitri Vorobiev b4cdc4300d x86: iommu_sac_force can become static
The iommu_sac_force variable is needlessly defined global,
and this patch makes it static. Additionally, this variable
needs not be explicitly initialized.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Dmitri Vorobiev 4412620fc2 x86: add proper header for reboot_force
This patch fixes one sparse warning by including the appropriate
header for the reboot_force symbol.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar 3e8f7e35f3 x86 VISWS: build fix
the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.

also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar ed5e233284 x86, voyager: fix ioremap_nocache()
James Bottomley reported that the following commit:

| commit 6371b49599
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Wed Jan 30 13:33:40 2008 +0100
|
|     x86: change ioremap() to default to uncached

broke Voyager.

James says:

" it broke a class of voyager machines: those which
  rely on the quad interrupt controller (QIC).  The precis of why they
  broke is because the QIC does IPIs (or CPIs in its terminology) via
  cache line interference: you interrupt a processor by moving a
  designated memory area to write exclusive in the cache (by simply
  writing to the line) and the CPU acks the interrupt by moving it back to
  read shared (by reading from it).  That area, is, of course, mapped by
  ioremap, so reversing the ioremap semantics and adding the uncached bit
  completely breaks the QIC. "

Sorry about that!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar fc3fbc4509 hpet: fix
Al Viro pointed out that there's a missing readl() of timer->hpet_config,
found by Sparse.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Adrian Bunk b9e017e04b x86: unexport kmap_atomic_to_page
This patch removes the no longer used export of kmap_atomic_to_page.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Linus Torvalds 08acd4f8af Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (179 commits)
  ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
  acpi: fix section mismatch warning in pnpacpi
  intel_menlo: fix build warning
  ACPI: Cleanup: Remove unneeded, multiple local dummy variables
  ACPI: video - fix permissions on some proc entries
  ACPI: video - properly handle errors when registering proc elements
  ACPI: video - do not store invalid entries in attached_array list
  ACPI: re-name acpi_pm_ops to acpi_suspend_ops
  ACER_WMI/ASUS_LAPTOP: fix build bug
  thinkpad_acpi: fix possible NULL pointer dereference if kstrdup failed
  ACPI: check a return value correctly in acpi_power_get_context()
  #if 0 acpi/bay.c:eject_removable_drive()
  eeepc-laptop: add hwmon fan control
  eeepc-laptop: add backlight
  eeepc-laptop: add base driver
  ACPI: thinkpad-acpi: bump up version to 0.20
  ACPI: thinkpad-acpi: fix selects in Kconfig
  ACPI: thinkpad-acpi: use a private workqueue
  ACPI: thinkpad-acpi: fluff really minor fix
  ACPI: thinkpad-acpi: use uppercase for "LED" on user documentation
  ...

Fixed conflicts in drivers/acpi/video.c and drivers/misc/intel_menlow.c
manually.
2008-04-30 11:52:52 -07:00
Len Brown 96916090f4 Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release 2008-04-30 13:58:00 -04:00
Roland McGrath 5a8da0ea82 signals: x86 TS_RESTORE_SIGMASK
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own
set_restore_sigmask() function.  This saves the costly SMP-safe set_bit
operation, which we do not need for the sigmask flag since TIF_SIGPENDING
always has to be set too.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Yinghai Lu 70b9f7dc14 x86/pci: remove flag in pci_cfg_space_size_ext
so let pci_cfg_space_size call it directly without flag.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-04-29 15:34:05 -07:00
Sam Ravnborg 98db6f193c x86: fix section mismatch in pci_scan_bus
Fix following section mismatch warning:
WARNING: vmlinux.o(.text+0x275616): Section mismatch in reference from the function pci_scan_bus() to the function .devinit.text:pci_scan_bus_parented()

The warning was seen with a CONFIG_DEBUG_SECTION_MISMATCH=y build.
The inline function pci_scan_bus refer to functions annotated
__devinit - so annotate it __devinit too.
This revealed a few x86 specific functions that were only
used from __init or __devinit context.
So annotate these __devinit and the warning was killed.

The added include in pci.h was not strictly required but
added to avoid being dependent on indirect includes.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
2008-04-29 13:41:59 -07:00
Linus Torvalds 1f43c53930 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86: fix PCI MSI breaks when booting with nosmp
  x86: vget_cycles() __always_inline
  x86: add more boot protocol documentation
  bootprotocol: cleanup
  x86: fix warning in "x86: clean up vSMP detection"
  x86: !x & y typo in mtrr code
2008-04-29 09:03:19 -07:00
Linus Torvalds 5f78e4d339 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci:
  x86: add pci=check_enable_amd_mmconf and dmi check
  x86: work around io allocation overlap of HT links
  acpi: get boot_cpu_id as early for k8_scan_nodes
  x86_64: don't need set default res if only have one root bus
  x86: double check the multi root bus with fam10h mmconf
  x86: multi pci root bus with different io resource range, on 64-bit
  x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
  x86: get mp_bus_to_node early
  x86 pci: remove checking type for mmconfig probe
  x86: remove unneeded check in mmconf reject
  driver core: try parent numa_node at first before using default
  x86: seperate mmconf for fam10h out from setup_64.c
  x86: if acpi=off, force setting the mmconf for fam10h
  x86_64: check MSR to get MMCONFIG for AMD Family 10h
  x86_64: check and enable MMCONFIG for AMD Family 10h
  x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
  x86: mmconf enable mcfg early
  x86: clear pci_mmcfg_virt when mmcfg get rejected
  x86: validate against acpi motherboard resources

Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to
OLPC support manually.
2008-04-29 08:26:51 -07:00
Linus Torvalds 44473d9913 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] state info wrong after resume
  [CPUFREQ] allow use of the powersave governor as the default one
  [CPUFREQ] document the currently undocumented parts of the sysfs interface
  [CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism
2008-04-29 08:18:49 -07:00
Christoph Lameter 66916cd267 x86: use kbuild.h
Drop the macro definitions in asm-offsets_*.c and use kbuild.h

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:29 -07:00
Tim Gardner 8c4dd60682 edd: add default mode CONFIG_EDD_OFF=n, override with edd={on,off}
Add a kernel parameter option to 'edd' to enable/disable BIOS Enhanced Disk
Drive Services.  CONFIG_EDD_OFF disables EDD while still compiling EDD into
the kernel.  Default behavior can be forced using 'edd=on' or 'edd=off' as
a kernel parameter.

[akpm@linux-foundation.org: fix kernel-parameters.txt]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Alexey Dobriyan c74c120a21 proc: remove proc_root from drivers
Remove proc_root export.  Creation and removal works well if parent PDE is
supplied as NULL -- it worked always that way.

So, one useless export removed and consistency added, some drivers created
PDEs with &proc_root as parent but removed them as NULL and so on.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Andres Salomon 3ef0e1f8ca x86: olpc: add One Laptop Per Child architecture support
This adds support for OLPC XO hardware.  Open Firmware on XOs don't contain
the VSA, so it is necessary to emulate the PCI BARs in the kernel.  This also
adds functionality for running EC commands, and a CONFIG_OLPC.

A number of OLPC drivers depend upon CONFIG_OLPC.

olpc_ec_timeout is a hack to work around Embedded Controller bugs.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: geode_has_vsa build fix]
[akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
FUJITA Tomonori a852250920 swiotlb: use iommu_is_span_boundary helper function
iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa1).  SWIOTLB can use it instead
of the homegrown function.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Adrian Bunk 7d195a5409 proper extern for late_time_init
Add a proper extern for late_time_init in include/linux/init.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:03 -07:00
Adrian Bunk eb0f1c442d proper __do_softirq() prototype
Add a proper prototype for __do_softirq() in include/linux/interrupt.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Jesse Barnes e90955c26d x86: fix PCI MSI breaks when booting with nosmp
set up sane APIC state even in the nosmp case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-29 13:45:24 +02:00
Ingo Molnar 781fe2ebc0 bootprotocol: cleanup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-29 13:45:24 +02:00
Alexander van Heukelum 8008abbd87 x86: fix warning in "x86: clean up vSMP detection"
The function detect_vsmp_box is a void function in the PCI case.
Change the !PCI stub to void too.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-29 13:45:24 +02:00
Harvey Harrison e686d34156 x86: !x & y typo in mtrr code
As written, this can never be true.

Spotted by the Sparse checker.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-29 13:45:24 +02:00
Roland McGrath d9dedc1385 x86_64 vDSO: use initdata
The 64-bit vDSO image is in a special ".vdso" section for no reason
I can determine.  Furthermore, the location of the vdso_end symbol
includes some wrongly-calculated padding space in the image, which
is then (correctly) rounded to page size, resulting in an extra page
of zeros in the image mapped in to user processes.

This changes it to put the vdso.so image into normal initdata as we
have always done for the 32-bit vDSO images.  The extra padding is
gone, so the user VMA is one page instead of two.  The image that
was already copied around at boot time is now in initdata, so we
recover that wasted space after boot.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:49:35 -07:00
Darrick J. Wong e8628dd06d [CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism
Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software.  When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required.  This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.

To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw).  If the cpufreq driver does not provide a value, fall
back to policy->cpus.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-04-28 16:27:08 -04:00
Venkatesh Pallipadi e56a727b02 [CPUFREQ] Make acpi-cpufreq more robust against BIOS freq changes behind our back.
We checked the hardware freq with OS cached freq value in get_cur_freqon_cpu().

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-04-28 15:16:46 -04:00
PJ Waskiewicz 9d9ad4b51d x86: Fix 32-bit MSI-X allocation leakage
This bug was introduced in the 2.6.24 i386/x86_64 tree merge, where
MSI-X vector allocation will eventually fail.  The cause is the new
bit array tracking used vectors is not getting cleared properly on
IRQ destruction on the 32-bit APIC code.

This can be seen easily using the ixgbe 10 GbE driver on multi-core
systems by simply loading and unloading the driver a few times.
Depending on the number of available vectors on the host system, the
MSI-X allocation will eventually fail, and the driver will only be
able to use legacy interrupts.

I am generating the same patch for both stable trees for 2.6.24 and
2.6.25.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 10:49:17 -07:00
Andres Salomon 32bf87e369 x86: geode: MSR cleanup
This cleans up a few MSR-using drivers in the following manner:
  - Ensures MSRs are all defined in asm/geode.h, rather than in misc
    places
  - Makes the naming consistent; cs553[56] ones begin with MSR_,
    GX-specific ones start with MSR_GX_, and LX-specific ones start
    with MSR_LX_.  Also, make the names match the data sheet.
  - Use MSR names rather than numbers in source code
  - Document the fact that the LX's MSR_PADSEL has the wrong value
    in the data sheet.  That's, uh, good to note.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:35 -07:00
Thomas Petazzoni 7ae9392c0a x86: configurable DMI scanning code
Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in
order to be able to remove the DMI table scanning code if it's not
needed, and then reduce the kernel code size.

With CONFIG_DMI (i.e before) :

   text    data     bss     dec     hex filename
1076076  128656   98304 1303036  13e1fc vmlinux

Without CONFIG_DMI (i.e after) :

   text    data     bss     dec     hex filename
1068092  126308   98304 1292704  13b9a0 vmlinux

Result:

   text    data     bss     dec     hex filename
  -7984   -2348       0  -10332   -285c vmlinux

The new option appears in "Processor type and features", only when
CONFIG_EMBEDDED is defined.

This patch is part of the Linux Tiny project, and is based on previous work
done by Matt Mackall <mpm@selenic.com>.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Anvin" <hpa@zytor.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:30 -07:00
Christoph Lameter d60cd46bbd pageflags: use proper page flag functions in Xen
Xen uses bitops to manipulate page flags.  Make it use proper page flag
functions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter 2301696932 vmallocinfo: add caller information
Add caller information so that /proc/vmallocinfo shows where the allocation
request for a slice of vmalloc memory originated.

Results in output like this:

0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20000801000-0xffffc20000806000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
0xffffc20000c07000-0xffffc20000c0a000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c0a000-0xffffc20000c0c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c0c000-0xffffc20000c0f000   12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
0xffffc20000c10000-0xffffc20000c15000   20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
0xffffc20000c16000-0xffffc20000c18000    8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
0xffffc20000c18000-0xffffc20000c1a000    8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
0xffffc20000c1a000-0xffffc20000c1c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1c000-0xffffc20000c1e000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1e000-0xffffc20000c20000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c20000-0xffffc20000c22000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c22000-0xffffc20000c24000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c24000-0xffffc20000c26000    8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
0xffffc20000c26000-0xffffc20000c28000    8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
0xffffc20000c28000-0xffffc20000c2d000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000c2d000-0xffffc20000c31000   16384 tcp_init+0xd5/0x31c pages=3 vmalloc
0xffffc20000c31000-0xffffc20000c34000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c34000-0xffffc20000c36000    8192 init_vdso_vars+0xde/0x1f1
0xffffc20000c36000-0xffffc20000c38000    8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap
0xffffc20000c38000-0xffffc20000c3a000    8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap
0xffffc20000c3a000-0xffffc20000c3e000   16384 sys_swapon+0x509/0xa15 pages=3 vmalloc
0xffffc20000c40000-0xffffc20000c61000  135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap
0xffffc20000c61000-0xffffc20000c6a000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c6a000-0xffffc20000c73000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c73000-0xffffc20000c7c000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c7c000-0xffffc20000c7f000   12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc
0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap
0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc
0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages
0xffffc20002204000-0xffffc2000220d000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000220d000-0xffffc20002216000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002216000-0xffffc2000221f000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000221f000-0xffffc20002228000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002228000-0xffffc20002231000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002231000-0xffffc20002234000   12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc
0xffffc20002240000-0xffffc20002261000  135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap
0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages
0xffffffffa0000000-0xffffffffa0022000  139264 module_alloc+0x4f/0x55 pages=33 vmalloc
0xffffffffa0022000-0xffffffffa0029000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc
0xffffffffa002b000-0xffffffffa0034000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa0034000-0xffffffffa003d000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa003d000-0xffffffffa0049000   49152 module_alloc+0x4f/0x55 pages=11 vmalloc
0xffffffffa0049000-0xffffffffa0050000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Pekka Enberg 1b27d05b6e mm: move cache_line_size() to <linux/cache.h>
Not all architectures define cache_line_size() so as suggested by Andrew move
the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Jeremy Fitzhardinge 180c06efce hotplug-memory: make online_page() common
All architectures use an effectively identical definition of online_page(), so
just make it common code.  x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.

x86-32's differences arise because it puts its hotplug pages in the highmem
zone.  We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately.  This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.

I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.

[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
Ingo Molnar f022bfd582 x86: PAT fix
Adrian Bunk noticed the following Coverity report:

> Commit e7f260a276
> (x86: PAT use reserve free memtype in mmap of /dev/mem)
> added the following gem to arch/x86/mm/pat.c:
>
> <--  snip  -->
>
> ...
> int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
>                                 unsigned long size, pgprot_t *vma_prot)
> {
>         u64 offset = ((u64) pfn) << PAGE_SHIFT;
>         unsigned long flags = _PAGE_CACHE_UC_MINUS;
>         unsigned long ret_flags;
> ...
> ...  (nothing that touches ret_flags)
> ...
>         if (flags != _PAGE_CACHE_UC_MINUS) {
>                 retval = reserve_memtype(offset, offset + size, flags, NULL);
>         } else {
>                 retval = reserve_memtype(offset, offset + size, -1, &ret_flags);
>         }
>
>         if (retval < 0)
>                 return 0;
>
>         flags = ret_flags;
>
>         if (pfn <= max_pfn_mapped &&
>             ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
>                 free_memtype(offset, offset + size);
>                 printk(KERN_INFO
>                 "%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
>                         current->comm, current->pid,
>                         cattr_name(flags),
>                         offset, offset + size);
>                 return 0;
>         }
>
>         *vma_prot = __pgprot((pgprot_val(*vma_prot) & ~_PAGE_CACHE_MASK) |
>                              flags);
>         return 1;
> }
>
> <--  snip  -->
>
> If (flags != _PAGE_CACHE_UC_MINUS) we pass garbage from the stack to
> ioremap_change_attr() and/or __pgprot().
>
> Spotted by the Coverity checker.

the fix simplifies the code as we get rid of the 'ret_flags'
complication.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:15:16 -07:00
Linus Torvalds 86cf02f8ea x86 PAT: tone down debugging messages some more
Ingo already fixed one of these at my request (in "x86 PAT: tone down
debugging messages", commit 1ebcc654f0),
but there was another one he missed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-27 11:59:30 -07:00
Linus Torvalds 42cadc8600 Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
  KVM: kill file->f_count abuse in kvm
  KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
  KVM: SVM: remove selective CR0 comment
  KVM: SVM: remove now obsolete FIXME comment
  KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
  KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
  KVM: export kvm_lapic_set_tpr() to modules
  KVM: SVM: sync TPR value to V_TPR field in the VMCB
  KVM: ppc: PowerPC 440 KVM implementation
  KVM: Add MAINTAINERS entry for PowerPC KVM
  KVM: ppc: Add DCR access information to struct kvm_run
  ppc: Export tlb_44x_hwater for KVM
  KVM: Rename debugfs_dir to kvm_debugfs_dir
  KVM: x86 emulator: fix lea to really get the effective address
  KVM: x86 emulator: fix smsw and lmsw with a memory operand
  KVM: x86 emulator: initialize src.val and dst.val for register operands
  KVM: SVM: force a new asid when initializing the vmcb
  KVM: fix kvm_vcpu_kick vs __vcpu_run race
  KVM: add ioctls to save/store mpstate
  KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
  ...
2008-04-27 10:13:52 -07:00
Marcelo Tosatti 960b399169 KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down
in the MMU processing will take it if necessary, so as it is it can
deadlock.

Apparently a leftover from the days before slots_lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:45 +03:00
Joerg Roedel 1336028b9a KVM: SVM: remove selective CR0 comment
There is not selective cr0 intercept bug. The code in the comment sets the
CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged
real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit.
Selective CR0 intercepts only occur when a bit is actually changed. So its the
right behavior that there is no intercept on this instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:44 +03:00
Joerg Roedel aaf697e4e0 KVM: SVM: remove now obsolete FIXME comment
With the usage of the V_TPR field this comment is now obsolete.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:43 +03:00
Joerg Roedel aaacfc9ae2 KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
This patch disables the intercept of CR8 writes if the TPR is not masking
interrupts. This reduces the total number CR8 intercepts to below 1 percent of
what we have without this patch using Windows 64 bit guests.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:43 +03:00
Joerg Roedel d7bf8221a3 KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be
synced with the TPR field in the local apic.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:42 +03:00
Joerg Roedel ec7cf6903f KVM: export kvm_lapic_set_tpr() to modules
This patch exports the kvm_lapic_set_tpr() function from the lapic code to
modules. It is required in the kvm-amd module to optimize CR8 intercepts.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:41 +03:00
Joerg Roedel 649d68643e KVM: SVM: sync TPR value to V_TPR field in the VMCB
This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB.
With this change we can safely remove the CR8 read intercept.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:40 +03:00
Avi Kivity f9b7aab35c KVM: x86 emulator: fix lea to really get the effective address
We never hit this, since there is currently no reason to emulate lea.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:35 +03:00
Avi Kivity 16286d082d KVM: x86 emulator: fix smsw and lmsw with a memory operand
lmsw and smsw were implemented only with a register operand.  Extend them
to support a memory operand as well.  Fixes Windows running some display
compatibility test on AMD hosts.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:34 +03:00
Avi Kivity 66b8550573 KVM: x86 emulator: initialize src.val and dst.val for register operands
This lets us treat the case where mod == 3 in the same manner as other cases.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:33 +03:00
Avi Kivity a79d2f1805 KVM: SVM: force a new asid when initializing the vmcb
Shutdown interception clears the vmcb, leaving the asid at zero (which is
illegal.  so force a new asid on vmcb initialization.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:32 +03:00
Marcelo Tosatti e9571ed54b KVM: fix kvm_vcpu_kick vs __vcpu_run race
There is a window open between testing of pending IRQ's
and assignment of guest_mode in __vcpu_run.

Injection of IRQ's can race with __vcpu_run as follows:

CPU0                                CPU1
kvm_x86_ops->run()
vcpu->guest_mode = 0                SET_IRQ_LINE ioctl
..
kvm_x86_ops->inject_pending_irq
kvm_cpu_has_interrupt()

                                    apic_test_and_set_irr()
                                    kvm_vcpu_kick
                                    if (vcpu->guest_mode)
                                        send_ipi()

vcpu->guest_mode = 1

So move guest_mode=1 assignment before ->inject_pending_irq, and make
sure that it won't reorder after it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:32 +03:00
Marcelo Tosatti 62d9f0dbc9 KVM: add ioctls to save/store mpstate
So userspace can save/restore the mpstate during migration.

[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:16 +03:00
Avi Kivity a45352908b KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
We wish to export it to userspace, so move it into the kvm namespace.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:04:13 +03:00
Marcelo Tosatti 3d80840d96 KVM: hlt emulation should take in-kernel APIC/PIT timers into account
Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.

Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:

CPU0                                        CPU1
                                            if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
    schedule()
                                            atomic_inc(pit_timer->pending)

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:04:11 +03:00
Joerg Roedel 3564990af1 KVM: SVM: do not intercept task switch with NPT
When KVM uses NPT there is no reason to intercept task switches. This patch
removes the intercept for it in that case.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:23 +03:00
Feng(Eric) Liu d4c9ff2d1b KVM: Add kvm trace userspace interface
This interface allows user a space application to read the trace of kvm
related events through relayfs.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:22 +03:00
Feng (Eric) Liu 2714d1d3d6 KVM: Add trace markers
Trace markers allow userspace to trace execution of a virtual machine
in order to monitor its performance.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:19 +03:00
Joerg Roedel 53371b5098 KVM: SVM: add intercept for machine check exception
To properly forward a MCE occured while the guest is running to the host, we
have to intercept this exception and call the host handler by hand. This is
implemented by this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:18 +03:00
Joerg Roedel 6394b6494c KVM: SVM: align shadow CR4.MCE with host
This patch aligns the host version of the CR4.MCE bit with the CR4 active in
the guest. This is necessary to get MCE exceptions when the guest is running.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:18 +03:00
Joerg Roedel ec077263b2 KVM: SVM: indent svm_set_cr4 with tabs instead of spaces
The svm_set_cr4 function is indented with spaces. This patch replaces
them with tabs.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:17 +03:00
Anthony Liguori 35149e2129 KVM: MMU: Don't assume struct page for x86
This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty().  Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.

We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results.  When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.

This does not implement support for avoiding reference counting for reserved
RAM or for IO memory.  However, it should make those things pretty straight
forward.

Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those.  I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.

[avi: fix overflow when shifting left pfns by adding casts]

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:15 +03:00
Marcelo Tosatti bed1d1dfc4 KVM: MMU: prepopulate guest pages after write-protecting
Zdenek reported a bug where a looping "dmsetup status" eventually hangs
on SMP guests.

The problem is that kvm_mmu_get_page() prepopulates the shadow MMU
before write protecting the guest page tables. By doing so, it leaves a
window open where the guest can mark a pte as present while the host has
shadow cached such pte as "notrap". Accesses to such address will fault
in the guest without the host having a chance to fix the situation.

Fix by moving the write protection before the pte prefetch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:58 +03:00
Avi Kivity fcd6dbac92 KVM: MMU: Only mark_page_accessed() if the page was accessed by the guest
If the accessed bit is not set, the guest has never accessed this page
(at least through this spte), so there's no need to mark the page
accessed.  This provides more accurate data for the eviction algortithm.

Noted by Andrea Arcangeli.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:57 +03:00
Avi Kivity 3d45830c2b KVM: Free apic access page on vm destruction
Noticed by Marcelo Tosatti.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:54 +03:00
Izik Eidus 3ee16c8145 KVM: MMU: allow the vm to shrink the kvm mmu shadow caches
Allow the Linux memory manager to reclaim memory in the kvm shadow cache.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:53 +03:00
Marcelo Tosatti 3200f405a1 KVM: MMU: unify slots_lock usage
Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.

Also fix some callsites that were not grabbing the lock properly.

[avi: drop slots_lock while in guest mode to avoid holding the lock
      for indefinite periods]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:52 +03:00
Sheng Yang 25c5f225be KVM: VMX: Enable MSR Bitmap feature
MSR Bitmap controls whether the accessing of an MSR causes VM Exit.
Eliminating exits on automatically saved and restored MSRs yields a
small performance gain.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:52 +03:00
Izik Eidus 37817f2982 KVM: x86: hardware task switching support
This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm.  It allows operating systems which use it,
like freedos, to run as kvm guests.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:39 +03:00
Izik Eidus 2e4d265349 KVM: x86: add functions to get the cpl of vcpu
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:38 +03:00
Avi Kivity 4c9fc8ef50 KVM: VMX: Add module option to disable flexpriority
Useful for debugging.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:37 +03:00
Avi Kivity 268fe02ae0 KVM: no longer EXPERIMENTAL
Long overdue.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:36 +03:00
Avi Kivity 0b49ea8659 KVM: MMU: Introduce and use spte_to_page()
Encapsulate the pte mask'n'shift in a function.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:35 +03:00
Izik Eidus 855149aaa9 KVM: MMU: fix dirty bit setting when removing write permissions
When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.

This patch fix this issue by marking the page as dirty inside
rmap_write_protect().

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:34 +03:00
Avi Kivity 947da53830 KVM: MMU: Set the accessed bit on non-speculative shadow ptes
If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction.  This saves a read-modify-write
operation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:33 +03:00
Glauber Costa 1e977aa12d x86: KVM guest: disable clock before rebooting.
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.

Without it, we can have a random memory location being written
when the guest comes back

It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:31 +03:00
Glauber Costa 3c62c62502 x86: make native_machine_shutdown non-static
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:30 +03:00
Glauber Costa ed23dc6f5b x86: allow machine_crash_shutdown to be replaced
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:29 +03:00
Marcelo Tosatti 096d14a3b5 x86: KVM guest: hypercall batching
Batch pte updates and tlb flushes in lazy MMU mode.

[avi:
 - adjust to mmu_op
 - helper for getting para_state without debug warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:28 +03:00